Tuesday 6 January 2009

Project Euler 89: Converting to and from Roman Numerals

It would not surprise me at all to learn that the Roman Numeral system was invented by a Stone Mason with one eye on his retirement fund. As he charged per character chiseled he could expect a substantial fee whenever called upon to inscribe numbers; dates on tombstones were even better, because on the whole he could expect the length of the inscription and thus the fee to increase as the years progressed.

Everybody knows how the Roman Numeral system works, though the only time we usually get to show off our skills is when the end-credits roll on the the big or small screen and the date of film or programme production flashes up. It consists of 7 symbols, each standing for a value:

Numeral Value
I 1
V 5
X 10
L 50
C 100
D 500
M 1000

A number in Roman Numerals is just a string of these numerals written in descending order (e.g. M's first, followed by D's, etc.). To convert from Roman Numerals to decimal we just work through the string, adding up the values of the numerals. So MMVIIII represents 2 x 1000 + 1 x 5 + 4 x 1 = 2009.

The only complication is the subtractive rule. This rule makes it possible for some numbers to be written in a more compact form (I'm sure archaeologists will one day unearth a document recording the protests made by the Mason's Union on its introduction!). It allows a numeral of a lower value to be placed before one of a higher value to indicate that the lesser should be subtracted from the greater. Thus IV means 5 - 1 = 4, and XC = 100 - 10 = 90. Unfortunately, it's not quite that simple. According to PPI MMDCL1, only I, X and C can be used subtractively, and then only in front of numerals up to ten times as big. So I can only be used in front of V and X, X in front of L and C; and C in front of D and M.

Converting Numerals to Decimal Programmatically

It's the subtractive rule that stumped me at first when I tried to figure out an algorithm to parse roman numerals. It certainly makes things complicated if you think about parsing the string from left to right. But as soon as you start thinking backwards, that is working from right to left, the difficulty evaporates:

  1. Split the string into characters
  2. Convert each character into the value it represents
  3. Start from the right - the lowest value numeral
  4. Keep a running total, and a record of the maximum numeral encountered so far
  5. Take characters one by one:
    1. If character is greater than the maximum, add it to the running total and update the maximum
    2. If character is less than the maximum, subtract it from the running total

This converts straight-forwardly into LINQ:

public static int ConvertToDecimal(this string numerals)
{
    var result = numerals
        .ToCharArray()
        .Reverse()
        .Select(c => _numeralToNumberLookup[c.ToString()])
        .Aggregate(
            new { MaxValue = 0, RunningTotal = 0 },
            (state, item) => new
            {
                MaxValue = Math.Max(state.MaxValue, item),
                RunningTotal = item >= state.MaxValue ? state.RunningTotal + item 
                                                               : state.RunningTotal - item
            }, 
            aggregate => aggregate.RunningTotal);

    return result;
}

You'll note that I'm using an anonymous type in my call to Aggregate to allow me to record both pieces of state. Notice that I have to seed the call to Aggregate with an empty instance of the anonymous type (line 8). The C# compiler makes this possible by ensuring that any anonymous type that has properties with the same name and type and in the same order as another anonymous type actually become the same type. This overload of Aggregate also requires a lambda expression that extracts the overall result from the final aggregate object.

Converting Numbers to Numerals (with added recurserators!)

So that's Parsing taken care of. What about turning a number into numerals?

The algorithm isn't much more difficult, though the implementation I've chosen is a little unusual. The algorithm is recursive. We start off with the number we want to convert, and a stack containing the values of all the numerals that are available to us, greatest on the top. This stack includes the values of "composite numerals" - pairs of numerals that follow the subtractive rule, for example, 4 (IV), 9 (IX) and 40 (XL). In each iteration we peek at the numeral value on the top of the stack and test to see whether the number is divisible by it. If it is we've got us some numerals - we produce part of the final numeral string by looking up the appropriate numeral(s) and repeating them as many times as the number is divisible by the numeral value. If not, we recurse anyway. To recurse, we pop the top numeral off the stack, and pass the reduced list of numerals through to the next iteration, along with the remainder of the number after dividing by the numeral value. We terminate the recursion when we are asked to convert number 0 - which can't be done in roman numerals. If you understood all that, you have permission to skip the next paragraph; otherwise read on for a worked run-through of the algorithm.

For example, suppose we want to convert 2051. We start with the complete list of numerals: { 1000, 900, 500, 400, ... 50, 40, 10, 9, 1}. Peek at the first numeral value, and discover that it's 1000. 2051 is 2 x 1000, with remainder 51. Looking up 1000 in our numeral dictionary yields "M", so the first part of the numeral string is "MM". We're done with that iteration, so we remove 1000 from our list of numerals, and pass the smaller list, and the remainder 51 to the next iteration. The next few numeral values {900, 500, etc.} don't divide into 51, so we keep popping numerals, passing remainders and recursing till we get to a numeral list starting {50, 40, ...}.  51 is divisible by 50 (so we deduce that the next part of the numeral string is "L"). We then pop, pass and recurse until the number 1 is given for conversion, and the numeral list just contains {1}. At this point we add numeral "I" to the result, recursing hits the terminating condition, so we stop.

Now for the implementation. I could have used a straight-forward recursive function: I did this with my solution to Problem 31 (Counting Coin Combinations) which had a similar algorithm. But my Lazy Trampoline hasn't seen much action recently, so I decided to give it a bounce. The idea of the Lazy Trampoline is to create a recursive function that is able to yield a result after each iteration - a recurserator, you could call it.

It looks like this2:

public static string ConvertToNumerals(this int number)
{
    Func<int, IStack<int>, IEnumerable<string>> numeralsEnumerator = Trampoline.MakeLazyTrampoline((int numberToConvert, IStack<int> availableNumerals) =>
    {
        if (numberToConvert == 0)
        {
            return Trampoline.YieldBreak<int, IStack<int>, string>();
        }

        int currentNumeral = availableNumerals.Peek();

        int quotient = numberToConvert / currentNumeral;
        int remainder = numberToConvert % currentNumeral;

        string numberAsNumerals = _numberToNumeralLookup[currentNumeral].Repeat(quotient);

        return Trampoline.YieldAndRecurse<int, IStack<int>, string>(
            numberAsNumerals, 
            remainder, availableNumerals.Pop());
    }
    );

    var result = numeralsEnumerator(number, _availableNumerals)
             .Aggregate(
                new StringBuilder(),
                (sb, numerals) => sb.Append(numerals),
                sb => sb.ToString());

    return result;
}

The call to MakeLazyTrampoline (line 3) generates an iterator (numeralsEnumerator). Each item in the sequence generated by the iterator is a string consisting of one type of numeral. These are all combined by the query starting line 23 to create the complete numeral.

Within the recurserator, note that I return special values generated by Trampoline.YieldBreak and Trampoline.YieldAndRecurse. These control the behaviour of the recurserator. YieldBreak indicates that the recursion should stop now; YieldAndRecurse indicates (as its first parameter) which value should be returned in the sequence, and (in the second and third parameters) which values should be passed through to the next iteration.

It may have slipped under you radar, but in Line 15 I'm making a call to the Repeat method which doesn't exist on the String class. Here's the missing extension method definition:

public static string Repeat(this string value, int count)
{
    if (count == 0)
    {
        return string.Empty;
    }
    
    if (count == 1)
    {
        return value;
    }

    return Enumerable.Repeat(value, count)
        .Aggregate(new StringBuilder(), 
        (aggregate, item) => aggregate.Append(item), 
        aggregate => aggregate.ToString());
}

Solving Project Euler Problem 89

Project Euler Problem 89 gives us a file containing a load of numbers in Roman Numeral form, and asks us to calculate how many characters can be saved if the numbers are compacted using the subtractive rule. Given the two extension methods we've just implemented, the solution is a simple pair of LINQ queries:

[EulerProblem(89, Title = "Develop a method to express Roman numerals in minimal form.")]
class Problem89
{
    public long Solve()
    {
        var lines = File.ReadAllLines("ProblemData\\Problem89Data.txt");

        var numberOfCharacters = lines.Select(line => line.Length).Sum();

        var minimisedNumberOfCharacters = lines
            .Select(line => line.ConvertToDecimal())
            .Select(number => number.ConvertToNumerals())
            .Select(numerals => numerals.Length)
            .Sum();

        return numberOfCharacters - minimisedNumberOfCharacters;
    }
}

The full code is available, as always, on the MSDN Code Gallery page. Hope you like it Saevar.

Footnotes
  1. PPI - Prex pro Ineo. ie Request For Comment! Thanks, InterTran!
  2. Thanks again to Eric Lippert for his Immutable Stack code.

4 comments:

Anonymous said...

It's also perfectly acceptable to write 2009 as either MMVIIII or the shorter, subtractive version MMIX.

Jonathan Wood said...

Yikes. Very interesting using LINQ to do this. If anyone wants to see code to do this without using LINQ, check out http://www.blackbeltcoder.com/Articles/strings/generating-and-parsing-roman-numerals.

mappo said...

Tiny correction needed.
In the parsing section, step 5.1, one should add to the running total if current character is greater than _or_equal_to_ the maximum.

Step 5 may therefore be written as:
IF currentCharacter < maxumum
THEN subtract
ELSE add

David Anderson Lino de Sousa said...

IIX would break your code. It would output 8. Right?

Post a Comment