Thursday 19 June 2008

Project Euler, Problem 13: Summing big integers

Today we'll look at a Project Euler problem that takes us right back to primary school. Problem 13 asks us to sum together 100 big integers - each with 50 digits. 50 digits puts the numbers right outside the range of the Int32 datatype, and even long can't handle numbers that big; double obviously could, but not with the precision that we need. It looks like we'll need to dig deeper to solve this one - which I'm sure was Euler's intention in setting the problem!

We could use one of the BigInteger implementations that are available (in the J# library with .Net, or on CodeProject - but not the one that failed to make it into .Net 3.5!). However, since my goal with this series is to learn functional programming techniques with C#, and oil my rusty mathematics, I'm going to show how to do the addition from scratch. Don't expect amazing performance or a reference implementation for long integer addition in what I show here: this is a functional programming tutorial, not an unsafe C# tutorial.

So how do we sum up numbers? Please excuse me for a moment whilst I imagine myself back to school, to the tiny desks and the classrooms smelling of disinfectant, in front of the squeaky blackboard where Mrs Jones is teaching us how to do long hand addition.

Long hand addition

It all comes flooding back:

  1. Write the digits of the numbers in columns. Draw a line underneath (neatly, with a ruler!). Write a plus sign to the left.
  2. Start at the right hand column (the "tens column"), and add up all its digits
  3. Write the last digit of the column's sum at the bottom of the column; the other digits should be written in the column to the left (the "hundreds column")- this is the carry over.
  4. Move to the next column. Sum up its digits, and also add the carry over from the previous column. Split up the sum as before: the last digit is written to the column, the first digits in the column to the left.
  5. Carry on till all the columns have been added up, or Darren spills paint over my workbook.
  6. Read off the answer from underneath the line.

Let's code up a solution that works in the same way.

For convenience, we'll just paste in the block of numbers from the Project Euler website as a big string; as in a previous solution, a LINQ query effortlessly chops and changes the string into a form we can work with. Since the 50 digit numbers are too long to be stored as numbers, we'll represent them as sequences of digits - and to make things easier later on, we'll store them with the least-significant digit (the digit representing the "units") first. I'm storing each digit as an int32 which is horribly inefficient, but helps keeps the code simple. The LINQ query gives us a sequence of these sequences of digits, as you can see:

const string Input = @"37107287533902102798797998220837590246510135740250
                     46376937677490009712648124896970078050417018260538
                        ...
   ";

// split the input string block into lines
var numbers = Input.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
    // convert each line of the strings to a sequence of digits:
    .Select(line =>
        // process the line as individual characters
        line.ToCharArray()
        // ignore any whitespace characters
        .Where(c => char.IsDigit(c))
        .Select(c => int.Parse(c.ToString()))
        // reverse the sequence putting least significant digits first; this makes
        // subsequent processing easier
        .Reverse()
        );

That effectively completes step 1 above, if you imagine all the sequences of digits laid out with columns aligning, and are prepared to forgo the neat underline that Mrs Jones insisted on in my workbook.

Now things begin to get interesting, because for step 2 we need to process one column of digits at a time - which means taking one element from each of the sequences of digits before moving on to the next position in the sequence. This is different from anything we've done before, where we have always processed sequences sequentially - one entire sequence after another. To handle sequences in this new way we need to introduce another staple of functional programming, the Zip function.

Zip in this context has nothing to do with compression. Instead you need to be thinking pants - or trousers, as we in the UK call them. Specifically, think of the zipper, the status of which some gentlemen get so paranoid about. As you pull the zipper up, it pulls together the two separate halves of the fly and joins them. The Zip function works just like that: it draws together two (or more) sequences, creating a new sequence where each element is made up of the elements from one point of all the independent sequences.

This is my implementation of Zip in C# 3.0:

public static IEnumerable<IEnumerable<T>> Zip<T>(this IEnumerable<IEnumerable<T>> sequences)
{
    // get the enumerators for each sequence
    var enumerators = sequences.Select(sequence => sequence.GetEnumerator()).ToList();

    // make a list of the enumerators that still have elements to yield
    var activeEnumerators = new List<IEnumerator<T>>();
    var items = new List<T>();

    while (enumerators.Count > 0)
    {
        items.Clear();
        activeEnumerators.Clear();

        // take an element from each iterator
        // if the iterator doesn't have any more elements/
        // don't add it to the activeEnumerators list
        foreach (var enumerator in enumerators)
        {
            if (enumerator.MoveNext())
            {
                items.Add(enumerator.Current);
                activeEnumerators.Add(enumerator);
            }
        }

        yield return items;

        enumerators.Clear();
        enumerators.AddRange(activeEnumerators);
    }
}

This implementation differs from others you might see around because it keeps going while any of the independent sequences have elements - it's greedy; you can easily change it to match the usual, frugal, implementations which terminate the output sequence as soon as any of the input sequences dry up by putting a yield break in an else clause after if (enumerator.MoveNext()).

If we apply this Zip method to our list of digit sequences it will produce for us a sequence where each element is effectively a column of digits in our sum, which is just what we need.

The next step is to add up each of the columns, doing the carry-over from one to the next. This is just the kind of job that Aggregate likes to do. Each iteration of the Aggregate method will add up the digits of one column of numbers, push the appropriate digit onto a stack that we'll build up representing the digits of the result, and record the carry-over ready for the next iteration. What do we do with the carry-over when we get to the last column? Well, we need to stick its digits in front of the other digits we've calculated, just as when we're doing long-hand division. The Aggregate method has an overload that allows us to apply an operation to the final accumulation variable; in our case the final accumulation variable will contain all the digits of the sum, and a separate record of the final carry-over as a number, so we'll supply a method that will break the carry-over into its component digits, and append it to the sequence of sum digits:

// calculate the sum of the numbers, returning the result as a sequence of integers
// representing the digits of the sum.
var sumDigits = numbers
                // process the entire list of numbers column by column
                // ie, process least significant digits of all numbers first, then the next column of digits, etc.
                .Zip()
                // work through each column, calculating which digit represents the sum for that column,
                // and what the carryover is
                .Aggregate(
                    // initialise an annoymous data structure to hold our workings out as we
                    // move column-by-column.
                    // Digits holds the digits of the sum that we've calculated so far
                    // CarryOver holds whatever has been carried over from the previous column
                    new { Digits = Stack<int>.Empty, CarryOver = 0},
                    (previousResult, columnDigits)=>
                        {
                            var columnSum = columnDigits.Sum() + previousResult.CarryOver;
                            var nextDigit = columnSum % 10;
                            var carryOver = columnSum / 10;
                            return new { Digits = previousResult.Digits.Push(nextDigit), CarryOver = carryOver };
                        },
                    // to complete the aggregation, we need to add the digits of the final carry-over
                    // to the digits of the sum;
                    // note that because completeSum.Digits is a stack, when it is enumerated
                    // we get the items back in reverse order - ie most significant digit first; hence the call to Reverse()
                    // to make the order of the digits in the sum we return consistent with the input digits
                    (completeSum) => completeSum.CarryOver == 0 ? completeSum.Digits.Reverse() : completeSum.CarryOver.Digits().Concat(completeSum.Digits).Reverse()
                    );

Note that I'm making use of Eric Lippert's Immutable Stack implementation to store the digits of the sum as we accumulate them.

The problem actually asked us to find the first ten digits of this sum. A simple query will extract those for us:

// as an answer, we're asked to give the 10 most significant digits of the sum
return sumDigits.Reverse()
    .Take(10)
    .Aggregate(new StringBuilder(), (sb, digit) => sb.Append(digit), sb => sb.ToString())
    .ToLong();

The full code is available on my Project Euler page on MSDN Code Gallery

Footnotes:

[Update 23/6/2008] Corrected the implementation of the "(completeSum) => ..." lambda

2 comments:

Delan said...

While this is a great solution, it does turn out that in this case, as only 10 digits are required, and double-precision floats give about 16 digits of precision, simply adding up the numbers gives the correct answer. The code I used was:

#include
double in[100] = {
37107287533902102798797998220837590246510135740250.0,
...
};
int main() {
        double j = 0.0; int i;
        for (i = 0; i < 100; i++)
                j += in[i];
        printf("%f\n", j);
        return 0;
}

Jsmith said...

nice delan

Post a Comment