Monday 28 April 2008

Project Euler Problem 8

Problem 8 in the the Project Euler series is pure number crunching. The site gives a 1000 digit number, and asks us to find the maximum product of 5 consecutive digits within this number. No number theory required here then.

My idea for solving this problem is to create a sliding window into the big number. The window will be 5 digits wide. We'll position it over the first digit, record the product of all the visible digits, then slide it along to the next position. When we can't slide it any further (when we're 5 digits from the end of the number), then we'll look over all the products we've recorded, and report the maximum.

The first problem in writing the code is to know how to deal with the 1000 digit number. C# cannot deal with such a beast by treating it as a number, and in fact that wouldn't be helpful anyway. We need to look at the digits individually. So what we'll do is supply the number as a string, then convert it into a list of integers, each integer representing one digit. This is what the code looks like:

const string InputDigits = @"73167176531330624919225119674426574742355349194934
                            96983520312774506326239578318016984801869478851843
                            85861560789112949495459501737958331952853208805511
                            12540698747158523863050715693290963295227443043557
                            66896648950445244523161731856403098711121722383113
                            62229893423380308135336276614282806444486645238749
                            30358907296290491560440772390713810515859307960866
                            70172427121883998797908792274921901699720888093776
                            65727333001053367881220235421809751254540594752243
                            52584907711670556013604839586446706324415722155397
                            53697817977846174064955149290862569321978468622482
                            83972241375657056057490261407972968652414535100474
                            82166370484403199890008895243450658541227588666881
                            16427171479924442928230863465674813919123162824586
                            17866458359124566529476545682848912883142607690042
                            24219022671055626321111109370544217506941658960408
                            07198403850962455444362981230987879927244284909188
                            84580156166097919133875499200524063689912560717606
                            05886116467109405077541002256983155200055935729725
                            7163626956188267042825248360082325753042075296345";

var digits = InputDigits
   .ToCharArray()
   .Where(c => char.IsDigit(c))
   .Select(c=> int.Parse(c.ToString()))
   .ToList();

Notice that I've used an "@" quoted string to allow it to span multiple lines, so as to preserve the presentation of the number. To create the List digits we first break up the string into individual characters with ToCharArray(). The Where clause filters out any new line or space characters from the sequence (they're there because of the way I've formatted the string). Then the Select clause converts each character into its corresponding number. Finally the ToList() method causes the sequence of integers to be evaluated and put into a List.

Now that we have our list of digits, the next step is to run our sliding window over it. This will create a list of quintuplets for us - mini-sequences, each containing 5 digits. I'm going to show two ways of doing this: one using my Lazy Trampoline from last time, and a second method that, in this case, evaluates much quicker.

The idea with using the Lazy Trampoline is to process the sequence of digits recursively. Remember that the Lazy Trampoline allows a tail recursive function to return a result after each iteration. Given a sequence, we'll return a sequence consisting of the first 5 elements; then we'll drop the first element of the sequence, and recurse, passing the remainder of the sequence as the parameter. The recursion stops when there are fewer than 5 elements left. Here's the code:

var quintupletsSelector = Trampoline.MakeLazyTrampoline((IEnumerable<int> digitsSequence) =>
{
   // if there are five elements remaining in the sequence
   if (digitsSequence.Skip(5).Any())
   {
       return Trampoline.YieldAndRecurse(digitsSequence.Take(5), digitsSequence.Skip(1));
   }
   else
   {
       return Trampoline.YieldBreak<IEnumerable<int>, IEnumerable<int>>();
   }
});

Executing this statement will cause quintupletsSelector to be initialised as a delegate that takes a sequence of integers, and returns a sequence of quintuplets. This is quite a nice demonstration of how you can recursively process a sequence. The problem is that this implementation is fairly slow (on my machine it takes about 20 seconds to produce all the quintuplets). The reason for that is hidden away behind the implementation of the Skip method. We're using it to create a new sequence that starts from the second member of the first sequence. The problem is that Skip works on sequences, not lists. This means that it can't just jump to the index you give it, it has to enumerate all the preceding items, so that it can ignore them. It may look like we're only asking it to Skip one item each time, but remember that it's working recursively. So we start off working with digitSequence. In the next iteration we're working with digitSequence.Skip(1), in the second digitSequence.Skip(1).Skip(1), and so on. Before any real work can be done, we have to run through the sequence, ignoring any digits that we've already processed. Not very efficient! In a real functional language, we would avoid this by using a data structure that actually allowed us to drop the first item in the list, and just work with the remainder in the next iteration. I might investigate such data structures soon, but I don't want to complicate things just yet.

So, in the shower this morning I thought up a faster way of producing the quintuplets. We'll treat the list as a list, rather than pretending its just a sequence. This will allow us to jump in at a particular index, rather than having to scan through from the beginning of the sequence each time.

I created an extension method RangeFrom that will extract a range from a List, starting from a given index:

public static IEnumerable<T> RangeFrom<T>(this IList<T> list, int startIndex, int count)
{
   for (var i = startIndex; i < startIndex + count; i++)
   {
       yield return list[i];
   }
}

With this extension method, we can now create a query expression to create the quintuplets. This completes almost instantaneously, as you would hope

var quintupletsSelector2 = from index in 0.To(digits.Count - 5)
                          select digits.RangeFrom(index, 5);

The final step is easy (since I've also defined an extension method to calculate the Product of a sequence):

quintupletsSelector2
   .Select(quintuplet => quintuplet.Product())
   .Max()
   .DisplayAndPause();

Or if you want to try out the Lazy Trampoline method:

quintupletsSelector(digits)
   .Select(quintuplet => quintuplet.Product())
   .Max()
   .DisplayAndPause();

Here's the complete sample (you'll also need the Trampoline code from last time)

public class Problem8
{
   public static void Main()
   {
       const string InputDigits = @"73167176531330624919225119674426574742355349194934
                                96983520312774506326239578318016984801869478851843
                                85861560789112949495459501737958331952853208805511
                                12540698747158523863050715693290963295227443043557
                                66896648950445244523161731856403098711121722383113
                                62229893423380308135336276614282806444486645238749
                                30358907296290491560440772390713810515859307960866
                                70172427121883998797908792274921901699720888093776
                                65727333001053367881220235421809751254540594752243
                                52584907711670556013604839586446706324415722155397
                                53697817977846174064955149290862569321978468622482
                                83972241375657056057490261407972968652414535100474
                                82166370484403199890008895243450658541227588666881
                                16427171479924442928230863465674813919123162824586
                                17866458359124566529476545682848912883142607690042
                                24219022671055626321111109370544217506941658960408
                                07198403850962455444362981230987879927244284909188
                                84580156166097919133875499200524063689912560717606
                                05886116467109405077541002256983155200055935729725
                                7163626956188267042825248360082325753042075296345";
       var digits = InputDigits
           .ToCharArray()
           .Where(c => char.IsDigit(c))
           .Select(c => int.Parse(c.ToString()))
           .ToList();

       var quintupletsSelector = Trampoline.MakeLazyTrampoline((IEnumerable<int> digitsSequence) =>
       {
           // if there are five elements remaining in the sequence
           if (digitsSequence.Skip(5).Any())
           {
               return Trampoline.YieldAndRecurse(digitsSequence.Take(5), digitsSequence.Skip(1));
           }
           else
           {
               return Trampoline.YieldBreak<IEnumerable<int>, IEnumerable<int>>();
           }
       });

       var quintupletsSelector2 = from index in 0.To(digits.Count - 5)
                                  select digits.RangeFrom(index, 5);

       // use the Trampoline method:
       quintupletsSelector(digits)
           .Select(quintuplet => quintuplet.Product())
           .Max()
           .DisplayAndPause();

       // use the fast method:
       quintupletsSelector2
           .Select(quintuplet => quintuplet.Product())
           .Max()
           .DisplayAndPause();
   }
}

public static class Extensions
{

   public static IEnumerable<int> To(this int start, int end)
   {
       for (int i = start; i <= end; i++)
       {
           yield return i;
       }
   }

   public static IEnumerable<T> RangeFrom<T>(this IList<T> list, int startIndex, int count)
   {
       for (var i = startIndex; i < startIndex + count; i++)
       {
           yield return list[i];
       }
   }

   public static int Product(this IEnumerable<int> factors)
   {
       int product = 1;

       foreach (var factor in factors)
       {
           product *= factor;
       }

       return product;
   }

   public static void DisplayAndPause(this object result)
   {
       Console.WriteLine(result);
       Console.ReadLine();
   }
}

7 comments:

Anonymous said...

A truly outstanding blog and excellent series of posts on the Project Euler problems in particular, and functional programming in general.

I just wanted to point out one minor problem in your solution.

var quintupletsSelector2 =
from index in 0.To(digits.Count - 6)
select digits.RangeFrom(index, 5);

This misses the last quintuplet in the series. It should be: digits.Count - 5

Cheers and here's looking forward to more posts. Thanks again and congratulations on your pending fatherhood!

Unknown said...

peterb,
I'm glad you like it. It makes my day when I get comments like that.

Thanks also for pointing out my off-by-one error: it should be corrected now.

Sam

Anonymous said...

I love project euler.....I solved his by using a similar method using C.......I used a Max vaiable which held the current max, a Current variable ......

T said...

These posts are really good for learning linq, thanks for writing them. Here's a more compact solution for problem 8. It takes about 150ms to finish.

class Problem8
{
public static long Functional()
{
string numberString = "7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450";

//convert string to int ienumerable
IEnumerable numbers = numberString.ToCharArray()
.Select(item => int.Parse(item.ToString()));

return numbers.WindowProduct(5).Take(995).Max();
}
}

public static class Problem8Extensions
{
public static IEnumerable WindowProduct(this IEnumerable source, int size)
{
int counter = 0;
while (true)
{
yield return source.SkipWhile((number, index) => index <= counter)
.Take(size).Aggregate((x, y) => x * y);
counter++;
}
}
}

T said...

My last post should have <-T-> (without dashes) after each IEnumerable.

Unknown said...

how about:
maxProd = Enumerable.Range(0, digits.Length - 5).Max(i => Enumerable.Range(i, 5).Select(a => digits[a]).Aggregate((x, y) => x * y));

William Blackburn said...

I second peterb's notion.  

Post a Comment