Monday, 28 April 2008

Project Euler Problem 8

Problem 8 in the the Project Euler series is pure number crunching. The site gives a 1000 digit number, and asks us to find the maximum product of 5 consecutive digits within this number. No number theory required here then.

My idea for solving this problem is to create a sliding window into the big number. The window will be 5 digits wide. We'll position it over the first digit, record the product of all the visible digits, then slide it along to the next position. When we can't slide it any further (when we're 5 digits from the end of the number), then we'll look over all the products we've recorded, and report the maximum.

The first problem in writing the code is to know how to deal with the 1000 digit number. C# cannot deal with such a beast by treating it as a number, and in fact that wouldn't be helpful anyway. We need to look at the digits individually. So what we'll do is supply the number as a string, then convert it into a list of integers, each integer representing one digit. This is what the code looks like:

const string InputDigits = @"73167176531330624919225119674426574742355349194934
                            96983520312774506326239578318016984801869478851843
                            85861560789112949495459501737958331952853208805511
                            12540698747158523863050715693290963295227443043557
                            66896648950445244523161731856403098711121722383113
                            62229893423380308135336276614282806444486645238749
                            30358907296290491560440772390713810515859307960866
                            70172427121883998797908792274921901699720888093776
                            65727333001053367881220235421809751254540594752243
                            52584907711670556013604839586446706324415722155397
                            53697817977846174064955149290862569321978468622482
                            83972241375657056057490261407972968652414535100474
                            82166370484403199890008895243450658541227588666881
                            16427171479924442928230863465674813919123162824586
                            17866458359124566529476545682848912883142607690042
                            24219022671055626321111109370544217506941658960408
                            07198403850962455444362981230987879927244284909188
                            84580156166097919133875499200524063689912560717606
                            05886116467109405077541002256983155200055935729725
                            7163626956188267042825248360082325753042075296345";

var digits = InputDigits
   .ToCharArray()
   .Where(c => char.IsDigit(c))
   .Select(c=> int.Parse(c.ToString()))
   .ToList();

Notice that I've used an "@" quoted string to allow it to span multiple lines, so as to preserve the presentation of the number. To create the List digits we first break up the string into individual characters with ToCharArray(). The Where clause filters out any new line or space characters from the sequence (they're there because of the way I've formatted the string). Then the Select clause converts each character into its corresponding number. Finally the ToList() method causes the sequence of integers to be evaluated and put into a List.

Now that we have our list of digits, the next step is to run our sliding window over it. This will create a list of quintuplets for us - mini-sequences, each containing 5 digits. I'm going to show two ways of doing this: one using my Lazy Trampoline from last time, and a second method that, in this case, evaluates much quicker.

The idea with using the Lazy Trampoline is to process the sequence of digits recursively. Remember that the Lazy Trampoline allows a tail recursive function to return a result after each iteration. Given a sequence, we'll return a sequence consisting of the first 5 elements; then we'll drop the first element of the sequence, and recurse, passing the remainder of the sequence as the parameter. The recursion stops when there are fewer than 5 elements left. Here's the code:

var quintupletsSelector = Trampoline.MakeLazyTrampoline((IEnumerable<int> digitsSequence) =>
{
   // if there are five elements remaining in the sequence
   if (digitsSequence.Skip(5).Any())
   {
       return Trampoline.YieldAndRecurse(digitsSequence.Take(5), digitsSequence.Skip(1));
   }
   else
   {
       return Trampoline.YieldBreak<IEnumerable<int>, IEnumerable<int>>();
   }
});

Executing this statement will cause quintupletsSelector to be initialised as a delegate that takes a sequence of integers, and returns a sequence of quintuplets. This is quite a nice demonstration of how you can recursively process a sequence. The problem is that this implementation is fairly slow (on my machine it takes about 20 seconds to produce all the quintuplets). The reason for that is hidden away behind the implementation of the Skip method. We're using it to create a new sequence that starts from the second member of the first sequence. The problem is that Skip works on sequences, not lists. This means that it can't just jump to the index you give it, it has to enumerate all the preceding items, so that it can ignore them. It may look like we're only asking it to Skip one item each time, but remember that it's working recursively. So we start off working with digitSequence. In the next iteration we're working with digitSequence.Skip(1), in the second digitSequence.Skip(1).Skip(1), and so on. Before any real work can be done, we have to run through the sequence, ignoring any digits that we've already processed. Not very efficient! In a real functional language, we would avoid this by using a data structure that actually allowed us to drop the first item in the list, and just work with the remainder in the next iteration. I might investigate such data structures soon, but I don't want to complicate things just yet.

So, in the shower this morning I thought up a faster way of producing the quintuplets. We'll treat the list as a list, rather than pretending its just a sequence. This will allow us to jump in at a particular index, rather than having to scan through from the beginning of the sequence each time.

I created an extension method RangeFrom that will extract a range from a List, starting from a given index:

public static IEnumerable<T> RangeFrom<T>(this IList<T> list, int startIndex, int count)
{
   for (var i = startIndex; i < startIndex + count; i++)
   {
       yield return list[i];
   }
}

With this extension method, we can now create a query expression to create the quintuplets. This completes almost instantaneously, as you would hope

var quintupletsSelector2 = from index in 0.To(digits.Count - 5)
                          select digits.RangeFrom(index, 5);

The final step is easy (since I've also defined an extension method to calculate the Product of a sequence):

quintupletsSelector2
   .Select(quintuplet => quintuplet.Product())
   .Max()
   .DisplayAndPause();

Or if you want to try out the Lazy Trampoline method:

quintupletsSelector(digits)
   .Select(quintuplet => quintuplet.Product())
   .Max()
   .DisplayAndPause();

Here's the complete sample (you'll also need the Trampoline code from last time)

public class Problem8
{
   public static void Main()
   {
       const string InputDigits = @"73167176531330624919225119674426574742355349194934
                                96983520312774506326239578318016984801869478851843
                                85861560789112949495459501737958331952853208805511
                                12540698747158523863050715693290963295227443043557
                                66896648950445244523161731856403098711121722383113
                                62229893423380308135336276614282806444486645238749
                                30358907296290491560440772390713810515859307960866
                                70172427121883998797908792274921901699720888093776
                                65727333001053367881220235421809751254540594752243
                                52584907711670556013604839586446706324415722155397
                                53697817977846174064955149290862569321978468622482
                                83972241375657056057490261407972968652414535100474
                                82166370484403199890008895243450658541227588666881
                                16427171479924442928230863465674813919123162824586
                                17866458359124566529476545682848912883142607690042
                                24219022671055626321111109370544217506941658960408
                                07198403850962455444362981230987879927244284909188
                                84580156166097919133875499200524063689912560717606
                                05886116467109405077541002256983155200055935729725
                                7163626956188267042825248360082325753042075296345";
       var digits = InputDigits
           .ToCharArray()
           .Where(c => char.IsDigit(c))
           .Select(c => int.Parse(c.ToString()))
           .ToList();

       var quintupletsSelector = Trampoline.MakeLazyTrampoline((IEnumerable<int> digitsSequence) =>
       {
           // if there are five elements remaining in the sequence
           if (digitsSequence.Skip(5).Any())
           {
               return Trampoline.YieldAndRecurse(digitsSequence.Take(5), digitsSequence.Skip(1));
           }
           else
           {
               return Trampoline.YieldBreak<IEnumerable<int>, IEnumerable<int>>();
           }
       });

       var quintupletsSelector2 = from index in 0.To(digits.Count - 5)
                                  select digits.RangeFrom(index, 5);

       // use the Trampoline method:
       quintupletsSelector(digits)
           .Select(quintuplet => quintuplet.Product())
           .Max()
           .DisplayAndPause();

       // use the fast method:
       quintupletsSelector2
           .Select(quintuplet => quintuplet.Product())
           .Max()
           .DisplayAndPause();
   }
}

public static class Extensions
{

   public static IEnumerable<int> To(this int start, int end)
   {
       for (int i = start; i <= end; i++)
       {
           yield return i;
       }
   }

   public static IEnumerable<T> RangeFrom<T>(this IList<T> list, int startIndex, int count)
   {
       for (var i = startIndex; i < startIndex + count; i++)
       {
           yield return list[i];
       }
   }

   public static int Product(this IEnumerable<int> factors)
   {
       int product = 1;

       foreach (var factor in factors)
       {
           product *= factor;
       }

       return product;
   }

   public static void DisplayAndPause(this object result)
   {
       Console.WriteLine(result);
       Console.ReadLine();
   }
}

Friday, 25 April 2008

Commenting

I've not had many comments on my posts: 5 in total so far. That could be because nobody's reading the blog; or maybe I'm not writing anything interesting enough to comment on.

But perhaps it's because commenting is too hard: up till now, Blogger has only been accepting comments from registered users. In the interests of getting some feedback, I'm now allowing annoymous comments. But don't get excited, spammers - I will be moderating everything before it's posted.

Thursday, 24 April 2008

Lazy Trampolining

A couple of epsiodes ago, I showed how you can avoid stack overflow in certain kinds of recursive function using a technique called trampolining. Just mentioning such a word made me feel out of breath, so I went on to develop a technique that I've called lazy trampolining, or lazy recursion.

The idea is to combine the trampoline that I created last time with an Iterator. This allows you to create functions that return a result at each iteration before recursing. Because it uses an Iterator, and Iterators are Lazy, you get to decide how long the recursion should carry on for. I'm still trying to work out how valuable this idea is, but since I've already come up with a couple of practical uses, I figured it would be worth sharing with the world.

The simplest example I've come up with so far is a Fibonacci sequence generator. You may remember that we defined one for Project Euler Problem 2 that used a while loop. Here's an equivalent generator using my Lazy Trampoline:

public static IEnumerable<int> FibonacciTerms(int a, int b)
{
   var recursiveFibonacciFunction = Trampoline.MakeLazyTrampoline((int x, int y) =>
                Trampoline.YieldAndRecurse(x + y, y, x + y)
                );

  return recursiveFibonacciFunction(a, b);
}

You can see that I'm calling the Trampoline.YieldAndRecurse method to indicate what I want to happen in the next iteration. I pass first the result from the current iteration that I want included in the output sequence, then I specify the parameters to be used for the next recursive call.

Now for a slightly longer example. I've taken the algorithm that I had for finding the largest prime factor of a number, and modified it so that it returns all the factors of the number as a sequence. In each iteration of the function, a new factor is found, and this is returned as a PrimeFactor object which contains both the prime number, and its multiplicity (how many times the factor divides into the target number).

        private static IEnumerable<PrimeFactor> PrimeFactors(long prime)
        {
            var function = Trampoline.MakeLazyTrampoline((long previousPrime, long remainder) =>
                {
                    // find next factor of number
                    long nextFactor = (previousPrime + 1).To(remainder)
                        .SkipWhile(x => remainder % x > 0)
                        .FirstOrDefault();

                    if (nextFactor == remainder)
                    {
                        return Trampoline.YieldBreak<long, long, PrimeFactor>(new PrimeFactor { Prime = remainder, Multiplicity = 1 });
                    }
                    else
                    {
                        // find its multiplicity
                        long multiplicity = Enumerable.Range(1, Int32.MaxValue)
                            .TakeWhile(x => remainder % (long)Math.Pow(nextFactor, x) == 0).Last();
                        long quotient = remainder / (long)Math.Pow(nextFactor, multiplicity);

                        PrimeFactor nextPrimeFactor = new PrimeFactor { Prime = nextFactor, Multiplicity = multiplicity };

                        if (quotient == 1)
                        {
                            return Trampoline.YieldBreak<long, long, PrimeFactor>(nextPrimeFactor);
                        }
                        else
                        {
                            return Trampoline.YieldAndRecurse<long, long, PrimeFactor>(nextPrimeFactor, nextFactor, quotient);
                        }
                    }
                }
                );

            return function(1, prime);
        }

Here I use the YieldBreak method: this is used to return one final result, and then terminate the recursion. In the code shown below, I've also included a YieldBreak method that just terminates without returning a final result.

This Lazy Trampoline technique appears to me to be a generalisation of the Unfold operation that we discussed last time. Unfold generates a sequence, but at each iteration you're only given the previous item in the sequence from which to generate the next item. With this technique, you get to decide what parameters to pass through to the next iteration.

This is what the code for the Lazy Trampoline looks like.

public static class Trampoline
{

    public static Func<T1, T2, IEnumerable<TResult>> MakeLazyTrampoline<T1, T2, TResult>(this Func<T1, T2, Bounce<T1, T2, TResult>> function)
    {
        return (T1 param1, T2 param2) => LazyTrampoline(function, param1, param2);
    }

    private static IEnumerable<TResult> LazyTrampoline<T1, T2, TResult>(Func<T1, T2, Bounce<T1, T2, TResult>> function, T1 param1, T2 param2)
    {
        T1 currentParam1 = param1;
        T2 currentParam2 = param2;

        while (true)
        {
            Bounce<T1, T2, TResult> result = function(currentParam1, currentParam2);

            if (result.HasResult)
            {
                yield return result.Result;
            }

            if (!result.Recurse)
            {
                yield break;
            }

            currentParam1 = result.Param1;
            currentParam2 = result.Param2;
        }
    }

    public static Bounce<T1, T2, TResult> YieldAndRecurse<T1, T2, TResult>(TResult result, T1 arg1, T2 arg2)
    {
        return new Bounce<T1, T2, TResult>(arg1, arg2, result);
    }

    public static Bounce<T1, T2, TResult> YieldBreak<T1, T2, TResult>()
    {
        return new Bounce<T1, T2, TResult>();
    }

    public static Bounce<T1, T2, TResult> YieldBreak<T1, T2, TResult>(TResult result)
    {
        return new Bounce<T1, T2, TResult>(result);
    }
}

As you can see, I had to be slightly creative in generating the recursive function in MakeLazyTrampoline. This is because the C# compiler won't allow you to create Iterators as lambda expressions (for very good reasons, no doubt). So I define the Iterator as its own method, then the lambda function that I actually return calls through to the Iterator to generate the sequence. As before, I've only shown the two-parameter version of the code. In the version I've included for download, there's a one-parameter version - you can easily extend it to any number of parameters.

You can download the Trampoline code, and the two examples shown above, here.

In the next episode, I'll show you how I used the Lazy Trampoline to solve Project Euler Problem 8.

Monday, 21 April 2008

Project Euler Problem 7 (and 10)

This post tackles Problem 7 in the Project Euler series.

A couple of episodes ago we introduced the Aggregate method. What this does is work its way through a sequence, summarising it in some way, until finally it produces a single result representing all the other items. I gave examples like finding the sum of a sequence, or computing the average value. "Aggregate" is what we call it if we speak C#: in other languages the same operation goes under the names "reduce", "accumulate", "compress" and perhaps most commonly, "fold".

These all convey the idea of working through a data structure (often a list or sequence, but not always), systematically combining the elements; because we're talking functional languages here, when you use one of these Fold operations you get to specify a function saying how the elements are to be combined.

Today, I'm going to introduce the opposite of the Fold operation, Unfold. (Aside: did anybody else have the thought that these ideas where invented by a gal keeping her mind off the tedium of doing the laundry?)

Where Fold takes a sequence and reduces it to a single element, Unfold starts with a single element, and amplifies it into a sequence. Unfold doesn't exist as such in the .Net class libraries, so I've had to code it up myself. It was a lot of work, as you can see:

    public static IEnumerable<T> Unfold<T>(this T seed, Func<T, T> generator)
    {
        // include seed in the sequence
        yield return seed;

        T current = seed;

        // now continue the sequence
        while (true)
        {
            current = generator(current);
            yield return current;
        }
    }

What I've done is to define an Iterator. For its parameters it takes a seed value and a function that will generate the next item in the sequence given the previous item. Then it's just a case of calling the generator function in a loop, yielding up the result each time, then feeding that result back to the generator function. The OddNumbersGreaterThan method shown below is an example of how you might use this Unfold method.

But what has any of this to do with finding the 10001st Prime, as Problem 7 requests?

Well, there are any ways of calculating the sequence of Prime numbers (the Sieve of Eratosthenes being probably the most famous). But the one that works most naturally (if not most efficiently) in a Functional language involves Unfolding.

What you do is start with a small list of the first few primes - this will be our seed list. Then you use an unfold operation that adds a new prime to the list at each stage. It does that by trial division: starting with the next odd number after the biggest prime in the list, check whether any of the known primes divide into it. If not, you've found your new prime; if it is divisible by any, move on to the next candidate.

Obviously, as the list of known primes gets bigger, finding the next prime in this way will get slower and slower: this isn't the most efficient algorithm, as I said. But we can make it fast enough to be acceptable if we remember again that when we're checking a particular candidate, we don't need to worry about checking divisibility by primes bigger than the square root of the candidate: if it is divisible by a prime bigger than the square root, it must also be divisible by a prime smaller than the square root.

This, then is what the solution looks like:

public class Problem7
{
    public static void Main()
    {
        var firstPrimes = new long[] { 2, 3, 5, 7, 11 };
        //
        var primes = firstPrimes.Unfold(priorPrimes =>
                        priorPrimes
                            .Last()
                            .OddNumbersGreaterThan()
                            .SkipWhile(
                                candidate => priorPrimes
                                  .TakeWhile(prime => prime * prime <= candidate)
                                  .Any(prime => candidate.IsDivisibleBy(prime)))
                            .First()
                            );
        //
        primes.Skip(10000).First().DisplayAndPause();
    }
}

The var keyword, by the way, is not defining a variant, or anything loosely-typed like that. It's just telling the compiler to declare a variable but to figure out what the type should be and substitute that type before it continues the compilation.

First up, we create a list of the first few primes, to get the whole thing started. Then, in line 7, we call the Unfold extension method (I'll show you that code in a minute) to generate the sequence of primes by extending the list firstPrimes. We pass the Unfold method a lambda expression ( lines 8 to 15) which tells it how to use the list of all previously found primes, priorPrimes, to generates the next prime.

And this is how it does it. The Last() method in line 9 finds the last item in this list - the biggest prime; this is taken up by the OddNumbersGreaterThan() method which generates a sequence of odd numbers bigger than that prime. The SkipWhile clause causes any of these odd numbers which are divisible by any of the known primes to be ignored; and finally the First() expression returns the first candidate which is not divisible by any of the previous primes - this is the next prime number: and thus we've extended the sequence by one item.

Just to explain that SkipWhile clause in more detail: we use it to ignore odd numbers from the sequence which are not prime. It does this by running through the sequence of known primes smaller than the Square root of candidate (the TakeWhile clause creates the subsequence for us). The Any method checks whether there are any items in a sequence meeting a condition: in this case, are there any primes which divide into our candidate?

It just remains, in line 18, to work our way through the sequence, ignoring the first 10000 items, finally displaying the 10001st.

As a bonus, now you've seen that, you have everything you need to go and solve Problem 10 as well: summing all primes less than 2 million. All it takes is a slight modification to line 18, but I'll leave that to you.

Thanks to Jacob Carpenter for inspiration for this post.

The complete code is shown below:

public class Problem7
{
    public static void Main()
    {
        var firstPrimes = new long[] { 2, 3, 5, 7, 11 };
        //
        var primes = firstPrimes.Unfold(priorPrimes =>
                        priorPrimes
                            .Last()
                            .OddNumbersGreaterThan()
                            .SkipWhile(
                                candidate => priorPrimes
                                  .TakeWhile(prime => prime * prime <= candidate)
                                  .Any(prime => candidate.IsDivisibleBy(prime)))
                            .First()
                            );
        //
        primes.Skip(10000).First().DisplayAndPause();
    }
}

public static class Extensions
{
    public static bool IsDivisibleBy(this long number, long factor)
    {
        return number % factor == 0;
    }

    public static IEnumerable<long> OddNumbersGreaterThan(this long prime)
    {
        return (prime + 2).Unfold(item => item + 2);
    }

    public static void DisplayAndPause(this object result)
    {
        Console.WriteLine(result);
        Console.ReadLine();
    }
}

public static class Functional
{
    public static IEnumerable<T> Unfold<T>(this IList<T> seedList, Func<IList<T>, T> generator)
    {
        List<T> previousItems = new List<T>(seedList);

        // enumerate all the items in the seed list
        foreach (T item in seedList)
        {
            yield return item;
        }

        // now extend the list
        while (true)
        {
            T newItem = generator(previousItems);
            previousItems.Add(newItem);
            yield return newItem;
        }
    }

    public static IEnumerable<T> Unfold<T>(this T seed, Func<T, T> generator)
    {
        // include seed in the sequence
        yield return seed;

        T current = seed;

        // now continue the sequence
        while (true)
        {
            current = generator(current);
            yield return current;
        }
    }
}

Thursday, 17 April 2008

Project Euler Problem 6

This post tackles Problem 6 in the Project Euler series.

How many budding young mathematicaians have been regaled with the story of Karl Friedrich Gauss and his lightning fast calculations? The story goes that, in a maths lesson at his primary school, his teacher wanted to keep the class busy for an hour, so he told them to work out the sum of all the numbers from 1 through to 100. The others in the class laboured away for the rest of the lesson, but only minutes after being given the problem Gauss, amongst the youngest in his class, had scribbled the answer on his slate and handed it in. The teacher didn't bother to look at his result, assuming that it couldn't possibly be right. When he finally did check at the end of the lesson, Gauss's slate was the only one holding the correct answer. (And here are 109 other versions of this story!)

Gauss' secret? There's a formula for finding the sum of this sequence which made it easy for him to do it in his head: 1 + 2 + ... + n = n(n+1)/2. You can arrive at the formula by noting that when you write down the sum, it's possible to rearrange the terms into pairs of numbers each summing to the same value. So for example, 1 + 2 + 3 + ... + 100 can be written (1 + 100) + (2 + 99) + (3 + 98) + ... + (49 + 52) + (50 + 51). Each pair sums to 101, and their are 50 pairs.

Using other techniques we can find a formula for the sum of squares of the numbers 1 to 100. It is 1^2 + 2^2 + ... + n^2 = n(n + 1)(2n + 1)/6. You can see the derivation of that formula here.

These are the formulae I would use if I wanted to solve Problem 6 mathematically. But my computer spends a large part of its day with the processor sitting at only 2% CPU usage, and I'm afraid it might be getting flabby, like its operator. So I'm going to make it do its calculations the long way.

Here's the code:

    public static class Problem6
    {
        public static void Main()
        {
            int sumOfNumbers = 1.To(100).Sum();
            int squareOfSum = sumOfNumbers * sumOfNumbers;

            int sumOfSquares = 1.To(100).Select(x => x * x).Sum();

            int difference = squareOfSum - sumOfSquares;

            difference.DisplayAndPause();
        }
    }

    public static class Problem6Extensions
    {
        public static IEnumerable<int> To(this int first, int last)
        {
            for (int i = first; i <= last; i++)
            {
                yield return i;
            }
        }

        public static void DisplayAndPause(this int number)
        {
            Console.WriteLine(number);
            Console.ReadLine();
        }
    }

I think this is all self-explanatory, except perhaps the call to the Select method in the line I've highlighted. Put simply, the Select method operates on a sequence and performs a transformation on every element it sees. In this case we're transforming each of the input numbers into their corresponding square number. This sequence of squares is then passed through to the Sum method which gives us the total we need.

So there we are. Nice and simple this time, before the pace picks up in the next episode.

Monday, 14 April 2008

Project Euler Problem 5

My brain must have been in power-saving mode when I first tackled Problem 5. "Find the smallest number divisible by everything from 1 to 20? Easy!", I thought. And instructed the computer to calculate 1 x 2 x 3 x ... x 19 x 20. But when I submitted this answer to the Project Euler website, a big red cross was the response I got.

At this point I realised I needed to divert more blood to my cranial regions. Then I realised where I'd gone wrong. My answer certainly was divisible by all the numbers from 1 to 20, but it was much bigger than it needed to be to meet that requirement.

Notice, for example, that if a number is divisible by 4 and by 5, then it will also be divisible by 20 since 20 = 4 x 5. Thus to make my answer smaller and still meet the requirement, I could drop the final multiplication by 20. I guess I could keep on pruning the number in this way until it could be pruned no more.

Instead, I went for a bottom up approach. What I decided to do was to start from 1, and build up a number integer by integer, making sure that it was divisible by each but no bigger than it needed to be. Here's how I did it.

Suppose we've reached number n, and we've found that P is the smallest number that is divisible by 1 through to n-1. We need to find out what to multiply P by to make sure it is divisible by n as well. Let's introduce a function called gcd, which stands for "greatest common divisor". The gcd(a,b) is the biggest number which divides into both a and b.

At this point, its useful to remember an important thing about prime numbers. Prime numbers are, for mathematicians, what Quarks are to physicists. They are the fundamental building blocks of numbers, the Lego bricks, if you like. The Fundamental Theorem of Arithmetic says that every number bigger than 1 can be broken down into a product of prime numbers in exactly one way. So for example, 20 = 2 x 2 x 5, and there's no other combination of prime numbers that will make 20.

Now, pretend that we've got the computer to calculate the gcd of P and of n and found out that it's some number, g. So the gcd function is finding out for us the maximum possible product of prime numbers that is common to both P and n. So P = g x "some primes", and n = g x "some other primes", and we can be sure that no prime that is part of "some primes" is also part of "some other primes". This means that if we multiple P by "some other primes", the result will be divisible by n, but no bigger than it needs to be.

How do we translate this into Functional C#?

Well, one of the extension methods provided by our friend the Enumerable class is Aggregate. Aggregate takes a sequence and boils it down to a single number (so it's actually an implementation of the Fold concept from Functional Programming). It's up to you what you do to combine the members of the sequence together. You could add them together to find the sum of the sequence, do a comparison to find the Max, or sum and divide to find the average. At each stage, Aggregate tells you what the current aggregate is, and the next item you need to incorporate. In our case, we can use Aggregate on the sequence of numbers from 1 to 20, and get it to calculate the smallest number divisible by every item in that sequence, using the method outlined above.

Here's the code:

public static class Problem5Gcd
{
    public static void Main()
    {
        long result =
            1.To(20)
            .Aggregate(
                (product, factor) => product * (factor / product.Gcd(factor))
            );

        Console.WriteLine(result);
        Console.Read();
    }
}

public static class Problem5Extensions
{
    public static IEnumerable<int> To(this int start, int end)
    {
        for (int i = start; i <= end; i++)
        {
            yield return i;
        }
    }

    public static int Gcd(this int a, int b)
    {
        if (b == 0)
        {
            return a;
        }
        else
        {
            return Gcd(b, a % b);
        }
    }
}

As you can see, the lambda expression we pass to Aggregate takes two parameters, the first one being the smallest number divisible by all previous items in the sequence, and the second being the next item in the sequence. We just need to modify the first parameter so that it is also divisible by that next item, and then return it.

I've defined Gcd as an extension method as you can see. You can details about the algorithm for calculating that here.

Wednesday, 9 April 2008

Bouncing on your tail

Anybody using Recursion to write functions, as we did in the last episode, should be aware of the major pitfall: Stack Overflow.

Do you know why Stack Overflow errors happen? Let me tell you a story.

Inside your computer, as Sean Hargreaves has informed us, are lots of little elves who scribble away at fantastic speeds, doing all the calculations that we programmers throw at them. Each elf has his own little pile of paper on which he jots the results of his sums. As you would expect, the elves need to be very methodical, and they follow a particular system.

Imagine that Charles, Chief Processing elf, is working his way through a function. He scribbles busily on his piece of paper, humming to himself as he notes down the result of each calculation (that's him you can hear if you stick your head next to the case). Then his instructions tell him that he's got to call another function. So he reaches out, grabs a fresh piece of paper, writes the name of the function at the top (and underlines it) together with the values of the parameters, and then plugs away at it. When he gets to the end, and has found his answer, he screws the paper up, and throws it over his shoulder (ever wondered why your case gets clogged up with so much dust?). He writes the answer on the piece of paper now at the top of the pile, right where he left off before the function call.

So you see, for each nested function call, his pile of paper gets one sheet taller. If we give poor Charles a recursive function to work on, the pile will grow and grow and grow, and he'll never be able to chuck any pieces away. As the pile grows, he has to stand on his chair to reach the top. Then he has to fetch a stepladder. The pile keeps on growing. Now he's standing on tip-toes, and then, as he reaches out to get a fresh sheet for his next function call, Catastrophe! He wobbles, the ladder falls, the pile topples. Stack Overflow!

(You can make all this tie up with what Wikipedia says if you know that Charles calls his pile of paper a "Call Stack", and each piece on it a "Stack Frame").

Some Compilers (like the F# compiler for instance) are kind to poor Charles, and change his instructions slightly to help him avoid accidents. It can do this when the final instruction in a function is to call another function and immediately return its result. This is called a Tail Call. If the last instruction of a recursive function is the recursive call, then we call that Tail Recursion. You'll notice that the function we created last time for factoring primes is tail recursive.

So what can the compiler do? Well, because the last instruction Charles has to carry out is to call the next function and then return its result, he knows that he doesn't need to keep his workings-out for the current function - he isn't going to need them again. So the compiler can tell Charles that, rather than starting a fresh sheet of paper, he can scrub his current one with his eraser and reuse it for the next function. To put it more boringly, the function is changed so that rather than making a call, it does a goto back to the top of the function, and thus avoids using a new Stack Frame.

Unfortunately, the C# compiler has no compassion for poor elves, and does nothing to help them avoid Stack Overflow (though in certain circumstances the .Net Just-In-Time compiler does help out a bit ). Sam to the rescue!

Not so long ago, I read about a technique called Trampolining. I thought at first that it might have been named that way by lazy geeks who wanted to make themselves sound athletic. I now know the real reason: trampolining makes functions bounce! You start with a tail-recursive function (that's important: it won't work for ordinary recursive functions). Then you change it so that rather then actually calling itself or returning a value it returns a message. One of two kinds of message in fact. The one kind of message says, "I want to bounce: please call me again with these parameters"; the other kind says, "I've had enough bouncing: here's the final answer". This new function obviously won't work by itself: it needs to be wrapped in a trampoline. The trampoline keeps on calling the function, each time using the the new set of parameters, until the function finally gives an answer.

This is what it looks like, applied to the prime factorisation algorithm from last time:

public long FindGreatestPrimeFactor(long factor, long n)
{

  var function = Trampoline.MakeTrampoline((long factorGreaterThan, long number) =>
  {
      long upperBound = (long)Math.Ceiling(Math.Sqrt(number));


       // find next factor of number
      long nextFactor = Range(factorGreaterThan + 1, upperBound)
                              .SkipWhile(x => number % x > 0).FirstOrDefault();

      // if no other factor was found, then the number must be prime
      if (nextFactor == 0)
      {
          return Trampoline.ReturnResult<long,long,long>(number);
      }
      else
      {
          // find the multiplicity of the factor
          long multiplicity = Enumerable.Range(1, Int32.MaxValue)
                                  .TakeWhile(x => number % (long)Math.Pow(nextFactor, x) == 0)
                                  .Last();

          long quotient = number / (long)Math.Pow(nextFactor, multiplicity);

          if (quotient == 1)
          {
              return Trampoline.ReturnResult<long,long,long>(nextFactor);
          }
          else
          {
              return Trampoline.Recurse<long,long,long>(nextFactor, quotient);
          }
      }
  });

  return function(factor, n);
}

And here's the Trampoline class:

public static class Trampoline
{
  public static Func<T1,T2,TResult> MakeTrampoline<T1,T2,TResult>(Func<T1,T2,Bounce<T1,T2, TResult>> function)
  {
      Func<T1,T2,TResult> trampolined = (T1 arg1, T2 arg2) =>
      {
          T1 currentArg1 = arg1;
          T2 currentArg2 = arg2;

          while (true)
          {
              Bounce<T1,T2, TResult> result = function(currentArg1, currentArg2);

              if (result.HasResult)
              {
                  return result.Result;
              }
              else
              {
                  currentArg1 = result.Param1;
                  currentArg2 = result.Param2;
              }
          }
      };

      return trampolined;
  }


  public static Bounce<T1,T2, TResult> Recurse<T1,T2, TResult>(T1 arg1, T2 arg2)
  {
      return new Bounce<T1,T2, TResult>arg1, arg2);
  }

  public static Bounce<T1,T2, TResult> ReturnResult<T1,T2, TResult>(TResult result)
  {
      return new Bounce<T1,T2, TResult>result);
  }

}


public struct Bounce<T1,T2, TResult>
{
  public T1 Param1 {get; private set;}
  public T2 Param2 { get; private set; }

  public TResult Result { get; private set; }
  public bool HasResult { get; private set; }
  public bool Recurse { get; private set; }

  public Bounce(T1 param1, T2 param2) : this()
  {
      Param1 = param1;
      Param2 = param2;
      HasResult = false;

      Recurse = true;

  }

  public Bounce(TResult result) : this()
  {
      Result = result;
      HasResult = true;

      Recurse = false;

  }
}

MakeTrampoline is an example of a higher-order function: it's a function factory - a function that creates another function. It takes our modified recursive function - the one that returns messages - and creates from it the recursive version that actually returns the result.

The type of the parameter to MakeTrampoline, indicates that we're expecting a delegate, a pointer to a function of two arguments, returning an instance of the Bounce struct. Bounce is what I'm using to convey the messages from the wanna-be recursive function. It either holds the result of the recursion, or the two parameters that should be used for the next recursive call. You can see that I've created two factory methods on Trampoline, Recurse, and ReturnResult, to make the syntax for returning messages from the would-be recursive function a bit more obvious.

Inside the MakeTrampoline method, I've defined a closure. This is an anonymous function that captures some of the variables in the function that surrounds it. In this case, the closure captures the value of the function parameter. This means that the anonymous function - our trampoline- can then call function in a loop, each time doing as it requests: either simulating recursion by calling function again with the new set of parameters, or, ultimately, returning a value and jumping out of the loop. Because it is a function factory, MakeTrampoline returns a delegate pointing to this inner function, rather than actually calling it itself. Thus its up to calling functions, like FindGreatestPrimeFactor to call the newly minted function to get the result.

So what do you think? I've tried different variations on this theme to see if I can improve the usage syntax, but I haven't been able to come up with anything significantly better - if anybody has any ideas I'd be glad to hear them. In the source, you can find another overload of the MakeTrampoline method which takes a 1-parameter function. As you can see, it's straightforward to extend to Functions with any number of parameters, but will unfortunately involve code-duplication. I've tried to think of a way round that too, but the only methods I've come up with involve sacrificing some efficiency.

Now this Trampoline isn't a panacea. It won't save you from badly written recursive functions that never return: in that case you'll just exchange a StackOverflow exception for a hung application as the Trampoline goes into an infinite loop. And I don't recommend it for simple cases where you know that you've only got a few levels of recursion. Standard recursion will always perform better. But it might be useful in cases with very deep recursion, or in frameworks with less memory allocated to the Stack (I think the .Net Compact Framework is one such case). And this technique has suggested to me another, Lazy Trampolining, which I'll tell you about some other time.

The source for this article can be found here.

Monday, 7 April 2008

Project Euler Problem 4

This post tackles Problem 4 in the Project Euler series, which wants us to find the largest palindrome that is a product of two, three digit, numbers.

Without further ado, here's my solution:

    public class Problem4
    {
        public static void Main()
        {
            (
             from x in 100.To(999)
             from y in 100.To(999)
             let product = x * y
             where product.IsPalindromic()
             orderby product descending
             select product
             )
             .First()
             .DisplayAndPause();
        }

    }

If you've not had much exposure to C#3.0, this might come as quite a shock: maybe you're wondering whether I slipped up, and pasted in some mutant SQL? But I can assure you that what I have written is quite legal, and I've no fear that Anders will be issuing an arrest warrant for me!

This snippet introduces several new features of C# 3.0, so we'll work through it slowly.

The part inside the brackets is a Query Expression. It might look like an island of SQL in an ocean of C# - albeit backwards SQL. In fact it's actually a different way (in many cases a more natural way) of writing the kind of expressions which query sequences that we've written in previous posts. The compiler will take that expression and translate it into a chain of calls to the appropriate extension methods on the Enumerable class. I'll let Anders supply the details of how that works.

The from clause defines the source of the data for the query, and names a range variable that you use in subsequent clauses: the range variable is a bit like the iteration variable in a foreach loop. But what is the source of data here? What I've done is to define an extension method, To, that I can call as if it where a member of int. It creates an iterator that produces all the integers in the range of specified.

I've mentioned extension methods in previous posts. Now's the time to see how they work. This is the definition of To:

public static class Problem4Extensions
{
    public static IEnumerable To(this int first, int last)
    {
        for (int i = first; i <= last; i++)      
        { 
            yield return i;
        } 
    }
}

As you can see, To is just a static method defined in a public class (that's a key requirement of extension methods, by the way - they won't work if you try putting them anywhere else). The only thing special about it is that the first parameter has the this keyword in front of it. The "this" keyword marks out the type that will appear to be extended with the new method. The extension method body is written just as any other static method would be. So don't get too excited: extension methods don't give you any special powers to delve into the internals of the classes that you're extending.

We use two from clauses here, and this will generate for us all possible pairs of numbers between 100 and 999.

The let clause is nice: it allows us to store the result of an expression to save duplicting it throughout the query. Because, having generated the product of x and y, we then need to check whether it is palindromic - the where clause filters it out if it's not. For that we've got another extension method:

    public static class Problem4Extensions
    {
        ...

        public static bool IsPalindromic(this int number)
        {
            var digits = number.ToString().ToCharArray();

            return digits.SequenceEqual(digits.Reverse());
        }
    }

Using the orderby clause we sort the palindromes, so that the biggest ones come first. To complete the query expression, we need a select statement to actually suck some data out of the query. Here we just pull out the product (that we now know is a palindrome). The answer to the problem is clearly the first number in this sequence, which First() kindly gives us. Finally, so that I can boast again of having solved the problem in one line of code, I've defined an extension method to print the result to the console, and wait for the user admire it:

    public static class Problem4Extensions
    {
        ...
        public static void DisplayAndPause(this int number)
        {
            Console.WriteLine(number);
            Console.ReadLine();
        }
    }

You can download the complete solution here. I should point out that this brute-force means of finding the answer is by no means the best. Look on the Project Euler forums for more mathematically informed ways of tackling the problem.

Thursday, 3 April 2008

Book Review: Pro LINQ

Pro LINQ: Language Integrated Query in C#2008 by Joseph C. Rattz, Jr. Rating: 3 out of 5.

I ordered a copy of Pro LINQ just before Christmas for a colleague who needed to get up to speed with C#3.0 for our most recent project. I got my hands on it a little while back, and thought it would make for a topical first book review for the blog.

Pro Linq aims to cover all the new Language Integrated Query features that have been introduced into .Net 3.5, from a C# perspective. After a brief chapter painting a picture of the new LINQ landscape, it kicks off with an introduction to the new language features in C# 3.0. Then follow sections on LINQ to Objects, LINQ to XML, the oft-forgotten LINQ to DataSet, concluding with LINQ to SQL. It would have been interesting to hear about LINQ to Entities, but the author, reasonably enough, declared that outside the scope of his book.

The author declares that his book "is all about code". And in fact it is organised around code. Most of each section is organised around the methods and properties of the LINQ classes.

What I liked

On the whole I found Pro LINQ easy to read. The author has a conversational style, and gives reasonably clear explanations.

There's a thorough, exhaustive (though some might say exhausting!) reference of all the new LINQ APIs in every area mentioned above.

The most useful section for me was the one covering LINQ to SQL. After covering all the basics, it goes into some detail about the harder aspects like handling concurrency conflicts.

Right at the beginning of the book, the author has included a "Tips to Get you started" section listing common pitfalls.

What I didn't like

Given that LINQ is so revolutionary, I finished the book feeling rather uninspired, despite my own enthusiasm for the subject. I think this was partly the fault of the examples used. I can appreciate that it must be very hard to create something fresh for everything you want to illustrate, but I felt the author could have come up with something more imaginative than filtering a list of US Presidents, for example.

What I would have liked to have seen were examples of the different parts of LINQ working together. Most of the samples dealt with just one concept at a time (which I know is how it needs to be to get started) but the excitement comes from putting everything together, and we never really got to that point.

There was also a lot of repetitiveness. This book has an example for almost every overload of every Enumerable class extension method, and not much variation in the examples either. And do we really need the console output of every code sample? These things made a lot of the book read like MSDN. The fact that under every method, the author had listed Exceptions that might be thrown by the method just added to the feeling that I might be better off reading through the online help.

Don't expect either to find in this book much about deeper considerations: when to use LINQ, or when to avoid it; how well the LINQ APIs perform, or indeed how they work; or anything about the bigger picture of Functional Programming is. This book doesn't get much beyond the limited picture of what the LINQ APIs are, and how to call them.

Conclusion

I'm afraid the last few paragraphs sound negative: that's because I had high hopes for this book, but came away dissapointed. It is certainly useful, and I'll pass it around the office for people getting started, but I expect that I'll find a replacement before long. Hopefully one that talks more about the why and the when, not just the what and the how.

Wednesday, 2 April 2008

How inclusive?

Seen on the label of a bag of Thorntons Chocolates: "Caution. May include inclusions"

Tuesday, 1 April 2008

Links I Like #1

There are already numerous link blogs in the blog-o-sphere, and several link blogs targetting .Net: I'm not going to try to compete.

But I thought I would occassionally compile some of my favourites - for selfish reasons as much as anything: they'll get included in the custom Google Search engine that lives behind the box in the top left of the blog, and I'll stand a chance of finding them again.

Technical

Non-Technical