Monday 21 April 2008

Project Euler Problem 7 (and 10)

This post tackles Problem 7 in the Project Euler series.

A couple of episodes ago we introduced the Aggregate method. What this does is work its way through a sequence, summarising it in some way, until finally it produces a single result representing all the other items. I gave examples like finding the sum of a sequence, or computing the average value. "Aggregate" is what we call it if we speak C#: in other languages the same operation goes under the names "reduce", "accumulate", "compress" and perhaps most commonly, "fold".

These all convey the idea of working through a data structure (often a list or sequence, but not always), systematically combining the elements; because we're talking functional languages here, when you use one of these Fold operations you get to specify a function saying how the elements are to be combined.

Today, I'm going to introduce the opposite of the Fold operation, Unfold. (Aside: did anybody else have the thought that these ideas where invented by a gal keeping her mind off the tedium of doing the laundry?)

Where Fold takes a sequence and reduces it to a single element, Unfold starts with a single element, and amplifies it into a sequence. Unfold doesn't exist as such in the .Net class libraries, so I've had to code it up myself. It was a lot of work, as you can see:

    public static IEnumerable<T> Unfold<T>(this T seed, Func<T, T> generator)
    {
        // include seed in the sequence
        yield return seed;

        T current = seed;

        // now continue the sequence
        while (true)
        {
            current = generator(current);
            yield return current;
        }
    }

What I've done is to define an Iterator. For its parameters it takes a seed value and a function that will generate the next item in the sequence given the previous item. Then it's just a case of calling the generator function in a loop, yielding up the result each time, then feeding that result back to the generator function. The OddNumbersGreaterThan method shown below is an example of how you might use this Unfold method.

But what has any of this to do with finding the 10001st Prime, as Problem 7 requests?

Well, there are any ways of calculating the sequence of Prime numbers (the Sieve of Eratosthenes being probably the most famous). But the one that works most naturally (if not most efficiently) in a Functional language involves Unfolding.

What you do is start with a small list of the first few primes - this will be our seed list. Then you use an unfold operation that adds a new prime to the list at each stage. It does that by trial division: starting with the next odd number after the biggest prime in the list, check whether any of the known primes divide into it. If not, you've found your new prime; if it is divisible by any, move on to the next candidate.

Obviously, as the list of known primes gets bigger, finding the next prime in this way will get slower and slower: this isn't the most efficient algorithm, as I said. But we can make it fast enough to be acceptable if we remember again that when we're checking a particular candidate, we don't need to worry about checking divisibility by primes bigger than the square root of the candidate: if it is divisible by a prime bigger than the square root, it must also be divisible by a prime smaller than the square root.

This, then is what the solution looks like:

public class Problem7
{
    public static void Main()
    {
        var firstPrimes = new long[] { 2, 3, 5, 7, 11 };
        //
        var primes = firstPrimes.Unfold(priorPrimes =>
                        priorPrimes
                            .Last()
                            .OddNumbersGreaterThan()
                            .SkipWhile(
                                candidate => priorPrimes
                                  .TakeWhile(prime => prime * prime <= candidate)
                                  .Any(prime => candidate.IsDivisibleBy(prime)))
                            .First()
                            );
        //
        primes.Skip(10000).First().DisplayAndPause();
    }
}

The var keyword, by the way, is not defining a variant, or anything loosely-typed like that. It's just telling the compiler to declare a variable but to figure out what the type should be and substitute that type before it continues the compilation.

First up, we create a list of the first few primes, to get the whole thing started. Then, in line 7, we call the Unfold extension method (I'll show you that code in a minute) to generate the sequence of primes by extending the list firstPrimes. We pass the Unfold method a lambda expression ( lines 8 to 15) which tells it how to use the list of all previously found primes, priorPrimes, to generates the next prime.

And this is how it does it. The Last() method in line 9 finds the last item in this list - the biggest prime; this is taken up by the OddNumbersGreaterThan() method which generates a sequence of odd numbers bigger than that prime. The SkipWhile clause causes any of these odd numbers which are divisible by any of the known primes to be ignored; and finally the First() expression returns the first candidate which is not divisible by any of the previous primes - this is the next prime number: and thus we've extended the sequence by one item.

Just to explain that SkipWhile clause in more detail: we use it to ignore odd numbers from the sequence which are not prime. It does this by running through the sequence of known primes smaller than the Square root of candidate (the TakeWhile clause creates the subsequence for us). The Any method checks whether there are any items in a sequence meeting a condition: in this case, are there any primes which divide into our candidate?

It just remains, in line 18, to work our way through the sequence, ignoring the first 10000 items, finally displaying the 10001st.

As a bonus, now you've seen that, you have everything you need to go and solve Problem 10 as well: summing all primes less than 2 million. All it takes is a slight modification to line 18, but I'll leave that to you.

Thanks to Jacob Carpenter for inspiration for this post.

The complete code is shown below:

public class Problem7
{
    public static void Main()
    {
        var firstPrimes = new long[] { 2, 3, 5, 7, 11 };
        //
        var primes = firstPrimes.Unfold(priorPrimes =>
                        priorPrimes
                            .Last()
                            .OddNumbersGreaterThan()
                            .SkipWhile(
                                candidate => priorPrimes
                                  .TakeWhile(prime => prime * prime <= candidate)
                                  .Any(prime => candidate.IsDivisibleBy(prime)))
                            .First()
                            );
        //
        primes.Skip(10000).First().DisplayAndPause();
    }
}

public static class Extensions
{
    public static bool IsDivisibleBy(this long number, long factor)
    {
        return number % factor == 0;
    }

    public static IEnumerable<long> OddNumbersGreaterThan(this long prime)
    {
        return (prime + 2).Unfold(item => item + 2);
    }

    public static void DisplayAndPause(this object result)
    {
        Console.WriteLine(result);
        Console.ReadLine();
    }
}

public static class Functional
{
    public static IEnumerable<T> Unfold<T>(this IList<T> seedList, Func<IList<T>, T> generator)
    {
        List<T> previousItems = new List<T>(seedList);

        // enumerate all the items in the seed list
        foreach (T item in seedList)
        {
            yield return item;
        }

        // now extend the list
        while (true)
        {
            T newItem = generator(previousItems);
            previousItems.Add(newItem);
            yield return newItem;
        }
    }

    public static IEnumerable<T> Unfold<T>(this T seed, Func<T, T> generator)
    {
        // include seed in the sequence
        yield return seed;

        T current = seed;

        // now continue the sequence
        while (true)
        {
            current = generator(current);
            yield return current;
        }
    }
}

5 comments:

urza said...

Hello, may i ask what is the difference between your unfold and SelectMany?

Unknown said...

@urza,
SelectMany is designed to flatten many sequences into a single sequence - if you had listOfLists = List>, for example, listOfLists.SelectMany(list => list) would flatten the list of lists into a single sequence of IEnumerable containing all the individual lists joined together.

unfold, on the other hand, is used to generate sequences starting with a single item, using a function to compute the next item based on the previous one.

Sameer Shah said...

Nice explanation. However, I was a little confused when you said
"we don't need to worry about checking divisibility by primes bigger than
the square root of the candidate: if it is divisible by a prime bigger
than the square root, it must also be divisible by a prime smaller than
the square root"

However, I got what you meant by referring the following forum
http://mathforum.org/library/drmath/view/56001.html

William Blackburn said...

hello, I was going through your solution and I am wondering why you have 2 Unfold extension methods?  I know they are both necessary(I tried removing them), but I don't fully understand why.  Can you clear this up for me? 

Samuel Jack said...

No worries - Writing down the question is always a good way of unlocking the answer yourself. You've heard of Rubber Ducking?

Post a Comment