Monday 31 March 2008

Project Euler Problem 3

This post tackles Problem 3 in the Project Euler series.

Do you remember making Factor Trees in maths lessons? They are a neat way of showing how a composite number (a number that is not prime) is composed of prime factors. Suppose we wanted to factor the number 60. We might start by noticing that it is divisible by 2: 2 x 30 = 60. So we write

        60
       /  \
      2   30

2 is prime, so we're finished with that branch of the tree. Look at the other branch: we've now got to solve the same problem, but with a smaller number. 30 is also divisible by 2, so we add another layer to the tree:

        60
       /  \
      2   30
         /  \
        2    15

Applying the same procedure to 15 gives the final tree:

        60
       /  \
      2   30
         /  \
        2    15
            /  \
           3    5

The procedure stops when every branch of the tree has a prime number hanging from it. I'm not one for fortune telling, but reading these leaves will tell you something useful: the prime factorisation of the root number. In this case 60 = 2 x 2 x 3 x 5.

Problem 3 wants us to find the largest prime factor of a composite number. The Factor Trees methods suggests a nice recursive algorithm, and recursion has a very Functional feel to it. The algorithm is quite straight-forward.

We start with the number that we want to factorise, and feeling friendly, we call it n. We then try to find the first number bigger than 1 which divides n: call this number p. p must be prime: if it wasn't we'd have found another number first that divides n. We next cut n down to size by dividing it by p as may times as we can. Now for the recursive part: we start the procedure over again, this time with the smaller number that we found when dividing n by p. Any prime factor of this smaller number will be larger than p, and is also a factor of n so this trick will give us the answer we need.

There are just two special cases: first, there might not be a number p that divides n. In that case, n must be prime, so n is the answer we are looking for. Second, it might be that when we have divided n by p, we end up with 1, which won't factor any further: but in this case p is the largest prime that divides n, so p is the answer we need.

There's one other thing to note: when we are hunting for p we don't need to try all the numbers as far as n. We only need to go as far as √n. This is because n = √n x √n, so if it has a factor greater than √n, it will have a corresponding factor smaller than √n which we will find by searching only as far as √n.

We could be even cleverer and say that we only need to try dividing n by odd numbers (because if it is divisible by an even number, it will be divisible by 2, and we're chopping out 2 as a factor right at the beginning of the search). I want to keep the code simplish though, so for now I won't bother.

Now for some code:

public void Solve()
{
    long factor = FindGreatestPrimeFactor(1, 600851475143);
    Console.WriteLine(factor);
    Console.Read();
}

public long FindGreatestPrimeFactor(long factorGreaterThan, long number)
{
    long upperBound = (long)Math.Ceiling(Math.Sqrt(number));

    // find next factor of number
    long nextFactor = Range(factorGreaterThan + 1, upperBound)
        .SkipWhile(x => number % x > 0).FirstOrDefault();

    // if no other factor was found, then the number must be prime
    if (nextFactor == 0)
    {
        return number;
    }
    else
    {
        // find the multiplicity of the factor
        long multiplicity = Enumerable.Range(1, Int32.MaxValue)
            .TakeWhile(x => number % (long)Math.Pow(nextFactor, x) == 0)
            .Last();

        long quotient = number / (long)Math.Pow(nextFactor, multiplicity);

        if (quotient == 1)
        {
            return nextFactor;
        }
        else
        {
            return FindGreatestPrimeFactor(nextFactor, quotient);
        }
    }
}

private IEnumerable<long> Range(long first, long last)
{
    for (long i = first; i <= last; i++)
    {
        yield return i;
    }
}

A few words of explanation might be called for.

First, you'll recognise the Range method as being an iterator. I created this one, rather than using Enumerable.Range primarily because I needed to be able to specify first and last as long rather than int. I also took the opportunity to make it more convenient to use by making it take an last parameter, rather than a count as Enumerable.Range does.

In order to find the next factor, we generate a sequence of all the numbers greater than any factor we found previously, up to √number. SkipWhile is another Extension method that works on a sequence (it kind of does the opposite of TakeWhile which we used last time). It ignores the first part of a sequence, and only includes the part after the condition stops being met. In this case, it skips all numbers that don't divide number, and then returns the rest of the input sequence. FirstOrDefault(), as you might expect from its name takes a sequence, and returns the first element of it; or if there are no elements in the sequence, it returns a default value - the default for the Type of element in the sequence, in this case 0.

Finding the multiplicity of the prime factor is similarly straightforward (multiplicity in this case means, how many times the factor divides into number). We generate the sequence of numbers, m, where nextFactor^m divides number, and then take the last one in that sequence.

And there we have it. Not as compact a solution as for previous problems, but interesting non-the-less.

Thursday 27 March 2008

Problem 2

This post tackles Problem 2 in the Project Euler series.

We have the Italian mathematician Fibonacci to thank for importing into Europe the Hindu-Arabic numerals that are the bane of every child learning their 1,2,3. If it wasn't for Fibonacci, we'd still be trying to do multiplication and division with Roman numerals. He also had the amazing insights into the mathematics of rabbit breading which led to the famous series that we consider today, the Fibonacci series.

To generate each term in the Fibonacci series, you simply add together the two terms in the series before it. So if the first two terms are 1 and 2, you continue, to get a series like

1,2,3,5,8,13,...

Problem 2 of Project Euler wants us to sum up all the terms in the Fibonacci series that are divisible by 2 and less than or equal to 4 million.

A while loop might spring to mind at this point, but we want to force ourselves to think differently. Lets try thinking in terms of sequences again, as we did last time. The Fibonacci series is an obvious sequence - an infinite one in fact. We need to generate terms in that sequence until they become too big, then filter out any that are odd, finally adding up what we have left.

Since C#2.0 there's been a very neat way of generating sequences: Iterators. An Iterator is just a method that returns an IEnumerable result (or another similar type - follow the link for the details), and contains the special yield keyword. Look at this one:

private static IEnumerable<int> FibonacciTerms(int firstTerm, int secondTerm)
        {
            yield return firstTerm;
            yield return secondTerm;

            int previous = firstTerm;
            int current = secondTerm;
  
            while (true)
            {
                int next = previous + current;
                previous = current;
                current = next;
                yield return current;
            }
        }

This iterator creates the Fibonacci sequence for us. You might look at the while(true) line, and think that we've just coded up an infinite loop. But there's more to it than meets the eye.

Iterators are very lazy creatures. They never do more than they're asked to do. In fact, if all you were to do was to call the FibonacciTerms method, none of the code above would get executed. It's only when you call an Iterator inside a foreach block that it reluctantly starts to do any work. The first time through a foreach loop, the iterator starts at the top of its method, huffing and puffing through each line of code, until it reaches a yield statement. With a sigh of relief, it returns to the foreach loop the value it has calculated so far, and then goes to sleep. For all subsequent iterations of the foreach loop, the iterator picks up from exactly where it left off, right after the previous yield statement, and carries on through the method until it finds another one, and then yields up its result and go back to sleep.

So rather than looping for ever, the FibonacciTerms method will keep going only as long as it is asked to produce more terms. Is a foreach loop the only way to put it to work? Step up the Enumerable class again. It provides the TakeWhile extension method that does the foreach loop for us. It pulls items out of a sequence, pushing them into an output sequence, until a certain condition is met. So we can say

FibonacciTerms(1, 2).TakeWhile(x => (x <= 4000000) )
and that will generate a sequence of all the Fibonacci terms less than 4 million. Again we're giving TakeWhile a lambda expression so it can decide whether the term that it got from FibonacciTerms meets the condition.

It only remains for us to filter out the odd terms, and to sum up everything else:

private static int SumEvenFibonarciTerms() { return FibonacciTerms(1, 2) .TakeWhile(x => (x <= 4000000)).Where(x => x % 2 == 0) .Sum(); }

Monday 24 March 2008

Problem 1

This post tackles Problem 1 in the Project Euler Series.

As you would expect, Problem 1 won't tax our mathematical skills too far, but it does let us get stuck into the new features of C#3.0.

We're asked to find the sum of all multiples of 3 or 5 below one thousand.

I'd be tempted to dive straight in, and code up a for loop. But that's the old way of doing things. We want to be Functional. How about telling the compiler what we want, and letting it figure out how to get it?

I guess we want to start with all the numbers from 1 to 999, drop all those that aren't divisible by 3 or 5, and then add up the rest.

This is where the excitement begins. A lot of the power of the new Language INtegrated Query (LINQ) features in C#3.0 come from tools to work with sequences. We use sequences all the time: whenever we use foreach on an IEnumerable type, we're iterating through a sequence.

LINQ and its associated APIs allow us to process sequences of things, without there being a for, or an i or a j in sight.

Some of the APIs produce sequences for us. Enumerable.Range(int start, int count) for example will produce a sequence of consecutive integers beginning with start.

Other APIs take a sequence as an input, tweak it in some way, and return a new sequence. Such is the Enumerable.Where method. This takes a sequence, drops things that don't meet a certain condition, then returns a sequence containing everything else.

Here's how you might use this:

Enumerable.Range(1, 999).Where(x => x % 5 == 0 || x % 3 == 0)

This is doing the first part of what we need: taking all the integers less than 1000, and giving us a sequence containing only the ones divisible by 3 and 5.

There are two curious things about that piece of code. One is the way we are calling the Where method. Surely the Range method returns an IEnumerable? Since when did IEnumerable have a Where method? That's the magic of Extension methods, which we'll dig into another time. For now, its enough to know that the Enumerable class adds many useful methods, include Where, which you can call on any object that implements the IEnumerable generic interface. You just need to add a reference to the System.Linq namespace to your code, and the extension methods magically appear.

The other interesting thing is how we're specifying the condition to the Where method. What's up with the "=>" symbol? That tells the compiler that we are creating a Lambda expression. A lambda expresssion is just a little inline block of code that we can pass to another method. This expression just takes one argument x, and returns true or false depending on whether x is divisible by 5 or 3. Isn't the syntax elegant? No need for the return keyword, and no need to tell C# that x is an int or that the expression returns a bool. It works all that out for itself.

Now that it has the lambda expression, the Where method can take every item in its input sequence (the one that it's getting from Enumerable.Range), call the expression to see whether the item meets the condition, and if it does, pass it though to the output sequence.

So it looks like we're almost ready to give Euler his answer. Except that he asked for the sum of all these numbers. That's easy. Enumerable provides another extension method, Sum(), that will get that for us. We just tag it on the end of what we've got already:

static int SumMultiplesOfThreeAndFive()
{
    return Enumerable.Range(1, 999)
            .Where(x => x % 5 == 0 || x % 3 == 0)
            .Sum();
}
Sum just works its way through a sequence, adding up everything it encounters and returns the result.

And there you have it. A Functional solution, in one line of code (ignoring a few newline characters!) to Project Euler, Problem 1.

[Updated to correct a few typos on 24/3/2008]

[Updated to correct a few typos on 25/3/2008]

[Updated to correct solution in line with change to problem on 5/5/2008]

Saturday 22 March 2008

Project Euler

In the not so distant, yet scarily dim past, I completed a maths degree at Birmingham University. From the day I started work at Paragon (which was two weeks after I graduated, and one week after my honeymoon) I've been rather concerned about how fast I've been forgetting all my hardly earned knowledge. These days, I doubt whether I could tell a lemma from a lemur. I stumbled across Project Euler a couple of weeks ago, and its sounds like the ideal remedy to counter my brain rot. Every two weeks since October 2001 they've been publishing a new little maths problem. Let them speak for themselves:
Project Euler is a series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required to solve most problems.
I thought it would make for an interesting series of posts if I took on Euler with all the might of C#3.0, seeing if I can tackle each problem using functional programming techniques. Since that is my aim, I can't promise that the posts will be insightful from a mathematical point of view, or that I'll hit upon the best algorithms. But you never know. Problem 1 coming up next time.

Wednesday 19 March 2008

What's going on here then?

I remember when I first heard about Object Orientated programming. At the time I tinkered around with BASIC programming and felt pretty smug because I knew that goto was considered harmful. This was back in the dark ages when the World Wide Web was still being woven and Bulletin Board Services were all the rage. For the poor folk without a modem there were Public Domain libraries. I used to browse through the catalogue of APDL (the Archimedes Public Domain Library - yes, it's still going today!), and every so often, I'd save up enough pocket money to order a few floppy disks worth of software.

It was in that catalogue that I discovered a tutorial on C++ and Object Orientated programming (supplied, no expense spared, in text files). It sounded so exotic. It conjured up in my mind thoughts about just clicking things together. I remember babbling on to my brother (who was not in the least interested - but I had to tell someone) that it was all about objects passing messages to each other using this magic "->" symbol and doing something called polymorphism. And then there was inheritance. Maybe I would get rich!

I feel some of the same excitement now that I've heard about functional programming. People are flashing around the same magic symbols, only they've moved on from one bar to two: "->" has evolved into "=>". And there's a new spell book of magic words: "Lambda functions", "Monads", and "Memoization", not forgetting "Y Combinators", "Closures" and "Tail Recursion".

I'm now a little more battle-hardened; and we've been warned that there's no magic silver bullet that will slay all our problems. But so many of those in the know are claiming great things for Functional programming in C# that I feel I can't be left behind.

I've been told that the best way to learn is to teach (which does make me wonder about some of my teachers at school). And I think I've discovered the perfect source of examples.

I'll tell you all about it next time.

About me

I suppose I could kick off without introducing myself, but that would be rather rude.

I always write Samuel Jack in my by-line in the vain hope that the second and third syllables of my name will get a hearing; I'm invariably called Sam.

I'm a twentythirty-something software developer living near Bromsgrove in the UK. Through my company, Seaturtle Software Ltd, I work with customers world-wide solving challenging software problems. Get in touch if you have something that fits the bill.

My portfolio spans the sectors. I started off my career working mainly in the nuclear industry (where my applications were used to oversee the clean-up operations at a major UK nuclear site). I spent a few happy years developing a spreadsheet modelling platform aimed at financial directors. Most recently I've been working in the entertainment sector, writing software for DJs. Oh - and I also found time to create a little mobile phone game.

The thing that I love about being a software developer is having ideas and then seeing them come to life - making something that works and is useful. That's one reason for the title of my blog. The other is that, now that C#3.0 is out, everyone is going on about functional programming. So I thought I'd have some fun in finding out what-on-earth they're talking about. Perhaps you'd like to join me?

Monday 17 March 2008

Coming soon

An obligatory hello-world!

Exciting posts on Functional Programming coming up soon.