Thursday 28 August 2008

Project Euler Problems 18 and 67: Finding the Maximal Route through a Triangle

If this doesn't pique the interest of my fellow developers then I don't know what will: a problem that will take twenty billion years to solve (at an extremely optimistic estimate), unless we find an efficient algorithm. The problem in question is Project Euler Problem 67. We're given a triangle made up of one hundred rows of numbers, and asked to consider all routes from the top to the bottom, made by moving from one row to the next via adjacent cells. Over all such routes, we have to find the maximum sum. Can't picture it? Let me help:

TriangleRoute

For the route that I've drawn, the sum is 5 + 15 + 29 + 45 = [train your brain and work it out yourself!].

If we only wanted to tackle the baby brother of this problem (Problem 18 - the same problem for a triangle with just 15 rows), we could get away with checking each of the routes. But it's not feasible to do that here because there are 299 possibilities: hence the need for an efficient algorithm. Any ideas?

You could take a bath and wait for the Eureka moment. Or if you want to avoid the ensuing streaking, read on.

An algorithm

Suppose that for each cell in a row we know the maximum sum over all routes ending at that cell. From that it's easy to work out the same for each of the cells in the next row. Here's how you do it. If you've got a cell with value c, somewhere in the middle of the row, then you can only reach it from either of the two cells adjacent to it in the row above. 

CalculatingCellMaxima

We're assuming that we already know the maximum sum for routes ending at these cells: suppose it is a and b respectively. So the biggest possible sum we can get for routes ending at c is Max(c + a, c + b). For the two cells at either end of the row the calculation is even simpler: they can only be reached from the cell at the appropriate end of the row above them, so we just sum the cell value with the known maximum sum to the cell above. In this way, we can crunch through the rows, 'flattening' the triangle until we reach the last row, at which point we will know the maximum possible sum over all routes ending at each of the cells in that row.

That's the middle of an algorithm. To solve the overall problem we just need to add a beginning and an end. The middle part starts by assuming that we already know the maximum sum for all routes ending at a particular row. So to get the algorithm rolling we need a row for which we do have that information, and the first row of the triangle is the obvious candidate: all routes start there, so the maximum up to that point is just the value of the cell in that row. How about the end of the algorithm? To solve the problem, we need to know the maximum sum from top to bottom through the triangle; we've calculated the maximum sum ending at each of the cells in the last row, so we just need to take the maximum over all these cells.

Some code

To code this, we'll reach once again into our Functional Programming toolbox. I think the Aggregate method is just the hammer to crack this particular nut. Do you see how?

Ignore the input part for now, and assume we've got a sequence of List<int>, one List for each row of the triangle. Like this:

{75}
{95, 64}
{17, 47, 82}
{18, 35, 87, 10}
...
In this layout, it's a little tricky to visualise which cells are adjacent to which: you need to look at the cell directly above, and the one above but to the left. For example, the 35 in row four is adjacent to 47 and 17 in the row above.

We'll use Aggregate to 'flatten' the triangle represented by this sequence of Lists using the algorithm I described above: the aggregate that we'll be calculating is a List containing the maximum sum over all routes to each cell up to the last row we've processed. Remember that Aggregate needs a function that combines the aggregate so far with the next item in the series. We'll supply a function that takes the maximums so far and combines them with the cell values for the next row. The code looks like this:

return rows.Aggregate(
    new List<int> {0},
    (currentMaxima, nextRow) =>
        {
            var rowLength = nextRow.Count();
            return nextRow
                .Select((cell, index) =>
                        index == 0 ? cell + currentMaxima[index]
                        : index == rowLength - 1 ? cell + currentMaxima[index - 1]
                        : Math.Max(cell + currentMaxima[index - 1], cell + currentMaxima[index]))
                .ToList();
        }
    )
    .Max();

Notice that at line 7 we're processing the sequence nextRow (containing the cell values) using an overload of the Select method that gives us the index of each cell value in the sequence; we use this index to find the appropriate entries in the list currentMaxima containing the maximum sums to the cells in the row above.

And with that, I believe we've solved the problem. It takes 2 milliseconds to solve the 100-row triangle problem on my computer - a satisfying improvement over 20 billion years.

Here's the complete code. Note that in the source code project I have included two files containing the data for these problems.

namespace ProjectEuler
{
    [EulerProblem(18, Title = "Find the maximum sum travelling from the top of the triangle to the base.")]
    public class Problem18
    {
        public long Solve()
        {
            var dataPath = @"ProblemData\Problem18Data.txt";

            return MaximumSumThroughTriangleSolver.SolveFromFile(dataPath);
        }
    }

    [EulerProblem(67, Title = "Using an efficient algorithm find the maximal sum in the triangle?")]
    public class Problem67
    {
        public long Solve()
        {
            var dataPath = @"ProblemData\Problem67Data.txt";

            return MaximumSumThroughTriangleSolver.SolveFromFile(dataPath);
        }
    }

    public static class MaximumSumThroughTriangleSolver
    {
        public static int SolveFromFile(string dataFilePath)
        {
            string problemData = File.ReadAllText(dataFilePath);

            return Solve(problemData);
        }

        public static int Solve(string problemData)
        {
            var rows = problemData
                .Split(new[] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
                .Select(line =>
                        line.Split(' ')
                            .Select(token => int.Parse(token))
                            .ToList()
                )
                .ToList();

            return rows.Aggregate(
                new List<int> {0},
                (currentMaxima, nextRow) =>
                    {
                        var rowLength = nextRow.Count();
                        return nextRow
                            .Select((cell, index) =>
                                    index == 0 ? cell + currentMaxima[index]
                                    : index == rowLength - 1 ? cell + currentMaxima[index - 1]
                                    : Math.Max(cell + currentMaxima[index - 1], cell + currentMaxima[index]))
                            .ToList();
                    }
                )
                .Max();
        }
    }
}

23 comments:

Anonymous said...

Hey! I believe you explained this as best you could but I still am not getting how to resolve this using your algorithm (maybe it's because I am not a c# guy or I am not that great with math)

Could you give more examples or something?

Anonymous said...

in your first illustration, the last row in the triangle has 5 elements....i think it should have 4.

Unknown said...

You're right about that. I made the same mistake in the second illustration too!

I wonder how many other people spotted that but couldn't be bothered to tell me.

Thanks.

Anonymous said...

Your explanation was exactly what I needed to get past #67. #81 is similar, I think. a matrix for right and down. Rotate, deal with the edge cases and it's the same problem.

Unknown said...

Kylar,
Glad I could help. It's interesting how my explanations suit some, but not others.

I think you might right about 81: I hadn't looked at it before.

Sam

Anonymous said...

Thanks. it would have take me month to think of an useful idea. Your concept help me do my code in a jiffy

Anonymous said...

The first illustration is very wrong. Picking 5-15-27-58 would yield 105 instead of 95, and that's clearly a greater sum.

Unknown said...

You're right that the route you've shown would give a greater sum. However in that picture, I was just trying to illustrate one possible route through the triangle, not necessarily the maximal one.

Anonymous said...

I am trying to solve this problem using Mathematica any advice

Unknown said...

@Anonymous,
I'm afraid I've never used Mathematica

Anonymous said...

You're an exceptionally talented problem solver, Sameul. Thanks for sharing.

Anonymous said...

But yes, the other comments are right about the problem with the graphics. Each row should increase by only a single item. A small flaw in an otherwise outstanding post.

Unknown said...

Peter,
Glad you liked it. Thanks for reading!

Sam

Anonymous said...

I'm trying to solve this problem using an easy Dijktra's algorithm. Am I wrong or should it be easier this way?

Anonymous said...

5stars

Unknown said...

wouldn't it be simpler to do essentially the same as you did, but to start at the bottom and work ones way up the pyramid.

E.g. represent the data as

List numbers = new List>()
{
{15},
{23,1},
{56,33,5},
{5,11,9,33}
};

and then just do:

for (int i = numbers.Count-2; i >= 0; i--)
{
for (int j = 0; j < numbers[i].Count; j++)
{
numbers[i][j] += Math.Max(numbers[i + 1][j], numbers[i + 1][j + 1]);
}
}

once your done, numbers[0][0] holds the result.

works quite fast for me.

cheers

Anonymous said...

Just for yucks, here is an F# equivalent.


let answer =
System.IO.File.ReadLines(".\\triangle.txt")
|> Seq.map(fun x -> x.Split(' ') |> Seq.map(fun y -> System.Int32.Parse(y)))
|> Seq.fold(fun acc x ->
let rowLen = Seq.length x
x |> Seq.mapi(fun i y ->
let current = fun () -> y+(Seq.nth i acc)
let previous = fun () -> y+(Seq.nth (i-1) acc)
if i=0 then current()
elif i=rowLen-1 then previous()
else System.Math.Max(current() , previous())
)
|> Seq.toList
) [0;]
|> List.max

Unknown said...

Hi there,

Thank you for your excellent linq tutorials!

I noticed that the output of your code is the same when removing line 46. Is that line there for a specific reason?

Máté Kovács-Deák said...

Excellent algorithm, which is easy to implement in other languages.

WillNess said...

This has a nice short and clear formulation in Haskell - the bottom up solution:

eu_18_67 tri = head $ foldr1        (\xs ys-> zipWith (+) xs (zipWith max ys (tail ys)))       tri

Cheers,

Matthew Welson said...

I know you've had this already but thanks again for this help. I just couldn't get the right start to this problem (I'm only doing 18) and this was great.
I'll write my own code rather than use yours as I only have a basic knowledge but this method will help a lot

paulius_l said...

I knew this problem could be solved from the top down but could not come up with solution. Very good explanation.

khaled khunaifer said...

This solution is a brute-force, but it would take a long time or a massive processing power .. since the problem size is 2^99

Post a Comment