Monday, 7 July 2008

Reporting Progress during LINQ queries

How many times have you had someone use your software, then say to you: "That feature is so much faster now. What did you do?" And in fact all you had done was implement a progress bar, thus harnessing, not the power of the CPU, but the power of psychology!

Reporting progress of an operation is very easy when you have a list of objects that you loop through in a for or foreach block: you just need to know the total number of objects before you start, increment a counter as you process them, and periodically raise an event to notify interested parties (usually your UI) of the percentage progress. But how do you do calculate progress when there are no loops in sight - when you are using LINQ queries to process sequences?

I had a flash of inspiration the other night, followed by an impromptu coding session the next morning when the unusually bright sunlight woke me up early. I thought I'd share the results with you.

LINQ Pipeline Monitors

One way to think of LINQ queries with sequences is as a pipeline, or an oil refinery. You have the data source (often a collection) acting like the oil well, but producing items. Items flow along the pipeline (think of the "."s in the expression as the pipe). Some items are removed from the pipeline by filters - Where clauses; some items are converted to other products - Select clauses; finally, just as oil is cracked, and graded into gasoline and diesel etc. items in LINQ queries are often grouped and sorted. In oil pipelines there are also flow meters, measuring how much oil is going through the pipes. If we can create the equivalent of a flow-meter for LINQ queries, then we've got our means of reporting progress.

What I came up with is an extension method, WithProgressReporting(...), that you can just insert in the appropriate place in your LINQ query. Given an input sequence, it simply passes items through to an output sequence. But as items pass through it counts how many its sees, and calls a lambda expression to report progress against a pre-determined count of the total number of items.

I've created two variants of this.

  • The first one ensures that all the items in the sequence have been generated up front, so that it knows how many there are to process; use this if your data source is a collection, or if generating the items takes insignificant time compared with the processing you're going to do.
  • The second variant can be used if generating the items in the sequence takes time (so you want to report progress of this part as well), but you know in advance how many there will be: it doesn't attempt to buffer the sequence before passing the items through to the output sequence.
public static class Extensions
{
    public static IEnumerable<T> WithProgressReporting<T>(this IEnumerable<T> sequence, Action<int> reportProgress)
    {
        if (sequence == null) { throw new ArgumentNullException("sequence"); }
    
        // make sure we can find out how many elements are in the sequence
        ICollection<T> collection = sequence as ICollection<T>;
        if (collection == null)
        {
            // buffer the entire sequence
            collection = new List<T>(sequence);
        }

        int total = collection.Count;
        return collection.WithProgressReporting(total, reportProgress);
    }

    public static IEnumerable<T> WithProgressReporting<T>(this IEnumerable<T> sequence, long itemCount, Action<int> reportProgress)
    {
        if (sequence == null) { throw new ArgumentNullException("sequence"); }

        int completed = 0;
        foreach (var item in sequence)
        {
            yield return item;
    
            completed++;
            reportProgress((int)(((double)completed / itemCount) * 100));
        }
    }
}

WithProgressReporting in action

Here's a toy example showing how you might use the first method when you're using a BackgroundWorker to do calculations on a background thread. Focus on line 11, where I'm using WithProgressReporting to report progress to the BackgroundWorker:

public void TestProgressReporting()
{
    BackgroundWorker worker = new BackgroundWorker();
    worker.WorkerReportsProgress = true;
    worker.DoWork += (sender, e) =>
          {
              // pretend we have a collection of 
              // items to process
              var items = 1.To(1000);
              items
                  .WithProgressReporting(progress => worker.ReportProgress(progress))
                  .ForEach(item => Thread.Sleep(10)); // simulate some real work
          };

    worker.ProgressChanged += (sender, e) =>
        {
            // make sure the figure is written to the
            // same point on screen each time
            Console.SetCursorPosition(1, 0);
            Console.Write(e.ProgressPercentage);
        };

    worker.RunWorkerAsync();
    Console.Read();
}

public static class Extensions
{
   public static void ForEach<T>(this IEnumerable<T> sequence, Action<T> action)
   {
       foreach (var item in sequence)
       {
           action(item);
       }
   }
}

Now, stay tuned. I've not exhausted this flash of inspiration yet...

8 comments:

LINQ Master said...

Very interesting idea. Looking forward to the next post.

Sam said...

Thanks for the feedback - do you want to try out the new rating gadget for me as well ;-) ?

Pete said...

Nice.

Where did you get your rating gadget?

Thanks,
Pete.

Sam said...

Pete,
It's from outbrain.com

LINQ Master said...

Very interesting idea. Looking forward to the next post.

Mick said...

Hi Sam,

thanks for sharing your flashes of inspiration - it's also explained very good, so that even I, being a hobby developer only, can catch some knowledge from it ;-)

For two days now I've been trying to implement your techique into my VB application, using an external DLL for the extensions in order not to get into 'yield' operator trouble. But still, calling the extension methods with the lambdas you use won't work: The VB-translated line 'items.WithProgressReporting(Function(progress) worker.ReportProgress(progress)).foreach....' wouldn't compile because 'worker.ReportProgress(progress)' doesn't seem to provide a value in VB. It's funny, but in C# your example works perfectly. Do you have an idea what might be wrong with the translated lambda-expression or how to 'dismantle' it into seperate procedures?Thank you very much,Mick

Mick said...

It's resolved: VB needs "Sub" instead of "Function" within the Lambda expression in this case. Just wanted to let others know.

Sergio said...

Hi,
what about using an Enumerable like Directory.EnumerateFiles where you don't want to execute Count for performance?

Post a Comment