Thursday 10 July 2008

Cancelling Long-running LINQ queries

Last time you dropped by, we were discussing how that polite software should tell its users how it's getting on with the tasks it has been set - it should give progress reports; I showed how to implement progress reporting in LINQ queries. Now the book of software etiquette also says (on page 214, but you'll have to trust me on that because I've got the only copy - stored in my imagination) that well mannered programs should allow their users to cancel very long running tasks.

When I was new to the .Net framework, and hadn't read around much, I was inclined to think that the best way of supporting cancellation is spin up a new Thread, do the work there and, if the user should press "Cancel", pull the trigger on the Thread using Thread.Abort(). But as Ian Griffiths explains, Thread.Abort() is a nuclear option that you should only invoke if you're prepared to deal with the radioactive fallout afterwards - by which I mean the inconsistent state that the ThreadAbortException generated by Thread.Abort() can leave in its wake as it goes about terminating your thread. If you want to stop your program in its tracks without damaging the environment you should use a good old-fashioned bool flag that one thread can set and the worker thread can check periodically to see whether it should break out of its current job.

But when we're using LINQ expressions, with nary a loop in sight, how can we periodically check that flag?

Remember the pipeline analogy I used of LINQ queries last time? There we talked about introducing flow meters to observe the number of elements passing through the pipe - that idea led to the progress reporting we needed. Well, the other thing every sensibly-constructed pipeline has are valves that can shut the whole operation down: that's just what we want to support cancellation. The idea is that, when we hear that we need to cancel, we starve the LINQ query of any further items so that it terminates early.

As before, we can implement this LINQ "valve" using an Extension method containing an iterator that just pulls items from the input sequence on the left and passes them through to the output sequence on the right, but crucially calls a delegate after every item to ask whether it should carry on or cut the output sequence short; we can even combine this technique with the one from last time to support reporting progress.

Also as before, I've provided two versions of this extension method: one that buffers the input collection in order to count it, and one that doesn't buffer it - but does require you to tell it how many elements there will be.

public static IEnumerable<T> AsCancelable<T>(this IEnumerable<T> sequence, Func<int,bool> shouldCancel)
{
    if (sequence == null) { throw new ArgumentNullException("sequence"); }

     // make sure we can find out how many elements are in the sequence
     ICollection<T> collection = sequence as ICollection<T>;
     if (collection == null)
     {
        // buffer the entire sequence
        collection = new List<T>(sequence);
     }

     long total = collection.Count;
     return collection.AsCancelable(total, shouldCancel);
}

public static IEnumerable<T> AsCancelable<T>(this IEnumerable<T> sequence, long itemCount, Func<int, bool> shouldCancel)
{
    if (sequence == null) { throw new ArgumentNullException("sequence"); }

    long completed = 0;

    foreach (var item in sequence)
    {
        yield return item;

        completed++;
        bool cancel = shouldCancel((int)(((double)completed / itemCount) * 100));
        if (cancel) { yield break; }
    }
}

To provide an example of how you might use this, I've applied AsCancelable to a slightly modified version of my solution to Project Euler Problem 4. To make the code run for longer (to give us an opportunity to cancel it!), I've changed it to find the largest palindrome that is a product of two 4 digit numbers (Problem 4 was about two 3 digit numbers).

public void TestCancellableProgressReporting()
{
    bool cancelled = false;

    Func<int, bool> progressController = percent =>
                        {
                            Console.SetCursorPosition(1, 0);
                            Console.WriteLine(percent + "%");
                            Console.WriteLine("Press 'c' to cancel");
                            cancelled = (Console.KeyAvailable && Console.ReadKey(true).Key == ConsoleKey.C);
                            return cancelled;
                        };

    var largestPalindrome =
                        (
                        from long x in 1000.To(9999).AsCancelable(progressController)
                        from long y in 1000.To(9999)
                        let product = x * y
                        where product.IsPalindromic()
                        orderby product descending
                        select product
                        )
                        .FirstOrDefault();
    if (!cancelled)
    {
        largestPalindrome
            .DisplayAndPause();
    }
    else
    {
        "Calculation Cancelled".DisplayAndPause();
    }
}

A couple of things to note about this:

  • You need to make sure your LINQ query is prepared for the starvation it might face. In the original solution to Problem 4 I used a call to First() rather than FirstOrDefault() at line 23 (the difference being that First() throws an exception if it doesn't find any elements in the sequence, whereas FirstOrDefault will return the default value for the sequence type - 0 in this case). That was fine in the original context, because there were always elements in the sequence. When we use AsCancelable however, the sequence might be terminated before any elements have been passed through
  • You need to be sure to ignore the result of the query if it has been cancelled. This is obvious really, but I nearly forgot when writing the sample. In this case largestPalindrome will always be assigned a value, but only when we don't cancel the query will it be a valid value.
  • AsCancelable will only work in expressions where we are directly processing sequences - ie when the data source is IEnumerable rather than IQueryable. It will not work with LINQ to SQL for example, because LINQ to SQL translates the expressions you give it to SQL statements which clearly don't support cancellation! You might want to check out AsEnumerable in this context.
  • You might also want to note in passing the way to read input from the Console without blocking: check Console.KeyAvailable first!

7 comments:

Strife Onizuka said...

Couldn't you just as easily use TakeWhile for this task?

Unknown said...

BW,
You can certainly use TakeWhile to support cancellation, but it wouldn't give you the progress reporting as well.

Sam

Anonymous said...

how to cancel a linq to sql execution?

Unknown said...

Cancelling a Linq to SQL query is much more difficult, I'm afraid. I don't really have any answers.

My best suggestion is to run the query on a separate thread so that you can abandon it, if needs be. You might not be able to do this in all circumstances though, because I think the DataContext is tied to a specific thread.

Anonymous said...

This is brilliant. Thanks for sharing.

I'm using this to cancel Linq to SQL queries without any problems so far. There are some limitations, but for the most part they do not matter:

1. call to AsCancelable must be the last thing in the pipeline so instead of putting it right after first from, you should put it after select, ie.
var query = from row in dc.table where... select ...;
var query = query.AsCancelable(...);

This allows runtime to push your query to SQLServer as if this was just another ordinary query.

2. Of course, you won't be able to cancel query at database level, but you can stop fetching rows, which is what I actually neaded.

3. Unfortunately this means you can't use stuf like Max, First, etc. in any usefull way. If you put it before AsCancelable, you'll only get one record so there's no need to cancel anything, and if you put it after, it will be evaluated at client, which is probably not what you intended.

Aside from this I don't see any obvious problems.

Also, if you want to use it to query SQL, it makes sense to declare it as
public static IEnumerable(T) AsCancelable(T)(this IQueryable(T) sequence...)
This way you can say
long itemCount = sequence.Count();
right there in the extension method itself and save yourself the trouble of doing it manually, and sending it in as a parameter every time you use it. Count will be executed at database and it won't enumerate collection in the process.

Thanks again for pointing me in the right direction. This was bogging me down for some time now.

Unknown said...

@Anonymous:
Glad I could help

Anonymous said...

What did you get as an answer for that 4-digit product palindrome?

Post a Comment