When I wrote up my post about the ways of interacting with Excel from .Net I listed a number of .Net components that can be used for reading and writing native Excel files; I left you with the hard task of picking one for yourself. Today I'm going to write you a helping hand.
Over the last couple of days I've been doing some bug fixing on a little utility we have that analyses an Excel spreadsheet and creates named ranges for cells that are formatted in particular styles. The biggest bug I wanted to fix was terrible performance. When I first looked at the application a couple of months ago the performance was unbelievable: never mind making a cup of coffee whilst it did its stuff; we could go home and get a good nights kip before it delivered the results. It didn't take long to pin most of the blame for that on Excel COM Interop: each call to a property or method on a COM object has a significant overhead, and we were querying the Style property of each cell in a 120Mb workbook!
Hence my interest in .Net components for loading Excel files: I had a license to Syncfusion XlsIO available, and by switching the code to use that instead of Excel Automation through COM, the time to query cell styles dropped to almost nothing. However, XlsIO has some performance issues of its own (more on these in a moment), and it still could take up to an hour for the whole operation to complete. So I decided to have a look at the other components in the market place.
In the hope that there will be at least one other person on the web who finds this useful, I'm recording my first impressions of several of these Excel IO components. I'm not claiming that these are objective reviews: your kilometreage will almost certainly vary.
I've been using XlsIO on and off for a couple of years. Mostly, it has helped me get the job done; several times it has been the cause of temporary localised baldness. My thoughts:
- It has an API that covers the vast majority of Excel features
- The API is mostly consistent with the Excel COM API, though there are several exceptions to catch you out: Workbooks.Add vs Worksheets.Create being one trivial example.
- There are some highly non-trivial differences from the Excel COM API. For example, the Excel Range object has Rows and Columns properties that return references to collections; the XlsIO Range object has properties with the same names, but each return arrays of Range objects and allocate a new array every time you access the property. This is a huge performance pitfall for the unwary.
- The Range model is different to Excel's in that in XlsIO, a Range can only contain one Rectangular region, whereas excel allows Ranges to be created that are the union of many rectangular regions.
- As I hinted in the introduction, there are other performance problems. In my large (120Mb) workbook, deleting a single row could take up 20 seconds. Deleting Named ranges was another costly operation. Excel demonstrates that these operations don't have to take that long.
- The documentation is sparse, to put it politely, and the class reference often doesn't state more than the obvious. "ICombinedRange: represents a Combined Range" being one typical example. I have however had assurances from Syncfusion that they are working to improve this.
I started my exploration of other components by downloading and installing XLSReadWrite. Then uninstalling it again. Call me a CLR snob, but I didn't like the thought of working with a component that is clearly designed for Delphi. This showed because the API commits two capital crimes: Every type in is prefixed with a T; and most of the namespaces contain just one type. The other point to note about XLSReadWrite is that the "shape" of the API is nothing like Excel's so any code you have using COM Interop would need a lot of reworking to use this component.
I'm afraid that CLR snobbishness also put me off ActiveXLS. The ActiveXLS team produce Spreadsheet components for both .Net and Java, and it appears that the .Net version is a straight port of the Java version: it is the paired "get_" and "set_" methods, and the absence of Properties that give the game away. Surely an organisation selling a component "optimised for Visual Studio" (as the home page claims) should make .Net developers feel at home and at least use Pascal casing for methods, rather than camel casing (which everybody knows should only be used for method parameters)?
The other thing that struck me as odd was that in the ActiveXLS forums (which have been active since 2006) there have only been about 200 posts. Is it that they have an intuitive API with superb documentation: or perhaps a very small user-base? Maybe I'm just cynical.
SpreadsheetGear was the component I finally settled on. This appears to be far and away the most mature (though correspondingly the most expensive) of the components that I looked at. Though I didn't try that part of it, this component also offers Windows Forms controls for editing spreadsheets. My impressions:
- The Object model is very similar to Excel's, and easy to learn. It didn't take me long to port my code from XlsIO (which is also similar to Excel).
- One big inconsistency is that properties giving access to cells use a 0-based index system rather than the 1-based system in Excel.
- The Range model is (as far as I can tell) identical to Excel's. Intersect and Union methods are provided for Ranges and seem to work as I'd expect.
- There's a surprising omission in the API in the current version (2007): no support for Styles. If you need style support you'll need to get the 2008 version (currently in beta).
- All the non-public code is obfuscated. This can cause problems when debugging. For example when I was trying to look at the Workbook.Names collection in the Quick Watch window (VS 2008) I expanded one of the Name items in the collection, but was unable to inspect any of its properties. It was only by rewriting the expression in Quick Watch window to include a cast to IName that I could see the property values.
- SpreadsheetGear do not have any public forums that I could find: the only way to get support is to fill in a form and wait for them to get back to you.
- Performance of the component was very good. Remember the application that took all night with COM Interop, and up to an hour with XlsIO? It now takes under a minute with SpreadsheetGear.
- As a bonus, here's the answer to an issue that took me an afternoon to figure out (and has caught me out again since then). When you supply a Range address to an IName (whether by using Names.Add or IName.RefersTo) remember to prefix it with an "=" sign; otherwise the IName.RefersToRange property won't get updated.
- I tried contacting Independentsoft about Spreadsheet.Net but never received a link to the evaluation download. I would judge by the website that this component isn't going to be as complete or mature as the others.
- Gembox Software offer a completely free version of their spreadsheet control, limited to 5 worksheets of 150 rows each. Unfortunately they don't seem to offer an evaluation version of the full product, so I wasn't able to try it out on my big workbook. A quick scan of the online help shows that the API is not dissimilar to Excel's, and does follow the .Net framework design guidelines.
- FarPoint Spread does Excel import and export, but its focus is on providing a spreadsheet-like Grid component, so I didn't look into this any further.
- Infragistics have Infragistics.Excel but it looks like it can only be purchased as part of one of their suites. From the documentation, it doesn't look as fully featured as either XlsIO or SpreadsheetGear.
- ComponentOne is another component suite vendor that has lumped an Excel IO component in with their suite. Again, the documentation shows that it has fairly limited capabilities compared with the leaders.
- Aspose.Cells is a component that appears, by my reading of its documentation, to sit somewhere in the middle of the market, in terms of functionality and price. Aspose is a vendor that sells components for .Net and Java. and gets it right: the API's are "localised" for the framework culture. The .Net API gets Properties and Pascal cased methods, and the Java API keeps its get and set methods.