Thursday, 21 August 2008

Which .Net Excel IO Component should I use?

Introduction

When I wrote up my post about the ways of interacting with Excel from .Net I listed a number of .Net components that can be used for reading and writing native Excel files; I left you with the hard task of picking one for yourself. Today I'm going to write you a helping hand.

Over the last couple of days I've been doing some bug fixing on a little utility we have that analyses an Excel spreadsheet and creates named ranges for cells that are formatted in particular styles. The biggest bug I wanted to fix was terrible performance. When I first looked at the application a couple of months ago the performance was unbelievable: never mind making a cup of coffee whilst it did its stuff; we could go home and get a good nights kip before it delivered the results. It didn't take long to pin most of the blame for that on Excel COM Interop: each call to a property or method on a COM object has a significant overhead, and we were querying the Style property of each cell in a 120Mb workbook!

Hence my interest in .Net components for loading Excel files: I had a license to Syncfusion XlsIO available, and by switching the code to use that instead of Excel Automation through COM, the time to query cell styles dropped to almost nothing. However, XlsIO has some performance issues of its own (more on these in a moment), and it still could take up to an hour for the whole operation to complete. So I decided to have a look at the other components in the market place.

In the hope that there will be at least one other person on the web who finds this useful, I'm recording my first impressions of several of these Excel IO components. I'm not claiming that these are objective reviews: your kilometreage will almost certainly vary.

Syncfusion XlsIO

I've been using XlsIO on and off for a couple of years. Mostly, it has helped me get the job done; several times it has been the cause of temporary localised baldness. My thoughts:

  • It has an API that covers the vast majority of Excel features
  • The API is mostly consistent with the Excel COM API, though there are several exceptions to catch you out: Workbooks.Add vs Worksheets.Create being one trivial example.
  • There are some highly non-trivial differences from the Excel COM API. For example, the Excel Range object has Rows and Columns properties that return references to collections; the XlsIO Range object has properties with the same names, but each return arrays of Range objects and allocate a new array every time you access the property. This is a huge performance pitfall for the unwary.
  • The Range model is different to Excel's in that in XlsIO, a Range can only contain one Rectangular region, whereas excel allows Ranges to be created that are the union of many rectangular regions.
  • As I hinted in the introduction, there are other performance problems. In my large (120Mb) workbook, deleting a single row could take up 20 seconds. Deleting Named ranges was another costly operation. Excel demonstrates that these operations don't have to take that long.
  • The documentation is sparse, to put it politely, and the class reference often doesn't state more than the obvious. "ICombinedRange: represents a Combined Range" being one typical example. I have however had assurances from Syncfusion that they are working to improve this.

XLSReadWrite

I started my exploration of other components by downloading and installing XLSReadWrite. Then uninstalling it again. Call me a CLR snob, but I didn't like the thought of working with a component that is clearly designed for Delphi. This showed because the API commits two capital crimes: Every type in is prefixed with a T; and most of the namespaces contain just one type. The other point to note about XLSReadWrite is that the "shape" of the API is nothing like Excel's so any code you have using COM Interop would need a lot of reworking to use this component.

ActiveXLS

I'm afraid that CLR snobbishness also put me off ActiveXLS. The ActiveXLS team produce Spreadsheet components for both .Net and Java, and it appears that the .Net version is a straight port of the Java version: it is the paired "get_" and "set_" methods, and the absence of Properties that give the game away. Surely an organisation selling a component "optimised for Visual Studio" (as the home page claims) should make .Net developers feel at home and at least use Pascal casing for methods, rather than camel casing (which everybody knows should only be used for method parameters)?

The other thing that struck me as odd was that in the ActiveXLS forums (which have been active since 2006) there have only been about 200 posts. Is it that they have an intuitive API with superb documentation: or perhaps a very small user-base? Maybe I'm just cynical.

SpreadsheetGear

SpreadsheetGear was the component I finally settled on. This appears to be far and away the most mature (though correspondingly the most expensive) of the components that I looked at. Though I didn't try that part of it, this component also offers Windows Forms controls for editing spreadsheets. My impressions:

  • The Object model is very similar to Excel's, and easy to learn. It didn't take me long to port my code from XlsIO (which is also similar to Excel).
  • One big inconsistency is that properties giving access to cells use a 0-based index system rather than the 1-based system in Excel.
  • The Range model is (as far as I can tell) identical to Excel's. Intersect and Union methods are provided for Ranges and seem to work as I'd expect.
  • There's a surprising omission in the API in the current version (2007): no support for Styles. If you need style support you'll need to get the 2008 version (currently in beta).
  • All the non-public code is obfuscated. This can cause problems when debugging. For example when I was trying to look at the Workbook.Names collection in the Quick Watch window (VS 2008) I expanded one of the Name items in the collection, but was unable to inspect any of its properties. It was only by rewriting the expression in Quick Watch window to include a cast to IName that I could see the property values.
  • SpreadsheetGear do not have any public forums that I could find: the only way to get support is to fill in a form and wait for them to get back to you.
  • Performance of the component was very good. Remember the application that took all night with COM Interop, and up to an hour with XlsIO? It now takes under a minute with SpreadsheetGear.
  • As a bonus, here's the answer to an issue that took me an afternoon to figure out (and has caught me out again since then). When you supply a Range address to an IName (whether by using Names.Add or IName.RefersTo) remember to prefix it with an "=" sign; otherwise the IName.RefersToRange property won't get updated.

Honourable Mentions

  • I tried contacting Independentsoft about Spreadsheet.Net but never received a link to the evaluation download. I would judge by the website that this component isn't going to be as complete or mature as the others.
  • Gembox Software offer a completely free version of their spreadsheet control, limited to 5 worksheets of 150 rows each. Unfortunately they don't seem to offer an evaluation version of the full product, so I wasn't able to try it out on my big workbook. A quick scan of the online help shows that the API is not dissimilar to Excel's, and does follow the .Net framework design guidelines.
  • FarPoint Spread does Excel import and export, but its focus is on providing a spreadsheet-like Grid component, so I didn't look into this any further.
  • Infragistics have Infragistics.Excel but it looks like it can only be purchased as part of one of their suites. From the documentation, it doesn't look as fully featured as either XlsIO or SpreadsheetGear.
  • ComponentOne is another component suite vendor that has lumped an Excel IO component in with their suite. Again, the documentation shows that it has fairly limited capabilities compared with the leaders.
  • Aspose.Cells is a component that appears, by my reading of its documentation, to sit somewhere in the middle of the market, in terms of functionality and price. Aspose is a vendor that sells components for .Net and Java. and gets it right: the API's are "localised" for the framework culture. The .Net API gets Properties and Pascal cased methods, and the Java API keeps its get and set methods.

19 comments:

Anonymous said...

I think you catch the same problem as me in XlsIO... but trick is that if you sit a little with any profiler then you will find that in most cases can be found less expensive in time XlsIO operation.

As syncfusion support said it's due to Excel API interfaces they have to keep...

You can ask syncfusion support for assist and after a while got optimized code from syncfusion developers.

Oleksandr Kucherenko said...

BTW can you provide a sample of code?! for testing on other commercial components?!

Anonymous said...

Could you give a sample code in VBA or XlsIO or SpreadsheetGear (or better all of them) which was so much faster in SpreadsheetGear? (I really want to see it with my eyes)

Sam said...

I didn't get round to creating any proper benchmarks, and I would be embarassed to release the code I have to the world: it's not fit to be seem ;-).

As I mentioned, it was a couple of specific API calls when used with a very large workbook that caused the problem with XlsIO - on the whole performance wasn't too bad.

The problem with the Delete row call seemed to be to do with checking and updating possible formula references to the row. SpreadsheetGear just didn't have this problem; they have obviously coded that part more efficiently.

Dhivya said...

Ranges with union of multiple rectangular ranges is also supported by Syncfusion's XlsIO. Here is how it can be done:

//Union of multiple ranges
IRanges rangeCollection = mySheet.CreateRangesCollection();
IRange range1 = mySheet.Range["A1:A2"];
IRange range2 = mySheet.Range["C1:C2"];
rangeCollection.Add(range1);
rangeCollection.Add(range2);
rangeCollection.Text = "XlsIO";

Sam said...

Thanks Dhivya. I was aware that you can use RangesCollection like that, but my point was that the standard IRange object isn't the same as Excel's. Also, the Merge operation on Ranges only works if ranges can be merged to form one Rectangle; it fails if ranges are disjoint.

dave said...

Thanks for posting this. As somebody who is altogether unfamiliar with programmatically reading/writing excel spreadsheets yet finding myself needing to do so, I found this post to be a good starting point for my own research.

Sam said...

Dave,
Glad I could help. Perhaps you might like to add your rating to the post :-)

Sam

dave said...

Here are my first impressions on some of the products you didn't look at in depth. To give some perspective, my goal was to find a component that would let me create a spreadsheet that will be filled out by users. I needed to use dropdown lists for some columns, driven off of named ranges. Because my needs are fairly simple, I decided to see if any of the cheaper products would work for me. Performance with large data sets is not my primary concern.

Gembox is very limited. It supports writing values, functions, and some simple formatting. It doesn't support setting validation for cells, which is needed to create dropdown lists. Their documentation is pretty sparse, and they only give a few trivial examples of how to use their APIs.

Apose has an API that is much more fully featured, and from what I can tell, fully supports the full range of excel's validation options for cells. Their documentation was pretty good, and they provided lots of examples. It supports everything I need to do, and I'll likely be using it.

Sam said...

Dave,
Thanks for posting that. I'm sure readers with requirements like your will find that helpful.

Sam

Zachary French said...

Hay,
Was also looking into excel libs, but for my purposes only needed to read excel.
Found this to be really inexpensive compared to the others.
http://www.csvreader.com/

Anonymous said...

Aspose is far more robust than Gembox but Gembox is fast, fast, fast for most items. I maintain extension methods for both engines and use them as needed.

Aspose also has an issue with language. The developers do not always take the best approach vis-a-vis what an English-speaking developer would do. I have been able to do complex actions though and have made workarounds using extension methods.

Anonymous said...

For those interested, also check out

Nasosoft Nasosoft Excel
Bytescout Spreadsheet SDK
ExpertXLS Excel Library for .NET

Mathew said...

Hi Samuel,
Your post enlightened my slow performance woes with Syncfusion xlsio. Creating a worksheet with 40+ columns and 25000+ rows took almost 10-15 minutes!
Your point on the IRange object (creating a new array each time) was likely the cause of the performance issue.

Instead of using the indexer of the worksheet object (i.e. sheet[rowNumber, columnNumber].Text = "some value"), I used the SetText and SetNumber functions of the worksheet. Using these, and reducing the amount of cell styling, the worksheet was made in under 15 seconds, and the output size was halved!

Anonymous said...

There is also SoftArtisans ExcelWriter: http://officewriter.softartisans.com

dave said...

Here are my first impressions on some of the products you didn't look at in depth. To give some perspective, my goal was to find a component that would let me create a spreadsheet that will be filled out by users. I needed to use dropdown lists for some columns, driven off of named ranges. Because my needs are fairly simple, I decided to see if any of the cheaper products would work for me. Performance with large data sets is not my primary concern.

Gembox is very limited. It supports writing values, functions, and some simple formatting. It doesn't support setting validation for cells, which is needed to create dropdown lists. Their documentation is pretty sparse, and they only give a few trivial examples of how to use their APIs.

Apose has an API that is much more fully featured, and from what I can tell, fully supports the full range of excel's validation options for cells. Their documentation was pretty good, and they provided lots of examples. It supports everything I need to do, and I'll likely be using it.

Oleksandr Kucherenko said...

BTW can you provide a sample of code?! for testing on other commercial components?!

Adrian said...

Try considering EasyXLS (http://www.easyxls.com) as well ;)

Guest said...

Hello, I wrote a simple Windows Form application in Visual Studio 2010, that tries to write something in a cell and save the page to disk. But nothing works, I do not know how much. Could you help me?

Post a Comment