Monday, 30 June 2008

Reading and Writing Excel files with .Net

document_outIt would be an unusual business application that did not have a requirement to export data in one form or another. Probably the most useful form of report to produce is one that can be opened by Excel, because then everybody and his dog can slice and dice and chart and pivot the data, and you don't have to implement any of those features in your application because Excel does it all.

So what options are there for reading and writing Excel files with .Net?

Plain Text

The easiest option is to use text files. Excel is quite happy to open a straight "txt" file, though it will make you tell it how to chop the lines into columns, and that's not terribly user-friendly. More useful are the CSV (Coma Separated Value) or TSV (Tab Separated Value) formats because these have structure built in. Excel will export its own data in any of these formats.

If you opt for this method, check out FileHelpers, an open-source project (released under the LGPL license) for reading and writing structured files. It lets you work a lot like the XML Serialization feature in the .Net framework. You can mark up classes with attributes, and FileHelpers will write data from your classes to structured files (CSV and TSV included), and will also do the reverse, turning structured files into objects. Nice!

The obvious limitation of plain text files is that you can't format the data or include things like charts. For that you need to work with one of the more sophisticated file types.

Native Excel Files

Native Excel files (the ones with the ".xls" extension) are stored in BIFF format - a binary format that is impenetrable to anybody without the specification. The good news is that, in February 2008, Microsoft released the spec for the Excel format, and other binary file formats used by the Office family. The bad news is that the spec for the Excel format alone weighs in at 349 pages, so good luck implementing that! If you're wondering why it is so complicated, Joel Spolsky (who used to work on the Excel team at Microsoft) has done some explaining.

Fortunately, there are quite a number of companies who saw the complexity of the task as an opportunity, and have created components in .Net to read and write native formats Excel files. Here are a few that I've come across (by no means an exhaustive list):

[Update: I've now written up my first impressions of several of these components]

Using one of these libraries is generally the fastest way of dealing with native excel files (fastest both in terms of performance of the code, and how long you take to write that code!). But there is one other option to consider, if for some reason you can't purchase a library, or you want absolute compatibility with Excel: automating Excel itself using COM Interop. This has the advantage of letting you export files in any of the formats that Excel supports. Suresh Behera has put together a list of Microsoft HowTos to get you started with this technique.

You should be aware that this comes with some limitations. The most obvious is that Excel has to be installed on a computer along with your software, and you have to be careful that the version is compatible with the one you developed with. You also can't use this method to access Excel files on a Server since Excel is a component with a UI (perhaps I should say shouldn't, rather than can't, because some people have made this work). Finally, it's pretty slow if you're dealing with large workbooks because of the overhead the COM imposes on method calls.

Open XML Format Excel files

When Microsoft released Office 2007, they changed the default file format for Word, Excel and Powerpoint to new, XML based formats. They've published the specifications for these under the Office Open XML label - the format for Excel is called SpreadsheetML (you might have heard of the controversy as Microsoft went about to get these formats approved as standards, first by the ECMA, and then by ISO). It's not just Office 2007 that supports these formats; Microsoft have provided an add-in to add support to all versions of Office right back to Office 2000.

An XML-based format. That sounds like it should be easy to write. Right? Well it is much easier than working with the binary formats, but the format has still got to support Excel's huge feature list, so it's never going to be simple. You probably want to start by poking around inside some files in the new format. The main thing you need to know is that they are basically zip files stuffed full of xml files. You can change the extension of a ".docx" file (the new Word file format) or ".xlsx" (the new Excel file format) to ".zip", then open it with your favourite Zip extractor. Some other places you might want to check out:

If you're not up for the challenge of reading and writing the format yourself, many of the libraries I listed above have support for it, to varying degrees. And this time, Microsoft themselves are working on components to read and write the formats. Have a look at the Open XML Format SDK. The first version has already been released with some basic support for the formats; the SDK development roadmap indicates that increased functionality will be available later this year.

6 comments:

Vadim Kantorov said...

I employed Apache POI project (recompiled for .NET with IKVM. I guess one could try NPOI project from CodePlex) to read / write plain old Excel files.
It's fast and exposes all the essential functionality.
More than that it's free and doesn't use any COM interop.
I recommend it to everyone for fun and profit :)

Manuel De Leon said...

If you need to create Excel 2007/2010 files then give this open source a try: http://closedxml.codeplex.com

advanced excel training london said...

I read your article thoroughly and found it quite useful and informative.I really enjoyed while reading your article.Thanks for sharing.

Sham Yemul said...

thanks a lot for this useful information, helped me quickly to get onto wheels

sham

simontufel said...

i observe your article with very deeply. then i realize it was  also very quick response.

rehan mustafa said...

if you want to create, read, edit and convert your excel file to other formats in .net language then try this .NET Excel API:

http://www.aspose.com/.net/excel-component.aspx

Post a Comment