Excel file generation made easy(er)

In this post I’ll discuss a technique to generate Microsoft Excel compatible files without any third party library and in an easier than ever manner.

If you ever faced the problem of generating Excel report/export from your application, I guess you know libraries like Jakarta POI and its wonderful APIs on the “Horrible SpreadSheet Format”. I also guess you know that Microsoft Office products can save HTML documents. For example you can save an HTML file from an Excel SpreadSheet. But not everyone knows that saved HTML files can flawlessy be opened back with the application that made it as if they were native application documents. That’s the little magic.
Let’s try a demonstration, follow those steps:

  1. Open Microsoft Excel
  2. Write some garbage inside of it (including formulas if you like), and rich formatting
  3. Save it as HTML file, let’s say “test-worksheet.html”
  4. Now close Excel and rename “test-worksheet.html” to “test-worksheet.xls”.
  5. Double click on it… and voilà… Excel opens the file like any other native Excel file: no difference, for the end user.

The nice thing about HTML files is that they are easier to be understood by humans than binary files, and easier also to be generated for example by a Java Server Page.
If you open the generated file you’ll see that it’s an HTML file that contains sections of metadata that will be interpreted by Excel as document properties and cell formulas. Of course you can send plain HTML to Excel application, but the common scenario is that the customer asks to generate the file from a given template. The best way to respect that template is to export it to HTML and play on the generated file.
In a web application it is necessary to set up proper Content-Type and Content-Disposition on the response. I use to set the Content-Disposition header in the Servlet and the Content-Type in the JSP, because otherwise it could happen that the JSP overrides the Content-Type with the default “text/html”.

// the servlet "ExcelSampleServlet.java"
public class ExcelSampleServlet extends HttpServlet {
    protected void doGet(HttpServletRequest request,
            HttpServletResponse response) throws ServletException, IOException {
        Order order = OrderManager.getOrderById(request.getParameter("id"));
        
        request.setAttribute("order", order);
        
        response.setHeader("Content-Disposition", 
                           "attachment; filename=order.xls");
        RequestDispatcher d = request.getRequestDispatcher("/sales-order.jsp");
        d.forward(request, response);
    }
}

// the jsp "sales-order.jsp"
<%@ page contentType="application/vnd.ms-excel; charset=windows-1252" 
%><html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
xmlns="http://www.w3.org/TR/REC-html40">
...

Here you find a mini war file with sources that demonstrates the technique; it has been tested on Tomcat 4.x and Tomcat 5.x. (As for anything on this site, use at your own risk)

In my current project we export excel files in that way. Generated files are quite big (about 2mb). So I am thinking about using a gzip SevletFilter to compress the response transparently to greatly reduce HTTP traffic. Most of the browsers support gzipped response, and it’s possible to detect this feature and behave consequently.

Limitations & Conclusions

This solutions has fit most of the requirements I’ve met until now. Problems come out if you want to generate files with charts, multiple worksheets, etc. because, when saving such kind of files will generate multiple files, that cannot be easily handled as a single document.
But consider that, due to the fact that Microsoft Document Files formats are proprietary and secret, almost any library that generates binary Excel files has several limitations and compatibility issues. Stallman could tell you more.
Another option could be generating OpenOffice documents, that comply to the standard OpenDocument Format, that actually are zipped xml files with attachments, easy enought to be manipulated programmatically. It seems that in the next Microsoft products could support OpenDocument format. For now, you should consider binary generation libraries like POI, or the way I exposed in this article.
Notice that Excel Viewer seems to not have support for HTML-saved Excel files; luckily, I never met a customer that even knows about Excel Viewer existence, nor anyone required it to work. So please, don’t tell anyone!

After my last post on “trimming jdbc” I’ve got (for now) 5 post saying “you should take a look at ${somelibrary}”. I hope this time I won’t: I felt sick of NIH syndrome. Maybe I am. Anyway I enjoy seeing that someone reads what I wrote, so thankyou for any comment.


5 Responses to “Excel file generation made easy(er)”  

  1. 1 Pedro

    I don’t try it yet, but i’ll try on my job… I do a single question? if it is so easy do this conversion why may I use POI???

  2. 2 Luigi

    Try downloading and deploying the demo webapp, and see yourself how easy it is.

    I would use POI if it is needed to import excel files, as POI is able also to read excel files. I found that reading excel is tricky also with POI, so better to not use excel at all for importing data.

  3. 3 Malcolm

    This is a good method but note Excel 2007 will show a warning when opening the file that the file is ‘not in the expected format’ if the user agress to open the file it will still open normally.

  4. 4 Mehran Emrani

    Luigi,
    When Excel (2007) opens HTML files, it does its own formatting of columns, which in some cases, distorts values. Here are some examples:

    001E00005 is changed into scientific notation: 1E 5
    JAN09 is changed to 9-JAN
    0123 is changed to 123

    Do you know how to “tell” Excel not to do auto formatting when it opens an HTML file?

    Thanks.

  1. 1 MSWord generation made easy(er) - NewInstance


Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>



Calendar

November 2006
M T W T F S S
« Oct   Dec »
 12345
6789101112
13141516171819
20212223242526
27282930  

Follow me

twitter flickr LinkedIn feed

Subscribe by email

Enter your email address:

Archives


Categories

Tag Cloud


Listening