David Truxall

Adrift in .Net

<September 2008>
SuMoTuWeThFrSa
31123456
78910111213
14151617181920
21222324252627
2829301234
567891011


Navigation

Other Good Blogs

My Other Articles on CodeProject

Subscriptions

News

View David Truxall's profile on LinkedIn

Post Categories



.Net 3.0 (RSS)

Building a Basic Excel Document with Open XML

I recently gave a talk about Open XML, and found that there were not many complete code samples out there which described how to build Office 2007 documents using .Net and SpreadsheetML. Most of the examples I ran into were snippets or functions, or just examples of the SpreadsheetML. As one of my demos, I created a C# class which builds a basic spreadsheet. This post describes that class.

There are prerequisite installs required to run this code:

  • .Net 3.0 Framework (System.IO.Packaging is part of WPF)
  • SDK for Open XML Formats, which is currently a CTP, so the code is subject to change if the object model changes at all with the final release (so therefore does the code in this post).
  • Code Snippets that are available for Open XML.

The class (called Spreadsheet) does two basic things:

  1. Create a spreadsheet package
  2. Insert data into a worksheet in the newly created package

The first step is creating the package, which consists of XML files for the SpreadsheetML and XML files which manage the relationships between those files. In an Open XML spreadsheet, the minimal spreadsheet package requires three documents containing SpreadsheetML:

  1. A workbook file
  2. A worksheet file
  3. A relationship file

Additionally, SpreadsheetML uses a concept called "Shared Strings". SpreadsheetML dictates storing Shared Strings separately from the worksheet in their own document, so the document stores less data if the document re-uses strings. Strings can also be added to the spreadsheet "in-line" and not used Shared Strings storage. For this example labels are stored as Shared Strings to demonstrate the concept, therefore the spreadsheet package also requires a Shared Strings document.

The SDK for Open XML Formats provides a new component, Microsoft.Office.DocumentFormat.OpenXml.dll, that wraps some of the functionality of creating an Open XML document with System.IO.Packaging. Essentially it manages creating the files and the relationships between the files in the package. Once you have created the files and relationships, you still need to create code to insert actual data into the documents. This example uses two steps:

  1. Create the basic XML document using a template of existing XML
  2. Insert data into the existing XML.

The following are the contents of three small XML files created and added to a Templates directory in the solution. These three files are the basis for the required parts of the package:

The workbook template

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">

    <sheets>

        <sheet name="{1}" sheetId="1" r:id="{0}" />

    </sheets>

</workbook>

Notice that the XML contains .Net placeholders. Later on we can replace these with actual values that can vary at run time.

The worksheet template

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" >

    <sheetData/>

</worksheet>

The shared strings template

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">

</sst>

These XML templates make up the basic content of the package. The C# class contains a CreateSpreadSheet procedure which will create the basic pieces of the package. The main thing to notice is that  by creating the part object (workbook, shared strings, worksheet), you are only creating the part file, not the content of that part file. The templates above become the content for the parts. There is no need to manage the relationship files directly, the API is doing that automatically.

public void CreateSpreadsheet(string path, string firstSheetName)

{

    using (SpreadsheetDocument doc = SpreadsheetDocument.Create(path, SpreadsheetDocumentType.Workbook))

    {

        //Add the workbook

        WorkbookPart workbook = doc.AddWorkbookPart();

 

        //Create the shared strings part

        SharedStringTablePart stringTable = workbook.AddNewPart<SharedStringTablePart>();

        this.AddPartXml(stringTable, this.ReadXML(@"Templates\SharedStringTemplate.xml"));

 

        //Create a worksheet

        WorksheetPart sheet = workbook.AddNewPart<WorksheetPart>();

 

        //Get the relationship id so the workbook and worksheet can be related

        string sheetId = workbook.GetIdOfPart(sheet);

 

        this.AddPartXml(workbook, this.WorkbookXml(sheetId, firstSheetName));

        this.AddPartXml(sheet, this.ReadXML(@"Templates\WorkSheetTemplate.xml"));

 

        doc.Close();

    }

}

The only interesting part is retrieving the ID of the worksheet part when building the workbook part. To create the content of each part the procedure opens an XML file and streams the content into the file. There are helper functions for this, which are really just standard ways of handling XML in .Net:

protected void AddPartXml(OpenXmlPart part, string xml)

{

    using (Stream stream = part.GetStream())

    {

        byte[] buffer = (new UTF8Encoding()).GetBytes(xml);

        stream.Write(buffer, 0, buffer.Length);

    }

}

 

protected string ReadXML(string fileName)

{

    StreamReader reader = new StreamReader(Environment.CurrentDirectory + @"\" + fileName);

    string contents = reader.ReadToEnd();

 

    return contents;

}

 

protected string WorkbookXml(string sheetId, string sheetName)

{

    string contents = this.ReadXML(@"Templates\WorkbookTemplate.xml");

 

    return string.Format(contents, sheetId, sheetName);

}

Notice the WorkbookXml procedure has a call to string.Format to replace some placeholders with actual data: the ID of the worksheet part relationship and the name of the worksheet. The name of the worksheet is important later, when we want to add data to the worksheet.

The second step is to actually add data to the worksheet. The class uses two functions available as Code Snippets (XLInsertStringIntoCell, and XLInsertNumberIntoCell). I won't reproduce the code here as I don't own it, but essentially the functions open the proper parts and insert the data. These functions take in the file, the sheet name, cell reference and cell value as parameters.

Lastly, I wrote a console app to exercise the Spreadsheet class:

class Program

{

    protected static readonly string fileName = "example.xlsx";

    protected static readonly string firstSheetName = "Sheet1";

 

    static void Main(string[] args)

    {

        string path = Environment.CurrentDirectory + @"\" + fileName;

 

        Spreadsheet file = new Spreadsheet();

 

        file.CreateSpreadsheet(path, firstSheetName);

 

        file.XLInsertStringIntoCell(fileName, firstSheetName, "A1", "Category");

        file.XLInsertStringIntoCell(fileName, firstSheetName, "B1", "Value");

        file.XLInsertStringIntoCell(fileName, firstSheetName, "A2", "Red");

        file.XLInsertNumberIntoCell(fileName, firstSheetName, "B2", 30);

        file.XLInsertStringIntoCell(fileName, firstSheetName, "A3", "Blue");

        file.XLInsertNumberIntoCell(fileName, firstSheetName, "B3", 60);

        file.XLInsertStringIntoCell(fileName, firstSheetName, "A4", "Green");

        file.XLInsertNumberIntoCell(fileName, firstSheetName, "B4", 10);

 

        Console.WriteLine("Workbook created at " + path);

 

        Console.ReadKey();

    }

}

Before the comments start to fly, I want to point out a couple things:

  • This bit of code is not that efficient, I realize it opens and closes the package a bunch of times. This is really just to demonstrate what is possible and not what is necessarily the best practice. There are very few code samples available, and I am shooting for simplicity here.
  • I know ExcelPackage is on CodePlex and does a better job of wrapping the APIs involved and is much easier to write code with. Once you have a basic understanding of these APIs you will appreciate for the work being done on that project.

Download the VS 2005 project. Don't forget to install all the prerequisites listed above before trying the project. I didn't include the two functions necessary from the Code Snippets in the project either (since I didn't write that code), you will have to put those in yourself.

posted Tuesday, November 06, 2007 11:40 AM by davetrux with 0 Comments

Open XML References

Resources

ECMA-376  Standard Specification

Word 2007 Content Control Toolkit

Open XML Code Snippets for Visual Studio 2005

OpenXmlDeveloper.org

Open XML e-book

XML in Office Developer Portal

SDK for Open XML Formats CTP (online)

MSDN Forum for Open XML SDK

Software

Package Explorer

ExcelPackage

Word 2007 Content Control Tookit 

Open XML Code Snippets for Visual Studio 2005

SDK for Open XML Formats CTP (download)

Microsoft .Net Framework 3.0

Articles

Introducing the Office (2007) Open XML File Formats

Building Server-Side Document Generation Solutions Using the Open XML Object Model (Part 1 of 2)

Building Server-Side Document Generation Solutions Using the Open XML Object Model (Part 2 of 2)

Manipulating Excel 2007 and PowerPoint 2007 Files with the Open XML Object Model (Part 1 of 2)

Manipulating Excel 2007 and PowerPoint 2007 Files with the Open XML Object Model (Part 2 of 2)

Dive Into SpreadsheetML (Part 1 of 2)

Dive Into SpreadsheetML (Part 2 of 2)

Manipulating Word 2007 Files with the Open XML Object Model (Part 1 of 3)

Manipulating Word 2007 Files with the Open XML Object Model (Part 2 of 3)

Manipulating Word 2007 Files with the Open XML Object Model (Part 3 of 3)

Blogs

Wouter van Vugt

Brian Jones

Kevin Boske

posted Friday, October 19, 2007 11:24 AM by davetrux with 0 Comments

Speaking at Day of .Net in Ann Arbor

I will be speaking at Day of .Net in Ann Arbor, MI on October 20th. It's a Saturday, and it's a completely free event. I will be presenting Creating Office Documents with Open XML. I will be going over the Packaging API and how to programmatically create and manipulate Office docs.

There are four concurrent sessions all day long covering many aspects of .Net.

See you there!

 

Day of .Net October 20, 2007 - See You there!

posted Thursday, October 04, 2007 10:03 AM by davetrux with 1 Comments

Day of .Net in Ann Arbor, MI

Registration is now open for Day of .Net in Ann Arbor on October 20th, 2007 at Washtenaw Community College in Ann Arbor, MI. Register here for this completely-free, all day learning event. Start planning now to take advantage of this event!

Day of .Net October 20, 2007 - See You there!

posted Tuesday, September 11, 2007 8:40 AM by davetrux with 0 Comments

Speaking at West Michigan Dot Net User Group - WCF

I will be speaking on July 17th on Windows Communication Foundation. See the West Michigan .NET Users Group site for details.

posted Saturday, July 07, 2007 2:59 PM by davetrux with 0 Comments

Even More WCF Links

Articles by Juval Lowy

Aaron Skonnard's Wiki

WCF Developer Center on MSDN

posted Friday, May 18, 2007 10:54 AM by davetrux with 0 Comments

Speaking at West Michigan Day of .Net

I will be speaking at the West Michigan Day of .Net on May 19th in Grand Rapids, MI at Davenport University.

My talk is Introduction to WCF. WCF is certainly the understated pillar of .Net 3.0. There is a ton of great content all day long, and best of all it's FREE! This is a great opportunity, don't miss out.

WM Day of .Net May 19, 2007 - I'll be there!

posted Tuesday, April 10, 2007 8:57 AM by davetrux with 0 Comments

More Good WCF Links
The Server Side .Net has a good WCF link aggregation.

posted Wednesday, April 04, 2007 8:48 AM by davetrux with 0 Comments

WCF Presentation

I am giving a presentation March 15th at the Greater Lansing User Group .net about Windows Communication Foundation.

If you have worked with web services at all you will really appreciate WCF. Come by and check it out.

posted Wednesday, March 14, 2007 1:45 PM by davetrux with 0 Comments

WCF Link List

Architecture Overview

MSDN Forum

WCF Tools

Contract Versioning

Terminology

Integrating WWF and WCF

WCF Developer Home

WCF Team Blog

posted Wednesday, March 14, 2007 1:43 PM by davetrux with 0 Comments

Vista and Visual Studio 2003 - Not Supported

Now that Vista has gone RTM, lots of us are trying to decide when to upgrade. Here is something I have not seen much mention of:

Visual Studio 2003 is not supported in Vista.

This is going to hold back adoption on my work machine. I'm a consultant, I have to do work on whatever platform the client is using, which unfortunately is VS 2003 sometimes. Is VS 2003 going to be relegated to life in a virtual machine from now on?

posted Thursday, November 09, 2006 12:42 PM by davetrux with 0 Comments




Powered by Dot Net Junkies, by Telligent Systems