Skip to content
Donner's Daily Dose of Drama
Donner's Daily Dose of Drama
  • The Good
    • Blogging
    • Consumer Protection
    • Environment
    • Ethics
    • Geek’s Home
    • Lisa Lanett
    • Medfield
    • Music
    • Parenting and Technology
    • Travel
    • wow
  • The Bad
    • Business
    • Ebay
    • Investment
    • Job search
    • Personal Finance
    • Politics
  • The Ugly
    • Information Technology
      • Business Intelligence
      • Content Management
      • Free Software
      • I18N and L10N
      • Java
      • Open Source
      • Mobile Devices
      • Open Source Business Intelligence
      • OSBI
      • SDA
      • Security
      • Smartphone
      • Software Best Practices
      • Software Engineering
      • SQL Server
      • Streaming Media
      • Web
    • Austria
    • Fiction
    • Hardware
    • iPod
    • Miscellaneous
    • Uncategorized
    • Video
    • Weekend Warrior
Donner's Daily Dose of Drama

Introduction to Microsoft’s Open Xml Format SDK 2.0 with a focus on Excel documents

Christian Donner, March 24, 2009July 16, 2009

Those looking for ways to render application data and content into Microsoft Office documents used to have a limited number of options. There once was something called Visual Basic for Applications (Microsoft ended support in 2007) or, more recently, Visual Studio Tools for Applications (see Wikipedia). OLE Automation has been around since the early days of Windows and can be used to access the Office application’s object model from the outside world (see Wikipedia). OLE is no longer mainstream technology, and there are conflicts with the managed code programming and deployment model of .Net.

All these options have in common that they are geared towards automating Office applications, rather than manipulating office documents. Consequently, they require that the Office applications are installed, which can be a prohibitive complexity for server-side document processing.

Alternatively, SQL Server Reporting Services has been an easy path to generating native Excel documents without Excel on the server, but Reporting Services is not exactly a lightweight product, and customization and integration  with other applications is a complex task.

This sets the stage for the arrival of the Open Xml SDK, made possible by the Office Open XML format specification in 2006 (see Wikipedia). This article will explain what the Open Xml SDK is and what it can be used for. I will also provide a simple code example for manipulating an Excel spreadsheet document.

The Microsoft Open XML Format SDK (visit the download page) is a very young product. Version 1.0 was released as recently as June 2008. Version 2.0 is available today as a Community Technology Preview release (CTP). The main difference is that 2.0 provides strongly typed classes, which means that your code is not dealing with generic Xml elements, but instead works with things like documents, sheets, and cells (in the case of a spreadsheet document).

In this article, I will not concern myself at all with standardization considerations or the politics of standards. Instead, I am solely interested in the practical applications of the SDK, and how it can shorten the path to achieving loose integration between Microsoft Office and custom .Net applications.

Deployment Models

After installation, the SDK makes a number of project templates available in Visual Studio. These templates can be used to create assemblies that extend Office (Add-In) or that can be attached to Office documents. The latter is an interesting concept. Excel executes managed code that can handle events in the document’s life cycle. A spreadsheet could be configured to execute managed code with pulls data from SQL Server and populates certain cells – all when the document is opened. This code would not be integrated with a stand-alone application, though, and sharing a security context and other state information would not be possible. 

Project types available through the SDK

Once the SDK libraries are installed, they can be referenced by and integrated with any application code. This is a model that works much better for an solutions that needs to generate or manipulate Office documents.

Application Integration

The SDK exposes objects and methods that allow direct access to elements of the document. There are types for Sheets, Rows,  Columns, and Cells. A typical workflow would be:

  • Open an XLSX document (the template)
  • Update XLSX document
  • Save a copy of, or print, the XLSX document (the output document)

Unfortunately, but not surprisingly, the SDK does not provide methods that emulate the behavior of the Office application. For instance, when a formula is changed in an Excel spreadsheet that is being edited in Excel, the spreadsheet application will automatically search for and update references in other cells so that their validity is maintained.

The SDK does not do that. When an application needs to change a formula in the cell of an Excel document, it has explicitely to search the document for references and update them. This can quickly become a complex undertaking, as one might imagine, and it is well worth evaluating what type of manipulations should be allowed to the application in order to keep the amount of code managable.

The most uncritical approach for Excel spreadsheets is to update only values, not formulas. Formulas, if needed, can be in place in the  template, and need to be capable of handling variable amounts of data for the approach to work.

C# Code Example

Let’s update a cell in an existing spreadsheet document. I prefer to make my code reusable, so we will write a few wrapper methods for the SDK and place them in a static class that I call OpenXMLHelper.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
using System.Xml;
using System.IO;
namespace OpenXMLWindowsApp
{
    static class OpenXMLHelper
    {
        public static SpreadsheetDocument
                  OpenSpreadSheetForUpdate(string docName)
        {
            return SpreadsheetDocument.Open(docName, true);
        }

        public static void
           CloseSpreadSheet(SpreadsheetDocument spreadSheet)
        {
            spreadSheet.Close();
        }
    }
}

As you can see, we are starting out small. There are two methods that open and close a document.
Next, let’s add a method that we can call to update the value of a certain spreadsheet cell with a string. The method has to accept the following parameters: a Spreadsheet document object, a sheet name, a cell address (specified by a column name and a row index), a string value, and a flag indicating whether the cell node should be created if it does not already exist.

public static void UpdateCell(SpreadsheetDocument spreadSheet,
			      string sheetName,
			      string text,
			      uint rowIndex,
			      string columnName,
			      bool createCell)
{
    WorksheetPart worksheetPart =
          GetWorksheetPartByName(spreadSheet, sheetName);

    if (worksheetPart != null)
    {
	Cell cell = GetCell(worksheetPart.Worksheet, columnName,
                                                rowIndex, createCell);

	if (cell != null)
	{
	    cell.CellValue = new CellValue(text);
	    cell.DataType = new EnumValue(CellValues.Number);
	}

	// Save the worksheet.
	worksheetPart.Worksheet.Save();
    }
    else
    {
	throw new
              Exception("Worksheet not found in spreadsheet document");
    }
}

This method calls another method GetCell that we have to write. We assume here that the value that we passed in is numeric. This could be easily improved to handle different value types, of course. The spreadsheet document is comprised of parts, and worksheets are one type of parts. The method GetWorksheetPartByName is another method that is missing. It will return the WorksheetPart for a given worksheet name (the name is what you see on the little tab at the bottom of the Excel window). Other than that, there is nothing out of the ordinary here. The interesting code will be in the GetCell() method, so let’s look at that next.

public static Cell GetCell(Worksheet worksheet,
			   string columnName,
			   uint rowIndex,
			   bool createCell)
{
    Row row = GetRow(worksheet, rowIndex, createCell);

    Cell cell = null;

    if (row.Elements().Where(c => string.Compare(
       c.CellReference.Value, columnName + rowIndex, true)
              == 0).Count() > 0)
    {
	cell = row.Elements().Where(c => string.Compare(
                c.CellReference.Value, columnName + rowIndex, true)
                    == 0).First();
    }
    else if (createCell)
    {
	// Cells must be in sequential order according to CellReference.
	// Determine where to insert the new cell.
	Cell refCell = null;
	string cellReference = columnName + rowIndex;

	foreach (Cell ecell in row.Elements())
	{
	    if (string.Compare(
                   ecell.CellReference.Value, cellReference, true) > 0)
	    {
		refCell = ecell;
		break;
	    }
	}

	cell = new Cell() { CellReference = cellReference };
	row.InsertBefore(cell, refCell);
	worksheet.Save();
    }

    return cell;
}

This method retrieves the row, queries it for the cell what we are interested in (using a Lambda expression), and if the cell does not exist and the caller wants one to be created, inserts a new cell at the correct place in the row. Now let’s write the method that retrieves a Row object.

public static Row GetRow(Worksheet worksheet,
			 uint rowIndex,
			 bool createCell)
{
    Row row = null;

    if (worksheet.GetFirstChild().
          Elements().Where(
          r => r.RowIndex == rowIndex).Count() > 0)
    {
	row = worksheet.GetFirstChild().
	     Elements().Where(
	       r => r.RowIndex == rowIndex).First();
    }
    else if (createCell)
    {
	row = new Row() { RowIndex = rowIndex };
	worksheet.GetFirstChild().Append(row);
    }

    return row;
}

Since we allow a cell to be created if it does not exist, we have to first also create a new row if one does not exist. The method above handles that for us. The one thing left to do is the method that finds a worksheet by name. Here it is.

public static WorksheetPart GetWorksheetPartByName(
       SpreadsheetDocument document, string sheetName)
{
    IEnumerable sheets =
          document.WorkbookPart.Workbook.GetFirstChild().
          Elements().Where(s => s.Name == sheetName);

    if (sheets.Count() == 0)
    {
	return null;
    }

    string relationshipId = sheets.First().Id.Value;
    WorksheetPart worksheetPart =
        (WorksheetPart)document.WorkbookPart.
        GetPartById(relationshipId);

    return worksheetPart;
}

This method does not handle the situation if a worksheet does not exist. The following method will insert a new worksheet at the end of the worksheet list:

public static WorksheetPart
        InsertWorksheet(WorkbookPart workbookPart)
{
    // Add a new worksheet part to the workbook.
    WorksheetPart newWorksheetPart =
          workbookPart.AddNewPart();
    newWorksheetPart.Worksheet =
         new Worksheet(new SheetData());
    newWorksheetPart.Worksheet.Save();

    Sheets sheets =
         workbookPart.Workbook.GetFirstChild();
    string relationshipId =
         workbookPart.GetIdOfPart(newWorksheetPart);

    // Get a unique ID for the new sheet.
    uint sheetId = 1;
    if (sheets.Elements().Count() > 0)
    {
	sheetId = sheets.Elements().Select(
	           s => s.SheetId.Value).Max() + 1;
    }

    string sheetName = "Sheet" + sheetId;

    // Append the new worksheet
    // and associate it with the workbook.
    Sheet sheet = new Sheet()
     {
         Id = relationshipId,
         SheetId = sheetId,
         Name = sheetName
     };
    sheets.Append(sheet);
    workbookPart.Workbook.Save();

    return newWorksheetPart;
}

We now have a working utility class that let’s us update cells in a spreadsheet document, there is one more method that I would like to add. It has to do with a the fact that the SDK does not update references in formulas, as I already mentioned. It also does not recalculate formula results. So, if you update a cell and you have a formula that uses this cell to calculate something somewhere else in the spreadsheet, this value will not reflect the chance, when the spreadsheet is loaded into into Excel. Getting Excel to reevaluate formulas is not straight-forward, so we will work around that. By deleting all calculated cells in the spreadsheet, we can force Excel to reevaluate all of them when the document is opened. Here is a little method that does exactly that.

// remove all values of cells with formulas on a sheet
// so that Excel refreshes the them upon Open
public static void ClearAllValuesInSheet
      (SpreadsheetDocument spreadSheet, string sheetName)
{
    WorksheetPart worksheetPart =
        GetWorksheetPartByName(spreadSheet, sheetName);

    foreach (Row row in
       worksheetPart.Worksheet.
          GetFirstChild().Elements())
    {
	foreach (Cell cell in row.Elements())
	{
	    if (cell.CellFormula != null &&
	          cell.CellValue != null)
	    {
		cell.CellValue.Remove();
	    }
	}

    }

    worksheetPart.Worksheet.Save();
}

This concludes the Open XML Format SDK primer.

Get the complete source code for my OpenXMLHelper.cs class here.

Related Posts:

  • OpenVPN
  • Enphase Envoy Local Access
  • My USPS Certified Mail Experience Explained
  • The Voip.ms SMS Integration for Home Assistant
  • SUTAB Scam?

Software Engineering Excel SDKOpen XMLOpen XML Format SDKXML Format

Post navigation

Previous post
Next post

Comments (27)

  1. Jona Kee says:
    April 10, 2009 at 10:37 am

    Hi Christian,

    Before the Microsoft Open XML SDK existed I’ve always found the cleanest (but not the simplest) solution to generating Office Documents from the server was with XSLT.

    At previous clients, we were able to develop transforms that created SpreadsheetML. Opened in Excel, the end result was multi-tabbed workbooks with formatting, formulas and bound charts. This is a nice solution that doesn’t require compiled code changes or redeployments when a report needs to get changed; simply update the XSLT.

    If you have experience with XSLT this can be a really solid solution. If you dont, you can always save the existing XLS file as XML and use that as a guide.

    Simple example: http://msdn.microsoft.com/en-us/library/bb226693(office.11).aspx

  2. Christian Donner says:
    April 10, 2009 at 12:01 pm

    Hi Jona,
    I don’t see the strength of the Open XML SDK in generating spreadsheets from scratch. XSLT is probably the better choice for that. But the SDK is great for manipulating existing documents.

  3. Tej says:
    May 7, 2009 at 1:35 pm

    Do you have working code for the snippets you have above? I am getting errors when I try to compile your code into a library class.

  4. Christian Donner says:
    May 7, 2009 at 2:21 pm

    I sent you an email with the source for a C# class. This is still prototype code. Let me know if you need examples for how to use it.

  5. Burak says:
    June 17, 2009 at 2:55 pm

    Hello,

    I am trying to create an excel file from scratch using openxml.

    Dim sheetData As SheetData = worksheetPart.Worksheet.GetFirstChild(Of SheetData)()

    blows up, because workSheetPart.WorkSheet is null.

    How can I add a WorkSheet to WorkSheetpart and add SheetData to Worksheet?

    Thank you,

    Burak

    Using myWorkBook As SpreadsheetDocument = SpreadsheetDocument.Create(FilePath, SpreadsheetDocumentType.Workbook)

    myWorkBook.AddWorkbookPart()

    myWorkBook.WorkbookPart.AddNewPart(Of WorksheetPart)(“TestSheet1”)

    Dim worksheetPart As WorksheetPart = myWorkBook.WorkbookPart.WorksheetParts.First

    Dim sheetData As SheetData = worksheetPart.Worksheet.GetFirstChild(Of SheetData)()

    Dim contentRow As New Row

    Dim firstCell As New Cell
    firstCell.DataType = CellValues.InlineString
    Dim cellStr As New InlineString
    Dim t As Text = New Text
    t.Text = “Hello world”
    cellStr.AppendChild(t)
    firstCell.AppendChild(cellStr)

    contentRow.AppendChild(firstCell)

    sheetData.AppendChild(contentRow)

    End Using

  6. Christian Donner says:
    June 17, 2009 at 3:29 pm

    Burak, check out these links:

    http://msdn.microsoft.com/en-us/library/cc881781(office.14).aspx

    http://openxmldeveloper.org/articles/Creating_Spreadsheets_Server.aspx

  7. Burak says:
    June 17, 2009 at 4:23 pm

    Hello Christian,

    I used ExcelPackage but it has shortcomings and it’s not being added to more (which is shame) , so that’s why I started delving into OpenXml.

    I looked at the first site and modified my code as follows

    Dim spreadSheet As SpreadsheetDocument = SpreadsheetDocument.Create(FilePath, SpreadsheetDocumentType.Workbook)

    Using (spreadSheet)

    spreadSheet.AddWorkbookPart()

    Dim newWorksheetPart As WorksheetPart = spreadSheet.WorkbookPart.AddNewPart(Of WorksheetPart)()
    newWorksheetPart.Worksheet = New Worksheet(New SheetData())
    newWorksheetPart.Worksheet.Save()

    Dim sheets As Sheets = spreadSheet.WorkbookPart.Workbook.GetFirstChild(Of Sheets)()

    it blows up this time with
    “object reference not set to an instance of an object”

    I guess it’s complaining about sheets.

    How do I create and add sheets?

    Thanks,

    Burak

  8. Burak says:
    June 17, 2009 at 5:09 pm

    I also noticed when we call spreadSheet.AddWorkbookPart()

    the workbook inside workbookpart shows up as being set to nothing and i think that’s what’s causing

    spreadSheet.WorkbookPart.Workbook.GetFirstChild(Of Sheets)()

    to blow up

    i looked inside workbookpart object with reflector but it’s constructor is not creating a workbook

    i am confused

  9. Christian Donner says:
    June 17, 2009 at 8:21 pm

    I don’t have time to look into your code, but it appears that you are not creating a workbook. The hierarchy is document-workbook-workbookpart.

  10. Jim McGregor says:
    June 22, 2009 at 5:05 pm

    Christian,

    I’m trying to iterate through the cells of a simple spreadsheet to get some values. Your examples look very useful, but I am running into errors when I try to use your code snippets. A working example would be very helpful.

    Regards — Jim

  11. Christian Donner says:
    June 22, 2009 at 9:25 pm

    Jim, I will email you a .cs class file that compiles, and I will add a link to this post for others.

  12. Laurent says:
    September 9, 2009 at 8:00 am

    Dear Christian,

    Your article is great. I have used it as a source of inspiration for my own code. The ClearAllValuesInSheet has been very helpful for me. I am now trying to do the same with arrays. Using the same code on the example below, I get rid of the “v” tag for C16. Once I have updated workbook.xml with a new array {9,6} under the definedname ITICTRD for instance, I get 9 in C16, which is great but I still get 0 in D16. If I remove the “v” tag from D16 and zip, D16 remains empty.

    -ITICTRD

    0

    Have you ever been through the array issue? If yes, it would be great if you could share your key findings.

    Thanks.

    Laurent

  13. Laurent says:
    September 9, 2009 at 8:01 am

    This is the example from sheet1.xml
    It was not taken into account into my previous post because of tags

    “”
    “-ITICTRD”
    “”
    “”
    “0”
    “”

  14. Christian Donner says:
    September 9, 2009 at 8:02 am

    Sorry, Laurent, I have not done anything with the SDK lately and can’t help you there.

  15. KING says:
    October 10, 2009 at 1:57 am

    HI?
    How to delete worksheet?

  16. Roozy says:
    November 19, 2009 at 5:14 am

    Hi Christian,

    I am trying to use open xml format to recognize the repeating pattern in a set of powerpoint presentations so that I can later use the repeating parts to make a template like presentation and spare the users time instead of starting from scratch. Do you think open xml is a good idea for this purpose?

  17. Christian Donner says:
    November 19, 2009 at 6:55 pm

    Roozy, I don’t know the Powerpoint API of the Open XML SDK, but if what you want to do can be done, I am sure it can be done with Open XML.

  18. Ryan says:
    July 13, 2010 at 2:02 pm

    Hi Christian,
    UpdateCell doesn’t seem to work for me once I get passed ‘Z’ for the column name. For example ‘AA’ doesn’t seem to work correctly. Is this a known problem?

    Thanks,

  19. Ryan says:
    July 13, 2010 at 3:57 pm

    Nevermind, I found the problem. This line of code is the culprit in GetCell():

    if (string.Compare(ecell.CellReference.Value, cellReference, true) > 0)

    This will return true when comparing B1 to AA1.

  20. Florian says:
    June 27, 2011 at 7:04 am

    Hi,

    i’m using the OpenXML SDK 2.0 and i have a little problem with the columns. However i come to the column AA or AB the first columns from B to Z are empty.
    Excel give me an error message that there is something unreadable in the sheet. But A , AA and AB isn’t empty.

    Thanks

  21. Florian says:
    June 27, 2011 at 7:29 am

    Oh sorry,

    i have found the error,
    Ryan had the same problem.

    Thanks

  22. Florian says:
    July 4, 2011 at 2:52 am

    Hi Christian,
    i have another question, I’m looking to a way to export some summations. I only have found the example in the SDK, but there is no way to export the formula. Is there another way to export it or is there another DataType for the formula?

    Thanks

  23. jared says:
    October 27, 2011 at 10:16 am

    Thanks. Florian and Ryan, I am having the same issue. How did you solve it? I found the line, but don’t know the work around.

  24. jared says:
    October 27, 2011 at 10:34 am

    Hi guys, I was able to figure it out. I just compared the length of each cell name and only do the second comparison if the lengths are equal. If one has a length of 3 (AA1), and the other has a length of 2 (B1) skip the if statement.

    if (string.Compare(ecell.CellReference.Value, cellReference, true) > 0)

    becomes:

    if ((ecell.CellReference.Value.Length == cellReference.Length) && (string.Compare(ecell.CellReference.Value, cellReference, true) > 0))

  25. Pingback: OpenXML SDK: Make Excel recalculate formula | SeekPHP.com
  26. Pingback: Splitting Excel File Using VB | Yonts Answers
  27. Raul Chico says:
    September 23, 2015 at 4:58 pm

    Thanks for the code!!!!!

Leave a Reply

Your email address will not be published. Required fields are marked *

Pages

  • About
  • Awards
    • TechnoLawyer
  • Contact Christian Donner
  • Project Portfolio
  • Publications
  • Speaking Engagements

Recent Comments

  • Christian Donner on Sealing a leaky cast-iron fireplace chimney damper
  • Eric on Sealing a leaky cast-iron fireplace chimney damper
  • Christian Donner on Contact Christian Donner
  • Max on Contact Christian Donner
  • Christian Donner on Contact Christian Donner

Tags

AHCI Amazon Android ASP.Net AT&T Droid Drupal email Error failure featured firmware Garmin Godaddy Google honda Internet Explorer 8 iPhone Lenovo Lisa Lanett Modules NAS Nexus One Paypal Performance Privacy QNAP raid RS-407 sauna Security spam SQL SR3600 Synology T-Mobile T430s transmission tylö Verizon Virus VMWare Windows 7 windows 8.1 Windows Mobile
  • About
  • Awards
    • TechnoLawyer
  • Contact Christian Donner
  • Project Portfolio
  • Publications
  • Speaking Engagements
©2025 Donner's Daily Dose of Drama | WordPress Theme by SuperbThemes