Extract Formatted Text From Excel Cell With C# (Rich Text Format)

by filip 16. September 2009 17:17

I was writing an application that needed to convert text in a cell in an Excel workbook to HTML. It is fairly trivial to get formatting for the entire cell, but each individual character in the cell could have different formatting itself, so I needed something more specific than cell-level formatting info.

At first, I started using the Excel.Range.get_Characters( pos, len ) method to get info out of the cell.  The code would loop through all characters, get them one by one, and check the formatting.  For example:

   1: Microsoft.Office.Interop.Excel.Range Range = (Microsoft.Office.Interop.Excel.Range)Cell;
   2: int TextLength = Range.Text.ToString().Length;
   3: for (int CharCount = 1; CharCount <= TextLength; CharCount++)
   4: {
   5:     Microsoft.Office.Interop.Excel.Characters charToTest = Range.get_Characters(CharCount, 1);
   6:     bool IsBold = (bool)charToTest.Font.Bold;
   7:     bool IsItalic = (bool)charToTest.Font.Italic;
   8:     // other formatting tests here
   9: }

However, that method proved to be incredibly slow for cells that have more than just a few characters.  For cells that have 1000+ characters, it would take several minutes to run the test across all characters. I kept playing around with different ways to speed up the whole process, but it just became apparent that making the call to Excel to get all of this information was not going to be acceptable.

Finally, I think I’ve found the solution. It is possible to copy the text from a cell to the clipboard, and then use the Clipboard class to retrieve the formatted text, and parse it with C#. I ended up using the System.Windows.DataFormats.Rtf format to extract the data from the clipboard in the following way:

string rtf = (string)System.Windows.Clipboard.GetData(System.Windows.DataFormats.Rtf);

Then, I create a System.Windows.Forms.RichTextBox, and use that to parse the data. The following is a sample of the solution, and it is reasonably quick.

   1: Microsoft.Office.Interop.Excel.Range Range = (Microsoft.Office.Interop.Excel.Range)Cell;
   2: Range.Copy(System.Reflection.Missing.Value);
   3:             
   4: string rtf = (string)System.Windows.Clipboard.GetData(System.Windows.DataFormats.Rtf);
   5: System.Windows.Forms.RichTextBox rtb = new System.Windows.Forms.RichTextBox();
   6: rtb.Rtf = rtf;
   7:             
   8: int CharCount = rtb.Text.Length;
   9:  
  10: for (int CharNum = 0; CharNum < CharCount; CharNum++)
  11: {
  12:    rtb.Select(CharNum, 1);
  13:    System.Drawing.Font Font = rtb.SelectionFont;
  14:    bool IsCharBold = Font.Bold;
  15:    bool IsCharUnderline = Font.Underline;
  16:    bool IsCharItalic = Font.Italic;
  17:  
  18:    // other code here
  19: }

Tags: , , ,

Windows Development

About Filip Stanek

Death Note Pic I'm a developer at ACG in Cincinnati, OH. I like ASP.NET, Flash, and other web technologies, & enjoy playing chess, video games, etc.

Currently playing:
- StarCraft IIE-mail me Send mail

Disqus

Month List