zoukankan      html  css  js  c++  java
  • 使用DocumentFormat.OpenXml操作Excel文件.xlsx

    1.开始

    DocumentFormat.OpenXml是ms官方给一个操作office三大件新版文件格式(.xlsx,.docx,.pptx)的组件;特色是它定义了OpenXml所包含的所有对象(たぶん),能做到精确微调文件内容格式;因此它没有EppPlus那么容易上手,性能也很看使用者的水平。。

    DocumentFormat.OpenXml的语法很接近直接操作xml,所以使用它来操作Excel,得先熟悉Excel的xml文档结构:

    ↑已经忘记从哪里找来的了; WorkbookPart包含4个重要子节点:

    • WorkSheetPart:表格数据内容就在这里面,结构最复杂的部分,Workheet的子节点除了Colmns、SheetData还有合并单元格集合MergeCells(图中缺失);
    • WorkSheet:存放表单id及命名(sheet1, Sheet2...),这里有excel的坑,如果包含多个Sheet直接Sheets.First()有可能获取到最后一张Sheet,最好根据Name来搜索;
    • WorkbootStylePart:存放样式;
    • SharedStringTablePart(上图中缺失):共享字符串集合,字符串默认会存在里面,然后Cell引用其数组下标,这也是类似保存1w行"一二三亖"的.xlsx比.txt小的原因
    了解了这些想用DocumentFormat.OpenXml从零开始new一个样式还看得过去的Excel文档依然很麻烦(相信用EppPlus、NPOI也一样),而且OpenXml的同级节点还强调先后顺序,错了用excel打开就会报错,相信没人想这么做;正确的姿势应该是加载模板写入数据另存为,模板中把要用到的样式都定义好,然后用代码拷贝对应单元格的样式。
     
    先定义一些扩展方法,不然用起来会很累:
      1 using System;
      2 using System.Collections.Generic;
      3 using System.Linq;
      4 using System.Xml;
      5 using DocumentFormat.OpenXml;
      6 using DocumentFormat.OpenXml.Packaging;
      7 using DocumentFormat.OpenXml.Spreadsheet;
      8 
      9 namespace EOpenXml
     10 {
     11     public static class OpenXmlExcelExtentions
     12     {
     13         public static Sheet GetSheet(this WorkbookPart workbookPart, string sheetName)
     14         {
     15             return workbookPart.Workbook
     16                 .GetFirstChild<Sheets>()
     17                 .Elements<Sheet>().Where(s => s.Name == sheetName).FirstOrDefault();
     18         }
     19 
     20         /// <summary>
     21         /// Given a worksheet and a row index, return the row.
     22         /// </summary>
     23         /// <param name="sheetData"></param>
     24         /// <param name="rowIndex"></param>
     25         /// <returns></returns>
     26         public static Row GetRow(this SheetData sheetData, uint rowIndex)
     27         {
     28             return sheetData.
     29                   Elements<Row>().Where(r => r.RowIndex == rowIndex).FirstOrDefault();
     30         }
     31         public static Cell GetCell(this SheetData sheetData, string columnName, uint rowIndex)
     32         {
     33             Row row = GetRow(sheetData, rowIndex);
     34 
     35             if (row == null)
     36                 return null;
     37 
     38             return row.Elements<Cell>().Where(c => string.Compare
     39                       (c.CellReference.Value, columnName +
     40                       rowIndex, true) == 0).FirstOrDefault();
     41         }
     42 
     43         // https://msdn.microsoft.com/en-us/library/office/cc861607.aspx
     44         // Given a column name, a row index, and a WorksheetPart, inserts a cell into the worksheet. 
     45         // If the cell already exists, returns it. 
     46         public static Cell GetOrCreateCell(this SheetData sheetData, string columnName, uint rowIndex)
     47         {
     48             string cellReference = columnName + rowIndex;
     49 
     50             // If the worksheet does not contain a row with the specified row index, insert one.
     51             Row row;
     52             if (sheetData.Elements<Row>().Where(r => r.RowIndex == rowIndex).Count() != 0)
     53             {
     54                 row = sheetData.Elements<Row>().Where(r => r.RowIndex == rowIndex).First();
     55             }
     56             else
     57             {
     58                 row = new Row() { RowIndex = rowIndex };
     59                 sheetData.Append(row);
     60             }
     61 
     62             return row.GetOrCreateCell(cellReference);
     63         }
     64         public static Cell GetOrCreateCell(this Row row, string cellReference)
     65         {
     66             // If there is not a cell with the specified column name, insert one.  
     67             if (row.Elements<Cell>().Where(c => c?.CellReference?.Value == cellReference).Count() > 0)
     68             {
     69                 return row.Elements<Cell>().Where(c => c.CellReference.Value == cellReference).First();
     70             }
     71             else
     72             {
     73                 // Cells must be in sequential order according to CellReference. Determine where to insert the new cell.
     74                 Cell refCell = null;
     75                 foreach (Cell cell in row.Elements<Cell>())
     76                 {
     77                     if (cell.CellReference.Value.Length == cellReference.Length)
     78                     {
     79                         if (string.Compare(cell.CellReference.Value, cellReference, true) > 0)
     80                         {
     81                             refCell = cell;
     82                             break;
     83                         }
     84                     }
     85                 }
     86 
     87                 Cell newCell = new Cell() { CellReference = cellReference };
     88                 row.InsertBefore(newCell, refCell);
     89                 return newCell;
     90             }
     91         }
     92 
     93         public static string GetValue(this Cell cell, SharedStringTablePart shareStringPart)
     94         {
     95             if (cell == null)
     96                 return null;
     97             string cellvalue = cell.InnerText;
     98             if (cell.DataType != null)
     99             {
    100                 if (cell.DataType == CellValues.SharedString)
    101                 {
    102                     int id = -1;
    103                     if (Int32.TryParse(cellvalue, out id))
    104                     {
    105                         SharedStringItem item = GetItem(shareStringPart, id);
    106                         if (item.Text != null)
    107                         {
    108                             //code to take the string value  
    109                             cellvalue = item.Text.Text;
    110                         }
    111                         else if (item.InnerText != null)
    112                         {
    113                             cellvalue = item.InnerText;
    114                         }
    115                         else if (item.InnerXml != null)
    116                         {
    117                             cellvalue = item.InnerXml;
    118                         }
    119                     }
    120                 }
    121             }
    122             return cellvalue;
    123         }
    124         public static string GetValue(this Cell cell, string[] shareStringPartValues)
    125         {
    126             if (cell == null)
    127                 return null;
    128             string cellvalue = cell.InnerText;
    129             if (cell.DataType != null)
    130             {
    131                 if (cell.DataType == CellValues.SharedString)
    132                 {
    133                     int id = -1;
    134                     if (Int32.TryParse(cellvalue, out id))
    135                     {
    136                         cellvalue = shareStringPartValues[id];
    137                     }
    138                 }
    139             }
    140             return cellvalue;
    141         }
    142 
    143         public static Cell SetValue(this Cell cell, object value = null, SharedStringTablePart shareStringPart = null, int shareStringItemIndex = -1, uint styleIndex = 0)
    144         {
    145             if (value == null)
    146             {
    147                 cell.CellValue = new CellValue();
    148                 if (shareStringItemIndex != -1)
    149                 {
    150                     cell.CellValue = new CellValue(shareStringItemIndex.ToString());
    151                     cell.DataType = new EnumValue<CellValues>(CellValues.SharedString);
    152                 }
    153             }
    154             else if (value is string str)
    155             {
    156                 if (shareStringPart == null)
    157                 {
    158                     cell.CellValue = new CellValue(str);
    159                     cell.DataType = new EnumValue<CellValues>(CellValues.String);
    160                 }
    161                 else
    162                 {
    163                     // Insert the text into the SharedStringTablePart.
    164                     int index = shareStringPart.GetOrInsertItem(str, false);
    165                     // Set the value of cell
    166                     cell.CellValue = new CellValue(index.ToString());
    167                     cell.DataType = new EnumValue<CellValues>(CellValues.SharedString);
    168                 }
    169             }
    170             else if (value is int || value is short || value is long ||
    171               value is float || value is double || value is uint ||
    172               value is ulong || value is ushort || value is decimal)
    173             {
    174                 cell.CellValue = new CellValue(value.ToString());
    175                 cell.DataType = new EnumValue<CellValues>(CellValues.Number);
    176             }
    177             else if (value is DateTime date)
    178             {
    179                 cell.CellValue = new CellValue(date.ToString("yyyy-MM-dd")); // ISO 861
    180                 cell.DataType = new EnumValue<CellValues>(CellValues.Date);
    181             }
    182             else if (value is XmlDocument xd)
    183             {
    184                 if (shareStringPart == null)
    185                 {
    186                     throw new Exception("Param [shareStringPart] can't be null when value type is XmlDocument.");
    187                 }
    188                 else
    189                 {
    190                     int index = shareStringPart.GetOrInsertItem(xd.OuterXml, true);
    191                     // Set the value of cell
    192                     cell.CellValue = new CellValue(index.ToString());
    193                     cell.DataType = new EnumValue<CellValues>(CellValues.SharedString);
    194                 }
    195             }
    196 
    197             if (styleIndex != 0)
    198                 cell.StyleIndex = styleIndex;
    199 
    200             return cell;
    201         }
    202 
    203         // https://msdn.microsoft.com/en-us/library/office/gg278314.aspx
    204         // Given text and a SharedStringTablePart, creates a SharedStringItem with the specified text 
    205         // and inserts it into the SharedStringTablePart. If the item already exists, returns its index.
    206         public static int GetOrInsertItem(this SharedStringTablePart shareStringPart, string content, bool isXml)
    207         {
    208             // If the part does not contain a SharedStringTable, create one.
    209             if (shareStringPart.SharedStringTable == null)
    210             {
    211                 shareStringPart.SharedStringTable = new SharedStringTable();
    212             }
    213 
    214             int i = 0;
    215 
    216             // Iterate through all the items in the SharedStringTable. If the text already exists, return its index.
    217             foreach (SharedStringItem item in shareStringPart.SharedStringTable.Elements<SharedStringItem>())
    218             {
    219                 if ((!isXml && item.InnerText == content) || (isXml && item.OuterXml == content))
    220                 {
    221                     return i;
    222                 }
    223 
    224                 i++;
    225             }
    226 
    227             // The text does not exist in the part. Create the SharedStringItem and return its index.
    228             if (isXml)
    229                 shareStringPart.SharedStringTable.AppendChild(new SharedStringItem(content));
    230             else
    231                 shareStringPart.SharedStringTable.AppendChild(new SharedStringItem(new Text(content)));
    232             shareStringPart.SharedStringTable.Save();
    233 
    234             return i;
    235         }
    236         private static SharedStringItem GetItem(this SharedStringTablePart shareStringPart, int id)
    237         {
    238             return shareStringPart.SharedStringTable.Elements<SharedStringItem>().ElementAt(id);
    239         }
    240         
    241         /// <summary>
    242         ///  https://docs.microsoft.com/en-us/office/open-xml/how-to-merge-two-adjacent-cells-in-a-spreadsheet
    243         /// </summary>
    244         /// <param name="worksheet"></param>
    245         /// <returns></returns>
    246         public static MergeCells GetOrCreateMergeCells(this Worksheet worksheet)
    247         {
    248             MergeCells mergeCells;
    249             if (worksheet.Elements<MergeCells>().Count() > 0)
    250             {
    251                 mergeCells = worksheet.Elements<MergeCells>().First();
    252             }
    253             else
    254             {
    255                 mergeCells = new MergeCells();
    256 
    257                 // Insert a MergeCells object into the specified position.
    258                 if (worksheet.Elements<CustomSheetView>().Count() > 0)
    259                 {
    260                     worksheet.InsertAfter(mergeCells, worksheet.Elements<CustomSheetView>().First());
    261                 }
    262                 else if (worksheet.Elements<DataConsolidate>().Count() > 0)
    263                 {
    264                     worksheet.InsertAfter(mergeCells, worksheet.Elements<DataConsolidate>().First());
    265                 }
    266                 else if (worksheet.Elements<SortState>().Count() > 0)
    267                 {
    268                     worksheet.InsertAfter(mergeCells, worksheet.Elements<SortState>().First());
    269                 }
    270                 else if (worksheet.Elements<AutoFilter>().Count() > 0)
    271                 {
    272                     worksheet.InsertAfter(mergeCells, worksheet.Elements<AutoFilter>().First());
    273                 }
    274                 else if (worksheet.Elements<Scenarios>().Count() > 0)
    275                 {
    276                     worksheet.InsertAfter(mergeCells, worksheet.Elements<Scenarios>().First());
    277                 }
    278                 else if (worksheet.Elements<ProtectedRanges>().Count() > 0)
    279                 {
    280                     worksheet.InsertAfter(mergeCells, worksheet.Elements<ProtectedRanges>().First());
    281                 }
    282                 else if (worksheet.Elements<SheetProtection>().Count() > 0)
    283                 {
    284                     worksheet.InsertAfter(mergeCells, worksheet.Elements<SheetProtection>().First());
    285                 }
    286                 else if (worksheet.Elements<SheetCalculationProperties>().Count() > 0)
    287                 {
    288                     worksheet.InsertAfter(mergeCells, worksheet.Elements<SheetCalculationProperties>().First());
    289                 }
    290                 else
    291                 {
    292                     worksheet.InsertAfter(mergeCells, worksheet.Elements<SheetData>().First());
    293                 }
    294                 worksheet.Save();
    295             }
    296             return mergeCells;
    297         }
    298 
    299         /// <summary>
    300         ///  Given the names of two adjacent cells, merges the two cells.
    301         ///  Create the merged cell and append it to the MergeCells collection.
    302         ///  When two cells are merged, only the content from one cell is preserved:
    303         ///  the upper-left cell for left-to-right languages or the upper-right cell for right-to-left languages.
    304         /// </summary>
    305         /// <param name="mergeCells"></param>
    306         /// <param name="cell1Name"></param>
    307         /// <param name="cell2Name"></param>
    308         public static void MergeTwoCells(this MergeCells mergeCells, string cell1Name, string cell2Name)
    309         {
    310             MergeCell mergeCell = new MergeCell() { Reference = new StringValue(cell1Name + ":" + cell2Name) };
    311             mergeCells.Append(mergeCell);
    312         }
    313 
    314         public static IEnumerable<string> GetItemValues(this SharedStringTablePart shareStringPart)
    315         {
    316             foreach (var item in shareStringPart.SharedStringTable.Elements<SharedStringItem>())
    317             {
    318                 if (item.Text != null)
    319                 {
    320                     //code to take the string value  
    321                     yield return item.Text.Text;
    322                 }
    323                 else if (item.InnerText != null)
    324                 {
    325                     yield return item.InnerText;
    326                 }
    327                 else if (item.InnerXml != null)
    328                 {
    329                     yield return item.InnerXml;
    330                 }
    331                 else
    332                 {
    333                     yield return null;
    334                 }
    335             };
    336         }
    337         public static XmlDocument GetCellAssociatedSharedStringItemXmlDocument(this SheetData sheetData, string columnName, uint rowIndex, SharedStringTablePart shareStringPart)
    338         {
    339             Cell cell = GetCell(sheetData, columnName, rowIndex);
    340             if (cell == null)
    341                 return null;
    342             if (cell.DataType == CellValues.SharedString)
    343             {
    344                 int id = -1;
    345                 if (Int32.TryParse(cell.InnerText, out id))
    346                 {
    347                     SharedStringItem ssi = shareStringPart.GetItem(id);
    348                     var doc = new XmlDocument();
    349                     doc.LoadXml(ssi.OuterXml);
    350                     return doc;
    351                 }
    352             }
    353             return null;
    354         }
    355     }
    356 }
    View Code

    2.插入数据

     1         private static void GenerateExcel()
     2         {
     3             using (MemoryStream mem = new MemoryStream())
     4             {
     5                 using (var temp = File.OpenRead(@"E:	emplate.xlsx"))
     6                 {
     7                     temp.CopyTo(mem);
     8                 }
     9 
    10                 using (SpreadsheetDocument doc = SpreadsheetDocument.Open(mem, true))
    11                 {
    12                     WorkbookPart wbPart = doc.WorkbookPart;
    13                     Worksheet worksheet = wbPart.WorksheetParts.First().Worksheet;
    14                     //statement to get the sheetdata which contains the rows and cell in table  
    15                     SheetData sheetData = worksheet.GetFirstChild<SheetData>();
    16 
    17                     SharedStringTablePart shareStringPart;
    18                     if (wbPart.GetPartsOfType<SharedStringTablePart>().Any())
    19                         shareStringPart = wbPart.GetPartsOfType<SharedStringTablePart>().First();
    20                     else
    21                         shareStringPart = wbPart.AddNewPart<SharedStringTablePart>();
    22                     //假设模板第一行是Title不用动,把要用到的样式都定义在了第二行的单元格里
    23                     var secondRow = sheetData.GetRow(2);
    24                     uint[] lineStyles = secondRow.Elements<Cell>().Select(c => c.StyleIndex.Value).ToArray();
    25                     sheetData.RemoveChild(secondRow);
    26                     //从第二行开始循环插入4列1000数据
    27                     uint currentRowIndex = 2;
    28                     for (int i = 0;i<1000;i++)
    29                     {
    30                         Row row = new Row();
    31                         row.RowIndex = currentRowIndex;//设置行号
    32                         row.AppendChild(new Cell().SetValue(1, shareStringPart, styleIndex: lineStyles[0]));
    33                         row.AppendChild(new Cell().SetValue(DateTime.Now, shareStringPart, styleIndex: lineStyles[1]));
    34                         row.AppendChild(new Cell().SetValue(3.1415926535, shareStringPart, styleIndex: lineStyles[2]));
    35                         row.AppendChild(new Cell().SetValue("通商宽衣", shareStringPart, styleIndex: lineStyles[3]));//这里慢
    36                         sheetData.AppendChild(row);
    37                         currentRowIndex++;
    38                     }
    39                     wbPart.Workbook.Save();
    40                     doc.SaveAs($@"E:Temp_{DateTime.Now.ToString("yyMMddHHmm")}.xlsx");
    41                     doc.Close();
    42                 }
    43                 mem.Close();
    44             }
    45         }

    以上就生成了一个Excel打开不会报任何格式错误提示的标准.xlsx文件;但有需要优化的地方:在每次插入字符串的时候会去循环共享字符集,调用shareStringPart.GetItemValues().ToArray()可以将集合全部存到数组或Dictionary<string,int>里面会快很多,如果清楚集合内容就不用去判断重复了,当然也可以简单粗暴的保存为CellValues.InlineString,这样在重复字符串比较多的时候两种方式所生成的文件大小会有很大差异

     3.快速遍历

     1         public static void Read()
     2         {
     3             using (var sd = SpreadsheetDocument.Open(@"E:	emp.xlsx", false))
     4             {
     5                 WorkbookPart wbPart = sd.WorkbookPart;
     6                 SharedStringTablePart shareStringPart;
     7                 if (wbPart.GetPartsOfType<SharedStringTablePart>().Count() > 0)
     8                     shareStringPart = wbPart.GetPartsOfType<SharedStringTablePart>().First();
     9                 else
    10                     shareStringPart = wbPart.AddNewPart<SharedStringTablePart>();
    11                 string[] shareStringItemValues = shareStringPart.GetItemValues().ToArray();
    12 
    13                 WorksheetPart worksheetPart = wbPart.WorksheetParts.First();
    14                 uint dataRowStart = 2;//数据开始行
    15                 OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
    16                 while (reader.Read())
    17                 {
    18                     if (reader.ElementType == typeof(Worksheet))
    19                     {
    20                         reader.ReadFirstChild();
    21                     }
    22 
    23                     if (reader.ElementType == typeof(Row))
    24                     {
    25                         Row r = (Row)reader.LoadCurrentElement();
    26                         if (r.RowIndex < dataRowStart)
    27                             continue;
    28                         foreach (Cell c in r.Elements<Cell>())
    29                         {
    30                             if (c.CellReference != null && c.CellReference.HasValue)
    31                             {
    32                                 string cv = c.GetValue(shareStringItemValues);
    33                                 Console.WriteLine(cv);
    34                                 if (c.CellReference.Value == "B" + r.RowIndex)
    35                                     Console.WriteLine("刚读取的是B列");
    36                             }
    37                         }
    38                     }
    39                 }
    40                 sd.Close();
    41             }
    42         }

    4.总结

    DocumentFormat.OpenXml不友好但操作透明,使用前最好先自行封装下,习惯之后相信能用的很爽。

    以上です。

  • 相关阅读:
    HTML
    短信发送平台-阿里大于
    java基础练习题
    2019年让程序员崩溃的 60 个瞬间,笑死我了
    JDBC连接时出现的问题总结
    Java 学习笔记 IO流与File操作
    Java 学习笔记 两大集合框架Map和Collection
    我的github博客地址
    重新认识mapreduce
    java打字游戏
  • 原文地址:https://www.cnblogs.com/CameronWu/p/10658198.html
Copyright © 2011-2022 走看看