zoukankan      html  css  js  c++  java
  • Excel 2007 底层实现方式

    一、EXCEL的底层实现

    能力有限,了解的比较浅,有不足之处望指正,首先看下图:

    一、 excel2007是使用xml格式来存储的,把一个excel文件后缀改为.zip,打开之后就直接可以看到一个excel文件对应的xml格式的文件了。

    1.1、docProps目录

    1.1.1、 core.xml文件

    <?xml version="1.0" encoding="utf-8"?>
    <cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties" xmlns:dc="http://purl.org/dc/elements/1.1/" 
      xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dcmitype="http://purl.org/dc/dcmitype/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
      <!-- 创建人 -->
      <dc:creator>Administrator</dc:creator>
      <!-- 最后修改人 -->
      <cp:lastModifiedBy>Administrator</cp:lastModifiedBy>
      <!-- 创建时间 -->
      <dcterms:created xsi:type="dcterms:W3CDTF">2008-09-11T17:22:52Z</dcterms:created>
      <!-- 修改时间 -->
      <dcterms:modified xsi:type="dcterms:W3CDTF">2017-03-31T15:29:39Z</dcterms:modified>
    </cp:coreProperties>
    View Code

    1.1.2、 app.xml文件

    <?xml version="1.0" encoding="utf-8"?>
    <Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties" 
      xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes">
        <!-- 文档类型 -->
        <Application>Microsoft Excel</Application>
        <!-- 是否加密 -->
        <DocSecurity>0</DocSecurity>
        <ScaleCrop>false</ScaleCrop>
        <HeadingPairs>
            <vt:vector size="2" baseType="variant">
                <vt:variant>
                    <vt:lpstr>工作表</vt:lpstr>
                </vt:variant>
                <vt:variant>
                    <vt:i4>3</vt:i4>
                </vt:variant>
            </vt:vector>
        </HeadingPairs>
        <TitlesOfParts>
            <vt:vector size="3" baseType="lpstr">
                <vt:lpstr>Sheet1</vt:lpstr>
                <vt:lpstr>Sheet2</vt:lpstr>
                <vt:lpstr>Sheet3</vt:lpstr>
            </vt:vector>
        </TitlesOfParts>
        <Company/>
        <LinksUpToDate>false</LinksUpToDate>
        <!-- 是否共享 -->
        <SharedDoc>false</SharedDoc>
        <HyperlinksChanged>false</HyperlinksChanged>
        <!-- 版本 -->
        <AppVersion>12.0000</AppVersion>
    </Properties>
    View Code

     1.2、 xl目录,该目录保存的就是excel中的详细内容了

    1.2.1、workbook.xml文件

      文件包含一对<sheets>标签,其中的每个<sheet>元素都代表Excel 2007文件中的一个sheet,工作表的名称就是其name属性的值,这里有三个sheet

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" 
        xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
        <fileVersion appName="xl" lastEdited="4" lowestEdited="4" rupBuild="4506"/>
        <workbookPr defaultThemeVersion="124226"/>
        <bookViews>
            <workbookView xWindow="0" yWindow="105" windowWidth="12765" windowHeight="5715"/>
        </bookViews>
        <sheets>
            <sheet name="Sheet1" sheetId="1" r:id="rId1"/>
            <sheet name="Sheet2" sheetId="2" r:id="rId2"/>
            <sheet name="Sheet3" sheetId="3" r:id="rId3"/>
        </sheets>
        <calcPr calcId="125725"/>
    </workbook>
    View Code

    1.2.2、 _rels/workbook.xml.rels

      定义每个sheetid对应的sheet内容文件sheet1.xml,共享的单元格内容文件sharedstring.xml,样式文件style.xml是当前单元格的样式字体,颜色等样式的xml配置。

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
        <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet3.xml"/>
        <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet2.xml"/>
        <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet1.xml"/>
        <Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings" Target="sharedStrings.xml"/>
        <Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/>
        <Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme" Target="theme/theme1.xml"/>
    </Relationships>
    View Code

    1.2.3、 worksheets文件夹,保存的就是excel中的sheet

    1.2.4、sheet.xml文件

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" 
        xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
        <!-- 
            容积,即该sheet中以供占用了多大的地方,哪个单元格到哪个单元格,矩形的左上角单元格和右下角单元格
            此属性可以判断列数,如下就是:(M-A+1)
        -->
        <dimension ref="A1:M8"/>
        <!-- sheet视图的配置信息,如打开时默认选中的单元格是哪个 -->
        <sheetViews>
            <sheetView tabSelected="1" workbookViewId="0">
                <selection activeCell="G12" sqref="G12"/>
            </sheetView>
        </sheetViews>
        <!-- sheet的格式参数 -->
        <sheetFormatPr defaultRowHeight="14.25"/>
        <!-- 行中每列的配置参数 -->
        <cols>
            <col min="1" max="1" width="7.875" customWidth="1"/>
            <col min="2" max="2" width="6.75" customWidth="1"/>
            <col min="3" max="3" width="6" customWidth="1"/>
            <col min="4" max="4" width="4.5" customWidth="1"/>
            <col min="5" max="5" width="8.5" customWidth="1"/>
            <col min="6" max="6" width="9.375" customWidth="1"/>
            <col min="7" max="7" width="13.5" customWidth="1"/>
            <col min="8" max="8" width="7" customWidth="1"/>
            <col min="9" max="9" width="7.625" customWidth="1"/>
            <col min="10" max="10" width="4.625" customWidth="1"/>
            <col min="11" max="11" width="9.5" bestFit="1" customWidth="1"/>
        </cols>
        <!-- 数据区域,这个里面就是我们的单元格的相关数据信息了,是以行(Row)的方式记录的 -->
        <sheetData>
            <!-- excel中的行 -->
            <row r="1" spans="1:13">
                <!-- 
                    excel中的列
                        r:单元格坐标
                        s:单元格的style,对应style.xml中cellXfs元素下的某个元素xf元素,
                          此处就是对应索引为12的元素,从0开始计算,即为第13个xf元素。
                        t:元素类型,s表示是字符串类型,对应到sharedStrings.xml中的sst元素
                -->
                <c r="A1" s="12" t="s">
                    <!-- 
                        如果是t="s" 则存储的是sharedStrings.xml中sst元素下的某个si元素,此处对应索引为0的元素,即第一个si元素
                        否则,这里存储的就是单元格的值 
                    -->
                    <v>0</v>
                </c><c r="B1" s="12" t="s">
                    <v>10</v>
                </c><c r="C1" s="12" t="s">
                    <v>1</v>
                </c><c r="D1" s="12" t="s">
                    <v>2</v>
                </c><c r="E1" s="12" t="s">
                    <v>3</v>
                </c><c r="F1" s="12" t="s">
                    <v>4</v>
                </c><c r="G1" s="12" t="s">
                    <v>5</v>
                </c><c r="H1" s="12" t="s">
                    <v>6</v>
                </c><c r="I1" s="12" t="s">
                    <v>7</v>
                </c><c r="J1" s="12" t="s">
                    <v>8</v>
                </c><c r="K1" s="12" t="s">
                    <v>9</v>
                </c>
            </row>
            <row r="2" spans="1:13">
                <c r="A2" t="s">
                    <v>11</v>
                </c><c r="B2" s="10" t="s">
                    <v>12</v>
                </c><c r="C2" s="1">
                    <v>2</v>
                </c><c r="D2" s="2" t="s">
                    <v>13</v>
                </c><c r="E2" s="3" t="s">
                    <v>14</v>
                </c><c r="F2" s="4">
                    <v>89</v>
                </c><c r="G2" s="5">
                    <v>10</v>
                </c><c r="H2" s="6">
                    <v>12</v>
                </c><c r="I2" s="7">
                    <v>0.2</v>
                </c><c r="J2" s="8">
                    <v>0.5</v>
                </c><c r="K2" s="9">
                    <v>4564645</v>
                </c><c r="L2" s="10"/>
                <c r="M2" s="11"/>
            </row>
            <row r="3" spans="1:13">
                <!-- 
                    此处为什么没有A3?
                    因为excel中常规列如果没有值,此处就不记录,因此在读取excel是应该特别注意,否则可能导致列错乱
                -->
                <c r="B3" s="10"/>
                <c r="C3" s="1"/>
                <c r="D3" s="2"/>
                <c r="E3" s="3"/>
                <c r="F3" s="4"/>
                <c r="G3" s="5"/>
                <c r="H3" s="6"/>
                <c r="I3" s="7"/>
                <c r="J3" s="8"/>
                <c r="K3" s="9"/>
            </row>
            
            <!-- 该行每列都是常规列,因此没有值的就不记录在这里 -->
            <row r="4" spans="1:13">
                <c r="A4" s="12" t="s">
                    <v>15</v>
                </c><c r="E4" t="s">
                    <v>16</v>
                </c>
            </row>
            <!-- 5,6,7 行都是合并再一起的单元格 -->
            <row r="5" spans="1:13">
                <!-- 合并单元格的值记录在起始单元格里,其余单元格置空 -->
                <c r="A5" s="14" t="s">
                    <v>17</v>
                </c><c r="B5" s="13"/>
                <c r="C5" s="13"/>
                <c r="D5" s="13"/>
                <c r="E5" s="13"/>
                <c r="F5" s="13"/>
                <c r="G5" s="13"/>
                <c r="H5" s="13"/>
                <c r="I5" s="13"/>
                <c r="J5" s="13"/>
                <c r="K5" s="13"/>
            </row>
            <row r="6" spans="1:13">
                <c r="A6" s="13"/>
                <c r="B6" s="13"/>
                <c r="C6" s="13"/>
                <c r="D6" s="13"/>
                <c r="E6" s="13"/>
                <c r="F6" s="13"/>
                <c r="G6" s="13"/>
                <c r="H6" s="13"/>
                <c r="I6" s="13"/>
                <c r="J6" s="13"/>
                <c r="K6" s="13"/>
            </row>
            <row r="7" spans="1:13">
                <c r="A7" s="13"/>
                <c r="B7" s="13"/>
                <c r="C7" s="13"/>
                <c r="D7" s="13"/>
                <c r="E7" s="13"/>
                <c r="F7" s="13"/>
                <c r="G7" s="13"/>
                <c r="H7" s="13"/>
                <c r="I7" s="13"/>
                <c r="J7" s="13"/>
                <c r="K7" s="13"/>
            </row>
            <row r="8" spans="1:13">
                <c r="A8" s="13"/>
                <c r="B8" s="13"/>
                <c r="C8" s="13"/>
                <c r="D8" s="13"/>
                <c r="E8" s="13"/>
                <c r="F8" s="13"/>
                <c r="G8" s="13"/>
                <c r="H8" s="13"/>
                <c r="I8" s="13"/>
                <c r="J8" s="13"/>
                <c r="K8" s="13"/>
            </row>
        </sheetData>
        <!-- 这个是记录哪个到哪个是合并单元格的信息的 -->
        <mergeCells count="1">
            <!-- A5到K8被合并了 -->
            <mergeCell ref="A5:K8"/>
        </mergeCells>
        <!-- -->
        <phoneticPr fontId="1" type="noConversion"/>
        <!-- 页的相关属性设置 -->
        <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/>
        <!-- 分页设置 -->
        <pageSetup paperSize="9" orientation="portrait" r:id="rId1"/>
        <!-- 图形信息,由图可以看出,这里引用了draw的文档了-->
        <drawing/>
    </worksheet>
    View Code

    1.2.5、sharedStrings.xml文件

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="18" uniqueCount="18">
        <!-- si存储的是字符串的值 -->
        <!-- 此处存储的是sheet.xml中c(列)中属性带有t="s"的列的值,
            带有t="s"字符串列的定位是通过c元素下的v元素中的值来定位的
            此处对应的就是v的值为0的单元格的实际值
        -->
        <si>
            <t>常规</t>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
        <!-- sheet.xml中c元素中属性t="s"的子元素<v>1</v>的单元格的实际值 -->
        <si>
            <t>数值</t>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
        <si>
            <t>货币</t>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
        <si>
            <t>会计专用</t>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
        <si>
            <t>短日期</t>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
        <si>
            <t>长日期</t>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
        <si>
            <t>时间</t>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
        <si>
            <t>百分比</t>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
        <si>
            <t>分数</t>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
        <si>
            <t>科学计数</t>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
        <si>
            <t>文本</t>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
        <si>
            <t>common</t>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
        <si>
            <t>String</t>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
        <si>
            <t>$</t>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
        <si>
            <t>D1</t>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
        <si>
            <t>常规A</t>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
        <si>
            <r>
                <rPr>
                    <sz val="11"/>
                    <color theme="1"/>
                    <rFont val="宋体"/>
                    <family val="3"/>
                    <charset val="134"/>
                </rPr><t>常规</t>
            </r><r>
                <rPr>
                    <sz val="11"/>
                    <color theme="1"/>
                    <rFont val="Tahoma"/>
                    <family val="2"/>
                    <charset val="134"/>
                </rPr><t>D</t>
            </r>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
        <si>
            <t>说明: 为了简单,行数就设置的少点  
       第一行为标题行(本身是常规行),分别标注了每列的属性。
       第三行和第二行一样的列格式
       第四行为常规行</t>
            <phoneticPr fontId="1" type="noConversion"/>
        </si>
    </sst>
    View Code

    1.2.6、style.xml文件

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
        <!-- 数字格式化显示的定义 -->
        <numFmts count="5">
            <!-- 此处的numFmtId和cellXfs中的numFmtId对应 -->
            <!-- 会计专用 -->
            <!-- 该元素表示cellXfs中xf元素的numFmtId="44"对应的格式化样式 -->
            <numFmt numFmtId="44" formatCode="_ &quot;¥&quot;* #,##0.00_ ;_ &quot;¥&quot;* -#,##0.00_ ;_ &quot;¥&quot;* &quot;-&quot;??_ ;_ @_ "/>
            <!-- 数值格式 -->
            <numFmt numFmtId="176" formatCode="0.00_);[Red](0.00)"/>
            <!-- 货币 -->
            <numFmt numFmtId="177" formatCode="&quot;¥&quot;#,##0.00_);[Red](&quot;¥&quot;#,##0.00)"/>
            <!-- 长日期 -->
            <numFmt numFmtId="178" formatCode="[$-F800]dddd\, mmmm dd\, yyyy"/>
            <!-- 时间 -->
            <numFmt numFmtId="179" formatCode="[$-F400]h:mm:ss AM/PM"/>
        </numFmts>
        <!-- 字体样式 -->
        <fonts count="3">
            <font>
                <sz val="11"/>
                <color theme="1"/>
                <name val="Tahoma"/>
                <family val="2"/>
                <charset val="134"/>
            </font>
            <font>
                <sz val="9"/>
                <name val="Tahoma"/>
                <family val="2"/>
                <charset val="134"/>
            </font>
            <font>
                <sz val="11"/>
                <color theme="1"/>
                <name val="宋体"/>
                <family val="3"/>
                <charset val="134"/>
            </font>
        </fonts>
        <!-- 填充样式,包括背景色 -->
        <fills count="2">
            <fill>
                <patternFill patternType="none"/>
            </fill>
            <fill>
                <patternFill patternType="gray125"/>
            </fill>
        </fills>
        <!-- 边框 -->
        <borders count="1">
            <border>
                <left/>
                <right/>
                <top/>
                <bottom/>
                <diagonal/>
            </border>
        </borders>
        <!-- 提供给cellXfs来调用 -->
        <cellStyleXfs count="1">
            <!-- 此处的numFmtId和cellXfs中的numFmtId对应 -->
            <xf numFmtId="0" fontId="0" fillId="0" borderId="0"/>
        </cellStyleXfs>
        <!-- 真正的样式定义 -->
        <cellXfs count="15">
            <!-- 此处的numFmtId和cellStyleXfs、numFmts等中的numFmtId对应 -->
            <!-- sheet.xml中的c(列)元素属性中s的属性值都对应这里一个xf元素,s的属性值存储的就是xf的索引-->
            <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/>
            <!-- 该元素表示sheet.xml中c元素中s="1"对应的样式 -->
            <xf numFmtId="176" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
            <!-- 该元素表示sheet.xml中c元素中s="2"对应的样式 -->
            <xf numFmtId="177" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
            <xf numFmtId="44" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
            <xf numFmtId="14" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
            <xf numFmtId="178" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
            <xf numFmtId="179" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
            <xf numFmtId="10" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
            <xf numFmtId="12" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
            <xf numFmtId="11" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
            <xf numFmtId="49" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
            <xf numFmtId="13" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
            <xf numFmtId="0" fontId="2" fillId="0" borderId="0" xfId="0" applyFont="1"/>
            <xf numFmtId="0" fontId="2" fillId="0" borderId="0" xfId="0" applyFont="1" applyAlignment="1">
                <alignment horizontal="left" vertical="top"/>
            </xf>
            <xf numFmtId="0" fontId="2" fillId="0" borderId="0" xfId="0" applyFont="1" applyAlignment="1">
                <alignment horizontal="left" vertical="top" wrapText="1"/>
            </xf>
        </cellXfs>
        <cellStyles count="1">
            <cellStyle name="常规" xfId="0" builtinId="0"/>
        </cellStyles>
        <dxfs count="0"/>
        <tableStyles count="0" defaultTableStyle="TableStyleMedium9" defaultPivotStyle="PivotStyleLight16"/>
    </styleSheet>
    View Code

    二、使用java读取EXCEL

     1. 依赖的jar包

    <dependency>
      <groupId>org.apache.poi</groupId>
      <artifactId>poi</artifactId>
      <version>3.15</version>
    </dependency>
    <dependency>
      <groupId>org.apache.poi</groupId>
      <artifactId>poi-ooxml</artifactId>
      <version>3.15</version>
    </dependency>
    <dependency>
      <groupId>org.apache.poi</groupId>
      <artifactId>poi-ooxml-schemas</artifactId>
      <version>3.15</version>
    </dependency>
    <dependency>
      <groupId>org.apache.xerces</groupId>
      <artifactId>com.springsource.org.apache.xerces</artifactId>
      <version>2.8.1</version>
    </dependency>
    <dependency>
      <groupId>commons-lang</groupId>
      <artifactId>commons-lang</artifactId>
      <version>2.6</version>
    </dependency>
    View Code

    2.  IExcelRowReader.java

    package com.wms.xml;
    
    import java.util.List;
    
    public interface IExcelRowReader {
        void getRows(int sheetIndex, int curRow, List<String> rowlist);
    }
    View Code

    3. ExcelReaderUtil.java

    package com.wms.xml;
    
    import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
    import org.xml.sax.SAXException;
    
    import java.io.IOException;
    
    public class ExcelReaderUtil {
        public static void read2007Excel(IExcelRowReader rowReader, String path) throws
                OpenXML4JException, SAXException, IOException {
            Excel2007Reader excel2007Reader = new Excel2007Reader();
            excel2007Reader.setRowReader(rowReader);
            excel2007Reader.load(path);
        }
    
    }
    View Code

    4. Excel2007Reader.java

    package com.wms.xml;
    
    import org.apache.commons.lang.StringUtils;
    import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
    import org.apache.poi.openxml4j.opc.OPCPackage;
    import org.apache.poi.ss.usermodel.BuiltinFormats;
    import org.apache.poi.ss.usermodel.DataFormatter;
    import org.apache.poi.xssf.eventusermodel.XSSFReader;
    import org.apache.poi.xssf.model.SharedStringsTable;
    import org.apache.poi.xssf.model.StylesTable;
    import org.apache.poi.xssf.usermodel.XSSFCellStyle;
    import org.apache.poi.xssf.usermodel.XSSFRichTextString;
    import org.xml.sax.Attributes;
    import org.xml.sax.InputSource;
    import org.xml.sax.SAXException;
    import org.xml.sax.XMLReader;
    import org.xml.sax.helpers.DefaultHandler;
    import org.xml.sax.helpers.XMLReaderFactory;
    
    import java.io.IOException;
    import java.io.InputStream;
    import java.util.ArrayList;
    import java.util.Iterator;
    import java.util.List;
    
    public class Excel2007Reader extends DefaultHandler {
    
        //列元素
        private static final String C_ELEMENT = "c";
        //列中属性r
        private static final String R_ATTR = "r";
        //列中的v元素
        private static final String V_ELEMENT = "v";
        //列中的t元素
        private static final String T_ELEMENT = "t";
        //列中属性值
        private static final String S_ATTR_VALUE = "s";
        //列中属性值
        private static final String T_ATTR_VALUE = "t";
    
        //sheet r:Id前缀
        private static final String RID_PREFIX = "rId";
    
        //行元素
        private static final String ROW_ELEMENT = "row";
    
        //时间格式化字符串
        private static final String DATE_FORMAT = "yyyy-MM-dd HH:mm:ss";
    
        //saxParser
        private static final String CLASS_SAXPARSER = "org.apache.xerces.parsers.SAXParser";
    
        //填充字符串
        private static final String CELL_FILL_STR = "@";
        //列的最大位数
        private static final int MAX_CELL_BIT = 3;
    
    
        //excel 2007 的共享字符串表,对应sharedString.xml
        private SharedStringsTable sharedStringsTable;
    
        private final DataFormatter dataFormatter = new DataFormatter();
    
        //当前行
        private int curRow;
    
        //当前列
        private int curCell;
    
        //上一次的内容
        private String lastContent;
    
        //是否是String类型的
        private boolean nextIsString;
    
        //单元数据类型
        private CellDataType cellDataType;
    
        //当前列坐标, 如A1,B5
        private String curCoordinate;
    
        //前一个列的坐标
        private String preCoordinate;
    
        //行的最大列坐标
        private String maxCellCoordinate;
    
        //单元格的格式表,对应style.xml
        private StylesTable stylesTable;
    
        //单元格存储格式的索引,对应style.xml中的numFmts元素的子元素索引
        private int numFmtIndex;
        //单元格存储的格式化字符串,nmtFmt的formateCode属性的值
        private String numFmtString;
    
        //sheet的索引
        private int sheetIndex = -1;
    
        //存储每行的列元素
        List<String> rowCellList = new ArrayList<String>();
    
        //单元格的数据格式
        enum CellDataType {
            BOOL("b"),
            ERROR("e"),
            FORMULA("str"),
            INLINESTR("inlineStr"),
            SSTINDEX("s"),
            NUMBER(""),
            DATE("m/d/yy"),
            NULL("");
    
            private String name;
    
            CellDataType(String name) {
                this.name = name;
            }
    
            public String getName() {
                return name;
            }
    
            public void setName(String name) {
                this.name = name;
            }
        }
    
        private IExcelRowReader rowReader;
    
        public void setRowReader(IExcelRowReader rowReader) {
            this.rowReader = rowReader;
        }
    
        /**
         * 读取excel中的制定索引sheet
         *
         * @param path
         * @param sheetIndex
         * @throws OpenXML4JException
         * @throws IOException
         * @throws SAXException
         */
        public void load(String path, int sheetIndex) throws OpenXML4JException, IOException,
                SAXException {
            OPCPackage opcPackage = OPCPackage.open(path);
            XSSFReader xssfReader = new XSSFReader(opcPackage);
            //获取styleTable
            stylesTable = xssfReader.getStylesTable();
            //获取共享字符串表
            SharedStringsTable sharedStringsTable = xssfReader.getSharedStringsTable();
            this.sharedStringsTable = sharedStringsTable;
            //获取解析器
            XMLReader xmlReader = fetchSheetReader(sharedStringsTable);
            // 根据 rId# 或 rSheet# 查找sheet
            InputStream sheetInputStream = xssfReader.getSheet(RID_PREFIX + sheetIndex);
            InputSource sheetInputSource = new InputSource(sheetInputStream);
            xmlReader.parse(sheetInputSource);
            //关闭流
            closeStream(sheetInputStream);
        }
    
        /**
         * 遍历所有的sheet
         *
         * @param path
         * @throws IOException
         * @throws OpenXML4JException
         * @throws SAXException
         */
        public void load(String path) throws IOException, OpenXML4JException, SAXException {
            OPCPackage opcPackage = OPCPackage.open(path);
            XSSFReader xssfReader = new XSSFReader(opcPackage);
            stylesTable = xssfReader.getStylesTable();
            SharedStringsTable sharedStringsTable = xssfReader.getSharedStringsTable();
            this.sharedStringsTable = sharedStringsTable;
            XMLReader xmlReader = fetchSheetReader(sharedStringsTable);
            Iterator<InputStream> sheetsInputStream = xssfReader.getSheetsData();
            while (sheetsInputStream.hasNext()) {
                curRow = 0;
                sheetIndex++;
                InputStream sheetInputStream = sheetsInputStream.next();
                InputSource sheetInputSource = new InputSource(sheetInputStream);
                xmlReader.parse(sheetInputSource);
                closeStream(sheetInputStream);
            }
    
        }
    
        /**
         * 获取sheet的解析器
         *
         * @param sharedStringsTable
         * @return
         * @throws SAXException
         */
        public XMLReader fetchSheetReader(SharedStringsTable sharedStringsTable) throws SAXException {
            XMLReader xmlReader = XMLReaderFactory.createXMLReader(CLASS_SAXPARSER);
            xmlReader.setContentHandler(this);
            return xmlReader;
        }
    
        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes)
                throws SAXException {
            //c 表示列
            if (C_ELEMENT.equals(qName)) {
    
                //获取当前列坐标
                String tempCurCoordinate = attributes.getValue(R_ATTR);
                //前一列为null,则将其设置为"@",A为第一列,ascii码为65,前一列即为@,ascii码64
                if (preCoordinate == null) {
                    preCoordinate = CELL_FILL_STR;
                } else {
                    //存在,则前一列要设置为上一列的坐标
                    preCoordinate = curCoordinate;
                }
                //重置当前列
                curCoordinate = tempCurCoordinate;
    
                //设置单元格类型
                setCellType(attributes);
                //t 属性值
                String type = attributes.getValue(T_ATTR_VALUE);
                //s 表示该列为字符串
                if (S_ATTR_VALUE.equals(type)) {
                    nextIsString = true;
                }
            }
    
            lastContent = "";
        }
    
        /**
         * 设置单元格的类型
         *
         * @param attribute
         */
        private void setCellType(Attributes attribute) {
            //默认是数字类型
            cellDataType = CellDataType.NUMBER;
            //重置numFmtIndex,numFmtString的值
            numFmtIndex = 0;
            numFmtString = "";
    
            //单元格的格式类型
            String cellType = attribute.getValue(T_ATTR_VALUE);
            //获取单元格的xf索引,对应style.xml中cellXfs的子元素xf
            String xfIndexStr = attribute.getValue(S_ATTR_VALUE);
    
            if (CellDataType.BOOL.getName().equals(cellType)) {
                cellDataType = CellDataType.BOOL;
            } else if (CellDataType.ERROR.getName().equals(cellType)) {
                cellDataType = CellDataType.ERROR;
            } else if (CellDataType.INLINESTR.getName().equals(cellType)) {
                cellDataType = CellDataType.INLINESTR;
            } else if (CellDataType.FORMULA.getName().equals(cellType)) {
                cellDataType = CellDataType.FORMULA;
            } else if (CellDataType.SSTINDEX.getName().equals(cellType)) {
                cellDataType = CellDataType.SSTINDEX;
            }
    
            if (xfIndexStr != null) {
                int xfIndex = Integer.parseInt(xfIndexStr);
                XSSFCellStyle xssfCellStyle = stylesTable.getStyleAt(xfIndex);
                numFmtIndex = xssfCellStyle.getDataFormat();
                numFmtString = xssfCellStyle.getDataFormatString();
    
                if (CellDataType.DATE.getName().equals(numFmtString)) {
                    cellDataType = CellDataType.DATE;
                    numFmtString = DATE_FORMAT;
                }
    
                if (numFmtString == null) {
                    cellDataType = CellDataType.NULL;
                    numFmtString = BuiltinFormats.getBuiltinFormat(numFmtIndex);
                }
            }
    
        }
    
        @Override
        public void endElement(String uri, String localName, String qName) throws SAXException {
            String value = StringUtils.trim(lastContent);
            if (T_ELEMENT.equals(qName)) {
                rowCellList.add(curCell++, value);
            } else if (C_ELEMENT.equals(qName)) {
                value = getDataValue(value);
                //补全单元格之间的空格
                fillBlackCell(curCoordinate, preCoordinate, false);
                rowCellList.add(curCell++, value);
            } else {
                //如果是row标签,说明已经到了一行的结尾
                if (ROW_ELEMENT.equals(qName)) {
                    //最大列坐标以第一行的为准
                    if (curRow == 0) {
                        maxCellCoordinate = curCoordinate;
                    }
    
                    //补全一行尾部可能缺失的单元格
                    if (maxCellCoordinate != null) {
                        fillBlackCell(maxCellCoordinate, curCoordinate, true);
                    }
    
                    rowReader.getRows(sheetIndex, curRow, rowCellList);
    
                    //一行结束
                    //清空rowCellList,
                    rowCellList.clear();
                    //行数增加
                    curRow++;
                    //当前列置0
                    curCell = 0;
                    //置空当前列坐标和前一列坐标
                    curCoordinate = null;
                    preCoordinate = null;
                }
            }
        }
    
        /**
         * 填充空白单元格
         *
         * @param curCoordinate
         * @param preCoordinate
         */
        private void fillBlackCell(String curCoordinate, String preCoordinate, boolean isEnd) {
            if (!curCoordinate.equals(preCoordinate)) {
                int len = calNullCellCount(curCoordinate, preCoordinate, isEnd);
                for (int i = 0; i < len; i++) {
                    rowCellList.add(curCell++, "");
                }
            }
        }
    
        /**
         * 计算当前单元格和前一个单元格之间的空白单元格数量
         * 如果是尾部则不减1
         *
         * @param curCoordinate
         * @param preCoordinate
         * @return
         */
        private int calNullCellCount(String curCoordinate, String preCoordinate, boolean isEnd) {
            // excel2007最大行数是1048576,最大列数是16384,最后一列列名是XFD
            String curCellCoordinate = curCoordinate.replaceAll("\d+", "");
            String preCellCoordinate = preCoordinate.replaceAll("\d+", "");
    
            curCellCoordinate = fillChar(curCellCoordinate, MAX_CELL_BIT, CELL_FILL_STR);
            preCellCoordinate = fillChar(preCellCoordinate, MAX_CELL_BIT, CELL_FILL_STR);
    
            char[] cur = curCellCoordinate.toCharArray();
            char[] pre = preCellCoordinate.toCharArray();
    
            int len = (cur[0] - pre[0]) * 26 * 26 + (cur[1] - pre[1]) * 26 + (cur[2] - pre[2]);
            if (!isEnd) {
                len = len - 1;
            }
            return len;
        }
    
        /**
         * 将不足指定位数的字符串补全,高位补上指定字符串
         *
         * @param cellCoordinate
         * @param maxLen
         * @param fillChar
         * @return
         */
        private String fillChar(String cellCoordinate, int maxLen, String fillChar) {
            int coordinateLen = cellCoordinate.length();
            if (coordinateLen < maxLen) {
                for (int i = 0; i < (maxLen - coordinateLen); i++) {
                    cellCoordinate = fillChar + cellCoordinate;
                }
            }
            return cellCoordinate;
        }
    
        private String getDataValue(String lastContent) {
            String value = "";
            XSSFRichTextString xssfRichTextString = null;
            switch (cellDataType) {
                case BOOL:
                    char first = lastContent.charAt(0);
                    value = first == '0' ? "FALSE" : "TRUE";
                    break;
                case ERROR:
                    value = ""ERROR:" + lastContent + '"';
                    break;
                case FORMULA:
                    value = '"' + lastContent + '"';
                    break;
                case INLINESTR:
                    xssfRichTextString = new XSSFRichTextString(lastContent);
                    value = xssfRichTextString.getString();
                    xssfRichTextString = null;
                    break;
                case SSTINDEX:
                    try {
                        int index = Integer.parseInt(lastContent);
                        xssfRichTextString = new XSSFRichTextString(sharedStringsTable.getEntryAt
                                (index));
                        value = xssfRichTextString.getString();
                        xssfRichTextString = null;
                    } catch (NumberFormatException e) {
                        value = lastContent;
                    }
                    break;
                case NUMBER:
                    if (numFmtString != null) {
                        value = dataFormatter.formatRawCellContents(Double.parseDouble(lastContent),
                                numFmtIndex, numFmtString);
                    } else {
                        value = lastContent;
                    }
                    value = value.replace("_", "");
                    break;
                case DATE:
                    value = dataFormatter.formatRawCellContents(Double.parseDouble(lastContent),
                            numFmtIndex, numFmtString);
                    break;
                default:
                    value = "";
                    break;
            }
    
            return value;
        }
    
        @Override
        public void characters(char[] ch, int start, int length) throws SAXException {
            // 得到单元格内容的值
            lastContent += new String(ch, start, length);
        }
    
        /**
         * 关闭流
         *
         * @param inputStream
         */
        public void closeStream(InputStream inputStream) {
            if (inputStream != null) {
                try {
                    inputStream.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    
        public static void main(String[] args) {
    //        System.out.println("m/d/yy" == CellDataType.DATE.getName());
    //        System.out.println('"' + "ss" + '"');
    //        char[] letter = "@@A".toCharArray();
    //        char[] letter_1 = "@@B".toCharArray();
    //        int res = (letter[0] - letter_1[0]) * 26 * 26 + (letter[1] - letter_1[1]) * 26 +
    //                (letter[2] - letter_1[2]);
    //        System.out.println(res);
    //        System.out.println(Integer.valueOf('@'));
            IExcelRowReader rowReader = new ExcelRowReader();
            try {
                // ExcelReaderUtil.readExcel(rowReader,
                // "E://2016-07-04-011940a.xls");
                System.out.println("**********************************************");
                ExcelReaderUtil.read2007Excel(rowReader, "F:\studygit\test.xlsx");
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    
    
    }
    View Code
  • 相关阅读:
    Coursera, Big Data 4, Machine Learning With Big Data (week 3/4/5)
    Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
    Coursera, Big Data 3, Integration and Processing (week 5)
    Linux Socket I/O
    Coursera, Big Data 3, Integration and Processing (week 4)
    概率统计知识复习
    Coursera, Big Data 3, Integration and Processing (week 1/2/3)
    Coursera, Big Data 2, Modeling and Management Systems (week 4/5/6)
    [面试] mysql 面试题
    APP完整的启动流程
  • 原文地址:https://www.cnblogs.com/wangmingshun/p/6654143.html
Copyright © 2011-2022 走看看