zoukankan      html  css  js  c++  java
  • POI-处理大Excel文件(xlsx)

    上次简述了使用poi读取大xls文件,这里说下读取xlsx格式的文件的方法

    环境模拟

    先准备一个大的excel文件(xlsx大小5M),再将jvm的heap缩小到100m(JVM 参数 -Xmx100m)用于模拟OOM
    并使用参数在OOM时dump内存 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=d://dump.hprof

    使用XSSF读取

    在gradle中引入解析xlsx需要的jar包

    	compile 'org.apache.poi:poi:3.15'
        compile 'org.apache.poi:poi-ooxml:3.15'
        compile 'xerces:xercesImpl:2.11.0'
    

    之后读取xlsx文件

        public static void main(String [] args) throws IOException {
            InputStream is = new FileInputStream("d://large.xlsx");
            Workbook wb = new XSSFWorkbook(is);
        }
    

    运行之后

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    	at java.util.Arrays.copyOf(Arrays.java:3236)
    	at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)
    	at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource$FakeZipEntry.<init>(ZipInputStreamZipEntrySource.java:136)
    	at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:56)
    	at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:99)
    	at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:342)
    	at org.apache.poi.util.PackageHelper.open(PackageHelper.java:37)
    	at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:285)
    	at blog.excel.Xlsx.main(Xlsx.java:17)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:483)
    	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
    

    同样报出了OOM,原因也是处理xlsx时会将数据完全读入内存,导致内存溢出。

    使用EventApi流式读取

    POI也为xlsx提供了流式读取的方式,用于减小内存的使用

    public class ExampleEventUserModel{
        public void processOneSheet(String filename) throws Exception {
            OPCPackage pkg = OPCPackage.open(filename);
            XSSFReader r = new XSSFReader( pkg );
            SharedStringsTable sst = r.getSharedStringsTable();
    
            XMLReader parser = fetchSheetParser(sst);
    
            // 获得第一个sheet
            InputStream sheet2 = r.getSheet("rId1");
            InputSource sheetSource = new InputSource(sheet2);
            parser.parse(sheetSource);
            sheet2.close();
        }
    
        public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException {
            XMLReader parser =
                    XMLReaderFactory.createXMLReader(
                            "org.apache.xerces.parsers.SAXParser"
                    );
            ContentHandler handler = new SheetHandler(sst);
            parser.setContentHandler(handler);
            return parser;
        }
    
        /**
         * 处理sax的handler
         */
        private static class SheetHandler extends DefaultHandler {
            private SharedStringsTable sst;
            private String lastContents;
            private boolean nextIsString;
    
            private SheetHandler(SharedStringsTable sst) {
                this.sst = sst;
            }
    
            //元素开始时的handler
            public void startElement(String uri, String localName, String name,
                                     Attributes attributes) throws SAXException {
                // c => 单元格
                if(name.equals("c")) {
                    System.out.print(attributes.getValue("r") + " - ");
                    // 获取单元格类型
                    String cellType = attributes.getValue("t");
                    if(cellType != null && cellType.equals("s")) {
                        nextIsString = true;
                    } else {
                        nextIsString = false;
                    }
                }
                lastContents = "";
            }
    
            //元素结束时的handler
            public void endElement(String uri, String localName, String name)
                    throws SAXException {
                if(nextIsString) {
                    int idx = Integer.parseInt(lastContents);
                    lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
                    nextIsString = false;
                }
    
                // v => 单元格内容
                if(name.equals("v")) {
                    System.out.println(lastContents);
                }
            }
    
            //读取元素间内容时的handler
            public void characters(char[] ch, int start, int length)
                    throws SAXException {
                lastContents += new String(ch, start, length);
            }
        }
    
        public static void main(String[] args) throws Exception {
            ExampleEventUserModel example = new ExampleEventUserModel();
            example.processOneSheet("d://large.xlsx");
        }
    
    }
    

    不足

    同样的使用这种方法可以流式读取打的xlsx文件,但是只限于读取内部的数据,而且无法进行修改操作。之后会介绍写大文件的方法

  • 相关阅读:
    sql 相关子查询
    sql 执行计划
    SQL表连接查询(inner join、full join、left join、right join)
    sql执行顺序
    sql 语句 嵌套子查询 执行顺序分析
    只有程序员才看得懂的情书
    Give Me an E
    hdu 1114 (背包变形)
    模版 并查集
    背包 讲解
  • 原文地址:https://www.cnblogs.com/resentment/p/6367404.html
Copyright © 2011-2022 走看看