zoukankan      html  css  js  c++  java
  • POI执行解析word转化HTML

    目前来说解析word文档显示在html上有三种办法

    分别是:POI(比较麻烦)

        插件(要付费,或者每天只允许调用500次,不适合大企业)

       把word转化成为PDF然后通过flash体现在页面上(不怎么样,麻烦+可操作性不强)

         使用H5执行,不太熟悉H5

    既然选择了POI那么就开始做了。

    第一步先maven导入jar包.

    <dependency> 
         <groupId>org.apache.poi</groupId> 
         <artifactId>poi</artifactId> 
         <version>3.14</version> 
        </dependency> 
        <dependency> 
         <groupId>org.apache.poi</groupId> 
         <artifactId>poi-scratchpad</artifactId> 
         <version>3.14</version> 
        </dependency> 
        <dependency> 
         <groupId>org.apache.poi</groupId> 
         <artifactId>poi-ooxml</artifactId> 
         <version>3.14</version> 
        </dependency> 
        <dependency> 
         <groupId>fr.opensagres.xdocreport</groupId> 
         <artifactId>xdocreport</artifactId> 
         <version>1.0.6</version> 
        </dependency> 
        <dependency> 
         <groupId>org.apache.poi</groupId> 
         <artifactId>poi-ooxml-schemas</artifactId> 
         <version>3.14</version> 
        </dependency> 
        <dependency> 
         <groupId>org.apache.poi</groupId> 
         <artifactId>ooxml-schemas</artifactId> 
         <version>1.3</version> 
        </dependency> 

    POI在解析的时候会有版本问题导致无法调用某些对象。所以word2003跟word2007需要使用不同的方法进行转化

    先解析2007

     @Test
        public void word2007ToHtml() throws Exception {
            String filepath = "e:/files/";
            String sourceFileName =filepath+"前言.docx"; 
            String targetFileName = filepath+"1496717486420.html"; 
            String imagePathStr = filepath+"/image/";  
            OutputStreamWriter outputStreamWriter = null; 
            try { 
              XWPFDocument document = new XWPFDocument(new FileInputStream(sourceFileName)); 
              XHTMLOptions options = XHTMLOptions.create(); 
              // 存放图片的文件夹 
              options.setExtractor(new FileImageExtractor(new File(imagePathStr))); 
              // html中图片的路径 
              options.URIResolver(new BasicURIResolver("image")); 
              outputStreamWriter = new OutputStreamWriter(new FileOutputStream(targetFileName), "utf-8"); 
              XHTMLConverter xhtmlConverter = (XHTMLConverter) XHTMLConverter.getInstance(); 
              xhtmlConverter.convert(document, outputStreamWriter, options); 
            } finally { 
              if (outputStreamWriter != null) { 
                outputStreamWriter.close(); 
              } 
            }
          } 

    然后没试过的2003

        @Test
        public void test(){
            DocxToHtml("E://files//1496635038432.doc","E://files//1496635038432.html");
        }
        public static void DocxToHtml(String fileAllName,String outPutFile){
            HWPFDocument wordDocument;
            try {
                //根据输入文件路径与名称读取文件流
                InputStream in=new FileInputStream(fileAllName);
                //把文件流转化为输入wordDom对象
                wordDocument = new HWPFDocument(in);
                //通过反射构建dom创建者工厂
                DocumentBuilderFactory domBuilderFactory=DocumentBuilderFactory.newInstance();
                //生成dom创建者
                DocumentBuilder domBuilder=domBuilderFactory.newDocumentBuilder();
                //生成dom对象
                Document dom=domBuilder.newDocument();
                //生成针对Dom对象的转化器
                WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(dom);    
                //转化器重写内部方法
                 wordToHtmlConverter.setPicturesManager( new PicturesManager()    
                 {    
                     public String savePicture( byte[] content,    
                             PictureType pictureType, String suggestedName,    
                             float widthInches, float heightInches )    
                     {    
                         return suggestedName;    
                     }    
                 } ); 
                //转化器开始转化接收到的dom对象
                wordToHtmlConverter.processDocument(wordDocument); 
                //保存文档中的图片
            /*    List<?> pics=wordDocument.getPicturesTable().getAllPictures();    
                if(pics!=null){    
                    for(int i=0;i<pics.size();i++){    
                        Picture pic = (Picture)pics.get(i);   
                        try {    
                            pic.writeImageContent(new FileOutputStream("E:/test/"+ pic.suggestFullFileName()));    
                        } catch (FileNotFoundException e) {    
                            e.printStackTrace();    
                        }      
                    }    
                } */
                //从加载了输入文件中的转换器中提取DOM节点
                Document htmlDocument = wordToHtmlConverter.getDocument();  
                //从提取的DOM节点中获得内容
                DOMSource domSource = new DOMSource(htmlDocument);
                
                //字节码输出流
                ByteArrayOutputStream out = new ByteArrayOutputStream(); 
                //输出流的源头
                StreamResult streamResult = new StreamResult(out);    
                //转化工厂生成序列转化器
                TransformerFactory tf = TransformerFactory.newInstance();    
                Transformer serializer = tf.newTransformer();
                //设置序列化内容格式
                serializer.setOutputProperty(OutputKeys.ENCODING, "GB2312");    
                serializer.setOutputProperty(OutputKeys.INDENT, "yes");    
                serializer.setOutputProperty(OutputKeys.METHOD, "html");
                
                serializer.transform(domSource, streamResult);    
                //生成文件方法
                writeFile(new String(out.toByteArray()), outPutFile);
                out.close(); 
            } catch (FileNotFoundException e1) {
                // TODO Auto-generated catch block
                e1.printStackTrace();
            } catch (IOException e1) {
                // TODO Auto-generated catch block
                e1.printStackTrace();
            } catch (TransformerConfigurationException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                    } catch (TransformerException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            } catch (ParserConfigurationException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
            }
        }
        
        
         public static void writeFile(String content, String path) {    
                FileOutputStream fos = null;    
                BufferedWriter bw = null;    
                try {    
                    File file = new File(path);    
                    fos = new FileOutputStream(file);    
                    bw = new BufferedWriter(new OutputStreamWriter(fos,"GB2312"));    
                    bw.write(content);    
                } catch (FileNotFoundException fnfe) {    
                    fnfe.printStackTrace();    
                } catch (IOException ioe) {    
                    ioe.printStackTrace();    
                } finally {    
                    try {    
                        if (bw != null)    
                            bw.close();    
                        if (fos != null)    
                            fos.close();    
                    } catch (IOException ie) {    
                    }    
                }    
            }    

    这两个方法可以将word转化成HTML,注意如果是在IE8的情况下会无法显示表格边框。

    我会进一步优化这个方法

  • 相关阅读:
    Android自己定义组件系列【1】——自己定义View及ViewGroup
    LeetCode60:Permutation Sequence
    GitHub 优秀的 Android 开源项目
    view变化监听器ViewTreeObserver介绍
    android中ImageView的ScaleType属性
    Android静态图片人脸识别的完整demo(附完整源码)
    理解Android的手势识别
    Android浏览图片,点击放大至全屏效果
    Android中如何实现文件下载
    Android语音识别
  • 原文地址:https://www.cnblogs.com/blackdeng/p/6951282.html
Copyright © 2011-2022 走看看