zoukankan      html  css  js  c++  java
  • 使用JPedal取代PDFBox

    http://wanggp.iteye.com/blog/1144177

    ————————————————————————————————————————————————

    之前都是使用PDFBOX0.8版本来实现PDF转为Image,0.8版本的PDFBox转为Image还有N多问题,比如部分扫描PDF无法转换、缺少字体等等问题。而且我们是修改PDFBox源代码来解决上述问题,但是还是不能解决全部问题。

         JPedal是一个商业的处理PDF软件,但是JPedal有一个裁切版,裁切版JPedal使用LGPL协议进行开源,可免费使用。如下摘抄官方说明:

    JPedal is a commercial PDF library, so it is not free (and it cannot realistically be because no income means no money to fund development and support). OEM customers also get access to the source code so they have free access to the product in the sense they are not limited – they can alter it if they want. Commercial users get free support in the sense we charge everyone a yearly fee to cover general support costs.
    
    We also have a cutdown version of the PDF viewer which we release under an LGPL license. This means that you can access the source code and the jar and use them without any payment. You just have to abide by the LGPL license. In this sense it is totally free.
    
    We build it from the full version and remove items (so it gets most bug fixes and some features). So it is free in that sense. Our hope is that it will encourage lots of people to use it, to do interesting things with it and some may become commercial clients. And we like to have a free entry-level version – it appeals to the rebel in our nature  
    
    And being a cut-down version of a commercial product means you are likely to see updates – there are several ‘dead’ free Java PDF libraries because they do not generate any revenues to put back into development and support.

       选择使用JPedal替换Pdfbox出于如下方面考虑:

        第一:解决扫描类PDF、缺少字体问题,不用修改源代码,解决软件后续维护升级问题。

        第二:转换效率高。一个70页PDF,使用PDFBox转换时间为27秒左右,而且使用JPedal的转换时间才16秒,大大地缩短转换时间。

        第三:由于只需要把PDFBox转换为Image,暂无其他需求,故裁切版JPedal已可满足需求。

       下面使用JPedal 转换为图片的代码

                                                                   /** instance of PdfDecoder to convert PDF into image */
                                          PdfDecoder decode_pdf = new PdfDecoder(true);
    
                                             /** set mappings for non-embedded fonts to use */
                                            PdfDecoder.setFontReplacements(decode_pdf);
    
    
                    /** open the PDF file - can also be a URL or a byte array */
    
                    decode_pdf.openPdfFileFromInputStream(in, false);
                    // decode_pdf.openPdfFile("C:/myPDF.pdf", "password"); //encrypted
                    // file
                    // decode_pdf.openPdfArray(bytes); //bytes is byte[] array with PDF
                    // decode_pdf.openPdfFileFromURL("http://www.mysite.com/myPDF.pdf",false);
    
                    /** get page 1 as an image */
                    // page range if you want to extract all pages with a loop
                    // int start = 1, end = decode_pdf.getPageCount();
                    int pageCount = decode_pdf.getPageCount();
                    
    
                    if (curPage > pageCount || curPage <= 0)
                        curPage = pageCount;
    
                    BufferedImage img = null;
    
                    img = decode_pdf.getPageAsImage(curPage );
                    
                    pageCnt=String.valueOf(pageCount);
                    
                    FileOutputStream out;
                    out = new FileOutputStream(file);
    
                    JPEGImageEncoder encoder = JPEGCodec.createJPEGEncoder(out);
                    encoder.encode(img);
                    out.close();
    评论
    3 楼 tyl 2013-04-07  
    最新版的jpedal jar包能否提供一个,我下载的好像不是最新的,方法没法用,谢谢
    2 楼 jinxiongyi 2012-05-10  
    转成图片的质量是怎么调的?我转的,,感觉不够清晰
    1 楼 melin 2011-12-12  
    我现在最新版本的jpedal,转换为png的时候,中文不能显示。你是怎么处理的?

    发现最新版本的API有些变化,PdfDecoder.setFontReplacements(decode_pdf);  变为:FontMappings.setFontReplacements(); 但是没有带参数。你的decode_pdf值是“UTF-8”?
  • 相关阅读:
    python 将函数参数一键转化成字典的技巧,非**kwargs,公有方法和函数抵制kwargs。
    js复制文本内容到剪贴板
    python 记录linux网速到文件。
    使用redis原生list结构作为消息队列取代celery框架。
    supervisor来自动化部署,集成git
    linux windows安装python的最佳方式,miniconda
    celery
    python设计模式之猴子补丁模式
    一个自定义python分布式专用爬虫框架。支持断点爬取和确保消息100%不丢失,哪怕是在爬取进行中随意关停和随意对电脑断电。
    python sort和sorted区别。
  • 原文地址:https://www.cnblogs.com/cuizhf/p/4273631.html
Copyright © 2011-2022 走看看