zoukankan      html  css  js  c++  java
  • java 库 pdfbox 将 pdf 文件转换成高清图片方法

    近期需要将 pdf 文件转成高清图片,使用库是 pdfbox、fontbox。可以使用 renderImageWithDPI 方法指定转换的清晰度,当然清晰度越高,转换需要的时间越长,转换出来的图片越大,越清晰。

    说明:由于 adobo 软件越来越强大,支持的格式越来越多,这造成了 java 软件有些不能转换。所以对于新的格式可能会有转换问题。

    1 引入依赖

    <dependency>
                <groupId>org.apache.pdfbox</groupId>
                <artifactId>pdfbox</artifactId>
                <version>2.0.16</version>
            </dependency>
            <!-- https://mvnrepository.com/artifact/org.apache.pdfbox/fontbox -->
            <dependency>
                <groupId>org.apache.pdfbox</groupId>
                <artifactId>fontbox</artifactId>
                <version>2.0.16</version>
            </dependency>

    2 代码如下

    public static void convertPdf2Image(String pdfPath, String imageDirPath) {
            log.info("start convert pdf file:[{}] to image path:[{}]", pdfPath, imageDirPath);
            if (!new File(pdfPath).exists()) {
                log.info("pdfFilename:[{}] not exist", pdfPath);
                return;
            }
            if (!new File(imageDirPath).exists()) {
                log.info("imageDir:[{}] not exist", imageDirPath);
                return;
            }
            byte[] pdfContent = FileUtil.getFileContentByte(pdfPath);
            String filename = FileUtil.getFilename(pdfPath);
            float dpi = 200;
            convertPdf2Image(pdfContent, filename, imageDirPath, dpi);
            log.info("convert pdf file:[{}] to image success", filename);
        }
    
    private static void convertPdf2Image(byte[] pdfContent, String pdfFilename, String imageDirPath, float dpi) {
            log.info("convert pdfFilename:[{}] to imageDir:[{}] with dpi:[{}]", pdfFilename, imageDirPath, dpi);
            if (ArrayUtils.isEmpty(pdfContent)) {
                return;
            }
            // 为了保证显示清除,至少 90
            if (dpi < 90) {
                dpi = 90;
            }
            String baseSir = imageDirPath;
            if (baseSir.endsWith("/") || baseSir.endsWith("\")) {
                baseSir += pdfFilename + "_";
            } else {
                baseSir += File.separator + pdfFilename + "_";
            }
            PDDocument document = null;
            BufferedOutputStream outputStream = null;
            try {
                document = PDDocument.load(pdfContent);
                int pageCount = document.getNumberOfPages();
                PDFRenderer pdfRenderer = new PDFRenderer(document);
                String imgPath;
                for (int i = 0; i < pageCount; i++) {
                    imgPath = baseSir + i + ".png";
                    outputStream = new BufferedOutputStream(new FileOutputStream(imgPath));
                    BufferedImage image = pdfRenderer.renderImageWithDPI(i, dpi, ImageType.RGB);
                    ImageIO.write(image, "png", outputStream);
                    outputStream.close();
                    log.info("convert to png, total[{}], now[{}], ori:[{}], des[{}]", pageCount, i + 1, pdfFilename, imgPath);
                }
            } catch (IOException e) {
                log.error("convert pdf to image error, pdfFilename:" + pdfFilename, e);
            } finally {
                IOUtil.closeSilently(outputStream);
                IOUtil.closeSilently(document);
            }
        }
    
    // IOUtil.closeSilently 代码
    public static void closeSilently(Closeable io) {
            if (io != null) {
                try {
                    io.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }

    在实际使用中遇到问题

    1)ERROR o.a.p.contentstream.PDFStreamEngine 911 - Cannot read JBIG2 image: jbig2-imageio is not installed

    2)Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed

    3) java.lang.IllegalArgumentException: Numbers of source Raster bands and source color space components do not match at java.awt.image.ColorConvertOp.filter

    以上两个问题需要使用 JAI 插件和 jbig2 插件支持,通过引入 jai-imageio-core、jai-imageio-jpeg2000、jbig2-imageio

    <dependency>
    <groupId>com.twelvemonkeys.imageio</groupId>
    <artifactId>imageio-jpeg</artifactId>
    <version>3.4.2</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.github.jai-imageio/jai-imageio-core -->
    <dependency>
    <groupId>com.github.jai-imageio</groupId>
    <artifactId>jai-imageio-core</artifactId>
    <version>1.4.0</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.github.jai-imageio/jai-imageio-jpeg2000 -->
    <dependency>
    <groupId>com.github.jai-imageio</groupId>
    <artifactId>jai-imageio-jpeg2000</artifactId>
    <version>1.3.0</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.pdfbox/jbig2-imageio -->
    <dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>jbig2-imageio</artifactId>
    <version>3.0.2</version>
    </dependency>

    参考问题文件

    https://github.com/crazyCodeLove/studentservice/blob/master/sys/src/main/resources/pdffile/000208-p1.pdf

    https://github.com/crazyCodeLove/studentservice/blob/master/sys/src/main/resources/pdffile/001659-p14.pdf

    https://github.com/crazyCodeLove/studentservice/blob/master/sys/src/main/resources/pdffile/main%20doc.pdf

    https://github.com/crazyCodeLove/studentservice/blob/master/sys/src/main/resources/pdffile/573636.pdf

    参考文献

    https://stackoverflow.com/questions/42169154/pdfbox1-8-12-convert-pdf-to-white-page-image

    https://stackoverflow.com/questions/20424796/pdf-box-generating-blank-images-due-to-jbig2-images-in-it

    https://blog.csdn.net/qq_15801963/article/details/80746830

    https://my.oschina.net/u/2345654/blog/1058192

    https://stackoverflow.com/questions/18351583/illegalargumentexception-numbers-of-source-raster-bands-and-source-color-space

    https://stackoverflow.com/questions/10416378/imageio-read-illegal-argument-exception-raster-bands-colour-space-components

  • 相关阅读:
    windows的80端口被占用时的处理方法
    Ansible自动化运维工具安装与使用实例
    Tomcat的测试网页换成自己项目首页
    LeetCode 219. Contains Duplicate II
    LeetCode Contest 177
    LeetCode 217. Contains Duplicate
    LeetCode 216. Combination Sum III(DFS)
    LeetCode 215. Kth Largest Element in an Array(排序)
    Contest 176 LeetCode 1354. Construct Target Array With Multiple Sums(优先队列,递推)
    Contest 176
  • 原文地址:https://www.cnblogs.com/zhaopengcheng/p/11377458.html
Copyright © 2011-2022 走看看