zoukankan      html  css  js  c++  java
  • Itext创建书签

            由于我经常下载一些pdf格式的电子书,有的时候一些好书下载下来没有书签,读起来感觉没有整体的感觉,所以决定自己写一个小工具,将特定格式的文本解析成为书签,然后保存到pdf格式中。
            整体思路是从豆瓣啊、京东啊、当当啊、亚马逊下面的介绍中可以copy出目录信息,拿《HTTP权威指南》为例:
    目录的结构如:
    1章 HTTP 概述 3
    1.1 HTTP——因特网的多媒体信使 4
    1.2 Web 客户端和服务器 4
    1.3 资源 5
    1.3.1 媒体类型 6
    1.3.2 URI 7
    1.3.3 URL 7
    1.3.4 URN 8
    1.4 事务 9
    1.4.1 方法 9
    1.4.2 状态码 10
    1.4.3 Web 页面中可以包含多个对象 10
    1.5 报文 11
    1.6 连接 13
    1.6.1 TCP/IP 13
    1.6.2 连接、IP 地址及端口号 14
    1.6.3 使用Telnet 实例 16
    1.7 协议版本 18
    1.8 Web 的结构组件 19
    1.8.1 代理 19
    1.8.2 缓存 20
    1.8.3 网关 20
    1.8.4 隧道 21
    1.8.5 Agent 代理 21
    1.9 起始部分的结束语 22
    1.10 更多信息 22
    1.10.1 HTTP 协议信息 22
    1.10.2 历史透视 23
    1.10.3 其他万维网信息 23
    2URL 与资源 25
    2.1 浏览因特网资源 26
    每一行后面都有页码,而且是用空格分开的。
    处理之后,结果为:
     
     
    主要的逻辑为:
    package org.fra.pdf.bussiness;

    import java.io.BufferedReader;
    import java.io.FileOutputStream;
    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.HashMap;
    import java.util.List;
    import java.util.Stack;

    import com.itextpdf.text.DocumentException;
    import com.itextpdf.text.pdf.IntHashtable;
    import com.itextpdf.text.pdf.PdfArray;
    import com.itextpdf.text.pdf.PdfDictionary;
    import com.itextpdf.text.pdf.PdfIndirectReference;
    import com.itextpdf.text.pdf.PdfName;
    import com.itextpdf.text.pdf.PdfNumber;
    import com.itextpdf.text.pdf.PdfObject;
    import com.itextpdf.text.pdf.PdfReader;
    import com.itextpdf.text.pdf.PdfStamper;
    import com.itextpdf.text.pdf.PdfString;
    import com.itextpdf.text.pdf.SimpleBookmark;

    public class AddPdfOutLineFromTxt {
        private Stack<OutlineInfo> parentOutlineStack = new Stack<OutlineInfo>();

        public void createPdf(String destPdf, String sourcePdf,
                BufferedReader bufRead, int pattern) throws IOException,
                DocumentException {

            if (pattern != AddBookmarkConstants.RESERVED_OLD_OUTLINE
                    &&  pattern != AddBookmarkConstants.RESERVED_NONE
                    && pattern != AddBookmarkConstants.RESERVED_FIRST_OUTLINE)
                return;
            // 读入pdf文件
            PdfReader reader = new PdfReader(sourcePdf);

            List<HashMap<String, Object>> outlines = new ArrayList<HashMap<String, Object>>();
            if (pattern == AddBookmarkConstants.RESERVED_OLD_OUTLINE) {
                outlines.addAll(SimpleBookmark.getBookmark(reader));
            } else if (pattern == AddBookmarkConstants.RESERVED_FIRST_OUTLINE) {
                addFirstOutlineReservedPdf(outlines, reader);
            }

            addBookmarks(bufRead, outlines, null, 0);
            // 新建stamper
            PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(
                    destPdf));

            stamper.setOutlines(outlines);
            stamper.close();
        }

        private void addBookmarks(BufferedReader bufRead,
                List<HashMap<String, Object>> outlines,
                HashMap<String, Object> preOutline, int preLevel)
                throws IOException {
            String contentFormatLine = null;
            bufRead.mark(1);
            if ((contentFormatLine = bufRead.readLine()) != null) {
                FormattedBookmark bookmark = parseFormmattedText(contentFormatLine);

                HashMap<String, Object> map = parseBookmarkToHashMap(bookmark);

                int level = bookmark.getLevel();
                // 如果n==m, 那么是同一层的,这个时候,就加到ArrayList中,继续往下面读取
                if (level == preLevel) {
                    outlines.add(map);
                    addBookmarks(bufRead, outlines, map, level);
                }
                // 如果n>m,那么可以肯定,该行是上一行的孩子,, new 一个kids的arraylist,并且加入到这个arraylist中
                else if (level > preLevel) {
                    List<HashMap<String, Object>> kids = new ArrayList<HashMap<String, Object>>();
                    kids.add(map);
                    preOutline.put("Kids", kids);
                    // 记录有孩子的outline信息
                    parentOutlineStack.push(new OutlineInfo(preOutline, outlines,
                            preLevel));
                    addBookmarks(bufRead, kids, map, level);
                }
                // 如果n<m , 那么就是说孩子增加完了,退回到上层,bufRead倒退一行
                else if (level < preLevel) {
                    bufRead.reset();
                    OutlineInfo obj = parentOutlineStack.pop();
                    addBookmarks(bufRead, obj.getOutlines(), obj.getPreOutline(),
                            obj.getPreLevel());
                }

            }
        }

        private HashMap<String, Object> parseBookmarkToHashMap(
                FormattedBookmark bookmark) {
            HashMap<String, Object> map = new HashMap<String, Object>();
            map.put("Title", bookmark.getTitle());
            map.put("Action", "GoTo");
            map.put("Page", bookmark.getPage() + " Fit");
            return map;
        }

        private FormattedBookmark parseFormmattedText(String contentFormatLine) {
            FormattedBookmark bookmark = new FormattedBookmark();
            String title = "";
            String destPage = "";

            // 当没有页码在字符串结尾的时候,一般就是书的名字,如果格式正确的话。
            int lastSpaceIndex = contentFormatLine.lastIndexOf(" ");
            if (lastSpaceIndex == -1) {
                title = contentFormatLine;
                destPage = "1";
            } else {
                title = contentFormatLine.substring(0, lastSpaceIndex);
                destPage = contentFormatLine.substring(lastSpaceIndex + 1);
            }

            String[] titleSplit = title.split(" ");
            int dotCount = titleSplit[0].split("\\.").length - 1;

            bookmark.setLevel(dotCount);
            bookmark.setPage(destPage);
            bookmark.setTitle(title);
            return bookmark;
        }

        private void addFirstOutlineReservedPdf(
                List<HashMap<String, Object>> outlines, PdfReader reader) {
            PdfDictionary catalog = reader.getCatalog();
            PdfObject obj = PdfReader.getPdfObjectRelease(catalog
                    .get(PdfName.OUTLINES));
            // 没有书签
            if (obj == null || !obj.isDictionary())
                return;
            PdfDictionary outlinesDictionary = (PdfDictionary) obj;
            // 得到第一个书签
            PdfDictionary firstOutline = (PdfDictionary) PdfReader
                    .getPdfObjectRelease(outlinesDictionary.get(PdfName.FIRST));

            PdfString titleObj = firstOutline.getAsString((PdfName.TITLE));
            String title = titleObj.toUnicodeString();

            PdfArray dest = firstOutline.getAsArray(PdfName.DEST);

            if (dest == null) {
                PdfDictionary action = (PdfDictionary) PdfReader
                        .getPdfObjectRelease(firstOutline.get(PdfName.A));
                if (action != null) {
                    if (PdfName.GOTO.equals(PdfReader.getPdfObjectRelease(action
                            .get(PdfName.S)))) {
                        dest = (PdfArray) PdfReader.getPdfObjectRelease(action
                                .get(PdfName.D));
                    }
                }
            }
            String destStr = parseDestString(dest, reader);

            String[] decodeStr = destStr.split(" ");
            int num = Integer.valueOf(decodeStr[0]);
            HashMap<String, Object> map = new HashMap<String, Object>();
            map.put("Title", title);
            map.put("Action", "GoTo");
            map.put("Page", num + " Fit");

            outlines.add(map);
        }

        private String parseDestString(PdfArray dest, PdfReader reader) {
            String destStr = "";
            if (dest.isString()) {
                destStr = dest.toString();
            } else if (dest.isName()) {
                destStr = PdfName.decodeName(dest.toString());
            } else if (dest.isArray()) {
                IntHashtable pages = new IntHashtable();
                int numPages = reader.getNumberOfPages();
                for (int k = 1; k <= numPages; ++k) {
                    pages.put(reader.getPageOrigRef(k).getNumber(), k);
                    reader.releasePage(k);
                }

                destStr = makeBookmarkParam((PdfArray) dest, pages);
            }
            return destStr;
        }

        private String makeBookmarkParam(PdfArray dest, IntHashtable pages) {
            StringBuffer s = new StringBuffer();
            PdfObject obj = dest.getPdfObject(0);
            if (obj.isNumber()) {
                s.append(((PdfNumber) obj).intValue() + 1);
            } else {
                s.append(pages.get(getNumber((PdfIndirectReference) obj)));
            }
            s.append(' ').append(dest.getPdfObject(1).toString().substring(1));
            for (int k = 2; k < dest.size(); ++k) {
                s.append(' ').append(dest.getPdfObject(k).toString());
            }
            return s.toString();
        }

        private int getNumber(PdfIndirectReference indirect) {
            PdfDictionary pdfObj = (PdfDictionary) PdfReader
                    .getPdfObjectRelease(indirect);
            if (pdfObj.contains(PdfName.TYPE)
                    && pdfObj.get(PdfName.TYPE).equals(PdfName.PAGES)
                    && pdfObj.contains(PdfName.KIDS)) {
                PdfArray kids = (PdfArray) pdfObj.get(PdfName.KIDS);
                indirect = (PdfIndirectReference) kids.getPdfObject(0);
            }
            return indirect.getNumber();
        }
    }
     
     
     
  • 相关阅读:
    Linux Enterprise Cluster NOtes Ch4 同步:ssh和rsync
    e805上不安装中文外挂支持中文,很简单而且实用
    ARM的一些概念性问题
    C#调用WORD处理的小项目 转
    ASP.NET面试题
    .net清除cookie代码|.net为什么不能清除cookie|.net cookie 过期代码
    c#字符串转数字的函数|c#字符串转数字的无错函数|c#字符串转数字的最好函数
    Wiki简介
    new 和override 重写区别
    禁用IE的后退按钮|显示网页已过期|几种语言的实现方法|c#|javascript|html
  • 原文地址:https://www.cnblogs.com/kakafra/p/2944747.html
Copyright © 2011-2022 走看看