zoukankan      html  css  js  c++  java
  • 利用htmlparser读取html文档的内容

    1.添加相关的的jar

    htmlparser-2.1.jar

    2.方法和代码

    public static String readHtml(File html) {

    String htmlPath = html.getAbsolutePath();
    String text = "";
    Parser parser = null;
    try {
    parser = new Parser(htmlPath);

    } catch (Exception e) {
    e.printStackTrace();
    }
    try {
    parser.setEncoding("UTF-8");
    } catch (Exception e) {
    e.printStackTrace();
    }
    HtmlPage visitor = new HtmlPage(parser);
    try {
    parser.visitAllNodesWith(visitor);
    } catch (Exception e) {
    e.printStackTrace();
    }

    NodeList nodes = visitor.getBody();

    int size = nodes.size();
    for (int i = 0; i < size; i++) {
    Node node = nodes.elementAt(i);
    text += node.toPlainTextString();
    }

    return text;
    }

  • 相关阅读:
    Excel教程(5)
    Excel教程(4)
    Excel教程(3)
    Excel教程(2)
    如何在Excel中少犯二(I)
    for zip
    temp
    study
    eclipse
    shell
  • 原文地址:https://www.cnblogs.com/git-niu/p/6903697.html
Copyright © 2011-2022 走看看