zoukankan      html  css  js  c++  java
  • 【JavaWeb】动态网页抓取

    Jsoup无法获取Js及Ajax执行后的网页内容,用HtmlUnit抓取动态网页:

    private String getPage(String url,boolean enabledJs,boolean ignoreSSL,boolean enabledCss,boolean enabledAjax) throws IOException {
            WebClient webClient = new WebClient(BrowserVersion.CHROME); //创建一个webclient
            webClient.getOptions().setJavaScriptEnabled(enabledJs); // 启动JS
            webClient.getOptions().setUseInsecureSSL(ignoreSSL);//忽略ssl认证
            webClient.getOptions().setCssEnabled(enabledCss);//禁用Css,可避免自动二次请求CSS进行渲染
            webClient.getOptions().setThrowExceptionOnScriptError(false);//运行错误时,不抛出异常
            if(enabledAjax)
                webClient.setAjaxController(new NicelyResynchronizingAjaxController());// 设置Ajax异步
    
            HtmlPage page = webClient.getPage(url);
            webClient.waitForBackgroundJavaScript(10000);
            return page.asXml();
        }
  • 相关阅读:
    【bzoj1010】[HNOI2008]玩具装箱toy
    bzoj 3173
    bzoj 1179
    bzoj 2427
    bzoj 1051
    bzoj 1877
    bzoj 1066
    bzoj 2127
    bzoj 1412
    bzoj 3438
  • 原文地址:https://www.cnblogs.com/cnsec/p/13286738.html
Copyright © 2011-2022 走看看