zoukankan      html  css  js  c++  java
  • java htmlunit 抓取网页数据

    WebClient webClient=new WebClient(BrowserVersion.CHROME);
                webClient.setJavaScriptTimeout(5000);
                webClient.getOptions().setUseInsecureSSL(true);
                
                
                webClient.getOptions().setJavaScriptEnabled(true);
                webClient.getOptions().setCssEnabled(false);
                webClient.getOptions().setThrowExceptionOnScriptError(false);
                webClient.getOptions().setTimeout(100000);
                webClient.getOptions().setDoNotTrackEnabled(false);
                
                
                HtmlPage page=webClient.getPage(this.path);
                webClient.waitForBackgroundJavaScript(20000);
                
                Thread.sleep(5000);
                
                HtmlDivision div=(HtmlDivision)page.getElementById("forecast");
                String xml=div.asXml();
                if(xml.indexOf("forecast-data-loading")>=0)
                {
                    System.out.println("htmlUnit解析页面失败");
                }
                else
                {
                    System.out.println("htmlUnit解析页面成功");
                    int[] aqis=new int[8];
                    
                    int i=0;
                    List<HtmlTable> tables=(List<HtmlTable>)div.getByXPath("./div[2]/center[1]/table");
                    if(tables.size()==8)
                    {
                        for(HtmlTable table : tables)
                        {  
                            List<HtmlTableRow> trs=(List<HtmlTableRow>)table.getByXPath("./tbody/tr[4]");
                            HtmlTableRow tr=trs.get(0);
                            
                            int aqi=0;
                            List<HtmlTableCell> cells = (List<HtmlTableCell>)tr.getByXPath("./td");
                            for(HtmlTableCell cell : cells)
                            {
                                String s=cell.asText();
                                String [] values=s.split("
    ");
                                aqi=aqi+(Integer.parseInt(values[0])+Integer.parseInt(values[1]))/2 ;
                            }
                            aqi=aqi/cells.size();
                            aqis[i]=aqi;
                            i=i+1;
                        }
                    }
  • 相关阅读:
    Asp.net MVC 自定义路由在IIS7以上,提示Page Not Found 解决方法
    mysql 常用操作
    Mongo常用操作
    Cent Os 常用操作
    Window 8.1 开启Wifi共享
    解决 对象的当前状态使该操作无效 的问题
    unity3d: how to display the obj behind the wall
    unreal network
    rust borrow and move
    erlang的map基本使用
  • 原文地址:https://www.cnblogs.com/tiandi/p/6218905.html
Copyright © 2011-2022 走看看