<!-- htmlUnit --> <dependency> <groupId>net.sourceforge.htmlunit</groupId> <artifactId>htmlunit</artifactId> <version>2.19</version> </dependency>
WebClient webClient = new WebClient(BrowserVersion.CHROME);//选择浏览器 HtmlPage mainPage = webClient.getPage("https://www.baidu.com/"); List<HtmlAnchor> list = (List<HtmlAnchor>) mainPage.getByXPath("//a[@class="mnav"]");//使用XPath,获取要查询的Class对象 for(HtmlAnchor temp:list){ System.out.println(temp.asText()); } webClient.close();
可以通过特定的代码设置cookie
新浪微博有强制登录机制,所以不能直接爬数据,可以直接爬手机版的首页(weibo.cn)