zoukankan      html  css  js  c++  java
  • HttpClient抓取网页内容简单介绍

    版本HttpClient3.1

    1、GET方式

    第一步、创建一个客户端,类似于你用浏览器打开一个网页

    HttpClient httpClient = new HttpClient();

    第二步、创建一个GET方法,用来获取到你需要抓取的网页URL

    GetMethod getMethod = new GetMethod("http://www.baidu.com");

    第三步、获得网址的响应状态码,200表示请求成功

    int statusCode = httpClient.executeMethod(getMethod);

    第四步、获取网页的源码

    byte[] responseBody = getMethod.getResponseBody();

    主要就这四步,当然还有其他很多东西,比如网页编码的问题

    
    
     1 public static String spiderHtml() throws Exception {
     2         //URL url = new URL("http://top.baidu.com/buzz?b=1");
     3         
     4         HttpClient client = new HttpClient();
     5         GetMethod method = new GetMethod("http://top.baidu.com/buzz?b=1");        
     6         
     7         int statusCode = client.executeMethod(method);
     8         if(statusCode != HttpStatus.SC_OK) {
     9             System.err.println("Method failed: "  + method.getStatusLine());
    10         }
    11         
    12         byte[] body = method.getResponseBody();
    13         String html = new String(body,"gbk");


    2、Post方式





    1
    HttpClient httpClient = new HttpClient();
     2        PostMethod postMethod = new PostMethod(UrlPath);  
     3        postMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,new DefaultHttpMethodRetryHandler());  
     4        NameValuePair[] postData = new NameValuePair[2];  
     5        postData[0] = new NameValuePair("username", "xkey");  
     6        postData[1] = new NameValuePair("userpass", "********");  
     7        postMethod.setRequestBody(postData);  
     8        try {  
     9            int statusCode = httpClient.executeMethod(postMethod);  
    10            if (statusCode == HttpStatus.SC_OK) {  
    11                byte[] responseBody = postMethod.getResponseBody();  
    12                String html = new String(responseBody);  
    13                System.out.println(html);  
    14            }  
    15        } catch (Exception e) {  
    16 System.err.println("页面无法访问"); 17 }finally{ 18 postMethod.releaseConnection(); 19 }






    相关链接:http://blog.csdn.net/acceptedxukai/article/details/7030700
    http://www.cnblogs.com/modou/articles/1325569.html
     
  • 相关阅读:
    Jzoj5422 天才绅士少女助手克里斯蒂娜
    Jzoj5422 天才绅士少女助手克里斯蒂娜
    Jzoj5421 嘟嘟噜
    Jzoj5421 嘟嘟噜
    Jzoj5460【NOIP2017提高A组冲刺11.7】士兵训练
    Jzoj5460【NOIP2017提高A组冲刺11.7】士兵训练
    Jzoj5459【NOIP2017提高A组冲刺11.7】密室
    PAT甲级——A1046 Shortest Distance
    PAT甲级——A1045 Favorite Color Stripe
    PAT甲级——A1044 Shopping in Mars
  • 原文地址:https://www.cnblogs.com/wq920/p/3522753.html
Copyright © 2011-2022 走看看