zoukankan      html  css  js  c++  java
  • httpclient详细介绍及使用详情

    一:httpclient 简介

    HttpClient 是 Apache Jakarta Common 下的子项目,可以用来提供高效的、最新的、功能丰富的支持 HTTP 协议的客户端编程工具包,并且它支持 HTTP 协议最新的版本和建议。

    超文本传输协议(HTTP)可能是当今Internet上使用的最重要的协议。Web服务,支持网络的设备和网络计算的发展继续将HTTP协议的作用扩展到用户驱动的Web浏览器之外,同时增加了需要HTTP支持的应用程序的数量。尽管java.net包提供了通过HTTP访问资源的基本功能,但它并未提供许多应用程序所需的完全灵活性或功能。HttpClient旨在通过提供一个高效,最新且功能丰富的包来实现这一空白,该包实现了最新HTTP标准和建议的客户端。HttpClient专为扩展而设计,同时为基本HTTP协议提供强大支持,HttpClient可能对构建支持HTTP的客户端应用程序(如Web浏览器,Web服务客户端或利用或扩展HTTP协议进行分布式通信的系统)感兴趣。

    HttpClient主页: http://hc.apache.org/

    HttpClient下载:http://hc.apache.org/downloads.cgi

    最新版本4.5 http://hc.apache.org/httpcomponents-client-4.5.x/

    官方文档: http://hc.apache.org/httpcomponents-client-4.5.x/tutorial/html/index.html

    maven地址:

    <dependency>
        <groupId>org.apache.httpcomponents</groupId>
        <artifactId>httpclient</artifactId>
        <version>4.5.2</version>
    </dependency>
    

    二:httpclient使用流程

    使用 HttpClient 发送请求、接收响应很简单,一般需要如下几步即可。

    • 创建 HttpClient 对象。
    • 创建请求方法的实例,并指定请求 URL。如果需要发送 GET 请求,创建 HttpGet 对象;如果需要发送 POST 请求,创建 HttpPost 对象。
    • 如果需要发送请求参数,可调用 HttpGet、HttpPost 共同的 setParams(HttpParams params) 方法来添加请求参数;对于 HttpPost 对象而言,也可调用 setEntity(HttpEntity entity) 方法来设置请求参数。
    • 调用 HttpClient 对象的 execute(HttpUriRequest request) 发送请求,该方法返回一个 HttpResponse。
    • 调用 HttpResponse 的 getAllHeaders()、getHeaders(String name) 等方法可获取服务器的响应头;调用 HttpResponse 的 getEntity() 方法可获取 HttpEntity 对象,该对象包装了服务器的响应内容。程序可通过该对象获取服务器的响应内容。
    • 释放连接。无论执行方法是否成功,都必须释放连接

    三:HelloWorld 程序

    1.创建helloworld程序

    public class HelloWorld2 {
    	public static void main(String[] args) throws ClientProtocolException, IOException {
    		//1.创建httpclient实例
    		CloseableHttpClient httpClient = HttpClients.createDefault();
    		//2.创建httpget实例(请求)
    		HttpGet httpGet = new HttpGet("http://www.java1234.com");
    		//3.httpclient执行(httpget)请求
    		CloseableHttpResponse response = httpClient.execute(httpGet);	//执行http get请求
    		//4.获取返回的实体(entity)
    		HttpEntity entity = response.getEntity();
    		String context = EntityUtils.toString(entity, "utf-8");	//获取网页内容
    		System.out.println("网页内容是:"+context);
    		//5.关闭资源
    		response.close();	//response关闭
    		httpClient.close();	//httpClient关闭
    	}	
    }
    

    2.创建HttpGet请求

    添加依赖

    <dependency>
      	<groupId>org.apache.httpcomponents</groupId>
      	<artifactId>fluent-hc</artifactId>
      	<version>4.5.5</version>
    </dependency>
    <dependency>
     	<groupId>org.apache.httpcomponents</groupId>
      	<artifactId>httpmime</artifactId>
      	<version>4.5.5</version>
    </dependency>
    
    public class MyTest {
        public static void main(String[] args) {
            get();
        }
      
        private static void get() {
            // 创建 HttpClient 客户端,打开浏览器
            CloseableHttpClient httpClient = HttpClients.createDefault();
    
            // 创建 HttpGet 请求,输入url
            HttpGet httpGet = new HttpGet("http://localhost:8080/content/page?draw=1&start=0&length=10");
            // 设置长连接
            httpGet.setHeader("Connection", "keep-alive");
            // 设置代理(模拟浏览器版本)
            httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36");
            // 设置 Cookie
            httpGet.setHeader("Cookie", "UM_distinctid=16442706a09352-0376059833914f-3c604504-1fa400-16442706a0b345; CNZZDATA1262458286=1603637673-1530123020-%7C1530123020; JSESSIONID=805587506F1594AE02DC45845A7216A4");
    
          	//发送请求,回车
            CloseableHttpResponse httpResponse = null;
            try {
                // 请求并获得响应结果
                httpResponse = httpClient.execute(httpGet);
                HttpEntity httpEntity = httpResponse.getEntity();
                // 输出请求结果
                System.out.println(EntityUtils.toString(httpEntity));
            } catch (IOException e) {
                e.printStackTrace();
            } finally {	// 无论如何必须关闭连接
                if (httpResponse != null) {
                    try {
                        httpResponse.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
                if (httpClient != null) {
                    try {
                        httpClient.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            }
        }
    }
    

    3.创建HttpPost请求

    public class MyTest {
        public static void main(String[] args) {
            post();
        }
    
        private static void post() {
            // 创建 HttpClient 客户端
            CloseableHttpClient httpClient = HttpClients.createDefault();
    
            // 创建 HttpPost 请求
            HttpPost httpPost = new HttpPost("http://localhost:8080/content/page");
            // 设置长连接
            httpPost.setHeader("Connection", "keep-alive");
            // 设置代理(模拟浏览器版本)
            httpPost.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36");
            // 设置 Cookie
            httpPost.setHeader("Cookie", "UM_distinctid=16442706a09352-0376059833914f-3c604504-1fa400-16442706a0b345; CNZZDATA1262458286=1603637673-1530123020-%7C1530123020; JSESSIONID=805587506F1594AE02DC45845A7216A4");
    
            // 创建 HttpPost 参数
            List<BasicNameValuePair> params = new ArrayList<BasicNameValuePair>();
            params.add(new BasicNameValuePair("draw", "1"));	//请求参数中的key-value值
            params.add(new BasicNameValuePair("start", "0"));
            params.add(new BasicNameValuePair("length", "10"));
    
            CloseableHttpResponse httpResponse = null;
            try {
                // 设置 HttpPost 参数
                httpPost.setEntity(new UrlEncodedFormEntity(params, "UTF-8"));
                httpResponse = httpClient.execute(httpPost);
                HttpEntity httpEntity = httpResponse.getEntity();
                // 输出请求结果
                System.out.println(EntityUtils.toString(httpEntity));
            } catch (UnsupportedEncodingException e) {
                e.printStackTrace();
            } catch (ClientProtocolException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            } finally {	// 无论如何必须关闭连接
                try {
                    if (httpResponse != null) {
                        httpResponse.close();
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                }
                try {
                    if (httpClient != null) {
                        httpClient.close();
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
    

    四:模拟浏览器抓取网页

    1.设置请求头消息User-Agent模拟浏览器(此处是chrome浏览器)

    httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36");
    

    2.获取响应内容类型Content-Type

    //获取响应内容类型Content-Type;  getName()是获取key,getValue()是获取value
    entity.getContentType().getValue();
    

    3.获取响应状态Status

    response.getStatusLine().getStatusCode();
    
    200	-- 正常
    403	-- 拒绝
    500	-- 服务器报错
    400	-- 未找到页面
    

    4.示例

    public class Demo2 {	
    	public static void main(String[] args) throws ClientProtocolException, IOException {
    		//1.创建httpclient实例
    		CloseableHttpClient httpClient = HttpClients.createDefault();
    		
    		//2.创建httpget实例(请求)
    		HttpGet httpGet = new HttpGet("http://www.tuicool.com");
    		//设置请求头消息User-Agent模拟浏览器(此处是chrome浏览器)
    		httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36");
    		
    		//3.httpclient执行(httpget)请求
    		CloseableHttpResponse response = httpClient.execute(httpGet);	//执行http get请求
    		System.out.println("Status:"+response.getStatusLine().getStatusCode());	//获取响应状态Status
    		
    		//4.获取返回的实体(entity)
    		HttpEntity entity = response.getEntity();
    		//获取响应内容类型Content-Type;  getName()是获取key,getValue()是获取value
    		System.out.println("Content-Type:"+entity.getContentType().getValue());
    		//获取网页内容
    //		String context = EntityUtils.toString(entity, "utf-8");
    //		System.out.println("网页内容是:"+context);
    		
    		//5.关闭资源
    		response.close();	//response关闭
    		httpClient.close();	//httpClient关闭
    	}
    }
    

    五:httpclient 抓取图片

    public class Demo1 {
    	public static void main(String[] args) throws ClientProtocolException, IOException {
    		//1.创建httpclient实例
    		CloseableHttpClient httpClient = HttpClients.createDefault();
    		
    		//2.创建httpget实例(请求)
    		HttpGet httpGet = new HttpGet("http://www.java1234.com/uploads/allimg/161105/1-161105150121954.jpg");
    		//设置请求头消息User-Agent模拟浏览器(此处是chrome浏览器)
    		httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36");
    		
    		//3.httpclient执行(httpget)请求
    		CloseableHttpResponse response = httpClient.execute(httpGet);	//执行http get请求
    		
    		//4.获取返回的实体(entity)
    		HttpEntity entity = response.getEntity();
    		if(entity!=null) {
    			//打印实体的内容类型
    			System.out.println("Content-Type:"+entity.getContentType().getValue());
    			//获取实体的输入流
    			InputStream inputStream = entity.getContent();
    			//将输入流复制到新建的文件
    			FileUtils.copyToFile(inputStream, new File("E://mysource/picture/aaa.jpg"));
    		}
    		
    		//5.关闭资源
    		response.close();	//response关闭
    		httpClient.close();	//httpClient关闭
    	}	
    }
    

    六:httpclient 使用代理ip

    在爬取网页的时候,有的目标站点有反爬虫机制,对于频繁访问站点以及规则性访问站点的行为,会采集屏蔽IP措施。

    关于代理IP的话 也分几种 透明代理、匿名代理、混淆代理、高匿代理。

    1.透明代理(Transparent Proxy)

    REMOTE_ADDR = Proxy IP
    HTTP_VIA = Proxy IP
    HTTP_X_FORWARDED_FOR = Your IP
    

    透明代理虽然可以直接“隐藏”你的IP地址,但是还是可以从HTTP_X_FORWARDED_FOR来查到你是谁。

    2.匿名代理(Anonymous Proxy)

    REMOTE_ADDR = proxy IP
    HTTP_VIA = proxy IP
    HTTP_X_FORWARDED_FOR = proxy IP
    

    匿名代理比透明代理进步了一点:别人只能知道你用了代理,无法知道你是谁。

    3.混淆代理(Distorting Proxies)

    REMOTE_ADDR = Proxy IP
    HTTP_VIA = Proxy IP
    HTTP_X_FORWARDED_FOR = Random IP address
    

    如上,与匿名代理相同,如果使用了混淆代理,别人还是能知道你在用代理,但是会得到一个假的IP地址,伪装的更逼真.

    4.高匿代理(Elite proxy或High Anonymity Proxy)

    REMOTE_ADDR = Proxy IP
    HTTP_VIA = not determined
    HTTP_X_FORWARDED_FOR = not determined
    

    可以看出来,高匿代理让别人根本无法发现你是在用代理,所以是最好的选择.

    那代理IP 从哪里搞呢 很简单 百度一下,你就知道 一大堆代理IP站点。 一般都会给出一些免费的,但是花点钱搞收费接口更加方便;比如 http://www.66ip.cn/

    5.示例

    public class Demo1 {
    	public static void main(String[] args) throws ClientProtocolException, IOException {
    		//1.创建httpclient实例
    		CloseableHttpClient httpClient = HttpClients.createDefault();
    		
    		//2.创建httpget实例(请求)
    		HttpGet httpGet = new HttpGet("http://www.tuicool.com");
    		//设置代理ip
    		HttpHost proxy = new HttpHost("42.121.15.99",3128);
    		RequestConfig config = RequestConfig.custom().setProxy(proxy).build();
    		httpGet.setConfig(config);
    		//设置请求头消息User-Agent模拟浏览器(此处是chrome浏览器)
    		httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36");
    		
    		//3.httpclient执行(httpget)请求
    		CloseableHttpResponse response = httpClient.execute(httpGet);	//执行http get请求
    		
    		//4.获取返回的实体(entity)
    		HttpEntity entity = response.getEntity();
    		String context = EntityUtils.toString(entity, "utf-8");	//获取网页内容
    		System.out.println("网页内容是:"+context);
    		
    		//5.关闭资源
    		response.close();	//response关闭
    		httpClient.close();	//httpClient关闭
    	}	
    }
    

    七:httpclient 连接超时及读取超时

    httpClient在执行具体http请求时候 有一个连接的时间和读取内容的时间;

    HttpClient连接时间,所谓连接的时候 是HttpClient发送请求的地方开始到连接上目标url主机地址的时间。

    HttpClient读取时间,所谓读取的时间 是HttpClient已经连接到了目标服务器,然后进行内容数据的获取。

    国外maven仓库地址:http://central.maven.org/maven2/

    示例:

    public class Demo1 {
    	public static void main(String[] args) throws ClientProtocolException, IOException {
    		//1.创建httpclient实例
    		CloseableHttpClient httpClient = HttpClients.createDefault();
    		
    		//2.创建httpget实例(请求)
    		HttpGet httpGet = new HttpGet("http://central.maven.org/maven2/");
    		//设置连接超时及读取超时
    		RequestConfig config=RequestConfig.custom()
                    .setConnectTimeout(1000)	//设置连接超时时间(单位毫秒)
                    .setSocketTimeout(1000)	//设置读取超时时间(单位毫秒)
                    .build();
            httpGet.setConfig(config);
    		//设置请求头消息User-Agent模拟浏览器(此处是chrome浏览器)
    		httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36");
    		
    		//3.httpclient执行(httpget)请求
    		CloseableHttpResponse response = httpClient.execute(httpGet);	//执行http get请求
    		
    		//4.获取返回的实体(entity)
    		HttpEntity entity = response.getEntity();
    		String context = EntityUtils.toString(entity, "utf-8");	//获取网页内容
    		System.out.println("网页内容是:"+context);
    		
    		//5.关闭资源
    		response.close();	//response关闭
    		httpClient.close();	//httpClient关闭
    	}
    }
  • 相关阅读:
    Git的安装
    报错Invalid character found in method name. HTTP method names must be tokens|the HTTP protoco
    Spring Cloud(二)—— Eureka注册与发现
    spring-boot swagger2 设置全局token,说明页面接口无法带入token
    c# 结构体中包含结构体数组的使用
    百度地图api热力图时报错Cannot read property 'y' of undefined
    springboot使用freemaker导出word文档
    c# 同时运行两个相同的程序
    idea maven的pom文件已导入依赖,但是无法引入该包中class
    bootstrap Table 导出时时间格式显示秒 科学计数法显示
  • 原文地址:https://www.cnblogs.com/itzlg/p/10699496.html
Copyright © 2011-2022 走看看