目录
二、快速入门使用
2.1、导入依赖
2.2、第一个示例
2.3、设置请求头部信息
2.4、设置请求实体信息(表单数据)
2.5、post请求携带json
三、其他拓展
3.1、ip代理
3.2、连接池
3.3、设置超时
一、Apche httpClient介绍
如果熟悉Linux,那么应该知道,curl、wget这两个命令就可以发起HTTP请求,curl的功能更加强大,可以设置很多http请求参数。Apache httpClient和他们的功能类似,httpClient是一个可以发起HTTP请求,并对响应进行处理的工具。
httpClient相对于curl命令来说,功能要多一些,比如httpClient可以创建类似于数据库连接池的对象,在发起大量请求时,提高一部分效率,当然httpClient的有点不止于此,细节可以参考官方文档。
官方文档:https://hc.apache.org/httpcomponents-client-4.5.x/index.html
本文的内容主要参考官网文档。
二、快速入门使用
2.1、导入依赖
这里使用最新的4.5.x版本
<dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> <version>4.5.10</version> </dependency>
2.2、第一个示例
下面是使用httpClient的一个简单例子,像一个链接发起http请求,然后对响应进行简单处理:
package cn.ganlixin.httpclient; import org.apache.http.Header; import org.apache.http.HttpEntity; import org.apache.http.StatusLine; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpGet; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; import org.apache.http.util.EntityUtils; import org.junit.Test; import java.io.IOException; public class HttpClientDemo { @Test public void firstDemo() throws IOException { // 创建一个httpClient,可以理解为简单的浏览器(不过没有UI渲染以及js引擎) // httpClient使用完毕后,需要关闭,可以利用try with resource方式 try (CloseableHttpClient httpClient = HttpClients.createDefault()) { // 指定请求的url地址,以GET请求为例 String url = "https://www.cnblogs.com/-beyond/p/11207100.html"; HttpGet httpGet = new HttpGet(url); // 执行请求 CloseableHttpResponse httpResponse = httpClient.execute(httpGet); // 获取状态行信息 StatusLine statusLine = httpResponse.getStatusLine(); System.out.println(statusLine); // HTTP/1.1 200 OK // 响应状态语 String reasonPhrase = statusLine.getReasonPhrase(); System.out.println(reasonPhrase); // OK int statusCode = statusLine.getStatusCode(); System.out.println(statusCode); // 200 // 获取响应的头部信息 Header[] allHeaders = httpResponse.getAllHeaders(); for (Header header : allHeaders) { System.out.println(header.getName() + ":" + header.getValue()); } /* Date:Tue, 18 Jul 2019 07:43:14 GMT Content-Type:text/html; charset=utf-8 Transfer-Encoding:chunked Connection:keep-alive Vary:Accept-Encoding */ System.out.println("--------------------------------------------------"); // 获取响应实体 HttpEntity entity = httpResponse.getEntity(); // 解析响应实体,使用httpClient提供的工具类 String content = EntityUtils.toString(entity); System.out.println(content); // ....... } } }
2.3、设置请求头部信息
一般常用的头部信息,比如token,screctKey....
package cn.ganlixin.httpclient; import org.apache.http.HttpEntity; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpPost; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; import org.apache.http.util.EntityUtils; import org.junit.Test; import java.io.IOException; public class RequestWithParams { @Test public void testRequestWithParam() { HttpPost httpPost = new HttpPost("http://localhost/test"); // 设置header头部信息 httpPost.setHeader("token", "1234567890"); try (CloseableHttpClient httpClient = HttpClients.createDefault(); CloseableHttpResponse httpresponse = httpClient.execute(httpPost)) { HttpEntity responseEntity = httpresponse.getEntity(); System.out.println(EntityUtils.toString(responseEntity)); } catch (IOException e) { e.printStackTrace(); } } }
2.4、设置请求实体信息(表单数据)
请求参数,比如表单信息等,可以使用下面这种方式进行设置
package cn.ganlixin.httpclient; import org.apache.http.Consts; import org.apache.http.HttpEntity; import org.apache.http.NameValuePair; import org.apache.http.client.entity.UrlEncodedFormEntity; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpPost; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; import org.apache.http.message.BasicNameValuePair; import org.apache.http.util.EntityUtils; import org.junit.Test; import java.io.IOException; import java.util.ArrayList; import java.util.List; public class RequestWithParams { @Test public void testWithRequestParam() { HttpPost httpPost = new HttpPost("http://localhost/test"); // 设置请求实体信息(post的body,get不要设置body) List<NameValuePair> params = new ArrayList<>(); params.add(new BasicNameValuePair("name", "你好")); params.add(new BasicNameValuePair("age", "99")); params.add(new BasicNameValuePair("addr", "Beijing")); UrlEncodedFormEntity requestEntity = new UrlEncodedFormEntity(params, Consts.UTF_8); httpPost.setEntity(requestEntity); try (CloseableHttpClient httpClient = HttpClients.createDefault(); CloseableHttpResponse httpresponse = httpClient.execute(httpPost)) { HttpEntity responseEntity = httpresponse.getEntity(); System.out.println(EntityUtils.toString(responseEntity)); } catch (IOException e) { e.printStackTrace(); } } }
2.5、post请求携带json
当前比较多的传参方式使用json格式,使用json传递请求参数时,需要设置请求实体的类型,也就是Content-Type为application/json。
下面是请求携带json参数的例子:
package cn.ganlixin.httpclient; import org.apache.http.HttpEntity; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpPost; import org.apache.http.entity.StringEntity; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; import org.apache.http.util.EntityUtils; import org.junit.Test; import java.io.IOException; import java.io.UnsupportedEncodingException; public class RequestWithParams { @Test public void testRequestJson() { HttpPost httpPost = new HttpPost("http://localhost/test/json"); String json = "{"name":"beyond", "age":99, "addr":"Beijing"}"; // 方式一 // ContentType contentType = ContentType.create(ContentType.APPLICATION_JSON.getMimeType(), Consts.UTF_8); // StringEntity requestEntity = new StringEntity(json, contentType); // 方式二 httpPost.setHeader("Content-Type", "application/json"); StringEntity requestEntity = null; try { requestEntity = new StringEntity(json); } catch (UnsupportedEncodingException e) { e.printStackTrace(); return; } // 设置请求实体 httpPost.setEntity(requestEntity); try (CloseableHttpClient httpClient = HttpClients.createDefault(); CloseableHttpResponse response = httpClient.execute(httpPost)) { HttpEntity responseEntity = response.getEntity(); String content = EntityUtils.toString(responseEntity); System.out.println(content); } catch (IOException e) { e.printStackTrace(); } } }
三、其他拓展
下面针对一些用到的例子进行整理,比如ip代理、客户端连接池、BA认证....
3.1、ip代理
如果使用httpClient进行一些数据的爬取,那么经常碰到403(forbidden)被屏蔽的问题,这是因为频繁使用1个IP访问某个资源,被认为是非法访问,所以被拉黑了。
对于这种情况,我们一般会使用代理ip,也就是换一个ip来发起请求,而不是自己本身的ip去发起。
代理ip可以在网上搜到,一般是ip+port形式,下面是httpClient使用ip代理的例子:
package cn.ganlixin.httpclient; import org.apache.http.HttpHost; import org.apache.http.client.config.RequestConfig; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpGet; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; import org.apache.http.util.EntityUtils; import org.junit.Test; import java.io.IOException; public class UseHttpClientProxy { @Test public void testProxy() { HttpGet httpGet = new HttpGet("https://www.so.com/"); // 设置代理ip和端口 HttpHost proxy = new HttpHost("112.85.161.145", 9999); RequestConfig config = RequestConfig.custom().setProxy(proxy).build(); httpGet.setConfig(config); // 执行请求 try (CloseableHttpClient httpClient = HttpClients.createDefault(); CloseableHttpResponse httpResponse = httpClient.execute(httpGet)) { // 处理响应 EntityUtils.toString(httpResponse.getEntity()); } catch (IOException e) { e.printStackTrace(); } } }
上面只是用了一个代理ip,一般来说,如果要使用ip代理,那么就会有一堆ip备用,自己可以采取一些方式来切换不同的ip进行代理。
3.2、客户端连接池
HttpClient客户端连接池,其实和其他技术中心的连接池是一样的道理(比如数据库连接池,Thrift连接池,Redis连接池....)
前面的代码中,基本都以一行代码:
CloseableHttpClient httpClient = HttpClients.createDefault()
这行代码可以简单理解为创建一个“浏览器”(客户端),创建一个客户端,需要一定的开销;如果需要发起很多请求,每次都创建一个客户端,用完即扔,这种方式的效率比较降低。
低效的原因是频繁创建httpClient,其实httpClient是可以复用的,创建的httpClient可以发起N次请求,那么我们可以先创建几个httpClient(客户端)放入池中(pool),当业务需要发起请求时,就从客户端池中取1个(当有空闲时),用完后,放回池中。
下面是创建httpClient连接池的示例:
package cn.ganlixin.httpclient; import org.apache.http.client.methods.HttpGet; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; import org.apache.http.impl.conn.PoolingHttpClientConnectionManager; import org.junit.Test; import java.io.IOException; /** * 描述: * httpClient连接池 * * @author ganlixin * @create 2020-01-28 */ public class HttpClientPool { @Test public void testCreate() throws IOException { PoolingHttpClientConnectionManager connectionManager = new PoolingHttpClientConnectionManager(); // 设置httpClient的最大数量为100个 connectionManager.setMaxTotal(2); // 此时创建httpClient,不要使用createDefault,而是使用上面的connectionManager CloseableHttpClient httpClient = HttpClients.custom() .setConnectionManager(connectionManager) .build(); // 使用创建的httpClient发起请求即可,用法和之前一样,但是需要注意的是httpClient不要关闭 httpClient.execute(new HttpGet("https://www.cnblogs.com/-beyond")); // httpClient.close(); // 不要关闭 } }
3.3、设置超时
httpClient发请求的过程中,有几个地方都可能超时,比如
1、连接池没有空闲连接,导致超时;
2、从连接池获取到可用连接,但是与要访问的主机创建连接超时;
3、与主机创建连接成功,但是在传输数据的时候超时;
针对上面三种超时,可以采取下面方式进行设置超时时间
package cn.ganlixin.httpclient; import org.apache.http.client.config.RequestConfig; import org.apache.http.client.methods.HttpGet; public class SetTimeOut { public void testTimeout() { HttpGet httpGet = new HttpGet("https://www.cnblogs.com/-beyond"); // 设置超时(单位毫秒) RequestConfig config = RequestConfig.custom() .setConnectTimeout(2 * 1000) // 创建连接的超时时间 .setConnectionRequestTimeout(2 * 1000) // 设置从连接池获取连接的超时时间 .setSocketTimeout(5 * 1000) // 数据传输的超时时间 .build(); httpGet.setConfig(config); // 创建一个HttpClient执行httpGet即可 } }