目录
二、快速入门使用
2.1、导入依赖
2.2、第一个示例
2.3、设置请求头部信息
2.4、设置请求实体信息(表单数据)
2.5、post请求携带json
三、其他拓展
3.1、ip代理
3.2、连接池
3.3、设置超时
一、Apche httpClient介绍
如果熟悉Linux,那么应该知道,curl、wget这两个命令就可以发起HTTP请求,curl的功能更加强大,可以设置很多http请求参数。Apache httpClient和他们的功能类似,httpClient是一个可以发起HTTP请求,并对响应进行处理的工具。
httpClient相对于curl命令来说,功能要多一些,比如httpClient可以创建类似于数据库连接池的对象,在发起大量请求时,提高一部分效率,当然httpClient的有点不止于此,细节可以参考官方文档。
官方文档:https://hc.apache.org/httpcomponents-client-4.5.x/index.html
本文的内容主要参考官网文档。
二、快速入门使用
2.1、导入依赖
这里使用最新的4.5.x版本
<dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> <version>4.5.10</version> </dependency>
2.2、第一个示例
下面是使用httpClient的一个简单例子,像一个链接发起http请求,然后对响应进行简单处理:
package cn.ganlixin.httpclient;
import org.apache.http.Header;
import org.apache.http.HttpEntity;
import org.apache.http.StatusLine;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import org.junit.Test;
import java.io.IOException;
public class HttpClientDemo {
@Test
public void firstDemo() throws IOException {
// 创建一个httpClient,可以理解为简单的浏览器(不过没有UI渲染以及js引擎)
// httpClient使用完毕后,需要关闭,可以利用try with resource方式
try (CloseableHttpClient httpClient = HttpClients.createDefault()) {
// 指定请求的url地址,以GET请求为例
String url = "https://www.cnblogs.com/-beyond/p/11207100.html";
HttpGet httpGet = new HttpGet(url);
// 执行请求
CloseableHttpResponse httpResponse = httpClient.execute(httpGet);
// 获取状态行信息
StatusLine statusLine = httpResponse.getStatusLine();
System.out.println(statusLine); // HTTP/1.1 200 OK
// 响应状态语
String reasonPhrase = statusLine.getReasonPhrase();
System.out.println(reasonPhrase); // OK
int statusCode = statusLine.getStatusCode();
System.out.println(statusCode); // 200
// 获取响应的头部信息
Header[] allHeaders = httpResponse.getAllHeaders();
for (Header header : allHeaders) {
System.out.println(header.getName() + ":" + header.getValue());
}
/*
Date:Tue, 18 Jul 2019 07:43:14 GMT
Content-Type:text/html; charset=utf-8
Transfer-Encoding:chunked
Connection:keep-alive
Vary:Accept-Encoding
*/
System.out.println("--------------------------------------------------");
// 获取响应实体
HttpEntity entity = httpResponse.getEntity();
// 解析响应实体,使用httpClient提供的工具类
String content = EntityUtils.toString(entity);
System.out.println(content);
// .......
}
}
}
2.3、设置请求头部信息
一般常用的头部信息,比如token,screctKey....
package cn.ganlixin.httpclient;
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import org.junit.Test;
import java.io.IOException;
public class RequestWithParams {
@Test
public void testRequestWithParam() {
HttpPost httpPost = new HttpPost("http://localhost/test");
// 设置header头部信息
httpPost.setHeader("token", "1234567890");
try (CloseableHttpClient httpClient = HttpClients.createDefault();
CloseableHttpResponse httpresponse = httpClient.execute(httpPost)) {
HttpEntity responseEntity = httpresponse.getEntity();
System.out.println(EntityUtils.toString(responseEntity));
} catch (IOException e) {
e.printStackTrace();
}
}
}
2.4、设置请求实体信息(表单数据)
请求参数,比如表单信息等,可以使用下面这种方式进行设置
package cn.ganlixin.httpclient;
import org.apache.http.Consts;
import org.apache.http.HttpEntity;
import org.apache.http.NameValuePair;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.util.EntityUtils;
import org.junit.Test;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class RequestWithParams {
@Test
public void testWithRequestParam() {
HttpPost httpPost = new HttpPost("http://localhost/test");
// 设置请求实体信息(post的body,get不要设置body)
List<NameValuePair> params = new ArrayList<>();
params.add(new BasicNameValuePair("name", "你好"));
params.add(new BasicNameValuePair("age", "99"));
params.add(new BasicNameValuePair("addr", "Beijing"));
UrlEncodedFormEntity requestEntity = new UrlEncodedFormEntity(params, Consts.UTF_8);
httpPost.setEntity(requestEntity);
try (CloseableHttpClient httpClient = HttpClients.createDefault();
CloseableHttpResponse httpresponse = httpClient.execute(httpPost)) {
HttpEntity responseEntity = httpresponse.getEntity();
System.out.println(EntityUtils.toString(responseEntity));
} catch (IOException e) {
e.printStackTrace();
}
}
}
2.5、post请求携带json
当前比较多的传参方式使用json格式,使用json传递请求参数时,需要设置请求实体的类型,也就是Content-Type为application/json。
下面是请求携带json参数的例子:
package cn.ganlixin.httpclient;
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import org.junit.Test;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
public class RequestWithParams {
@Test
public void testRequestJson() {
HttpPost httpPost = new HttpPost("http://localhost/test/json");
String json = "{"name":"beyond", "age":99, "addr":"Beijing"}";
// 方式一
// ContentType contentType = ContentType.create(ContentType.APPLICATION_JSON.getMimeType(), Consts.UTF_8);
// StringEntity requestEntity = new StringEntity(json, contentType);
// 方式二
httpPost.setHeader("Content-Type", "application/json");
StringEntity requestEntity = null;
try {
requestEntity = new StringEntity(json);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
return;
}
// 设置请求实体
httpPost.setEntity(requestEntity);
try (CloseableHttpClient httpClient = HttpClients.createDefault();
CloseableHttpResponse response = httpClient.execute(httpPost)) {
HttpEntity responseEntity = response.getEntity();
String content = EntityUtils.toString(responseEntity);
System.out.println(content);
} catch (IOException e) {
e.printStackTrace();
}
}
}
三、其他拓展
下面针对一些用到的例子进行整理,比如ip代理、客户端连接池、BA认证....
3.1、ip代理
如果使用httpClient进行一些数据的爬取,那么经常碰到403(forbidden)被屏蔽的问题,这是因为频繁使用1个IP访问某个资源,被认为是非法访问,所以被拉黑了。
对于这种情况,我们一般会使用代理ip,也就是换一个ip来发起请求,而不是自己本身的ip去发起。
代理ip可以在网上搜到,一般是ip+port形式,下面是httpClient使用ip代理的例子:
package cn.ganlixin.httpclient;
import org.apache.http.HttpHost;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import org.junit.Test;
import java.io.IOException;
public class UseHttpClientProxy {
@Test
public void testProxy() {
HttpGet httpGet = new HttpGet("https://www.so.com/");
// 设置代理ip和端口
HttpHost proxy = new HttpHost("112.85.161.145", 9999);
RequestConfig config = RequestConfig.custom().setProxy(proxy).build();
httpGet.setConfig(config);
// 执行请求
try (CloseableHttpClient httpClient = HttpClients.createDefault();
CloseableHttpResponse httpResponse = httpClient.execute(httpGet)) {
// 处理响应
EntityUtils.toString(httpResponse.getEntity());
} catch (IOException e) {
e.printStackTrace();
}
}
}
上面只是用了一个代理ip,一般来说,如果要使用ip代理,那么就会有一堆ip备用,自己可以采取一些方式来切换不同的ip进行代理。
3.2、客户端连接池
HttpClient客户端连接池,其实和其他技术中心的连接池是一样的道理(比如数据库连接池,Thrift连接池,Redis连接池....)
前面的代码中,基本都以一行代码:
CloseableHttpClient httpClient = HttpClients.createDefault()
这行代码可以简单理解为创建一个“浏览器”(客户端),创建一个客户端,需要一定的开销;如果需要发起很多请求,每次都创建一个客户端,用完即扔,这种方式的效率比较降低。
低效的原因是频繁创建httpClient,其实httpClient是可以复用的,创建的httpClient可以发起N次请求,那么我们可以先创建几个httpClient(客户端)放入池中(pool),当业务需要发起请求时,就从客户端池中取1个(当有空闲时),用完后,放回池中。
下面是创建httpClient连接池的示例:
package cn.ganlixin.httpclient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.impl.conn.PoolingHttpClientConnectionManager;
import org.junit.Test;
import java.io.IOException;
/**
* 描述:
* httpClient连接池
*
* @author ganlixin
* @create 2020-01-28
*/
public class HttpClientPool {
@Test
public void testCreate() throws IOException {
PoolingHttpClientConnectionManager connectionManager = new PoolingHttpClientConnectionManager();
// 设置httpClient的最大数量为100个
connectionManager.setMaxTotal(2);
// 此时创建httpClient,不要使用createDefault,而是使用上面的connectionManager
CloseableHttpClient httpClient = HttpClients.custom()
.setConnectionManager(connectionManager)
.build();
// 使用创建的httpClient发起请求即可,用法和之前一样,但是需要注意的是httpClient不要关闭
httpClient.execute(new HttpGet("https://www.cnblogs.com/-beyond"));
// httpClient.close(); // 不要关闭
}
}
3.3、设置超时
httpClient发请求的过程中,有几个地方都可能超时,比如
1、连接池没有空闲连接,导致超时;
2、从连接池获取到可用连接,但是与要访问的主机创建连接超时;
3、与主机创建连接成功,但是在传输数据的时候超时;
针对上面三种超时,可以采取下面方式进行设置超时时间
package cn.ganlixin.httpclient;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.HttpGet;
public class SetTimeOut {
public void testTimeout() {
HttpGet httpGet = new HttpGet("https://www.cnblogs.com/-beyond");
// 设置超时(单位毫秒)
RequestConfig config = RequestConfig.custom()
.setConnectTimeout(2 * 1000) // 创建连接的超时时间
.setConnectionRequestTimeout(2 * 1000) // 设置从连接池获取连接的超时时间
.setSocketTimeout(5 * 1000) // 数据传输的超时时间
.build();
httpGet.setConfig(config);
// 创建一个HttpClient执行httpGet即可
}
}