zoukankan      html  css  js  c++  java
  • HttpClient使用代理IP

    在爬取网页的时候,有的网站会有反爬虫措施,导致服务器请求拒接,可以使用代理IP来访问,解决请求拒绝的问题

    代理IP分 透明代理、匿名代理、混淆代理、高匿代理

      1、透明代理(Transparent Proxy):透明代理虽然可以“隐藏”IP地址,但是还是可以从HTTP_X_FORWARDED_FOR来查到IP
        REMOTE_ADDR = Proxy IP
        HTTP_VIA = Proxy IP
        HTTP_X_FORWARDED_FOR = Your IP
      2、匿名代理(Anonymous Proxy):匿名代理比透明代理进步了一点:别人只能知道你用了代理,无法知道你是谁
        REMOTE_ADDR = proxy IP
        HTTP_VIA = proxy IP
        HTTP_X_FORWARDED_FOR = proxy IP
      3、混淆代理(Distorting Proxies):如果使用了混淆代理,别人还是能知道你在用代理,但是会得到一个假的IP地址,伪装的更逼真
        REMOTE_ADDR = Proxy IP
        HTTP_VIA = Proxy IP
        HTTP_X_FORWARDED_FOR = Random IP address
      4、高匿代理(Elite proxy或High Anonymity Proxy):高匿代理让别人根本无法发现你是在用代理
        REMOTE_ADDR = Proxy IP
        HTTP_VIA = not determined
        HTTP_X_FORWARDED_FOR = not determined

    import org.apache.http.HttpEntity;
    import org.apache.http.HttpHost;
    import org.apache.http.client.config.RequestConfig;
    import org.apache.http.client.methods.CloseableHttpResponse;
    import org.apache.http.client.methods.HttpGet;
    import org.apache.http.impl.client.CloseableHttpClient;
    import org.apache.http.impl.client.HttpClients;
    import org.apache.http.util.EntityUtils;
    import org.junit.Test;
    /**
     * @author test
     * @Title: JunitHttpClient
     * @ProjectName JunitHttpClient
     * @Description: TODO
     * @date 2018/12/1216:07
     */
    public class JunitHttpClient {
    
        @Test
        public void test()throws Exception{
            // 创建httpget实例
            HttpGet httpGet=new HttpGet("https://www.****.com");
            CloseableHttpClient client = setProxy(httpGet, "192.168.1.1", 8888);
            //设置请求头消息
            httpGet.setHeader("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36");
            // 执行http get请求  也可以使用psot
            CloseableHttpResponse response=client.execute(httpGet);
            // 获取返回实体
            if (response != null){
                HttpEntity entity = response.getEntity();
                if (entity != null){
                    System.out.println("网页内容为:"+ EntityUtils.toString(entity,"utf-8"));
                }
            }
            //关闭response
            response.close();
            //关闭httpClient
            client.close();
    
        }
        /**
         * 设置代理
         * @param httpGet
         * @param proxyIp
         * @param proxyPort
         * @return
         */
        public CloseableHttpClient setProxy(HttpGet httpGet,String proxyIp,int proxyPort){
            // 创建httpClient实例
            CloseableHttpClient httpClient= HttpClients.createDefault();
            //设置代理IP、端口
            HttpHost proxy=new HttpHost(proxyIp,proxyPort,"http");
            //也可以设置超时时间   RequestConfig requestConfig = RequestConfig.custom().setProxy(proxy).setConnectTimeout(3000).setSocketTimeout(3000).setConnectionRequestTimeout(3000).build();
            RequestConfig requestConfig=RequestConfig.custom().setProxy(proxy).build();
            httpGet.setConfig(requestConfig);
            return httpClient;
        }
    }
  • 相关阅读:
    【python进阶】哈希算法(Hash)
    【数据库】MongoDB操作命令
    【数据库】MongoDB安装&配置
    【python基础】元组方法汇总
    【python基础】集合方法汇总
    滴水穿石-04Eclipse中常用的快捷键
    滴水穿石-03一道面试题引发的思考
    滴水穿石-02制作说明文档
    滴水穿石-01JAVA和C#的区别
    步步为营101-同一个PCode下重复的OrderNumber重新排序
  • 原文地址:https://www.cnblogs.com/qinxu/p/10109156.html
Copyright © 2011-2022 走看看