zoukankan      html  css  js  c++  java
  • httpclient 使用代理

    httpclient_使用代理

    当爬取网页的时候,有的目标站点有反爬虫机制,对于频繁访问站点以及规则性访问站点的行为,会采用屏蔽IP的措施。
    这时候代理IP就派上用场了。
    代理的分类
    透明代理
    匿名代理
    混淆代理
    高匿代理
    

    ***透明代理(Transparent Proxy) ***

    REMOTE_ADDR= Proxy IP
    HTTP_VIA = Proxy IP
    HTTP_X_FORWORAD_FOR= YOUR IP
    透明代理虽然可以直接隐藏你的IP地址,但是还是从HTTP_X_FORWARD_FOR 来查到你是谁
    

    ***匿名代理(Anonymous Proxy) ***

    REMOTE_ADDR= proxy Ip
    HTTP_VIA = proxy IP
    HTTP_X_FORWARD_FOR = proxy_IP
    匿名代理比透明代理进步一点,别人只能知道你用了代理,无法知道你是谁
    

    ***混淆代理(Distorting Proxies) ***

    REMOTE_ADDR=PROXY_IP
    HTTP_VIA =PROXY IP
    HTTP_X_FOREARD_FOR=Random IP ADDRESS
    与匿名代理相同,如果使用了混淆代理,别人还是能知道你在用代理,但是会得到一个假的IP地址,伪装的更逼真
    

    ***高匿代理(Elite Proxy 或High Anonymity Proxy) ***

    REMOTE_ADDR=PROXY_IP
    HTTP_VIA = not determined
    HTTP_X_FORWARD_FOR= not determined
    可以看出,高匿代理让别人无法发现你是在用代理,是爬虫最好的选择
    

    代理IP的获取

    @Test
        public void testHttpProxy() throws  Exception{
            HttpClient httpClient = HttpClients.createDefault();
            HttpGet httpGet = new HttpGet("http://www.baidu.com");
            //使用代理服务器
            HttpHost httpHost = new HttpHost("220.194.55.160",3128);
            RequestConfig config = RequestConfig.custom().setProxy(httpHost).build();
            httpGet.setConfig(config);
            CloseableHttpResponse response = (CloseableHttpResponse) httpClient.execute(httpGet);
            HttpEntity entity = response.getEntity();
            //输出网页内容
            System.out.println("网页内容:");
            System.out.println(EntityUtils.toString(entity,"utf-8"));
            response.close();
        }
    

    httpclient代理配置

    HttpClient支持复杂的路由方案和代理链,同样也支持直接或者只通过一跳的连接
    使用代理服务器最简单的方式,执行一个默认的默认的代理

    HttpHost proxy = new HttpHost("someproxy", 8080);  
    DefaultProxyRoutePlanner routePlanner = new DefaultProxyRoutePlanner(proxy);  
    CloseableHttpClient httpclient = HttpClients.custom()  
            .setRoutePlanner(routePlanner)  
            .build();
    

    HttpClient使用jre代理服务器

    SystemDefaultRoutePlanner routePlanner = new SystemDefaultRoutePlanner(  
            ProxySelector.getDefault());  
    CloseableHttpClient httpclient = HttpClients.custom()  
            .setRoutePlanner(routePlanner)  
            .build();  
    

    手动配置RoutePlanner,这样就可以完全控制Http路由的过程

    HttpRoutePlanner routePlanner = new HttpRoutePlanner() {    
        public HttpRoute determineRoute(  
                HttpHost target,  
                HttpRequest request,  
                HttpContext context) throws HttpException {  
            return new HttpRoute(target, null,  new HttpHost("someproxy", 8080),  
                    "https".equalsIgnoreCase(target.getSchemeName()));  
        }   
    };  
    CloseableHttpClient httpclient = HttpClients.custom()  
            .setRoutePlanner(routePlanner)  
            .build();  
        }  
    } 
    
  • 相关阅读:
    iOS总结_UI层自我复习总结
    runtime梳理。
    页面传值。顺传,逆传。
    用1 + 2 = 3诠释面向对象思想
    循环逻辑。让我逻辑滞空的小题目
    const,static,extern 简介
    swift webView的高度自适应内容
    Swift之UITabBarController 导航控制器颜色的改变
    swift 启动图片的设置
    swift 中使用OC第三方库(以AFNetworking为例)
  • 原文地址:https://www.cnblogs.com/ssgao/p/8829199.html
Copyright © 2011-2022 走看看