zoukankan      html  css  js  c++  java
  • httpclient 使用代理

    httpclient_使用代理

    当爬取网页的时候,有的目标站点有反爬虫机制,对于频繁访问站点以及规则性访问站点的行为,会采用屏蔽IP的措施。
    这时候代理IP就派上用场了。
    代理的分类
    透明代理
    匿名代理
    混淆代理
    高匿代理
    

    ***透明代理(Transparent Proxy) ***

    REMOTE_ADDR= Proxy IP
    HTTP_VIA = Proxy IP
    HTTP_X_FORWORAD_FOR= YOUR IP
    透明代理虽然可以直接隐藏你的IP地址,但是还是从HTTP_X_FORWARD_FOR 来查到你是谁
    

    ***匿名代理(Anonymous Proxy) ***

    REMOTE_ADDR= proxy Ip
    HTTP_VIA = proxy IP
    HTTP_X_FORWARD_FOR = proxy_IP
    匿名代理比透明代理进步一点,别人只能知道你用了代理,无法知道你是谁
    

    ***混淆代理(Distorting Proxies) ***

    REMOTE_ADDR=PROXY_IP
    HTTP_VIA =PROXY IP
    HTTP_X_FOREARD_FOR=Random IP ADDRESS
    与匿名代理相同,如果使用了混淆代理,别人还是能知道你在用代理,但是会得到一个假的IP地址,伪装的更逼真
    

    ***高匿代理(Elite Proxy 或High Anonymity Proxy) ***

    REMOTE_ADDR=PROXY_IP
    HTTP_VIA = not determined
    HTTP_X_FORWARD_FOR= not determined
    可以看出,高匿代理让别人无法发现你是在用代理,是爬虫最好的选择
    

    代理IP的获取

    @Test
        public void testHttpProxy() throws  Exception{
            HttpClient httpClient = HttpClients.createDefault();
            HttpGet httpGet = new HttpGet("http://www.baidu.com");
            //使用代理服务器
            HttpHost httpHost = new HttpHost("220.194.55.160",3128);
            RequestConfig config = RequestConfig.custom().setProxy(httpHost).build();
            httpGet.setConfig(config);
            CloseableHttpResponse response = (CloseableHttpResponse) httpClient.execute(httpGet);
            HttpEntity entity = response.getEntity();
            //输出网页内容
            System.out.println("网页内容:");
            System.out.println(EntityUtils.toString(entity,"utf-8"));
            response.close();
        }
    

    httpclient代理配置

    HttpClient支持复杂的路由方案和代理链,同样也支持直接或者只通过一跳的连接
    使用代理服务器最简单的方式,执行一个默认的默认的代理

    HttpHost proxy = new HttpHost("someproxy", 8080);  
    DefaultProxyRoutePlanner routePlanner = new DefaultProxyRoutePlanner(proxy);  
    CloseableHttpClient httpclient = HttpClients.custom()  
            .setRoutePlanner(routePlanner)  
            .build();
    

    HttpClient使用jre代理服务器

    SystemDefaultRoutePlanner routePlanner = new SystemDefaultRoutePlanner(  
            ProxySelector.getDefault());  
    CloseableHttpClient httpclient = HttpClients.custom()  
            .setRoutePlanner(routePlanner)  
            .build();  
    

    手动配置RoutePlanner,这样就可以完全控制Http路由的过程

    HttpRoutePlanner routePlanner = new HttpRoutePlanner() {    
        public HttpRoute determineRoute(  
                HttpHost target,  
                HttpRequest request,  
                HttpContext context) throws HttpException {  
            return new HttpRoute(target, null,  new HttpHost("someproxy", 8080),  
                    "https".equalsIgnoreCase(target.getSchemeName()));  
        }   
    };  
    CloseableHttpClient httpclient = HttpClients.custom()  
            .setRoutePlanner(routePlanner)  
            .build();  
        }  
    } 
    
  • 相关阅读:
    DIV编辑器中当keydowm时获得内部其他元素的位置
    javascript 比较运算符分析
    JS编码注意事项,不断更新中
    ahjesus 获取div编辑框,textarea,input text的光标位置 兼容IE,FF和Chrome
    keycode对应主要键的关系
    关于jquery的remove方法
    根据EntityFramework写的重写sql语句的类
    查询sql表的详细信息
    让div span等元素能响应键盘事件
    你会用英语吵架吗?(学会99句,走遍全世界)
  • 原文地址:https://www.cnblogs.com/ssgao/p/8829199.html
Copyright © 2011-2022 走看看