zoukankan      html  css  js  c++  java
  • 爬虫使用代理ip

    获得代理IP的网站:

    http://www.xicidaili.com/

    验证代理是否可用的方式之一:

    globalUrl = "http://ip.chinaz.com/getip.aspx"

    如何使用代理:

    一 使用requests:

    import requests
    ip = "http://" + i[0]+":"+i[1]
    ipdict = {"http":ip}
    requests.get(globalUrl,headers = header,proxies = ipdict,timeout = 3).text

    二 使用 urllib:   

    import urllib
    ip = "http://" + i[0]+":"+i[1]
    ipdict = {"http":ip}
    try:
    print urllib.urlopen(globalUrl,proxies=ipdict).read()
    except Exception,e:
    print "%s can not use" % ip

    三 使用urllib2的ProxyHandler模块:

    proxy_info = {'host': i[0],
    'port': i[1]}
    proxy_support = urllib2.ProxyHandler({"http":"http://%(host)s:%(port)s" % proxy_info})
    opener = urllib2.build_opener(proxy_support)
    urllib2.install_opener(opener)
    request = urllib2.Request(globalUrl,headers= header)
    try:
    print urllib2.urlopen(request,timeout = 3).read()
    except Exception,e:
    print "%s can not use" % proxy_info["host"]


    如果需要验证:
    proxy_info = {"host": "xxx",
    "port": "xxx",
    "user": "xxx",
    "pass": "xxx"}
    proxy_support = urllib2.ProxyHandler({"http":"http://%(user)s:%(pass)s@%(host)s:%(port)d" % proxy_info})

    四 使用urllib2的request模块:

    ip = i[0] + ":" + i[1]
    request = urllib2.Request(globalUrl,headers =header)
    request.set_proxy(ip,"http")
    try:
    print urllib2.urlopen(request,timeout=5).read()
    except Exception,e:
    print "%s can not use" % ip

    五:使用httplib:

    conn = httplib.HTTPConnection(i[0],i[1])
    try:
    conn.connect()
    conn.request("GET",globalUrl,headers=header)
    response = conn.getresponse()
    print response.read()
    except:
    print "%s can not use" % i[0]
    
    


  • 相关阅读:
    利用AspNetPager控件实现数据分页(存储过程)
    System.Reflection
    规范管理提高效率——国内主要api接口文档工具盘点
    文件管理命令
    操作系统磁盘分区
    实体类配置(Entity)
    SpEL语法
    杂乱无章
    从struts2源码学到的技巧
    Spring基于注解的缓存配置
  • 原文地址:https://www.cnblogs.com/gongbo/p/6343843.html
Copyright © 2011-2022 走看看