zoukankan      html  css  js  c++  java
  • python3.6爬取高匿代理IP地址

    python3.6简单爬取高匿代理IP地址

    import re
    from urllib.request import urlopen
    from urllib.request import Request
    from bs4 import BeautifulSoup
    from lxml import etree
    
    #添加模拟浏览器协议头
    headers = {'User-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'}
    url = "http://www.xicidaili.com/nn/1"
    req_timeout = 5
    req = Request(url=url,headers=headers)
    f = urlopen(req,None,req_timeout)
    s = f.read()
    s = s.decode('utf-8')
    ss = str(s)
    #====================#lxml提取=========================
    selector = etree.HTML(ss)
    links = selector.xpath('//tr[@class="odd"]/td/text()|//tr[@class="odd"]/td[@class=""]/td/text()')
    for link in links:
        print(link)
    

      

  • 相关阅读:
    CF1439E
    CF1446
    CSP2020 游记
    CF1442
    CF1444E
    CF1444
    CF850F Rainbow Balls
    A
    uoj266[清华集训2016]Alice和Bob又在玩游戏(SG函数)
    loj536「LibreOJ Round #6」花札(二分图博弈)
  • 原文地址:https://www.cnblogs.com/yongxinboy/p/7800852.html
Copyright © 2011-2022 走看看