zoukankan      html  css  js  c++  java
  • Beautifulsoup分解

    from urllib.request import Request, ProxyHandler
    from urllib.request import build_opener
    from bs4 import BeautifulSoup
    import redis
    urlfront = "http://www.xicidaili.com"
    url = "http://www.xicidaili.com/nn/1"
    r = redis.Redis(host='127.0.0.1', port=6379,db=0)
    
    # def spider_IP(url):
    # 获取整个页面
    def get_allcode(url):
        # 设置代理IP
        proxy = {'https': '110.73.0.45:8123'}
        proxy_support = ProxyHandler(proxy);
        opener = build_opener(proxy_support)
        # 设置访问http协议头,模拟浏览器
        opener.addheaders = [
            ('User-agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6')]
        r = opener.open(url)
        html = r.read().decode("UTF-8");
        # print(html)
        return str(html)
    
    # 根据URl用beautifulsoup提取,可以写方法
    def find_ip(s):
        soup = BeautifulSoup(s, 'html.parser');
        aList = soup.find_all(name="tr",class_="odd")
        for items in aList:
            link = items.find_all("td")
            print("%s:%s" %(link[1].get_text(),link[2].get_text()))
    find_ip(get_allcode(url))
  • 相关阅读:
    [hdu5312]数的拆分,数学推导
    [POJ1038]状压DP
    [hdu2112]最短路
    [hdu1532]最大流
    [hdu5256]LIS模型
    [hdu5255]枚举
    [hdu5254]BFS
    [hdu5270]按位统计,容斥,归并
    Elasticsearch在Centos 7上的安装与配置
    手动安装java1.8
  • 原文地址:https://www.cnblogs.com/qieyu/p/7846085.html
Copyright © 2011-2022 走看看