zoukankan      html  css  js  c++  java
  • python---网络爬虫

    写了一个简单的网络爬虫:

    #coding=utf-8
    from bs4 import BeautifulSoup
    import requests
    url = "http://www.weather.com.cn/textFC/hb.shtml"
    def get_temperature(url):
        headers = {
            'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
            'Upgrade-Insecure-Requests':'1',
            'Referer':'http://www.weather.com.cn/weather1d/10129160502A.shtml',
            'Host':'www.weather.com.cn'
        }
        res = requests.get(url,headers=headers)
        res.encoding = "utf-8"
        content = res.content # 拿到的是ascll编码
        content = content.decode('UTF-8')# 转成UTF-8编码
        #print(content)
    
        soup = BeautifulSoup(content,'lxml')
        conMidetab = soup.find('div',class_='conMidtab')
        conMidetab2_list = conMidetab.find_all('div',class_='conMidtab2')
        for x in conMidetab2_list:
            tr_list = x.find_all('tr')[2:] # 所有的tr
            province = ''
            min = 0
            for index,x in enumerate(tr_list):
                if index == 0:
                    td_list = x.find_all('td')
                    province = td_list[0].text.replace('
    ','')
                    city = td_list[1].text.replace('
    ','')
                    min = td_list[7].text.replace('
    ','')
                else:
                    td_list = x.find_all('td')
                    city = td_list[0].text.replace('
    ','')
                    min = td_list[6].text.replace('
    ','')
                print(province,city,min)
            # province_list = tr_list[2]
            # td_list = province_list.find_all('td')
            # province_td = td_list[0]
            # province = province_td.text
            # #print(province.replace('
    ',''))
    get_temperature(url)
  • 相关阅读:
    [NOI2017]游戏
    [USACO09MAR]Cleaning Up
    [POI2010]Blocks
    [JSOI2011]分特产
    [POI2001]Peaceful Commission
    BZOJ4152 [AMPPZ2014]The Captain
    Luogu P3783 [SDOI2017]天才黑客
    Luogu P3645 [APIO2015]雅加达的摩天楼
    Luogu P1613 跑路
    AGC009E Eternal Average
  • 原文地址:https://www.cnblogs.com/e0yu/p/9505490.html
Copyright © 2011-2022 走看看