zoukankan      html  css  js  c++  java
  • 使用Python爬取腾讯房产的新闻,用的Python库:requests 、re、time、BeautifulSoup ​​​​

    import requests
    import re
    import time
    from bs4 import BeautifulSoup
    
    today = time.strftime('%Y-%m-%d',time.localtime(time.time()))
    
    one_url = 'http://hz.house.qq.com'    #用来构建新的URL的链接
    
    url = 'http://hz.house.qq.com/zxlist/bdxw.htm'      #需要爬取的网址
    html = requests.get(url)
    html.encoding = html.apparent_encoding
    reg = re.compile(r'<a target="_blank" class="tit f-l f16 blue" href="(.*?)">(.*?)</a><span class="tm f-r gray">(.*?)</span>')
    html_lis = re.findall(reg,html.text)
    
    for html_li in html_lis:
        new_url = one_url + html_li[0]
        new_time = html_li[2][0:10]             #分割获取到的新闻日期,对比今天的日期和获取到的新闻日期,相同的话就打印出来,不相同就跳过不打印
        if new_time == today:
            print(html_li[1],new_url)
            new_html = requests.get(new_url)            
            soup = BeautifulSoup(new_html.text,'html.parser')
            contents = soup.find_all('p',style="TEXT-INDENT: 2em")
            for content in contents:
                if content.string != None:
                    print(content.string)
                else:
                    continue
            print('----------------------------下一篇新闻----------------------------')
        else:
            break
    #可以建立函数来介绍代码的重复
  • 相关阅读:
    安卓android.support.design使用中的问题
    处理requests SSl 证书问题
    python-excel
    post 请求包含相同参数
    关于zk 页面滚动问题 scroll
    Usefull Jquery
    Git 安装
    Comparison issues in TD
    Work Diary 12/13/17
    Unit10 I don't like work in the weekend
  • 原文地址:https://www.cnblogs.com/114811yayi/p/6767883.html
Copyright © 2011-2022 走看看