zoukankan      html  css  js  c++  java
  • 爬取校园新闻首页的新闻

    1. 用requests库和BeautifulSoup库,爬取校园新闻首页新闻的标题、链接、正文。

         

    import requests
    newsurl='http://news.gzcc.cn/html/xiaoyuanxinwen/'
    res = requests.get(newsurl) #返回response对象
    res.encoding='utf-8'
    
    
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(res.text,'html.parser')
    
    for news in soup.select('li'):
        if len(news.select(".news-list-title"))>0:
           d= news.select('.news-list-info')[0].text
           t= news.select(".news-list-title")
           a=news.select("a")[0].attrs['href']
           print(d,t,a)

    2. 分析字符串,获取每篇新闻的发布时间,作者,来源,摄影等信息。

         

    
    
    import requests
    newsurl='http://news.gzcc.cn/html/xiaoyuanxinwen/'
    res = requests.get(newsurl) #返回response对象
    res.encoding='utf-8'
    
    from datetime import datetime
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(res.text,'html.parser')
    
    for news in soup.select('li'):
        if len(news.select(".news-list-title"))>0:
           d= news.select('.news-list-info')[0].text
           t= news.select(".news-list-title")
           a=news.select("a")[0].attrs['href']
           #print(d,t,a)
    
           resd=requests.get(a)
           resd.encoding = 'utf-8'
           soupd=BeautifulSoup(resd.text,'html.parser')
           c=soupd.select('#content')[0].text
           info=soupd.select(".show-info")[0].text
           dt=info.lstrip('发布时间:')[:19]
           dati=datetime.strptime(dt,'%Y-%m-%d %H:%M:%S')
           ze=info[info.find('作者:'):].split()[0].lstrip('作者:')
           sh=info[info.find('审核:'):].split()[0].lstrip('审核:')
    
           print(dati,t,a,sh)
           break
    
    
    
    

    3. 将其中的发布时间由str转换成datetime类型。

         

    dati=datetime.strptime(dt,'%Y-%m-%d %H:%M:%S')

  • 相关阅读:
    hdu 2222 Keywords Search
    Meet and Greet
    hdu 4673
    hdu 4768
    hdu 4747 Mex
    uva 1513 Movie collection
    uva 12299 RMQ with Shifts
    uva 11732 strcmp() Anyone?
    uva 1401
    hdu 1251 统计难题
  • 原文地址:https://www.cnblogs.com/wwc000/p/8710024.html
Copyright © 2011-2022 走看看