zoukankan      html  css  js  c++  java
  • 爬取校园新闻首页的新闻

    1. 用requests库和BeautifulSoup库,爬取校园新闻首页新闻的标题、链接、正文。

    import requests
    from bs4 import BeautifulSoup
    from datetime import datetime
    ww= requests.get('http://news.gzcc.cn/html/xiaoyuanxinwen/')
    ww.encoding = 'utf-8'
    soup = BeautifulSoup(ww.text,'html.parser')
    for news in soup.select('li'):
        if len(news.select('.news-list-title'))>0:
            t = news.select('.news-list-title')[0].text #标题
            a = news.select('a')[0].attrs['href']  # 链接
            resd = requests.get(a)
            resd.encoding = 'utf-8'
            soupd = BeautifulSoup(resd.text, 'html.parser')  # 打开新闻详情页
            c = soupd.select('#content')[0].text  # 正文
            print(t,a,c)

    2. 分析字符串,获取每篇新闻的发布时间,作者,来源,摄影等信息。

    info = soupd.select('.show-info')[0].text
    print(info)

    3. 将其中的发布时间由str转换成datetime类型。

        ws = info.lstrip('发布时间:')[:19]  # 发布时间
            da = datetime.strptime(ws, '%Y-%m-%d %H:%M:%S')
            print(da)

    4. 将完整的代码及运行结果截图发布在作业上。

    import requests
    from bs4 import BeautifulSoup
    from datetime import datetime
    ww= requests.get('http://news.gzcc.cn/html/xiaoyuanxinwen/')
    ww.encoding = 'utf-8'
    soup = BeautifulSoup(ww.text,'html.parser')
    for news in soup.select('li'):
        if len(news.select('.news-list-title'))>0:
            t = news.select('.news-list-title')[0].text #标题
            a = news.select('a')[0].attrs['href']  # 链接
            resd = requests.get(a)
            resd.encoding = 'utf-8'
            soupd = BeautifulSoup(resd.text, 'html.parser')  # 打开新闻详情页
            c = soupd.select('#content')[0].text  # 正文
            info = soupd.select('.show-info')[0].text
            ws = info.lstrip('发布时间:')[:19]  # 发布时间
            da = datetime.strptime(ws, '%Y-%m-%d %H:%M:%S')
            print(t,a,c,info,da)
            break

  • 相关阅读:
    POJ 3672 水题......
    POJ 3279 枚举?
    STL
    241. Different Ways to Add Parentheses
    282. Expression Add Operators
    169. Majority Element
    Weekly Contest 121
    927. Three Equal Parts
    910. Smallest Range II
    921. Minimum Add to Make Parentheses Valid
  • 原文地址:https://www.cnblogs.com/candyxue/p/8717731.html
Copyright © 2011-2022 走看看