zoukankan      html  css  js  c++  java
  • 爬取校园新闻首页的新闻

    1. 用requests库和BeautifulSoup库,爬取校园新闻首页新闻的标题、链接、正文。

    2. 分析字符串,获取每篇新闻的发布时间,作者,来源,摄影。

    3. 将其中的发布时间由str转换成datetime类型。

    import requests
    from bs4 import BeautifulSoup
    
    url="http://news.gzcc.cn/html/xiaoyuanxinwen/"
    res=requests.get(url)
    res.encoding="utf-8"
    
    # soup=BeautifulSoup(res.text,"html.parser")
    # for news in soup.select("li"):
    #     if len(news.select(".news-list-title")) > 0:
    #         print(news.select(".news-list-title"))
    
    # ##
    # for news in soup.select("li"):
    #     if len(news.select(".news-list-title")) > 0:
    #         t=news.select('.news-list-title')[0].text
    #         dt=news.select('.news-list-info')[0].contents[0].text
    #         a = news.select('a')[0].attrs['href']
    #         print(t,dt,a)
    #      ##
    
    soup=BeautifulSoup(res.text,"html.parser")
    for news in soup.select("li"):
        if len(news.select(".news-list-title")) > 0:
            t = news.select('.news-list-title')[0].text
            a = news.select('a')[0].attrs['href']
            print(a)
            resd = requests.get(a)
            resd.encoding = 'utf-8'
            soupd = BeautifulSoup(resd.text, 'html.parser')
           # print(soupd.select('.show-info'))
           # print(soupd.select('#content'))
            d = soupd.select('#content')[0].text
            info = soupd.select('.show-info')[0].text
            print(info)
            dt = info.lstrip('发布时间:')[:19]
            print(dt)
            i = info.find('来源:')
            if i >0:
                s = info[info.find('来源:'):].split()[0].lstrip('来源:')
                print(s)
    
            z = info.find('作者:')
            if z > 0:
                z = info[info.find('作者:'):].split()[0].lstrip('作者:')
                print(z)
    
            y = info.find('摄影:')
            if y > 0:
                y = info[info.find('摄影:'):].split()[0].lstrip('摄影:')
                print(y)
    
            break

    from datetime import datetime
            str ='2018-03-30 17:10:12'
            dt1 = datetime.strptime(str,'%Y-%M-%D %H:%M:%S').year
            now = datetime.now()
            type(now)
            now.strftime("%Y-%M-%D %H:%M:%S")
  • 相关阅读:
    简易温控器的开发
    信号处理电路1:差动转单端输出电路计算于分析
    电容触摸屏资料适合7寸30PINS
    TI Motor Control
    AD规则实例1元件keepout层与覆铜间距
    Python基础语法
    Python基本运算符
    Python 字符串
    javascript>getElementsByClass
    thrift多平台安装
  • 原文地址:https://www.cnblogs.com/ming-z/p/8691905.html
Copyright © 2011-2022 走看看