zoukankan      html  css  js  c++  java
  • 获取一篇新闻的全部信息

    作业要求来自于:https://edu.cnblogs.com/campus/gzcc/GZCC-16SE2/homework/2894

    给定一篇新闻的链接newsUrl,获取该新闻的全部信息

    标题、作者、发布单位、审核、来源

    发布时间:转换成datetime类型

    点击:

    • newsUrl
    • newsId(使用正则表达式re)
    • clickUrl(str.format(newsId))
    • requests.get(clickUrl)
    • newClick(用字符串处理,或正则表达式)
    • int()

    整个过程包装成一个简单清晰的函数。

    import requests
    from bs4 import BeautifulSoup
    from datetime import datetime
    import re
    
    #点击次数
    def click(url):
        id = re.findall('(d{1,5})',url)[-1]
        clickUrl = 'http://oa.gzcc.cn/api.php?op=count&id={}&modelid=80'.format(id)
        resClick = requests.get(clickUrl)
        newsClick = int(resClick.text.split('.html')[-1].lstrip("('").rstrip("');"))
        return newsClick
    #时间
    def newsdt(showinfo):
        newsDate = showinfo.split()[0].split(':')[1]
        newsTime = showinfo.split()[1]
        newsDT = newsDate+' '+newsTime
        dt = datetime.strptime(newsDT,'%Y-%m-%d %H:%M:%S')
        return dt
    #内容
    def anews(url):
        res = requests.get(url)
        res.encoding = 'utf-8'
        soup = BeautifulSoup(res.text,'html.parser')
        newsTitle = soup.select('.show-title')[0].text
        showinfo = soup.select('.show-info')[0].text
        newsDT = newsdt(showinfo)
        newsAuthor = soup.select('.show-info')[0].text.split()[2].lstrip('作者:')
        newsAuditing = soup.select('.show-info')[0].text.split()[3].lstrip('审核:')
        newsSource = soup.select('.show-info')[0].text.split()[4].lstrip('来源:')
        newsClick = click(newsUrl)
        newsDetail = soup.select('.show-content')[0].text
        pr =print('标题:'+newsTitle,'
    发布时间:',newsDT,'
    作者:'+newsAuthor,'
    审核:'+newsAuditing,'
    来源:'+newsSource,'
    点击量:',newsClick,'次
    新闻内容:'+newsDetail)
        return pr
    
    newsUrl = 'http://news.gzcc.cn/html/2019/tongzhigonggao_0403/11141.html'
    anews(newsUrl)
    获取

  • 相关阅读:
    JAVA基础——编程练习(二)
    JAVA基础——面向对象三大特性:封装、继承、多态
    JVM内存
    50. Pow(x, n) (JAVA)
    47. Permutations II (JAVA)
    46. Permutations (JAVA)
    45. Jump Game II (JAVA)
    43. Multiply Strings (JAVA)
    42. Trapping Rain Water (JAVA)
    41. First Missing Positive (JAVA)
  • 原文地址:https://www.cnblogs.com/mofan2233/p/10648719.html
Copyright © 2011-2022 走看看