zoukankan      html  css  js  c++  java
  • 【Python】对我自己的博客进行统计,看看哪年哪月发帖量最大

    代码很简单,主要利用了requests进行网络访问,beautifulSoup进行页面文本分析,re进行正则表达式抽取文字,前面两个需要pip install name去安装,后者是内部对象所以不用安装。代码如下,只有区区二十七行:

    #encoding=utf-8

    from
    bs4 import BeautifulSoup import requests import re user_agent='Mozilla/4.0 (compatible;MEIE 5.5;windows NT)' headers={'User-Agent':user_agent} dic={}; #定义个字典对象,存月份和个数 for i in range(1,90): html=requests.get('http://www.cnblogs.com/xiandedanteng/p/?page='+str(i),headers=headers) soup= BeautifulSoup(html.text,'html.parser',from_encoding='utf-8'); for descDiv in soup.find_all(class_="postDesc2"): rawInfo=descDiv.text #得到class="postDesc2"的div的内容 yearMonth=re.search(r'd{4}-d{2}',rawInfo).group() #用正则表达式去匹配年月并取其值 # 将年月存入字典,如果存在就在原基础上加一 if yearMonth in dic: dic[yearMonth]=dic[yearMonth]+1 else: dic[yearMonth]=1 list=sorted(dic.items(),key=lambda x:x[1]) #将排序后的字典转化为数组 list.reverse() for item in list: print(item)

    而得到的结果如下:

    ('2017-09', 80) 
    ('2019-10', 66) 
    ('2018-04', 56) 
    ('2018-05', 45) 
    ('2013-09', 43) 
    ('2019-09', 42) 
    ('2017-08', 38) 
    ('2019-03', 37)
    ('2013-08', 32)
    ('2017-11', 32)
    ('2014-07', 26)
    ('2014-12', 22)
    ('2017-06', 21)
    ('2017-12', 21)
    ('2017-01', 20)
    ('2018-03', 19)
    ('2019-08', 18)
    ('2016-07', 17)
    ('2013-11', 15)
    ('2014-08', 15)
    ('2016-03', 15)
    ('2013-10', 14)
    ('2014-04', 14)
    ('2014-05', 14)
    ('2015-01', 14)
    ('2019-11', 13)
    ('2014-11', 12)
    ('2016-08', 12)
    ('2015-07', 10)
    ('2016-02', 9)
    ('2017-07', 9)
    ('2014-01', 8)
    ('2014-10', 7)
    ('2015-08', 7)
    ('2018-01', 7)
    ('2015-04', 6)
    ('2014-02', 5)
    ('2015-06', 5)
    ('2017-10', 5)
    ('2013-12', 4)
    ('2015-02', 4)
    ('2015-05', 4)
    ('2014-03', 3)
    ('2017-02', 3)
    ('2014-09', 2)
    ('2015-12', 2)
    ('2017-03', 2)
    ('2018-06', 2)
    ('2018-07', 2)
    ('2019-05', 2)
    ('2014-06', 1)
    ('2015-11', 1)
    ('2016-05', 1)
    ('2016-06', 1)
    ('2016-10', 1)
    ('2017-04', 1)
    ('2017-05', 1)
    ('2019-04', 1)
    ('2019-07', 1)

    偶尔玩玩Python还挺有意思,这门技能可不能忘了。

    --END-- 2019年11月3日15:26:38

    这是2020年1月31日的运行结果

    C:personalprogramspython>python 1.py
    C:UsersufoAppDataLocalProgramsPythonPython38libsite-packagess4\__init__.py:203: UserWarning: You provided Unicode markup but also provided a value for from_encoding. Your from_encoding will be ignored.
      warnings.warn("You provided Unicode markup but also provided a value for from_encoding. Your from_encoding will be ignored.")
    ('2017-09', 79)
    ('2020-01', 79)
    ('2019-11', 76)
    ('2019-12', 66)
    ('2019-10', 65)
    ('2018-04', 55)
    ('2018-05', 45)
    ('2019-09', 42)
    ('2019-03', 37)
    ('2017-11', 32)
    ('2014-12', 22)
    ('2017-06', 21)
    ('2017-12', 21)
    ('2017-01', 20)
    ('2018-03', 19)
    ('2017-08', 18)
    ('2016-07', 17)
    ('2019-08', 17)
    ('2016-03', 15)
    ('2015-01', 14)
    ('2014-11', 12)
    ('2016-08', 12)
    ('2014-08', 10)
    ('2015-07', 10)
    ('2016-02', 9)
    ('2017-07', 9)
    ('2014-10', 7)
    ('2015-08', 7)
    ('2018-01', 7)
    ('2015-04', 6)
    ('2015-06', 5)
    ('2017-10', 5)
    ('2015-02', 4)
    ('2015-05', 4)
    ('2017-02', 3)
    ('2014-09', 2)
    ('2015-12', 2)
    ('2017-03', 2)
    ('2018-06', 2)
    ('2018-07', 2)
    ('2019-05', 2)
    ('2015-11', 1)
    ('2016-05', 1)
    ('2016-06', 1)
    ('2016-10', 1)
    ('2017-04', 1)
    ('2017-05', 1)
    ('2019-04', 1)
    ('2019-07', 1)
  • 相关阅读:
    jQuery 淡入淡出
    (Windows窗体)循环动态绑定根节点及子节点
    C# 语音读取
    js禁用&启用某个按钮
    AWS DescribeDBInstances API 无法查询到 DBInstanceArn 参数
    Python 设置S3文件属性中元数据的Content-Encoding值
    Pyhton 批量重命名 AWS S3 中的文件
    Python 控制(SSM)AWS Systems Manager
    Python 根据AWS EC2设置的标签获取正在运行的instancesId
    python 'wb' 模式写入文件怎么输出回车
  • 原文地址:https://www.cnblogs.com/heyang78/p/11787380.html
Copyright © 2011-2022 走看看