zoukankan      html  css  js  c++  java
  • 02_Python简单爬虫(熊猫直播LOL的up主,谁最强!)

    
    

     声明:

     本文仅用于Python练手,并无任何恶意攻击行为!


    #
    导入request模块 from urllib import request # 导入re模块 import re class Spider(): # url以http, https开头 url_to_run = r'https://www.panda.tv/cate/lol' # 待抓取网页,熊猫直播平台-LOL分类(抓取主播名,视频观看人数) htmls = None # 保存抓取到的HTML内容 root_pattern = '<div class="video-info">(.*?)</div>' # 非贪婪匹配,匹配到最近的一个</div>,包含主播名,视频观看人数这两个tag的上一级tag name_pattern = '</i>(.*?)</span>' # 非贪婪匹配,匹配到举例</i>最近的1个</span>,找到该视频的主播名 number_pattern = '<span class="video-number">(.*?)</span>'# 非贪婪匹配,匹配到举例最近的1个</span>, 找到该主播视频的观看人数 result_list = [] # 存储最后的分析结果,每个元素为{'name':主播名, 'number':视频观看数}} @classmethod def fetch_content(cls): """ 模拟浏览器,向服务器发送获取特定页面的请求 将返回的HTML页面,字符串形式保存到Spider.htmls :return: None """ #request模块下的urlopen方法, 将web服务器返回的结果封装为1个file-like object,本质Response实例 result = request.urlopen(cls.url_to_run) # result操作 #print(result.getcode()) # HTTP返回码,200则正常获取到页面 #print(result.geturl()) # 实际获取的URL,判定页面是否有重定向 cls.htmls = result.read() # 实际的HTML页面内容, bytes类型 cls.htmls = str(cls.htmls, encoding='utf-8') # 将byte类型的HTML页面内容,转换为str字符串 @classmethod def analysis(cls): """ 根据Spider.htmls中保存的HTML页面,进行分析 1)主播名 2)视频观看次数 将每个主播和视频的观看次数,组成1个dict, 添加到cls.result_list :return: None """ # root_pattern中做了group, 返回结果中已经没有外部video-info标签 video_info_lst = re.findall(cls.root_pattern, cls.htmls, flags=re.S) for video in video_info_lst: up_host = re.findall(cls.name_pattern, video, flags=re.S) video_number = re.findall(cls.number_pattern, video, flags=re.S) # 对up_host内容格式进行调整: 丢弃第二个 , 将第一个的 开头和两边的空白字符去除 up_host = up_host[0] up_host = up_host.strip(' ') up_host = up_host.strip(' ') # 对video_number内容格式进行调整, 将vidoe_number从list中取出 video_number = video_number[0] # 主播名,观看数,组成字典,添加到结果列表 dic = {'name':up_host, 'number':video_number} cls.result_list.append(dic) @classmethod def sort_seed(cls, item): """ result_list中的元素是dict, 不能对dict直接做大小比较 指定将dict中的number作为key, 进行不同dict间的比较依据 sorted比较,传入要比较的ict, sort_seed返回dict中的number, 作为比较依据 :return: item['number'] 作为比较依据 """ r = re.findall('d+', item['number']) number = float(r[0]) # 处理“万”级别用户换算 if '' in item['number']: number *= 10000 return number @classmethod def sort_result(cls): """ 将cls.result_list中的元素,按照观看人数进行排序 :return: """ # sorted(iterable, key = None, reverse = False) cls.result_list = sorted(cls.result_list, key=cls.sort_seed, reverse=True) @classmethod def show(cls): print("Total Uphost: " + str(len(cls.result_list))) print('='*45) for item in cls.result_list: print('Uphost:'+ item['name'] + " ," + "Rank: " + str(cls.result_list.index(item) + 1) + ' Video Watched: ' + item['number'] ) @classmethod def go(cls): cls.fetch_content() cls.analysis() cls.sort_result() cls.show() # 类测试代码 Spider.go()

    部分实际测试结果:

    Total Uphost: 118
    =============================================
    Uphost:即将拥有人鱼线的PDD ,Rank: 1 Video Watched: 283.7万
    Uphost:RNG丶MLXG ,Rank: 2 Video Watched: 23.5万
    Uphost:熊猫伏念 ,Rank: 3 Video Watched: 9.7万
    Uphost:药水哥s ,Rank: 4 Video Watched: 9.3万
    Uphost:WE丶Mystic丶 ,Rank: 5 Video Watched: 8.0万
    Uphost:叫我官人 ,Rank: 6 Video Watched: 5.5万
    Uphost:冠军锐雯 ,Rank: 7 Video Watched: 4.5万
    Uphost:熊猫丶蛮神 ,Rank: 8 Video Watched: 2.3万
    Uphost:起飛的辛德浪 ,Rank: 9 Video Watched: 1.6万
    Uphost:善言_ ,Rank: 10 Video Watched: 1.9万
    Uphost:左手QAQ ,Rank: 11 Video Watched: 1.3万
    Uphost:S7全球总决赛 ,Rank: 12 Video Watched: 1.2万
    Uphost:Pino一米八 ,Rank: 13 Video Watched: 1.2万
    Uphost:金三炮o金三岁 ,Rank: 14 Video Watched: 9494
    Uphost:挽神z ,Rank: 15 Video Watched: 7025
    Uphost:易小埋l ,Rank: 16 Video Watched: 6228
    Uphost:主播毕老实 ,Rank: 17 Video Watched: 5941
    Uphost:一剑西来QAQ ,Rank: 18 Video Watched: 5897
    Uphost:英雄联盟活动直播间 ,Rank: 19 Video Watched: 4239
    Uphost:超级提莫丶牛腩君 ,Rank: 20 Video Watched: 4125
    Uphost:mid六安王 ,Rank: 21 Video Watched: 3555
    Uphost:熊猫丶乐鱼阿卡丽 ,Rank: 22 Video Watched: 3184
    Uphost:熊猫TV一休哥 ,Rank: 23 Video Watched: 3120
    Uphost:小黑胖砸 ,Rank: 24 Video Watched: 2415
    Uphost:或许这就是离岛吧 ,Rank: 25 Video Watched: 2341
    Uphost:第一最寂寞1u ,Rank: 26 Video Watched: 2203
    Uphost:李阿特 ,Rank: 27 Video Watched: 2081
    Uphost:LOL日常活动直播间 ,Rank: 28 Video Watched: 2028
    Uphost:LPL熊猫官方直播 ,Rank: 29 Video Watched: 2003
    Uphost:熊猫TV丶小青龙 ,Rank: 30 Video Watched: 1996
    Uphost:熊猫TV灬小豆豆 ,Rank: 31 Video Watched: 1957
    Uphost:小啊雅大大大 ,Rank: 32 Video Watched: 1613
    Uphost:小凯南zz ,Rank: 33 Video Watched: 1483
    Uphost:拿铁不加糖 ,Rank: 34 Video Watched: 1401
    Uphost:金克喵的猫珥朵丶 ,Rank: 35 Video Watched: 1351
    Uphost:炽天使z1 ,Rank: 36 Video Watched: 1164
    Uphost:小小小女人丶 ,Rank: 37 Video Watched: 1111
    Uphost:東東東 ,Rank: 38 Video Watched: 1081
    Uphost:纯纯小流_氓 ,Rank: 39 Video Watched: 1077
    Uphost:熊猫tv芭比公主 ,Rank: 40 Video Watched: 1070
    Uphost:big火鸡 ,Rank: 41 Video Watched: 979
    Uphost:机器猫mmm ,Rank: 42 Video Watched: 944
    Uphost:大家都叫我冷爷丶 ,Rank: 43 Video Watched: 915
    Uphost:栗子菌i ,Rank: 44 Video Watched: 879
    Uphost:星矢魔术 ,Rank: 45 Video Watched: 845
    Uphost:唐人leo ,Rank: 46 Video Watched: 842
    Uphost:十级浪 ,Rank: 47 Video Watched: 829
    Uphost:筱兮QAQ ,Rank: 48 Video Watched: 829
    Uphost:酥软迷妹小慢慢Zz ,Rank: 49 Video Watched: 817
    Uphost:小凡Aaaaaa ,Rank: 50 Video Watched: 804
    Uphost:小丸子爱吃樱桃丶 ,Rank: 51 Video Watched: 803
    Uphost:爱流血的兔斯基 ,Rank: 52 Video Watched: 803
    Uphost:凶残的喵绵绵 ,Rank: 53 Video Watched: 800
    Uphost:别叫凯隐叫隐神 ,Rank: 54 Video Watched: 799
    Uphost:Panda初心2018 ,Rank: 55 Video Watched: 793
    Uphost:熊猫丶大风6 ,Rank: 56 Video Watched: 792
    Uphost:顽皮ssssssssssss ,Rank: 57 Video Watched: 790
    Uphost:大表哥响尾蛇 ,Rank: 58 Video Watched: 789
    Uphost:告白White ,Rank: 59 Video Watched: 788
    Uphost:牌面之王丶火影劫 ,Rank: 60 Video Watched: 775
    Uphost:西湖仙境 ,Rank: 61 Video Watched: 775
    Uphost:飞不起来1 ,Rank: 62 Video Watched: 774
    Uphost:逗了个蛋 ,Rank: 63 Video Watched: 773
    Uphost:瓜皮球球 ,Rank: 64 Video Watched: 770
    Uphost:竹蜻蜓呀 ,Rank: 65 Video Watched: 761
    Uphost:少年阿超和阿斌 ,Rank: 66 Video Watched: 760
    Uphost:刚出土的i帕帕 ,Rank: 67 Video Watched: 753
    Uphost:小主播安旭 ,Rank: 68 Video Watched: 747
    Uphost:西决哟 ,Rank: 69 Video Watched: 737
    Uphost:Panda丶夏木 ,Rank: 70 Video Watched: 733
    Uphost:冰雪丶狐狸 ,Rank: 71 Video Watched: 730
    Uphost:夜魅丝 ,Rank: 72 Video Watched: 730
    Uphost:熊猫丶皮皮瓜 ,Rank: 73 Video Watched: 725
    Uphost:Panda灬刀刀 ,Rank: 74 Video Watched: 721
    Uphost:莫莫莫夏夏夏 ,Rank: 75 Video Watched: 694
    Uphost:皮皮翔i ,Rank: 76 Video Watched: 646
    Uphost:南表妹QAQ ,Rank: 77 Video Watched: 644
    Uphost:青蛙OB ,Rank: 78 Video Watched: 633
    Uphost:_Infi_ ,Rank: 79 Video Watched: 631
    Uphost:暴躁茹阿姨 ,Rank: 80 Video Watched: 627
    Uphost:整天打碟的DJ胖丶 ,Rank: 81 Video Watched: 625
    Uphost:熊猫丶一百 ,Rank: 82 Video Watched: 623
    Uphost:全蛋狮子喵 ,Rank: 83 Video Watched: 622
    Uphost:熊猫TV丶小66 ,Rank: 84 Video Watched: 620
    Uphost:电竞张全蛋长长 ,Rank: 85 Video Watched: 596
    Uphost:熊猫第一不亏哥 ,Rank: 86 Video Watched: 536
    Uphost:叫我东邪 ,Rank: 87 Video Watched: 513
    Uphost:熊猫TV丶一手绝 ,Rank: 88 Video Watched: 499
    Uphost:熊猫TV丶别勉强 ,Rank: 89 Video Watched: 485
    Uphost:提莫的小女朋友 ,Rank: 90 Video Watched: 480
    Uphost:王者蕾 ,Rank: 91 Video Watched: 471
    Uphost:日暮哟 ,Rank: 92 Video Watched: 470
    Uphost:颖妹er超甜的 ,Rank: 93 Video Watched: 464
    Uphost:熊猫TV丶成小七 ,Rank: 94 Video Watched: 441
    Uphost:熊猫tv丶马小越 ,Rank: 95 Video Watched: 405
    Uphost:柒柒天 ,Rank: 96 Video Watched: 397
    Uphost:Panda电竞白子画 ,Rank: 97 Video Watched: 395
    Uphost:熊猫TV_苏璞 ,Rank: 98 Video Watched: 388
    Uphost:你的小老虎哥哥 ,Rank: 99 Video Watched: 362
    Uphost:门徒zzzz ,Rank: 100 Video Watched: 359
    Uphost:李易钧 ,Rank: 101 Video Watched: 352
    Uphost:熊猫TV丶农药术士 ,Rank: 102 Video Watched: 346
    Uphost:熊猫贝乐 ,Rank: 103 Video Watched: 320
    Uphost:李小青盲僧 ,Rank: 104 Video Watched: 309
    Uphost:刘慕宸 ,Rank: 105 Video Watched: 307
    Uphost:寒风强袭 ,Rank: 106 Video Watched: 305
    Uphost:会蛙泳的饼干0 ,Rank: 107 Video Watched: 300
    Uphost:阿四德莱文丶 ,Rank: 108 Video Watched: 275
    Uphost:知道神龙摆尾吗 ,Rank: 109 Video Watched: 275
    Uphost:瓦罗兰的未来丶尨 ,Rank: 110 Video Watched: 260
    Uphost:JO丶欣欣 ,Rank: 111 Video Watched: 253
    Uphost:123ivan456 ,Rank: 112 Video Watched: 250
    Uphost:only丶提莫 ,Rank: 113 Video Watched: 240
    Uphost:情话好听但不暖心 ,Rank: 114 Video Watched: 230
    Uphost:小丸子真好吃 ,Rank: 115 Video Watched: 218
    Uphost:一只提莫送你回家 ,Rank: 116 Video Watched: 213
    Uphost:请叫我大腿岩丶 ,Rank: 117 Video Watched: 188
    Uphost:伊人芳泽瑞尔心i ,Rank: 118 Video Watched: 183
  • 相关阅读:
    OGG实时同步Oracle数据到Kafka实施文档(供flink流式计算)
    Oracle exp导出加where指定条件
    oracle merge into的用法
    Oracle列转行函数LISTAGG() WITHIN GROUP ()的使用方法
    sql怎样查一个存储过程被谁调用
    Oracle JOB间隔时间详解
    如何在ORACLE下创建JOB,并且赋予ID号?
    DOS下查看进程对应的文件路径
    查询系统中运行的JOB
    plsql中书写一个简单的存储过程
  • 原文地址:https://www.cnblogs.com/shay-zhangjin/p/7863539.html
Copyright © 2011-2022 走看看