zoukankan      html  css  js  c++  java
  • 多线程之小米商店APP爬虫

    #今日目标
    
    **多线程之小米商店APP爬虫**
    
    爬取小米商店所有社交APP
    
    ```
    import requests
    import time
    from threading import Thread
    from queue import Queue
    import json
    
    class XiaoAppSpider(object):
        def __init__(self):
            self.url='http://app.mi.com/categotyAllListApi?page={}&categoryId=2&pageSize=30'
            self.headers={'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36X-Requested-With: XMLHttpRequest'}
            self.url_queue=Queue()
            self.n=0
        #url队列
        def url_in(self):
            for i in range(67):
                url=self.url.format(i)
                #入队列
                self.url_queue.put(url)
        #线程事件函数
        def get_data(self):
            while True:
                if self.url_queue.empty():
                    break
                #get地址,请求+解析+保存
                url=self.url_queue.get()
                html=requests.get(url=url,headers=self.headers).content.decode('utf-8')
                html=json.loads(html)
                #with open('xiao.json','a') as f:
                    #app_dict={}
                for app in html['data']:
                    app_name=app['displayName']
                    app_link='http://app.mi.com/details?'+app['packageName']
                    print(app_name,app_link)
                    self.n +=1
    #主函数
        def main(self):
            #url入队列
            self.url_in()
            #创建多线程
            t_list=[]
            for i in range (5):
                t=Thread(target=self.get_data)
                t_list.append(t)
                t.start()
            for i in t_list:
                i.join()
                print('应用数量:',self.n)
    
    
    
    if __name__ == '__main__':
        start=time.time()
        spider=XiaoAppSpider()
        spider.main()
        end=time.time()
        print('执行时间为{}'.format(end-start))
    
    
    ```
  • 相关阅读:
    好友面板切换案例
    jquery
    H5 新增API
    深拷贝
    浅拷贝
    call bind apply
    像素鸟
    Django3.0
    三剑客-grep-awk-sed
    Linux中find命令详解
  • 原文地址:https://www.cnblogs.com/cxiaolong/p/11273314.html
Copyright © 2011-2022 走看看