zoukankan      html  css  js  c++  java
  • Python 爬取 猫眼 top100 电影例子

    一个Python 爬取猫眼top100的小栗子

    import json
    import requests
    import re
    from multiprocessing import Pool #//进程池
    from requests.exceptions import RequestException
    #请求单页
    def get_one_page(url):
        try:
            headers = {
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.79 Safari/537.36'
            }
            response = requests.get(url,headers=headers)
            if response.status_code == 200:
                return response.text
            return None
        except RequestException:
            return None
    
    
    #解析页面
    def parse_one_page(html):
        pattern = re.compile('<dd>.*?board-index.*?>(d+)</i>.*?data-src="(.*?)".*?name"><a.*?>(.*?)</a>.*?star">(.*?)</p>.*?releasetime">(.*?)</p>.*?integer">(.*?)</i>.*?fraction">(.*?)</i>.*?</dd>',re.S)
        items = re.findall(pattern, html)
        print(items)
        for item in items:
            yield {
                'index':item[0],
                'image': item[1],
                'title': item[2],
                'actor': item[3].strip()[3:],
                'time':item[4].strip()[5:],
                'score': item[5]+item[6]
            }
    
    def main(offset):
        url = 'https://maoyan.com/board/4?offset='+str(offset)
        html = get_one_page(url)
        htmls=parse_one_page(html)
        for item in htmls:
            print(item)
            wirte_to_file(item) #写入文件
    
    
    #写到文件中
    def wirte_to_file(content):
        with open('result.txt','a') as f:
            f.write(json.dumps(content)+'
    ')
            f.close()
    
    
    #开始调用
    if __name__ =='__main__':
        #普通方式
        for item in range(10):
            main(str(item*10))
    
        #线程池
        # pool = Pool()
        # pool.map(main,[i*10 for i in range(10)])
  • 相关阅读:
    Free DIY Tour_DP
    找出直系亲属_floyd
    Constructing Roads In JGShining's Kingdom_最长上升子序列
    买卖股票_线性dp
    Common Subsequence_公共子序列
    Doing Homework_状态压缩&&位运算
    猴子选大王
    卡片游戏
    Java 2
    Java 1
  • 原文地址:https://www.cnblogs.com/youmingkuang/p/7879991.html
Copyright © 2011-2022 走看看