zoukankan      html  css  js  c++  java
  • Python爬取猫眼电影排行

    import requests
    import pyquery
    
    
    def crawl_page(url: str) -> None:
        headers = {
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
    Chrome/72.0.3626.121 Safari/537.36',
        }
        response = requests.get(url, headers=headers)
        parse_page(response.text)
    
    
    def parse_page(source_code: str) -> None:
        html = pyquery.PyQuery(source_code)
        dd_elements = html('.board-wrapper dd')
        for dd_element in dd_elements.items():
            data = {
                '排名': dd_element.find('i.board-index').text(),
                '电影名': dd_element.find('a.image-link').attr('title'),
                '主演': dd_element.find('p.star').text().split(':')[1],
                '上映时间': dd_element.find('p.releasetime').text().split(':')[1],
                '评分': dd_element.find('p.score').text(),
            }
            print(data)
            save_data(data)
    
    
    def save_data(data: dict) -> None:
        data = str(data)
        with open('MaoYan.txt', 'a+', encoding='utf8') as f:
            f.write(data+'
    ')
        return None
    
    
    def main():
        for i in range(0, 100, 10):
            url = 'https://maoyan.com/board/4?offset={}'.format(i)
            crawl_page(url)
    
    
    if __name__ == '__main__':
        main()
    
  • 相关阅读:
    list集合对象日期排序
    Mongodb模糊,or,and查询和日期查询
    单例模式
    代理模式
    抽象工厂模式
    java 除数运算获取两位小数
    html5 canvas 使用总结
    @MockBean 注解后 bean成员对象为 null?
    Java8 BiFunction 简单用用
    如何正确安装Ubuntu
  • 原文地址:https://www.cnblogs.com/malinqing/p/11318341.html
Copyright © 2011-2022 走看看