zoukankan      html  css  js  c++  java
  • bs4-mysql-豌豆荚

    爬取豌豆荚排行

    使用bs4解析数据,并保存到mysql

    import requests
    from bs4 import BeautifulSoup
    import pymysql
    
    headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
    }
    
    #获取app的detail的url
    def get_urls(url):
        page = requests.get(url=url,headers=headers).text
        soup = BeautifulSoup(page,'lxml')
        urls = soup.select('.icon-wrap>a')
        return urls
    
    def get_detail(url,cursor):
        page = requests.get(url=url,headers=headers).text
        soup = BeautifulSoup(page,'lxml')
    
        img = soup.select('.app-icon>img')[0]['src']
        name = soup.select('.app-name>span')[0].text
        count = soup.select('.app-info-data>span>i')[0].text
        ping_size = soup.select('.app-info-data>.love>i')[0].text
        ping_num = soup.select('.app-info-data>.comment-open>i')[0].text
        print(1)
        sql = 'insert into spider values (%s,%s,%s,%s,%s)'
        cursor.execute(sql,(img,name,count,ping_size,ping_num))
        conn.commit()
    
    
    if __name__ == '__main__':
        url_home = 'https://www.wandoujia.com/top/game'
        urls = get_urls(url_home)
        # print(soup.prettify())
    
        conn = pymysql.connect(user='root',passwd='root',database='t5')
        cursor = conn.cursor(pymysql.cursors.DictCursor)
        for i in urls:
            get_detail(i['href'],cursor)
    
        cursor.close()
        conn.close()
    
  • 相关阅读:
    为什么说 Java 程序员必须掌握 Spring Boot ?(转)
    Vert.x 之 HelloWorld
    Vert.x Web 文档手册
    Vert.x Core 文档手册
    Android数据存储五种方式总结
    设置简单的定时
    自定义dialog
    Android DrawerLayout 高仿QQ5.2双向侧滑菜单
    Fragments碎片
    理解Fragment生命周期
  • 原文地址:https://www.cnblogs.com/zx125/p/12129826.html
Copyright © 2011-2022 走看看