zoukankan      html  css  js  c++  java
  • python实现列表页数据的批量抓取练手练手的

    python实现列表页数据的批量抓取,练手的,下回带分页的

    #!/usr/bin/env python
    # coding=utf-8
    
    import requests
    from bs4 import BeautifulSoup
    import pymysql
    
    import sys, io
    sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8') # Change default encoding to utf8
    
    print('连接到mysql服务器...')
    db = pymysql.connect("localhost","root","root","python")
    print('连接上了!')
    cursor = db.cursor()
    
    hdrs = {'User-Agent':'Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko)'}
    
    url = "http://www.zztez.com/tezgl/"
    
    r = requests.get(url, headers = hdrs)
    soup = BeautifulSoup(r.content.decode('gbk', 'ignore'), 'lxml')
    
    
    def has_class_but_no_id(tag):
        return tag.has_attr('title') and tag.has_attr('href') and not tag.has_attr('target')
    
    urls = []
    for link in soup.find_all(has_class_but_no_id):
                url="http://www.zztez.com" + link.get('href')
                r = requests.get(url, headers = hdrs)
                soup = BeautifulSoup(r.content.decode('gbk', 'ignore'), 'lxml')
    
                title=soup.find("h1")
                title=title.string.encode("utf-8")
    
                intro=soup.select(".intro")
                rintro=intro[0].string.encode("utf-8")
    
                content=soup.select(".content")
                rcontent=content[0].encode("utf-8")
    
                #查询数据
                sql="SELECT count(*) as total FROM article WHERE title like %s"
                data=(title)
                row_affected=cursor.execute(sql,data)
                one=cursor.fetchone()
    
                if one==(0,):
                    insert = ("INSERT INTO article(title,intro,content)" "VALUES(%s,%s,%s)")
                    data = (title, rintro, rcontent)
                    cursor.execute(insert, data)
                    db.commit()
    
    print('爬取数据并插入mysql数据库完成...')
  • 相关阅读:
    未来超市 轻松之旅
    超市淡季从竞争对手抓起
    如何监管超市收银漏洞
    一份好的方案需要注意哪些内容?
    超市负库存产生的原因及对策
    成功演示的关键步骤(三)
    成功演示的关键步骤(一)
    js iframe 地址
    js 弹出可拖动窗口
    js 关闭当前页面不提示
  • 原文地址:https://www.cnblogs.com/baker95935/p/7744168.html
Copyright © 2011-2022 走看看