zoukankan      html  css  js  c++  java
  • 顶会热词之数据爬取

    import requests
    from bs4 import BeautifulSoup
    import pymysql
    
    headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}  # 创建头部信息
    url = 'http://openaccess.thecvf.com/CVPR2018.py'
    print(url)
    r = requests.get(url, headers=headers)
    content = r.content.decode('utf-8')
    
    soup = BeautifulSoup(content, 'html.parser')
    dts = soup.find_all('dt', class_='ptitle')
    print(dts)
    hts = 'http://openaccess.thecvf.com/'
    # 数据爬取
    alllist = []
    for i in range(len(dts)):
        print('这是第' + str(i) + '')
        title = dts[i].a.text.strip()
        href = hts + dts[i].a['href']
        r = requests.get(href, headers=headers)
        content = r.content.decode('utf-8')
        soup = BeautifulSoup(content, 'html.parser')
        # print(title,href)
        divabstract = soup.find(name='div', attrs={"id": "abstract"})
        abstract = divabstract.text.strip()
        # print(''+str(i)+'个:',abstract)
        alllink = soup.select('a')
        link = hts + alllink[4]['href'][6:]
        keyword = str(title).split(' ')
        keywords = ''
        for k in range(len(keyword)):
            if (k == 0):
                keywords += keyword[k]
            else:
                keywords += ',' + keyword[k]
        value = (title, abstract, link, keywords)
        alllist.append(value)
    print(alllist)
    tuplist = tuple(alllist)
    # 数据保存
    db = pymysql.connect("localhost", "root", "123456", "lunwen", charset='utf8')
    cursor = db.cursor()
    sql_cvpr = "INSERT INTO lunwens(title, abstract, link, keywords) values (%s,%s,%s,%s)"
    try:
        cursor.executemany(sql_cvpr, tuplist)
        db.commit()
    except:
        print('执行失败,进入回调3')
        db.rollback()
    db.close()
  • 相关阅读:
    用对象模式实现QTP的远程调用
    Python 常用类库
    User32.dll 函数的相关方法整理
    python ConfigParser – 配置文件解析
    python 中的 __init__()解释
    Access to the database file is not allowed. [ File name = ***\DataBase.sdf
    在遗忘中成长
    在MVC3里如何关闭Form标签
    javascript 中写cookie
    .NET之死和观念的力量【】
  • 原文地址:https://www.cnblogs.com/jz-no-bug/p/14908396.html
Copyright © 2011-2022 走看看