zoukankan      html  css  js  c++  java
  • Python爬取CVPR2018论文

    摘要:爬取CVPR2018论文的内容:标题,简介,关键字,论文链接

    1、数据库表的创建(MySQL)

    注意:abstract长度不定,所以类型应为text,避免入坑

    2、python爬取

    import requests
    from bs4 import BeautifulSoup
    import pymysql
    
    headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}  # 创建头部信息
    url = 'http://openaccess.thecvf.com/CVPR2018.py'
    print(url)
    r = requests.get(url, headers=headers)
    content = r.content.decode('utf-8')
    
    soup = BeautifulSoup(content, 'html.parser')
    dts = soup.find_all('dt', class_='ptitle')
    print(dts)
    hts = 'http://openaccess.thecvf.com/'
    # 数据爬取
    alllist = []
    for i in range(len(dts)):
        print('这是第' + str(i) + '')
        title = dts[i].a.text.strip()
        href = hts + dts[i].a['href']
        r = requests.get(href, headers=headers)
        content = r.content.decode('utf-8')
        soup = BeautifulSoup(content, 'html.parser')
        # print(title,href)
        divabstract = soup.find(name='div', attrs={"id": "abstract"})
        abstract = divabstract.text.strip()
        # print('第'+str(i)+'个:',abstract)
        alllink = soup.select('a')
        link = hts + alllink[4]['href'][6:]
        keyword = str(title).split(' ')
        keywords = ''
        for k in range(len(keyword)):
            if (k == 0):
                keywords += keyword[k]
            else:
                keywords += ',' + keyword[k]
        value = (title, abstract, link, keywords)
        alllist.append(value)
    print(alllist)
    tuplist = tuple(alllist)
    # 数据保存
    db = pymysql.connect("localhost", "root", "123456", "lunwen", charset='utf8')
    cursor = db.cursor()
    sql_cvpr = "INSERT INTO lunwens(title, abstract, link, keywords) values (%s,%s,%s,%s)"
    try:
        cursor.executemany(sql_cvpr, tuplist)
        db.commit()
    except:
        print('执行失败,进入回调3')
        db.rollback()
    db.close()
    lunwen
  • 相关阅读:
    (剑指offer)斐波那契数列
    手写Vue源码 watch的实现
    Vue源码之异步批量任务更新
    手写Vue源码之 依赖收集
    C# 测试代码#if DEBUG使用
    shell脚本编程相关7
    C#中关于ref和out的认识
    shell脚本编程相关6
    shell脚本编程相关5
    shell脚本编程相关4
  • 原文地址:https://www.cnblogs.com/MoooJL/p/12782860.html
Copyright © 2011-2022 走看看