zoukankan      html  css  js  c++  java
  • python 爬虫数据存入csv格式方法

    python 爬虫数据存入csv格式方法

    命令存储方式:
    scrapy crawl ju -o ju.csv

    第一种方法:
    with open("F:/book_top250.csv","w") as f:
    f.write("{},{},{},{},{} ".format(book_name ,rating, rating_num,comment, book_link))
    复制代码


    第二种方法:
    with open("F:/book_top250.csv","w",newline="") as f: ##如果不添加newline="",爬取信息会隔行显示
    w = csv.writer(f)
    w.writerow([book_name ,rating, rating_num,comment, book_link])
    复制代码


    方法一的代码:
    import requests
    from lxml import etree
    import time

    urls = ['https://book.douban.com/top250?start={}'.format(i * 25) for i in range(10)]
    with open("F:/book_top250.csv","w") as f:
    for url in urls:
    r = requests.get(url)
    selector = etree.HTML(r.text)

    books = selector.xpath('//*[@id="content"]/div/div[1]/div/table/tr/td[2]')
    for book in books:
    book_name = book.xpath('./div[1]/a/@title')[0]
    rating = book.xpath('./div[2]/span[2]/text()')[0]
    rating_num = book.xpath('./div[2]/span[3]/text()')[0].strip('() ') #去除包含"(",")"," "," "的首尾字符
    try:
    comment = book.xpath('./p[2]/span/text()')[0]
    except:
    comment = ""
    book_link = book.xpath('./div[1]/a/@href')[0]
    f.write("{},{},{},{},{} ".format(book_name ,rating, rating_num,comment, book_link))

    time.sleep(1)
    复制代码


    方法二的代码:
    import requests
    from lxml import etree
    import time
    import csv

    urls = ['https://book.douban.com/top250?start={}'.format(i * 25) for i in range(10)]
    with open("F:/book_top250.csv","w",newline='') as f:
    for url in urls:
    r = requests.get(url)
    selector = etree.HTML(r.text)

    books = selector.xpath('//*[@id="content"]/div/div[1]/div/table/tr/td[2]')
    for book in books:
    book_name = book.xpath('./div[1]/a/@title')[0]
    rating = book.xpath('./div[2]/span[2]/text()')[0]
    rating_num = book.xpath('./div[2]/span[3]/text()')[0].strip('() ') #去除包含"(",")"," "," "的首尾字符
    try:
    comment = book.xpath('./p[2]/span/text()')[0]
    except:
    comment = ""
    book_link = book.xpath('./div[1]/a/@href')[0]

    w = csv.writer(f)
    w.writerow([book_name ,rating, rating_num,comment, book_link])
    time.sleep(1)

  • 相关阅读:
    JBoss中配置虚拟目录以及设置浏览器地址输入框支持中文的方法
    实现表格鼠标经过变色,点击变色并选中项目
    JBoss4.0与金山词霸有端口冲突
    Hibernate查询方式比较
    数字证书使用Javascript在浏览器中自动安装的解决方案
    JBoss设置URI编码,使浏览器URL支持中文
    VC++中list的使用方式
    使用JBoss管理数据库连接的方法(JDNI方式)
    CefSharp中实现Chrome中jS导出Excel
    WP7备注(27)(DependencyProperty|RoutedPropertyChangedEventHandler)
  • 原文地址:https://www.cnblogs.com/duanlinxiao/p/9820685.html
Copyright © 2011-2022 走看看