zoukankan      html  css  js  c++  java
  • python 爬虫数据存入csv格式方法

    python 爬虫数据存入csv格式方法

    命令存储方式:
    scrapy crawl ju -o ju.csv

    第一种方法:
    with open("F:/book_top250.csv","w") as f:
    f.write("{},{},{},{},{} ".format(book_name ,rating, rating_num,comment, book_link))
    复制代码


    第二种方法:
    with open("F:/book_top250.csv","w",newline="") as f: ##如果不添加newline="",爬取信息会隔行显示
    w = csv.writer(f)
    w.writerow([book_name ,rating, rating_num,comment, book_link])
    复制代码


    方法一的代码:
    import requests
    from lxml import etree
    import time

    urls = ['https://book.douban.com/top250?start={}'.format(i * 25) for i in range(10)]
    with open("F:/book_top250.csv","w") as f:
    for url in urls:
    r = requests.get(url)
    selector = etree.HTML(r.text)

    books = selector.xpath('//*[@id="content"]/div/div[1]/div/table/tr/td[2]')
    for book in books:
    book_name = book.xpath('./div[1]/a/@title')[0]
    rating = book.xpath('./div[2]/span[2]/text()')[0]
    rating_num = book.xpath('./div[2]/span[3]/text()')[0].strip('() ') #去除包含"(",")"," "," "的首尾字符
    try:
    comment = book.xpath('./p[2]/span/text()')[0]
    except:
    comment = ""
    book_link = book.xpath('./div[1]/a/@href')[0]
    f.write("{},{},{},{},{} ".format(book_name ,rating, rating_num,comment, book_link))

    time.sleep(1)
    复制代码


    方法二的代码:
    import requests
    from lxml import etree
    import time
    import csv

    urls = ['https://book.douban.com/top250?start={}'.format(i * 25) for i in range(10)]
    with open("F:/book_top250.csv","w",newline='') as f:
    for url in urls:
    r = requests.get(url)
    selector = etree.HTML(r.text)

    books = selector.xpath('//*[@id="content"]/div/div[1]/div/table/tr/td[2]')
    for book in books:
    book_name = book.xpath('./div[1]/a/@title')[0]
    rating = book.xpath('./div[2]/span[2]/text()')[0]
    rating_num = book.xpath('./div[2]/span[3]/text()')[0].strip('() ') #去除包含"(",")"," "," "的首尾字符
    try:
    comment = book.xpath('./p[2]/span/text()')[0]
    except:
    comment = ""
    book_link = book.xpath('./div[1]/a/@href')[0]

    w = csv.writer(f)
    w.writerow([book_name ,rating, rating_num,comment, book_link])
    time.sleep(1)

  • 相关阅读:
    图像处理之基础---卷积及其快速算法的C++实现
    嵌入式c语言笔试
    逻辑题
    多媒体开发之---h264 图像参数级语义
    多媒体开发之---h264 取流解码实现
    多媒体开发之---live555 分析客户端
    多媒体开发之---如何确定slice_header slice_type 的位置
    图像处理之基础---很好的一个开源文档库
    多媒体开发之---h264 高度和宽度获取
    Flutter实战视频-移动电商-65.会员中心_订单区域UI布局
  • 原文地址:https://www.cnblogs.com/duanlinxiao/p/9820685.html
Copyright © 2011-2022 走看看