zoukankan      html  css  js  c++  java
  • python 爬虫数据存入csv格式方法

    python 爬虫数据存入csv格式方法

    命令存储方式:
    scrapy crawl ju -o ju.csv

    第一种方法:
    with open("F:/book_top250.csv","w") as f:
    f.write("{},{},{},{},{} ".format(book_name ,rating, rating_num,comment, book_link))
    复制代码


    第二种方法:
    with open("F:/book_top250.csv","w",newline="") as f: ##如果不添加newline="",爬取信息会隔行显示
    w = csv.writer(f)
    w.writerow([book_name ,rating, rating_num,comment, book_link])
    复制代码


    方法一的代码:
    import requests
    from lxml import etree
    import time

    urls = ['https://book.douban.com/top250?start={}'.format(i * 25) for i in range(10)]
    with open("F:/book_top250.csv","w") as f:
    for url in urls:
    r = requests.get(url)
    selector = etree.HTML(r.text)

    books = selector.xpath('//*[@id="content"]/div/div[1]/div/table/tr/td[2]')
    for book in books:
    book_name = book.xpath('./div[1]/a/@title')[0]
    rating = book.xpath('./div[2]/span[2]/text()')[0]
    rating_num = book.xpath('./div[2]/span[3]/text()')[0].strip('() ') #去除包含"(",")"," "," "的首尾字符
    try:
    comment = book.xpath('./p[2]/span/text()')[0]
    except:
    comment = ""
    book_link = book.xpath('./div[1]/a/@href')[0]
    f.write("{},{},{},{},{} ".format(book_name ,rating, rating_num,comment, book_link))

    time.sleep(1)
    复制代码


    方法二的代码:
    import requests
    from lxml import etree
    import time
    import csv

    urls = ['https://book.douban.com/top250?start={}'.format(i * 25) for i in range(10)]
    with open("F:/book_top250.csv","w",newline='') as f:
    for url in urls:
    r = requests.get(url)
    selector = etree.HTML(r.text)

    books = selector.xpath('//*[@id="content"]/div/div[1]/div/table/tr/td[2]')
    for book in books:
    book_name = book.xpath('./div[1]/a/@title')[0]
    rating = book.xpath('./div[2]/span[2]/text()')[0]
    rating_num = book.xpath('./div[2]/span[3]/text()')[0].strip('() ') #去除包含"(",")"," "," "的首尾字符
    try:
    comment = book.xpath('./p[2]/span/text()')[0]
    except:
    comment = ""
    book_link = book.xpath('./div[1]/a/@href')[0]

    w = csv.writer(f)
    w.writerow([book_name ,rating, rating_num,comment, book_link])
    time.sleep(1)

  • 相关阅读:
    gojs常用API-画布定义
    页面开发的标准
    iis7.5做反向代理配置方法实例图文教程
    Tomcat实现反向代理
    nodejs的package.json依赖dependencies中 ^ 和 ~ 的区别
    dependencies与devDependencies的区别
    常见的cmd命令
    解决SecureCRT中文显示乱码
    ASP防XSS代码
    Android页面之间进行数据回传
  • 原文地址:https://www.cnblogs.com/duanlinxiao/p/9820685.html
Copyright © 2011-2022 走看看