zoukankan      html  css  js  c++  java
  • Scrapy通过sqlite3保存数据

    以爬取当当网作为实例 http://bj.ganji.com/fang1/chaoyang/

    通过xpath获取title和price

    分别贴出spider, items, pipelines的code

     1 # -*- coding: utf-8 -*-
     2 import scrapy
     3 from ..items import RenthouseItem
     4 
     5 class GanjiSpider(scrapy.Spider):
     6     name = 'ganji'
     7     # allowed_domains = ['bj.ganji.com']
     8     start_urls = ['http://bj.ganji.com/fang1/chaoyang/']
     9 
    10     def parse(self, response):
    11         #print(response)
    12         rh = RenthouseItem()
    13         title_list = response.xpath('//*[@class="f-list-item ershoufang-list"]/dl/dd[1]/a/text()').extract()
    14         price_list = response.xpath('//*[@class="f-list-item ershoufang-list"]/dl/dd[5]/div[1]/span[1]/text()').extract()
    15         # d = {}
    16         for i, j in zip(title_list, price_list):
    17             rh['title'] = i
    18             rh['price'] = j
    19             yield rh
    20             # d['title'] = i
    21             # d['price'] = j
    22             # yield d
    23         #     print(i, ':', j)
     1 # -*- coding: utf-8 -*-
     2 
     3 # Define here the models for your scraped items
     4 #
     5 # See documentation in:
     6 # https://doc.scrapy.org/en/latest/topics/items.html
     7 
     8 import scrapy
     9 
    10 
    11 class RenthouseItem(scrapy.Item):
    12     # define the fields for your item here like:
    13     # name = scrapy.Field()
    14     title = scrapy.Field()
    15     price = scrapy.Field()
    16     # pass
     1 # -*- coding: utf-8 -*-
     2 
     3 # Define your item pipelines here
     4 #
     5 # Don't forget to add your pipeline to the ITEM_PIPELINES setting
     6 # See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
     7 import sqlite3
     8 
     9 class RenthousePipeline(object):
    10     def open_spider(self, spider):
    11         self.con = sqlite3.connect('renthouse.sqlite')
    12         self.cu = self.con.cursor()    
    13 
    14     def process_item(self, item, spider):
    15         #print(spider.name)
    16         insert_sql = 'insert into renthouse (title, price) values ("{}", "{}")'.format(item['title'], item['price'])
    17         #print(insert_sql)
    18         self.cu.execute(insert_sql)
    19         self.con.commit()
    20         return item
    21 
    22     def spider_close(self, spider):
    23         self.con.close() 

    spider通过 rh = RenthouseItem() 这一句话初始化一个rh的实例,使我们可以通过这个rh传到pipelines进行处理

    所以这里我们每次通过rh传一个字典给pipelines(标题titile,价格price)然后通过sql语句插入到sqlite3

    open_spider是打开spider的时候做的,所以这个时候我们连接数据库,个人觉得这篇文章关于cursor光标及sqlite的应用讲的很清楚https://www.cnblogs.com/qq78292959/archive/2013/04/01/2993327.html

    注意insert等这种修改数据execute(执行)以后一定要commit(提交)!!!

    close_spider就是关闭spider的时候做的,所以这个时候我们关闭与数据库的连接

    插入成功

  • 相关阅读:
    C# List<string>和List<int>互相转换
    sourcetree跳过注册的方法
    列举 contentType: 内容类型(MIME 类型)
    nginx 使用过程中一些基础性问题总结
    MVC3升级到MVC4模型验证信息显示为英文问题及解决方案
    ckeditor:复制内容到ckeditor时,只保留文本,忽略其样式解决方法
    MVC从视图传参到Controller几种方式
    Window Service 计划任务
    Git命令行连Github与TortoiseGit 连Github区别
    For xml path
  • 原文地址:https://www.cnblogs.com/ducklu/p/9029993.html
Copyright © 2011-2022 走看看