zoukankan      html  css  js  c++  java
  • scrapy框架操作

    命令提示符中操作
    scrapy startproject anjuke1 //创建项目
    cd anjuke1 //切换到文件夹
    scrapy genspider anjuke tianjin.anjuke.com/sale/?from=navigation //创建应用
    pycharm中
    base=response.xpath('').extract() //scrapy语法
    创建run.py文件内容为
    from scrapy.cmdline import execute
    execute(['scrapy','crawl','anjuke'])
    在pipelines中内容
    import pymysql
    class Anjuke1Pipeline:
    def __init__(self, user, password, host, db_name): #参数
    self.user = user#用户root
    self.password = password#密码
    self.host = host#127.0.0.1
    self.db_name = db_name#数据库名
    @classmethod
    def from_crawler(cls, crawler): #将setting中的数据库user,password等加入
    return cls(user=crawler.settings.get('DB_USER'), password=crawler.settings.get('DB_PWD'),host=crawler.settings.get('HOST'), db_name=crawler.settings.get('DB_NAME'))

    def open_spider(self, spider):#接user,password等
    # print(self.db_name,self.user,self.password,self.host)
    self.db = pymysql.connect(user=self.user, password=self.password, host=self.host, db=self.db_name,charset='utf8')
    self.cursor = self.db.cursor()

    def process_item(self, item, spider):#写sql语句
    sql = "insert into anjuke1(a,b,c,c1,c2,c3,e,f,g) VALUE ('{}','{}','{}','{}','{}','{}','{}','{}','{}')".format(item['a'], item['b'], item['c'], item['c1'], item['c2'],item['c3'], item['e'], item['f'], item['g'])
    # print(sql)
    self.cursor.execute(sql)#执行sql语句
    self.db.commit()#保存
    # print(item)
    return item
    def close_spider(self, spider):#关闭MySQL
    self.db.close()

  • 相关阅读:
    PHP (20140519)
    PHP (20140516)
    js(20140517)在JS方法中返回多个值的三种方法
    PHP (20140515)
    PHP (20140514)
    Java内网发送邮件
    每日一“酷”之Cookie
    每日一“酷”之Queue
    每日一“酷”之pprint
    每日一“酷”之copy
  • 原文地址:https://www.cnblogs.com/wbf980728/p/14583086.html
Copyright © 2011-2022 走看看