zoukankan      html  css  js  c++  java
  • scrapy中的request

    scrapy中的request
    初始化参数
    class scrapy.http.Request(
    url [ ,
    callback,
    method='GET',
    headers,
    body,
    cookies,
    meta,
    encoding='utf-8',
    priority=0,
     don't_filter=False,
     errback ] )
    
    
    1,生成Request的方法
    def parse_page1(self, response):
        return scrapy.Request("http://www.example.com/some_page.html",
                              callback=self.parse_page2)
    
    def parse_page2(self, response):
        # this would log http://www.example.com/some_page.html
        self.logger.info("Visited %s", response.url)
    
    2,通过Request传递数据的方法
    def parse_page1(self, response):
        item = MyItem()
        item['main_url'] = response.url
        request = scrapy.Request("http://www.example.com/some_page.html",
                                 callback=self.parse_page2)
        request.meta['item'] = item
        yield request
    
    def parse_page2(self, response):
        item = response.meta['item']
        item['other_url'] = response.url
        yield item
    
    3,Request.meta中的特殊关键字
    
    
    4,主要子类FormRequest,用于登陆
    return [FormRequest(url="http://www.example.com/post/action",
                        formdata={'name': 'John Doe', 'age': '27'},
                        callback=self.after_post)]
    
    更相信的登陆的例子
    import scrapy
    
    class LoginSpider(scrapy.Spider):
        name = 'example.com'
        start_urls = ['http://www.example.com/users/login.php']
    
        def parse(self, response):
            return scrapy.FormRequest.from_response(
                response,
                formdata={'username': 'john', 'password': 'secret'},
                callback=self.after_login
            )
    
        def after_login(self, response):
            # check login succeed before going on
            if "authentication failed" in response.body:
                self.logger.error("Login failed")
                return
    
            # continue scraping with authenticated session...
  • 相关阅读:
    android120 zhihuibeijing 开机页面
    Android View.onMeasure方法的理解
    android119 侧滑菜单
    android事件拦截处理机制详解
    Android应用在不同版本间兼容性处理
    虚拟机重置密码
    ESXi虚拟机开机进入bios的方法
    [日常工作]Win2008r2 以及更高版本的操作系统安装Oracle10.2.0.5
    Linux下安装oracle的过程
    Oracle18c Exadata 版本安装介质安装失败。
  • 原文地址:https://www.cnblogs.com/themost/p/7106250.html
Copyright © 2011-2022 走看看