zoukankan      html  css  js  c++  java
  • scrapy中的request

    scrapy中的request
    初始化参数
    class scrapy.http.Request(
    url [ ,
    callback,
    method='GET',
    headers,
    body,
    cookies,
    meta,
    encoding='utf-8',
    priority=0,
     don't_filter=False,
     errback ] )
    
    
    1,生成Request的方法
    def parse_page1(self, response):
        return scrapy.Request("http://www.example.com/some_page.html",
                              callback=self.parse_page2)
    
    def parse_page2(self, response):
        # this would log http://www.example.com/some_page.html
        self.logger.info("Visited %s", response.url)
    
    2,通过Request传递数据的方法
    def parse_page1(self, response):
        item = MyItem()
        item['main_url'] = response.url
        request = scrapy.Request("http://www.example.com/some_page.html",
                                 callback=self.parse_page2)
        request.meta['item'] = item
        yield request
    
    def parse_page2(self, response):
        item = response.meta['item']
        item['other_url'] = response.url
        yield item
    
    3,Request.meta中的特殊关键字
    
    
    4,主要子类FormRequest,用于登陆
    return [FormRequest(url="http://www.example.com/post/action",
                        formdata={'name': 'John Doe', 'age': '27'},
                        callback=self.after_post)]
    
    更相信的登陆的例子
    import scrapy
    
    class LoginSpider(scrapy.Spider):
        name = 'example.com'
        start_urls = ['http://www.example.com/users/login.php']
    
        def parse(self, response):
            return scrapy.FormRequest.from_response(
                response,
                formdata={'username': 'john', 'password': 'secret'},
                callback=self.after_login
            )
    
        def after_login(self, response):
            # check login succeed before going on
            if "authentication failed" in response.body:
                self.logger.error("Login failed")
                return
    
            # continue scraping with authenticated session...
  • 相关阅读:
    Redis
    多线程相关
    selenium操作浏览器的基本方法
    selenium之 webdriver与三大浏览器版本映射表(更新至v2.29)
    selenium安装及官方文档
    Python(3)_python对Json进行操作
    python类中的self参数和cls参数
    python3中shuffle函数
    Python3中assert断言
    python2和python3中range的区别
  • 原文地址:https://www.cnblogs.com/themost/p/7106250.html
Copyright © 2011-2022 走看看