zoukankan html css js c++ java

06. scrapy的Request对象

class scrapy.http.Request(url[, callback, method="GET", headers, body, cookies, meta, encoding='utf8', priority=0, dont_filter=Falese, errback]))

参数详解:

url : 目标请求地址
callback : 指定http方法, 默认为get
method : 自定http方法, 默认为get
meta : request.meta 可以传一些键值对
body : 请求正文, 二进制内容
headers : http请求头
cookies: 附带在请求中要一起发出的cookies对象
encoding : 当前请求的编码方式, 设置为true则不过滤请求
priority : 设置请求的优先级, 默认为0, 这个优先级是scheduler在线程中用于定义处理请求的顺序
dont_filter : 默认为False, 设置为True则不过滤请求
erraback: 当请求发生任何异常时就会调用此回调函数
　　

import scrapy 
from  scrapy.linkextractors import LinkExtractor

class DeepInSpider( scrapy.Spider ):
    name = 'example.com'
    start_urls = [ 'https://www.baidu.com' ]
    
    def parse( self, response ):
        link_extractor = LinkExtractor()
        seen =set()
        
        linkes = link_extractor.extract_links(response)
        links = [ link for link in linkes if link not in senn ]

        for link in links:
            print( link.url )
            seen.add(link)
            cd = None
            if ( link.contains( 'detail ) ):
                cd = self.parse_detail
            yield scrapy.Request( url=link, callback=cd )
            yield scrapy.Request( url = link.url, callback=cd )

    def parse_detail(self, response):
        pass

查看全文

相关阅读:
欧拉函数的递推形式
 Hadoop运行startdfs.sh报ERROR: Attempting to operate on hdfs as root错误的解决方法
 Solr+ZooKeeper运行报错
 Linux的查找命令
 Solr报警告java.nio.file.NoSuchFileException: /solr/xxx_shard_replica_xx/../../../../contrib/extraction/lib
利用脚本批量操作Tomcat集群
 大数据组件的日志的时区问题
 利用脚本批量操作ZooKeeper集群
 Solr报错org.apache.solr.common.SolrException: undefined field text
Solr/SolrCloud报警告Your ZK connection string ( hosts) is different from the dynamic ensemble config ( hosts)的解决方法

原文地址：https://www.cnblogs.com/zhangjian0092/p/11693669.html