zoukankan      html  css  js  c++  java
  • 爬虫杂记2

    在scrapy中发送post请求可以用 FormRequest,但发送的 Content-Type 的值是 application/x-www-form-urlencoded ,不适用Content-Type不支持这种情况的网站

    这是可以用 Request 发送post请求,如下:

    from scrapy.http import Request
    
    yield Request(
                    url=self.search_url, method='POST',
                    body=json.dumps(post_data),
                    headers={'Content-Type': 'application/json'},
                    callback=self.parse,
                    meta={'item': deepcopy(item)}
                )

    脚本运行Scrapy

    from scrapy.crawler import CrawlerProcess
    from neuSpider.spiders.zcy import SnSpider
    from scrapy.utils.project import get_project_settings
    settings = get_project_settings()
    settings['DUPEFILTER_CLASS'] = "neuSpider.dupefilters.NODupeFilter"
    process = CrawlerProcess(settings)
    process.crawl(SnSpider, key='手机',  max_page=1, page_size=10, idle_num=2, clean_redis='yes')
    process.start()

    自定义过滤类  DUPEFILTER_CLASS

    from scrapy_redis.dupefilter import RFPDupeFilter
    
    
    class NODupeFilter(RFPDupeFilter):
        def request_seen(self, request):
            return False

    验证码识别

     截屏:

    动作链:

     

     =========发post请求

     

     ======= cookie

     == =====模拟登录

     

     

     ========获取cookie信息

     

     =======利用scrapy登录人人

     

  • 相关阅读:
    Unique Binary Search Trees 解答
    Unique Paths II 解答
    Unique Paths 解答
    Maximum Subarray 解答
    Climbing Stairs 解答
    House Robber II 解答
    House Robber 解答
    Valid Palindrome 解答
    Container With Most Water 解答
    Remove Duplicates from Sorted List II 解答
  • 原文地址:https://www.cnblogs.com/testzcy/p/13768442.html
Copyright © 2011-2022 走看看