zoukankan      html  css  js  c++  java
  • scrapy docker 基本部署使用

    1. 简单项目
    pip install scrapy 
    scrapy startproject appdemo
     
     
    2. 项目代码
    a. 项目代码结构
    
    ├── Dockerfile
    ├── README.md
    ├── appdemo
    │   ├── __init__.py
    │   ├── __pycache__
    │   ├── items.py
    │   ├── middlewares.py
    │   ├── pipelines.py
    │   ├── settings.py
    │   └── spiders
    │       ├── __init__.py
    │       ├── __pycache__
    │       └── book_spider.py
    └── scrapy.cfg
    
    b. 主要代码是book_spider.py
    
    import scrapy
    class BookSpider(scrapy.Spider):
        name="appdemo"
        start_urls=["http://books.toscrape.com/"]
        def parse(self,response):
            for book in response.css("article.product_pod"):
                name= book.xpath("./h3/a/@title").extract_first()
                price=book.css("p.price_color::text").extract_first()
                yield {
                    "name":name,
                    "price":price,
                }
                next_url=response.css("ul.pager li.next a::attr(href)").extract_first()
                if next_url:
                    next_url=response.urljoin(next_url)
                    yield scrapy.Request(next_url,callback=self.parse)
    c. Dockerfile
    
    FROM python:3.5
    RUN  pip install scrapy
    VOLUME [ "/data" ]
    WORKDIR /myapp
    COPY . /myapp
    ENTRYPOINT [ "scrapy","crawl","appdemo","-o","/data/appdemo.csv" ]
    备注: 为了简单使用了python:3.5 基础镜像,alpine 镜像存在包依赖的问题
     
    3. 运行
    a. 命令行运行
    
    scrapy crawl appdemo -o myinfo.csv
    
    b. docker build
    
    docker build -t myscrapy .
    
    docker run -it -v $PWD/mydata:/data myscrapy
    cat $PWD/mydata/appdemo.csv
    
    c. 直接使用dockerhub 镜像运行
    docker run -it -v $PWD/mydata:/data dalongrong/scrapydockerdemo
     
    4. 参考文档
    https://docs.scrapy.org/en/latest/
    https://github.com/rongfengliang/scrapydockerdemo
  • 相关阅读:
    Agc011_C Squared Graph
    银河战舰
    项链
    无旋Treap
    [PHP] 生成二维码(两种方法)
    [XML] XML格式【有道翻译】API 的数据转化输出
    [YII2] COOKIE的操作使用
    [Laravel框架学习一]:Laravel框架的安装以及 Composer的安装
    [YII2] 修改默认控制器Controller以及默认方法Action
    [YII2] 3步发送邮件,有图有真相!
  • 原文地址:https://www.cnblogs.com/rongfengliang/p/8447574.html
Copyright © 2011-2022 走看看