zoukankan      html  css  js  c++  java
  • scrapy docker 基本部署使用

    1. 简单项目
    pip install scrapy 
    scrapy startproject appdemo
     
     
    2. 项目代码
    a. 项目代码结构
    
    ├── Dockerfile
    ├── README.md
    ├── appdemo
    │   ├── __init__.py
    │   ├── __pycache__
    │   ├── items.py
    │   ├── middlewares.py
    │   ├── pipelines.py
    │   ├── settings.py
    │   └── spiders
    │       ├── __init__.py
    │       ├── __pycache__
    │       └── book_spider.py
    └── scrapy.cfg
    
    b. 主要代码是book_spider.py
    
    import scrapy
    class BookSpider(scrapy.Spider):
        name="appdemo"
        start_urls=["http://books.toscrape.com/"]
        def parse(self,response):
            for book in response.css("article.product_pod"):
                name= book.xpath("./h3/a/@title").extract_first()
                price=book.css("p.price_color::text").extract_first()
                yield {
                    "name":name,
                    "price":price,
                }
                next_url=response.css("ul.pager li.next a::attr(href)").extract_first()
                if next_url:
                    next_url=response.urljoin(next_url)
                    yield scrapy.Request(next_url,callback=self.parse)
    c. Dockerfile
    
    FROM python:3.5
    RUN  pip install scrapy
    VOLUME [ "/data" ]
    WORKDIR /myapp
    COPY . /myapp
    ENTRYPOINT [ "scrapy","crawl","appdemo","-o","/data/appdemo.csv" ]
    备注: 为了简单使用了python:3.5 基础镜像,alpine 镜像存在包依赖的问题
     
    3. 运行
    a. 命令行运行
    
    scrapy crawl appdemo -o myinfo.csv
    
    b. docker build
    
    docker build -t myscrapy .
    
    docker run -it -v $PWD/mydata:/data myscrapy
    cat $PWD/mydata/appdemo.csv
    
    c. 直接使用dockerhub 镜像运行
    docker run -it -v $PWD/mydata:/data dalongrong/scrapydockerdemo
     
    4. 参考文档
    https://docs.scrapy.org/en/latest/
    https://github.com/rongfengliang/scrapydockerdemo
  • 相关阅读:
    Codeforces 1457D XOR-gun
    华东交通大学2020年ACM“双基”程序设计竞赛 题解
    Codeforces-1433F-Zero Remainder Sum
    Codeforces-1430D- String Deletion
    Codeforces 1315D Recommendations
    Codeforces Skyscrapers (hard version)
    Codeforces-1470C(Chocolate Bunny)
    Hdu 6863
    杭电多校2020-7&&hdu 6769 In Search of Gold
    Codeforces-1384B2 Koa and the Beach (Hard Version)
  • 原文地址:https://www.cnblogs.com/rongfengliang/p/8447574.html
Copyright © 2011-2022 走看看