zoukankan      html  css  js  c++  java
  • scrapy

    pip install Sphinx
    cd /d E:worksoftselenium-2.48.0pyselenium
    sphinx-quickstart

    sphinx-apidoc -o outputdir packagedir
    sphinx-apidoc -s .txt -o E:worksoftselenium-2.48.0source E:worksoftselenium-2.48.0py

    http://blog.csdn.net/pleasecallmewhy/article/details/19642329

    # -*- coding: utf-8 -*-
    from scrapy.spider import BaseSpider
    from scrapy.selector import Selector
    from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
    from scrapy.contrib.spiders import CrawlSpider, Rule

    from selenium import selenium

    from shiyifang.items import ShiyifangItem

    class ShiyifangSpider(CrawlSpider):
        name = "shiyifang"
        allowed_domains = ["taobao.com"]
        start_urls = [
            "http://www.taobao.com"
        ]

        rules = (
            Rule(SgmlLinkExtractor(allow=('/market/nvzhuang/index.php?spm=a217f.7297021.a214d5w.2.tvAive', )),
                 callback='parse_page', follow=True),
        )

        def __init__(self):
            CrawlSpider.__init__(self)
            self.verificationErrors = []
            self.selenium = selenium("localhost", 4444, "*firefox", "http://www.taobao.com")
            self.selenium.start()

        def __del__(self):
            self.selenium.stop()
            print self.verificationErrors
            CrawlSpider.__del__(self)


        def parse_page(self, response):
            sel = Selector(response)
            from webproxy.items import WebproxyItem

            sel = self.selenium
            sel.open(response.url)
            sel.wait_for_page_to_load("30000")
            import time

            time.sleep(2.5)

  • 相关阅读:
    终于找到个eclipse的高级点的插件了!
    ArrayList排序 降序排列
    sql语法总结
    preparedStatement的用法总结
    ReactiveCocoa源码解读(二)
    ReactiveCocoa源码解读(一)
    ReactiveCocoa应用篇(二)
    ReactiveCocoa应用篇(一)
    iOS
    Monad详解
  • 原文地址:https://www.cnblogs.com/zhang-pengcheng/p/5017893.html
Copyright © 2011-2022 走看看