zoukankan      html  css  js  c++  java
  • Anemone

    An easy-to-use Ruby web spider framework

    What is it?

    Anemone is a Ruby library that makes it quick and painless to write programs that spider a website. It provides a simple DSL for performing actions on every page of a site, skipping certain URLs, and calculating the shortest path to a given page on a site.

    The multi-threaded design makes Anemone fast. The API makes it simple. And the expressiveness of Ruby makes it powerful.

    What's new?

     

    • 02/17/2011 - Version 0.6.0 released. Added support for proxies, HTTP Basic Auth, and HTTP read timeout. Fixed a bug with double-encoding links and with erroring on a read timeout.
    • 09/01/2010 - GitHub Issue Tracker - The Anemone project issue tracker has moved from Lighthouse to GitHub Issues.
    • 09/01/2010 - Version 0.5.0 released. Added Redis and MongoDB page storage engines, and skip_query_strings option.

     

    Where do I get it?

    $ gem install anemone

    You can also browse the code on GitHub.

    How do I use it?

    To get the most out of Anemone, read through the technical information and examples and the RDoc documentation.

    You can use Anemone to write tasks to gather useful statistics on your websites. Just point Anemone at a URL, and it will crawl every page in that domain. You can also tell Anemone to skip pages that match certain regular expressions. Using blocks, you tell Anemone what code to run on every page, or after it's done crawling.

    For example, to print the URL of every page on a site:

    require 'anemone'
    
    Anemone.crawl("http://www.example.com/") do |anemone|
      anemone.on_every_page do |page|
          puts page.url
      end
    end

    Anemone also comes with a command-line frontend for several web-spider tasks. Just run 'anemone' on the command-line. The source for several example programs is in the lib/anemone/cli directory of the project.

    Who wrote it?

    Anemone is written and maintained by Chris Kite. Development is sponsored by Vertive, Inc., the creator ofOffers.com. The Anemone logo was created by Ismael Ayala.

    Anemone is free to use under the terms of the MIT License.

    I have a problem or a suggestion!

    Check out the Anemone issue tracker, or contact the author.

  • 相关阅读:
    深入理解java异常处理机制
    i2c总线
    运行时类型识别(RTTI)
    bcg界面库总结
    c++ 虚函数表
    宽字符,宽字符串
    c++异常处理
    内存管理
    c++中使用联合体
    MFC自定义消息
  • 原文地址:https://www.cnblogs.com/lexus/p/2429088.html
Copyright © 2011-2022 走看看