zoukankan      html  css  js  c++  java
  • drillercppwebcrawler

    Who am i:
    my name is meir yanovich and im c++/java developer mostly doing server infra cross platform (unix/linux/window) stuff in my day job.
    but sometimes i like to experiment stuff in my spare time.
    also If you interested in facebook api and ways to interact this project may interest
    you:
    http://code.google.com/p/facebook-cpp-graph-api/
    or if you have young kids:
    http://code.google.com/p/kidsbrowser/


    you can find my on-line profile in here :
    http://il.linkedin.com/in/meiryanovich 
    if you have any cool ideas on how to use this code and you need help please email me
    Email: meiry242@gmail.com

    Implementation of web crawler / spider in c++ 
    ------------------------------------------------------------------------------------

    Web crawler / spider used for web data mining or data aggregations 

    • using regular expressions rules to collect data.
    • Programmed using pure c++ (stl) and bunch of open source libraries.
    • web spider that can fallow links based on single domain.
    • output to xml file with configurable tags.

     

    I tried to keep the "keep it simple keep it clean" rule , using as much of ready made open source c/c++ libraries.

    How to build it:
    The application only tested on windows xp 32 bit although I pay attention on using only cross platform libraries.
    and not to write OS depended code.
    The libraries the Driller depend on are :

    • pcre : for regular expressions.
    • Pthreads : for cross platform threads wrapper.
    • Curl + c-ares : for http requests / response.
    In Driller source code I supply visual studio express 2008 solution and project files and all the libraries are already build in debug mode. all you have to do is configure it and build it
    this will save you time on configuring and compiling to test the application.
    for more information see *how_to_build_drill*

    How to configure it:
    The driller web spider doesn’t come with fancy configuration GUI or configuration file.
    All configurations must be done in code , then compile it then run it and see the results come in.
    The reasons is because I used it for my personal use without much time in my hands and didn't planed to Open source it ..any way all those features will be added later.
    Step by step guide can be found here in how_to_configure_drill.


    if you find this useful consider to donate.
    all donations will go to charity.

  • 相关阅读:
    pyVmomi入门
    一个简单的web框架实现
    H3C交换配置PBR最佳实践
    jQuery入门第三
    jQuery入门第二
    JS笔记 入门第四
    JS笔记 入门第三
    JS笔记 入门第二
    day16
    python之路-DAY10
  • 原文地址:https://www.cnblogs.com/lexus/p/2559700.html
Copyright © 2011-2022 走看看