zoukankan      html  css  js  c++  java
  • drillercppwebcrawler

    Who am i:
    my name is meir yanovich and im c++/java developer mostly doing server infra cross platform (unix/linux/window) stuff in my day job.
    but sometimes i like to experiment stuff in my spare time.
    also If you interested in facebook api and ways to interact this project may interest
    you:
    http://code.google.com/p/facebook-cpp-graph-api/
    or if you have young kids:
    http://code.google.com/p/kidsbrowser/


    you can find my on-line profile in here :
    http://il.linkedin.com/in/meiryanovich 
    if you have any cool ideas on how to use this code and you need help please email me
    Email: meiry242@gmail.com

    Implementation of web crawler / spider in c++ 
    ------------------------------------------------------------------------------------

    Web crawler / spider used for web data mining or data aggregations 

    • using regular expressions rules to collect data.
    • Programmed using pure c++ (stl) and bunch of open source libraries.
    • web spider that can fallow links based on single domain.
    • output to xml file with configurable tags.

     

    I tried to keep the "keep it simple keep it clean" rule , using as much of ready made open source c/c++ libraries.

    How to build it:
    The application only tested on windows xp 32 bit although I pay attention on using only cross platform libraries.
    and not to write OS depended code.
    The libraries the Driller depend on are :

    • pcre : for regular expressions.
    • Pthreads : for cross platform threads wrapper.
    • Curl + c-ares : for http requests / response.
    In Driller source code I supply visual studio express 2008 solution and project files and all the libraries are already build in debug mode. all you have to do is configure it and build it
    this will save you time on configuring and compiling to test the application.
    for more information see *how_to_build_drill*

    How to configure it:
    The driller web spider doesn’t come with fancy configuration GUI or configuration file.
    All configurations must be done in code , then compile it then run it and see the results come in.
    The reasons is because I used it for my personal use without much time in my hands and didn't planed to Open source it ..any way all those features will be added later.
    Step by step guide can be found here in how_to_configure_drill.


    if you find this useful consider to donate.
    all donations will go to charity.

  • 相关阅读:
    Mac 安装 Python3
    在push的时候发生崩溃信息
    Swift3.0 和 Swift3.0.2的区别
    01- 简单值
    @objc || private || 按钮的点击事件
    extension的作用
    swift3.0中使用代码添加选中图片
    OC-创建瀑布流
    springboot项目控制台日志不是彩色的原因
    更改docker里mysql的字符编码
  • 原文地址:https://www.cnblogs.com/lexus/p/2559700.html
Copyright © 2011-2022 走看看