zoukankan      html  css  js  c++  java
  • anthelion编译

    编程工程

    $ cd ./anthelion/anthelion/target/classes
    $ java -Xmx15G -cp ../Anthelion-1.0.0-jar-with-dependencies.jar com.yahoo.research.robme.anthelion.simulation.CCFakeCrawler ./index ./network ./label ../../config/baseline.properties result.log

    Necessary files:

    • index: the mapping between ID and URL
    • network: the graph including the IDs from the index
    • label: list of the IDs which fulfil the target function
    • properties: configuration file (a set of configuration files can be found in the resource folder of the distribution)
    • result: the location where the information about the performance and the crawling process are stored

    The files which we used to measure the performance when crawling for HTML pages including Microdata, Microformats and RDFa can be found on the dedicated page of the WebDataCommons project: http://webdatacommons.org/structureddata/anthelion/

    Available actions within the simulation process:

    • Run "init" to initialize the crawler (loading the network, labels and create the features).
    • Run "start" to start the crawler and simulate a crawl. Output is written to the result.log
    • Use "stop" to stop the simulation
    • Run "exit" to shut down
    • Use "status" to observe the crawling process.
  • 相关阅读:
    spring快速入门
    Vue整合ElementUI搭建项目
    .Net的Rsa解密
    Maven配置国内仓库
    pom.xml
    SpringBoot文件打包后修改配文件
    .net 过滤器
    c#语法糖汇总
    git修改远程地址
    abp Application层,接口服务层,获取请求的信息
  • 原文地址:https://www.cnblogs.com/timssd/p/5171148.html
Copyright © 2011-2022 走看看