zoukankan      html  css  js  c++  java
  • crawler with data analysis (Hadoop, MapReduce, HBase) Phase I Data Modeling

    http://www.donanza.com/jobs/p3315101-crawler_with_data_analysis_hadoop_mapreduce_hbase_phase_i

    crawler with data analysis (Hadoop, MapReduce, HBase) - Phase I - Data Modeling

    Goal for Phase 1: given a topic in English (e.g. "skiing"), crawl the web (sites, blogs, social media) and collect 1 million relevant articles/pages/posts/documents. Perform analysis and generate meaningful reports on the topic, potentially including top keywords, concepts, related topics or concepts, Optional task (bonus): add "intelligence" to your analysis, by determining rank/reputation, sentiment (negative vs. positive), type (opinion article vs. advertisement vs. for sale ad vs. wanted ad) - we are flexible and open to ideas. Development/staging environment: 3-node cluster, CentOS 5.6 and Cloudera CDH3 (Hadoop, MapReduce, Hue, Pig, Flume, HBase) + one management machine with CDH. If you bid on this job, please describe your prior experience with Big Data, and tell us how you would approach this problem, a high-level overview of steps you will need to perform... It's important for us to see the way you approach problems. We speak English and Russian fluently. Depending on your approach, we will define milestones and timeline together. This is Phase I of the project, do your best! Desired Skills: Data Modeling, Scripts & Utilities, CentOS, Hadoop, MapReduce

  • 相关阅读:
    算法模板——线性欧拉函数
    2818: Gcd
    1688: [Usaco2005 Open]Disease Manangement 疾病管理
    3314: [Usaco2013 Nov]Crowded Cows
    3450: Tyvj1952 Easy
    1664: [Usaco2006 Open]County Fair Events 参加节日庆祝
    1054: [HAOI2008]移动玩具
    1432: [ZJOI2009]Function
    1121: [POI2008]激光发射器SZK
    1113: [Poi2008]海报PLA
  • 原文地址:https://www.cnblogs.com/lexus/p/2196481.html
Copyright © 2011-2022 走看看