zoukankan      html  css  js  c++  java
  • 使用hadoop平台进行小型网站日志分析


    0.上传日志文件到linux中,通过flume将文件收集到hdfs中。
    执行命令/home/cloud/flume/bin/flume-ng agent -n a4 -c conf -f /home/cloud/flume/conf/a4.conf -Dflume.root.logger=DEBUG,console


    1.建立hive表
    create external table bbslog (ip string,logtime string,url string) partitioned by (logdate string) row format delimited fields terminated by ' ' location '/cleaned';


    2.创建shell脚本
    touch daily.sh
    添加执行权限
    chmod +x daily.sh
    daily.sh:
    CURRENT=`date +%Y%m%d`
    #对数据进行清理,保存到cleaned文件夹,按照当前日期进行保存
    /home/cloud/hadoop/bin/hadoop jar /home/cloud/cleaner.jar /flume/$CURRENT /cleaned/$CURRENT
    #修改hive表,添加当前日期的分区
    /home/cloud/hive/bin/hive -e "alter table bbslog add partition (logdate=$CURRENT) location 'cleaned/$CURRENT'"
    #使用hive进行分析,根据业务需求而定
    #统计pv并计入每日的pv表
    /home/cloud/hive/bin/hive -e "create table pv_$CURRENT row format delimited fields terminated by ' ' as select count(*) from bbslog where logdate=$CURRENT;"
    #统计点击次数过20的潜在用户
    /home/cloud/hive/bin/hive -e "create table vip _$CURRENT row format delimited fields terminated by ' ' as select $CURRENT,ip,count(*) as hits from bbslog where logdate=$CURRENT group by ip having hits > 20 order by hits desc"
    #查询uv
    /home/cloud/hive/bin/hive -e "create table uv_$CURRENT row format delimited fields terminated by ' ' as select count(distinct ip) from bbslog where logdate=$CURRENT"
    #查询每天的注册人数
    /home/cloud/hive/bin/hive -e "create table reg_$CURRENT row format delimited fields terminated by ' ' as select count(*) from bbslog where logdate=$CURRENT AND instr(url,'member.php?mod=register')>0"
    #将hive表中的数据导入mysql
    /home/cloud/sqoop/bin/sqoop export --connect jdbc:mysql://cloud3:3306/jchubby --username root --password JChubby123 --export-dir "/user/hive/warehouse/vip_$CURRENT" --table vip --fields-terminated-by ' ' 
  • 相关阅读:
    域名申请攻略(以godaddy+支付宝为例)
    初始java白盒测试junit的使用
    微型oracle学习使用—oracle XE(oracle express edition
    VBS学习创建桌面快捷方式
    强烈推荐Oracle的入门心得
    Godaddy域名注册详细图文教程(转)
    如何用WordPress搭建自己的博客(转)
    图解eclipse+myeclipse完全绿色版制作过程
    java整理的经典的bug问题白盒问题(转)
    eclipse插件整理集合(包括myeclipse插件)转
  • 原文地址:https://www.cnblogs.com/jchubby/p/4429684.html
Copyright © 2011-2022 走看看