zoukankan      html  css  js  c++  java
  • 大数据(日志分析)项目

    需求:统计一个网页的访问次数pv,及其用户访问量uv

    环境:hadoop-cdh伪分布式,mysql,hive,sqoop,springboot

    思路:

    编码开始:

    1.将爬取的元数据上传至hdfs文件系统

    bin/hdfs dfs -mkdir /project
    bin/hdfs dfs -put /usr/local/2015082818 /project
    bin/hdfs dfs -put /usr/local/2015082819 /project

    2.hive创建源文件表

    create table yhd_source(
    id               string ,
    url              string ,
    referer          string ,
    keyword          string ,
    type             string ,
    guid             string ,
    pageId           string ,
    moduleId         string ,
    linkId           string ,
    attachedInfo     string ,
    sessionId        string ,
    trackerU         string ,
    trackerType      string ,
    ip               string ,
    trackerSrc       string ,
    cookie           string ,
    orderCode        string ,
    trackTime        string ,
    endUserId        string ,
    firstLink        string ,
    sessionViewNo    string ,
    productId        string ,
    curMerchantId    string ,
    provinceId       string ,
    cityId           string ,
    fee              string ,
    edmActivity      string ,
    edmEmail         string ,
    edmJobId         string ,
    ieVersion        string ,
    platform         string ,
    internalKeyword  string ,
    resultSum        string ,
    currentPage      string ,
    linkPosition     string ,
    buttonPosition   string 
    )row format delimited fields terminated by '	' location '/project';

    3.建一张清洗表,将时间字段清洗,提取部分的时间字段出来

    create table yhd_qingxi(
    id string,
    url string,
    guid string,
    date string,
    hour string
    )
    row format delimited fields terminated by '	';

    4.字段截取,天&小时

    insert into table yhd_qingxi select id,url,guid,substring(trackTime,9,2) date,substring(trackTime,12,2) hour from yhd_source;

    5.分区的方式:hive静态分区

    create table yhd_part1(
    id string,
    url string,
    guid string
    )
    partitioned by (date string,hour string)
    row format delimited fields terminated by '	';

    6.加载数据,来源于source源表

    insert into table yhd_part1 partition (date='20150828',hour='18') select id,url,guid from yhd_qingxi where date='28' and hour='18';
    insert into table yhd_part1 partition (date='20150828',hour='19') select id,url,guid from yhd_qingxi where date='28' and hour='19';

    7.PV实现:

    select date,hour,count(url) PV from yhd_part1 group by date,hour;
    -》按照天和小时进行分区
    -》结果:
    +-----------+-------+--------+--+
    |   date    | hour  |   pv   |
    +-----------+-------+--------+--+
    | 20150828  | 18    | 64972  |
    | 20150828  | 19    | 61162  |
    +-----------+-------+--------+--+

    8.uv实现

    select date,hour,count(distinct guid) UV from yhd_part1 group by date,hour; 
    
    -》结果:
    +-----------+-------+--------+--+
    |   date    | hour  |   uv   |
    +-----------+-------+--------+--+
    | 20150828  | 18    | 23938  |
    | 20150828  | 19    | 22330  |
    +-----------+-------+--------+--+

    9.pv与uv整合

    create table if not exists result as select date,hour,count(url) PV ,count(distinct guid) UV from yhd_part1 group by date,hour; 
    
    -》结果:
    +--------------+--------------+------------+------------+--+
    | result.date  | result.hour  | result.pv  | result.uv  |
    +--------------+--------------+------------+------------+--+
    | 20150828     | 18           | 64972      | 23938      |
    | 20150828     | 19           | 61162      | 22330      |
    +--------------+--------------+------------+------------+--+

    10.将结果导出到mysql表中

    将结果导出到mysql表中
    
    先在mysql建表:用于保存结果集
    create table if not exists save(
    date varchar(30) not null,
    hour varchar(30) not null,
    pv varchar(30) not null,
    uv varchar(30) not null
    );
    
    使用sqoop实现导出到mysql
    
    bin/sqoop export 
    --connect 
    jdbc:mysql://hadoop5.lhcedu.com:3306/sqoop 
    --username root 
    --password 1234456 
    --table save 
    --export-dir /user/hive/warehouse/lhc125.db/result 
    --num-mappers 1 
    --input-fields-terminated-by '01'
    
    +----------+------+-------+-------+
    | date     | hour | pv    | uv    |
    +----------+------+-------+-------+
    | 20150828 | 18   | 64972 | 23938 |
    | 20150828 | 19   | 61162 | 22330 |
    +----------+------+-------+-------+
    
    
    hive默认的分隔符:001

    11.

    动态分区

    分区的方式:动态分区

    hive-site.xml
    
    <property>
      <name>hive.exec.dynamic.partition</name>
      <value>true</value>
      <description>Whether or not to allow dynamic partitions in DML/DDL.</description>
    </property>
    -》默认值是true,代表允许使用动态分区实现
    
    <property>
      <name>hive.exec.dynamic.partition.mode</name>
      <value>strict</value>
      <description>In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions.</description>
    </property>
    
    
    -set hive.exec.dynamic.partition.mode=nonstrict;  使用非严格模式
    
    建表:
    
    create table yhd_part2(
    id string,
    url string,
    guid string
    )
    partitioned by (date string,hour string)
    row format delimited fields terminated by '	';
    
    执行动态分区:
    insert into table yhd_part2 partition (date,hour) select * from yhd_qingxi;

    spring-boot整合Echarts完成图形化报表,

    echarts官网案例:https://echarts.baidu.com/examples/editor.html?c=area-stack

     常规项目环境下:

    1.在jsp页面引入echarts-min.js库

    <%@ page contentType="text/html;charset=UTF-8"  pageEncoding="UTF-8" %>
    <!DOCTYPE html>
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
    <meta content="always" name="referrer">
    <title>file</title>
    <script type="text/javascript" src="${pageContext.request.contextPath}/jquery.min.js"></script>
    <script src="${pageContext.request.contextPath}/echarts.min.js"></script>
    
    <body>
    
    <!-- 为 ECharts 准备一个具备大小(宽高)的 DOM -->
    <div id="ec" style=" 600px;height:400px;"></div>
    <script type="text/javascript">
        var myChart = echarts.init(document.getElementById('ec'));
    
        option = {
            title: {
                text: '一号店pv/uv统计图'
            },
            tooltip : {
                trigger: 'axis',
                axisPointer: {
                    type: 'cross',
                    label: {
                        backgroundColor: '#6a7985'
                    }
                }
            },
            legend: {
                data:['PV','UV']
            },
            toolbox: {
                feature: {
                    saveAsImage: {}
                }
            },
            grid: {
                left: '3%',
                right: '4%',
                bottom: '3%',
                containLabel: true
            },
            xAxis : [
                {
    
                }
            ],
            yAxis : [
                {
                    /*type : 'value'*/
                }
            ],
           series : [
    
    
            ]
        };
        myChart.setOption(option);
        $.ajax({
            type: "get",
            url: "${pageContext.request.contextPath}/re/query",
            dataType: "JSON",
            success: function (data) {
                myChart.setOption({
                    xAxis : [
                        {
                            type : 'category',
                            boundaryGap : false,
                            data : data.h
                        }
                    ],
                    series: [ {
                        name:'PV',
                        type:'line',
                        stack: '总量',
                        areaStyle: {},
                        data:data.pv
                    },
                        {
                            name:'UV',
                            type:'line',
                            stack: '总量',
                            areaStyle: {},
                            data:data.uv
                        }
                    ]
                })
            }
        })
    </script>
    </body>
    </html>

    2.controller封装完成echarts需要的数据结构的封装

    package com.lhc.controller;
    
    import com.lhc.entity.Result;
    import com.lhc.service.ResultService;
    import org.springframework.beans.factory.annotation.Autowired;
    import org.springframework.stereotype.Controller;
    import org.springframework.web.bind.annotation.RequestMapping;
    import org.springframework.web.bind.annotation.ResponseBody;
    
    import java.util.ArrayList;
    import java.util.HashMap;
    import java.util.List;
    import java.util.Map;
    
    @Controller
    @RequestMapping("re")
    public class ResultController {
        @Autowired
        ResultService resultService;
    
        @RequestMapping("query")
        @ResponseBody
        public Map query(){
            List<Result> results = resultService.queryResult();
            List<String> pv = new ArrayList();
            List<String> uv = new ArrayList();
            List<String> date = new ArrayList<>();
            for (Result result : results) {
                pv.add(result.getPv());
                uv.add(result.getUv());
                date.add(result.getHour());
            }
            Map map = new HashMap();
            map.put("pv",pv);
            map.put("uv",uv);
            map.put("h",date);
            return map;
        }
    }

    结果,完成图形化报表

  • 相关阅读:
    bzoj3653: 谈笑风生
    bzoj1858: [Scoi2010]序列操作
    bzoj1857: [Scoi2010]传送带
    bzoj1856: [Scoi2010]字符串
    bzoj1855: [Scoi2010]股票交易
    bzoj1854: [Scoi2010]游戏
    bzoj1853: [Scoi2010]幸运数字
    斜堆,非旋转treap,替罪羊树
    NOI2003 文本编辑器
    A_star poj2449 k短路
  • 原文地址:https://www.cnblogs.com/lhc-hhh/p/10386100.html
Copyright © 2011-2022 走看看