zoukankan      html  css  js  c++  java
  • MapReduce_PVUV

    在互联网环境下,一般网站都需要堆网站的pv,uv进行数据统计,简单理解下pv 就是url被访问的次数,uv则是url被不同ip访问的次数

    简单来说pv就是访问量,即点击量总量(不去重,所有相同ip访问多次也属于点击多次)

    简单来说uv就是会话量,即1个ip算一次访问量,实现ip去重.

    PV:

    测试数据

    192.168.1.1 aa
    192.168.1.2 bb
    192.168.1.3 cc
    192.168.1.1 dd
    192.168.1.1 ee

     1 package MapReduce;
     2 
     3 import org.apache.hadoop.conf.Configuration;
     4 import org.apache.hadoop.fs.FileSystem;
     5 import org.apache.hadoop.fs.Path;
     6 import org.apache.hadoop.io.IntWritable;
     7 import org.apache.hadoop.io.LongWritable;
     8 import org.apache.hadoop.io.Text;
     9 import org.apache.hadoop.mapreduce.Mapper;
    10 import org.apache.hadoop.mapreduce.Reducer;
    11 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    12 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    13 
    14 import org.apache.hadoop.mapreduce.Job;
    15 import java.io.IOException;
    16 import java.net.URI;
    17 import java.net.URISyntaxException;
    18 import java.util.Iterator;
    19 import java.util.StringTokenizer;
    20 
    21 
    22 public class IpPv {
    23     private static final String INPUT_PATH = "hdfs://h201:9000/user/hadoop/input";
    24     private static final String OUTPUT_PATH = "hdfs://h201:9000/user/hadoop/output";
    25     public static class IpPvUvMap extends Mapper<LongWritable, Text,Text, IntWritable> {
    26         IntWritable one = new IntWritable(1);
    27         public void map(LongWritable key, Text value, Context context) throws IOException , InterruptedException {
    28             String ip = value.toString().split(" ", 5)[0];
    29             context.write(new Text("pv"), one);
    30         }                
    31     }
    32 
    33     public static class IpPvUvReduce extends Reducer<Text, IntWritable,Text, IntWritable> {
    34             protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {                    
    35                  int sum = 0;
    36                 for(IntWritable value:values){
    37                 sum+=value.get();
    38                 //if(sum > 100)#将map pv改为ip可求输出top100
    39             }
    40             context.write(key, new IntWritable(sum));
    41             }
    42     }
    43             
    44 
    45     public static void main(String[] args) throws IOException, URISyntaxException, ClassNotFoundException, InterruptedException {
    46         System.out.println(args.length);
    47         Configuration conf = new Configuration();
    48         conf.set("mapred.jar","pv.jar");//申明jar名字为wcapp.jar
    49         final FileSystem fileSystem = FileSystem.get(new URI(OUTPUT_PATH), conf);//读路径信息
    50         fileSystem.delete(new Path(OUTPUT_PATH), true);//删除路径信息 输出路径不能存在
    51         Job job = new Job(conf, "PV");
    52         job.setJarByClass(IpPv.class);//启job任务
    53         FileInputFormat.setInputPaths(job, INPUT_PATH);
    54 
    55         //set mapper & reducer class
    56         job.setMapperClass(IpPvUvMap.class);
    57         job.setMapOutputKeyClass(Text.class);
    58         job.setMapOutputValueClass(IntWritable.class);
    59         
    60         job.setCombinerClass(IpPvUvReduce.class);
    61         job.setReducerClass(IpPvUvReduce.class);
    62         //set output key class
    63         job.setOutputKeyClass(Text.class);
    64         job.setOutputValueClass(IntWritable.class);
    65         FileOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH));//输出
    66         System.exit(job.waitForCompletion(true) ? 0 : 1);
    67     }
    68 }

    [hadoop@h201 IpPv]$ /usr/jdk1.7.0_25/bin/javac IpPv.java
    Note: IpPv.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    [hadoop@h201 IpPv]$ /usr/jdk1.7.0_25/bin/jar cvf pv.jar IpPv*class
    added manifest
    adding: IpPv.class(in = 2207) (out= 1116)(deflated 49%)
    adding: IpPv$IpPvUvMap.class(in = 1682) (out= 658)(deflated 60%)
    adding: IpPv$IpPvUvReduce.class(in = 1743) (out= 752)(deflated 56%)
    [hadoop@h201 IpPv]$ hadoop jar pv.jar IpPv
    18/04/22 20:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    18/04/22 20:08:13 INFO client.RMProxy: Connecting to ResourceManager at h201/192.168.121.132:8032
    18/04/22 20:08:13 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    18/04/22 20:08:14 INFO input.FileInputFormat: Total input paths to process : 1
    18/04/22 20:08:14 INFO mapreduce.JobSubmitter: number of splits:1
    18/04/22 20:08:14 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    18/04/22 20:08:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1516635595760_0053
    18/04/22 20:08:14 INFO impl.YarnClientImpl: Submitted application application_1516635595760_0053
    18/04/22 20:08:14 INFO mapreduce.Job: The url to track the job: http://h201:8088/proxy/application_1516635595760_0053/
    18/04/22 20:08:14 INFO mapreduce.Job: Running job: job_1516635595760_0053
    18/04/22 20:08:21 INFO mapreduce.Job: Job job_1516635595760_0053 running in uber mode : false
    18/04/22 20:08:21 INFO mapreduce.Job:  map 0% reduce 0%
    18/04/22 20:08:29 INFO mapreduce.Job:  map 100% reduce 0%
    18/04/22 20:08:36 INFO mapreduce.Job:  map 100% reduce 100%
    18/04/22 20:08:36 INFO mapreduce.Job: Job job_1516635595760_0053 completed successfully
    18/04/22 20:08:36 INFO mapreduce.Job: Counters: 49
            File System Counters
                    FILE: Number of bytes read=15
                    FILE: Number of bytes written=219335
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=182
                    HDFS: Number of bytes written=5
                    HDFS: Number of read operations=6
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=2
            Job Counters
                    Launched map tasks=1
                    Launched reduce tasks=1
                    Data-local map tasks=1
                    Total time spent by all maps in occupied slots (ms)=5545
                    Total time spent by all reduces in occupied slots (ms)=3564
                    Total time spent by all map tasks (ms)=5545
                    Total time spent by all reduce tasks (ms)=3564
                    Total vcore-seconds taken by all map tasks=5545
                    Total vcore-seconds taken by all reduce tasks=3564
                    Total megabyte-seconds taken by all map tasks=5678080
                    Total megabyte-seconds taken by all reduce tasks=3649536
            Map-Reduce Framework
                    Map input records=5
                    Map output records=5
                    Map output bytes=35
                    Map output materialized bytes=15
                    Input split bytes=107
                    Combine input records=5
                    Combine output records=1
                    Reduce input groups=1
                    Reduce shuffle bytes=15
                    Reduce input records=1
                    Reduce output records=1
                    Spilled Records=2
                    Shuffled Maps =1
                    Failed Shuffles=0
                    Merged Map outputs=1
                    GC time elapsed (ms)=677
                    CPU time spent (ms)=1350
                    Physical memory (bytes) snapshot=224731136
                    Virtual memory (bytes) snapshot=2147983360
                    Total committed heap usage (bytes)=136712192
            Shuffle Errors
                    BAD_ID=0
                    CONNECTION=0
                    IO_ERROR=0
                    WRONG_LENGTH=0
                    WRONG_MAP=0
                    WRONG_REDUCE=0
            File Input Format Counters
                    Bytes Read=75
            File Output Format Counters
                    Bytes Written=5

    结果:

    [hadoop@h201 ~]$ hadoop fs -cat /user/hadoop/output/part-r-00000
    18/04/22 20:08:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    pv      5

    UV:

     1 package MapReduce;
     2 
     3 import java.io.IOException;
     4 import java.util.Iterator;
     5 import java.net.URI;
     6 import java.net.URISyntaxException;
     7 
     8 import org.apache.hadoop.fs.Path;
     9 import org.apache.hadoop.io.IntWritable;
    10 import org.apache.hadoop.io.LongWritable;
    11 import org.apache.hadoop.io.Text;
    12 import org.apache.hadoop.conf.Configuration;
    13 import org.apache.hadoop.fs.FileSystem;
    14 import org.apache.hadoop.mapreduce.Mapper;
    15 import org.apache.hadoop.mapreduce.Reducer;
    16 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    17 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    18 import org.apache.hadoop.mapreduce.Job;
    19 
    20 public class IpUv {
    21     private static final String INPUT_PATH = "hdfs://h201:9000/user/hadoop/input2";
    22     private static final String OUTPUT_PATH = "hdfs://h201:9000/user/hadoop/output";
    23     private final static IntWritable one = new IntWritable(1);
    24     public static  class IpUvMapper1 extends Mapper<LongWritable, Text,Text, IntWritable> {
    25         public void map(LongWritable key, Text value, Context context) throws IOException , InterruptedException {
    26             String ip = value.toString().split(" ", 5)[0];
    27             context.write(new Text(ip.trim()), one);//trim()去掉俩端多余的空格
    28         }
    29     }
    30     public static class IpUvReducer1 extends Reducer<Text, IntWritable, Text,IntWritable> {
    31         public void reduce(Text key, Iterable<IntWritable> value, Context context) throws IOException, InterruptedException {
    32             context.write(key, new IntWritable(1));
    33         }
    34     }
    35     public static class IpUvMapper2 extends Mapper<LongWritable, Text,Text, IntWritable>{
    36         public void map(LongWritable longWritable, Text text, Context context) throws IOException, InterruptedException {
    37             String ip = text.toString().split("	")[0];
    38             context.write(new Text("uv"), one);
    39         }
    40     }
    41     public static class IpUvReducer2 extends Reducer<Text, IntWritable, Text,IntWritable>{
    42         public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
    43             int sum = 0;
    44             /**
    45              * uv, [1,1,1,1,1,1]
    46              */
    47             for (IntWritable value:values){
    48                 sum+=value.get(); 
    49             }
    50             context.write(key, new IntWritable(sum));
    51         }
    52     }
    53 
    54     public static void main(String [] args) throws IOException, ClassNotFoundException, InterruptedException, URISyntaxException {
    55         Configuration conf = new Configuration();
    56         conf.set("mapred.jar","uv.jar");
    57         final FileSystem fileSystem = FileSystem.get(new URI(OUTPUT_PATH), conf);
    58         fileSystem.delete(new Path(OUTPUT_PATH), true);
    59         Job job = new Job(conf, "UV");
    60         job.setJarByClass(IpUv.class);
    61         FileInputFormat.setInputPaths(job, INPUT_PATH);
    62         job.setOutputKeyClass(Text.class);
    63         job.setOutputValueClass(IntWritable.class);
    64 
    65         //set mapper & reducer class
    66         job.setMapperClass(IpUvMapper1.class);
    67         job.setReducerClass(IpUvReducer1.class);
    68         FileOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH));
    69         
    70         if(job.waitForCompletion(true)){
    71             Configuration conf1 = new Configuration();
    72             final FileSystem fileSystem1 = FileSystem.get(new URI(OUTPUT_PATH + "-2"), conf1);
    73             fileSystem1.delete(new Path(OUTPUT_PATH + "-2"), true);
    74             Job job1 = new Job(conf1, "UV");
    75             job1.setJarByClass(IpUv.class);
    76             FileInputFormat.setInputPaths(job1, OUTPUT_PATH);
    77             job1.setOutputKeyClass(Text.class);
    78             job1.setOutputValueClass(IntWritable.class);
    79 
    80             //set mapper & reducer class
    81             job1.setMapperClass(IpUvMapper2.class);
    82             job1.setReducerClass(IpUvReducer2.class);
    83 
    84             FileOutputFormat.setOutputPath(job1, new Path(OUTPUT_PATH + "-2"));
    85             System.exit(job1.waitForCompletion(true) ? 0 : 1);
    86         }
    87     }
    88 }

    [hadoop@h201 IpUv]$ /usr/jdk1.7.0_25/bin/javac IpUv.java
    Note: IpUv.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    [hadoop@h201 IpUv]$ /usr/jdk1.7.0_25/bin/jar cvf uv.jar IpUv*class
    added manifest
    adding: IpUv.class(in = 2656) (out= 1323)(deflated 50%)
    adding: IpUv$IpUvMapper1.class(in = 1563) (out= 609)(deflated 61%)
    adding: IpUv$IpUvMapper2.class(in = 1569) (out= 616)(deflated 60%)
    adding: IpUv$IpUvReducer1.class(in = 1344) (out= 513)(deflated 61%)
    adding: IpUv$IpUvReducer2.class(in = 1624) (out= 684)(deflated 57%)
    [hadoop@h201 IpUv]$ hadoop jar uv.jar IpUv
    18/04/22 20:20:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    18/04/22 20:20:07 INFO client.RMProxy: Connecting to ResourceManager at h201/192.168.121.132:8032
    18/04/22 20:20:07 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    18/04/22 20:20:08 INFO input.FileInputFormat: Total input paths to process : 1
    18/04/22 20:20:08 INFO mapreduce.JobSubmitter: number of splits:1
    18/04/22 20:20:08 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    18/04/22 20:20:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1516635595760_0054
    18/04/22 20:20:08 INFO impl.YarnClientImpl: Submitted application application_1516635595760_0054
    18/04/22 20:20:08 INFO mapreduce.Job: The url to track the job: http://h201:8088/proxy/application_1516635595760_0054/
    18/04/22 20:20:08 INFO mapreduce.Job: Running job: job_1516635595760_0054
    18/04/22 20:20:15 INFO mapreduce.Job: Job job_1516635595760_0054 running in uber mode : false
    18/04/22 20:20:15 INFO mapreduce.Job:  map 0% reduce 0%
    18/04/22 20:20:23 INFO mapreduce.Job:  map 100% reduce 0%
    18/04/22 20:20:29 INFO mapreduce.Job:  map 100% reduce 100%
    18/04/22 20:20:29 INFO mapreduce.Job: Job job_1516635595760_0054 completed successfully
    18/04/22 20:20:29 INFO mapreduce.Job: Counters: 49
            File System Counters
                    FILE: Number of bytes read=96
                    FILE: Number of bytes written=218531
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=182
                    HDFS: Number of bytes written=42
                    HDFS: Number of read operations=6
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=2
            Job Counters
                    Launched map tasks=1
                    Launched reduce tasks=1
                    Data-local map tasks=1
                    Total time spent by all maps in occupied slots (ms)=5370
                    Total time spent by all reduces in occupied slots (ms)=3060
                    Total time spent by all map tasks (ms)=5370
                    Total time spent by all reduce tasks (ms)=3060
                    Total vcore-seconds taken by all map tasks=5370
                    Total vcore-seconds taken by all reduce tasks=3060
                    Total megabyte-seconds taken by all map tasks=5498880
                    Total megabyte-seconds taken by all reduce tasks=3133440
            Map-Reduce Framework
                    Map input records=5
                    Map output records=5
                    Map output bytes=80
                    Map output materialized bytes=96
                    Input split bytes=107
                    Combine input records=0
                    Combine output records=0
                    Reduce input groups=3
                    Reduce shuffle bytes=96
                    Reduce input records=5
                    Reduce output records=3
                    Spilled Records=10
                    Shuffled Maps =1
                    Failed Shuffles=0
                    Merged Map outputs=1
                    GC time elapsed (ms)=295
                    CPU time spent (ms)=1240
                    Physical memory (bytes) snapshot=224301056
                    Virtual memory (bytes) snapshot=2147659776
                    Total committed heap usage (bytes)=136712192
            Shuffle Errors
                    BAD_ID=0
                    CONNECTION=0
                    IO_ERROR=0
                    WRONG_LENGTH=0
                    WRONG_MAP=0
                    WRONG_REDUCE=0
            File Input Format Counters
                    Bytes Read=75
            File Output Format Counters
                    Bytes Written=42
    18/04/22 20:20:29 INFO client.RMProxy: Connecting to ResourceManager at h201/192.168.121.132:8032
    18/04/22 20:20:29 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    18/04/22 20:20:29 INFO input.FileInputFormat: Total input paths to process : 1
    18/04/22 20:20:29 INFO mapreduce.JobSubmitter: number of splits:1
    18/04/22 20:20:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1516635595760_0055
    18/04/22 20:20:29 INFO impl.YarnClientImpl: Submitted application application_1516635595760_0055
    18/04/22 20:20:29 INFO mapreduce.Job: The url to track the job: http://h201:8088/proxy/application_1516635595760_0055/
    18/04/22 20:20:29 INFO mapreduce.Job: Running job: job_1516635595760_0055
    18/04/22 20:20:36 INFO mapreduce.Job: Job job_1516635595760_0055 running in uber mode : false
    18/04/22 20:20:36 INFO mapreduce.Job:  map 0% reduce 0%
    18/04/22 20:20:42 INFO mapreduce.Job:  map 100% reduce 0%
    18/04/22 20:20:47 INFO mapreduce.Job:  map 100% reduce 100%
    18/04/22 20:20:48 INFO mapreduce.Job: Job job_1516635595760_0055 completed successfully
    18/04/22 20:20:48 INFO mapreduce.Job: Counters: 49
            File System Counters
                    FILE: Number of bytes read=33
                    FILE: Number of bytes written=218409
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=155
                    HDFS: Number of bytes written=5
                    HDFS: Number of read operations=6
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=2
            Job Counters
                    Launched map tasks=1
                    Launched reduce tasks=1
                    Data-local map tasks=1
                    Total time spent by all maps in occupied slots (ms)=2962
                    Total time spent by all reduces in occupied slots (ms)=3019
                    Total time spent by all map tasks (ms)=2962
                    Total time spent by all reduce tasks (ms)=3019
                    Total vcore-seconds taken by all map tasks=2962
                    Total vcore-seconds taken by all reduce tasks=3019
                    Total megabyte-seconds taken by all map tasks=3033088
                    Total megabyte-seconds taken by all reduce tasks=3091456
            Map-Reduce Framework
                    Map input records=3
                    Map output records=3
                    Map output bytes=21
                    Map output materialized bytes=33
                    Input split bytes=113
                    Combine input records=0
                    Combine output records=0
                    Reduce input groups=1
                    Reduce shuffle bytes=33
                    Reduce input records=3
                    Reduce output records=1
                    Spilled Records=6
                    Shuffled Maps =1
                    Failed Shuffles=0
                    Merged Map outputs=1
                    GC time elapsed (ms)=218
                    CPU time spent (ms)=890
                    Physical memory (bytes) snapshot=224432128
                    Virtual memory (bytes) snapshot=2147622912
                    Total committed heap usage (bytes)=136712192
            Shuffle Errors
                    BAD_ID=0
                    CONNECTION=0
                    IO_ERROR=0
                    WRONG_LENGTH=0
                    WRONG_MAP=0
                    WRONG_REDUCE=0
            File Input Format Counters
                    Bytes Read=42
            File Output Format Counters
                    Bytes Written=5

    结果:

    [hadoop@h201 ~]$ hadoop fs -cat /user/hadoop/output-2/part-r-00000
    18/04/22 20:21:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    uv      3

  • 相关阅读:
    (官网)虚幻3--画布技术指南
    (官网)虚幻3--HUD技术指南
    (官网)虚幻3--相机技术指南
    (官网)虚幻3--角色技术指南
    (官网)虚幻3--Unreal Frontend
    (官网)虚幻3--自定义 UnrealScript 项目
    (官网)虚幻3--基础游戏快速入门
    (官网)虚幻3--游戏类型技术指南
    (官网)虚幻3--入门指南: 游戏性元素
    (原创)虚幻3--控制台命令参数--1
  • 原文地址:https://www.cnblogs.com/jieran/p/8909253.html
Copyright © 2011-2022 走看看