zoukankan      html  css  js  c++  java
  • hive进行词频统计

    统计文件信息:

    $ /opt/cdh-5.3.6/hadoop-2.5.0/bin/hdfs dfs -text /user/hadoop/wordcount/input/wc.input
    hadoop spark
    spark hadoop
    oracle mysql postgresql
    postgresql oracle mysql
    mysql mongodb
    hdfs yarn mapreduce
    yarn hdfs
    zookeeper

    针对于以上文件使用hive做词频统计:

    create table docs (line string);

    load data inpath '/user/hadoop/wordcount/input/wc.input' into table docs;

    create table word_counts as
    select word,count(1) as count from
    (select explode(split(line,' ')) as word from docs) word
    group by word
    order by word;

    分段解释:

    --使用split函数对表中行按空格进行分隔:

    select split(line,' ') from docs;
    ["hadoop","spark",""]
    ["spark","hadoop"]
    ["oracle","mysql","postgresql"]
    ["postgresql","oracle","mysql"]
    ["mysql","mongodb"]
    ["hdfs","yarn","mapreduce"]
    ["yarn","hdfs"]
    ["zookeeper"]

    --使用explode函数对split的结果集进行行拆列:

    select explode(split(line,' ')) as word from docs;
    word
    hadoop
    spark

    spark
    hadoop
    oracle
    mysql
    postgresql
    postgresql
    oracle
    mysql
    mysql
    mongodb
    hdfs
    yarn
    mapreduce
    yarn
    hdfs
    zookeeper

    --以上输出内容已经满足对其做统计分析,这时通过sql对其进行分析:

    select word,count(1) as count from
    (select explode(split(line,' ')) as word from docs) word
    group by word
    order by word;

    word    count
         1
    hadoop    2
    hdfs    2
    mapreduce    1
    mongodb    1
    mysql    3
    oracle    2
    postgresql    2
    spark    2
    yarn    2
    zookeeper    1

  • 相关阅读:
    CocoaPods 的简单快速安装方法
    macOS Catalina new Shell,解决 The default interactive shell is now zsh
    Mac入门--通过homebrew下载过慢问题
    Mac下安装Android Studio
    Mac更新catalina之后有道词典闪退的解决办法
    mac系统下安装Java开发环境(一)——JDK安装
    聊天案例
    ios中常用k线
    ubuntu连接蓝牙鼠标
    image_transport
  • 原文地址:https://www.cnblogs.com/wcwen1990/p/7116041.html
Copyright © 2011-2022 走看看