zoukankan      html  css  js  c++  java
  • 马士兵hadoop2.7.3_hive入门

      • Hive入门
      • 解压Hive,到/usr/local目录,将解压后的目录名mv为hive 
        设定环境变量HADOOP_HOME,HIVE_HOME,将bin目录加入到PATH中
        1. cd /usr/local/hive/conf
        2. cp hive-default.xml.template hive-site.xml
        3. 修改hive.metastore.schema.verification,设定为false
        4. 创建/usr/local/hive/tmp目录,替换${system:java.io.tmpdir}为该目录
        5. 替换${system:user.name}为root
      • schematool -initSchema -dbType derby 
        会在当前目录下简历metastore_db的数据库。
        注意!!!下次执行hive时应该还在同一目录,默认到当前目录下寻找metastore。
        遇到问题,把metastore_db删掉,重新执行命令
        实际工作环境中,经常使用mysql作为metastore的数据
      • 启动hive
      • 观察hadoop fs -ls /tmp/hive中目录的创建
      • show databases;
        use default;
        create table doc(line string);
        show tables;
        desc doc;
        select * from doc;
        drop table doc;
      • 观察hadoop fs -ls /user
      • 启动yarn
      • load data inpath '/wcinput' overwrite into table doc;
        select * from doc;
        select split(line, ' ') from doc;
        select explode(split(line, ' ')) from doc;
        select word, count(1) as count from (select explode(split(line, ' ')) as word from doc) w group by word;
        select word, count(1) as count from (select explode(split(line, ' ')) as word from doc) w group by word order by word;
        create table word_counts as select word, count(1) as count from (select explode(split(line, ' ')) as word from doc) w group by word order by word;
        select * from word_counts;

        1. dfs -ls /user/hive/...
      • 使用sougou搜索日志做实验
      • 将日志文件上传的hdfs系统,启动hive
      • create table sougou (qtime string, qid string, qword string, url string) row format delimited fields terminated by ',';
        load data inpath '/sougou.dic' into table sougou;
        select count(*) from sougou;
        create table sougou_results as select keyword, count(1) as count from (select qword as keyword from sougou) t group by keyword order by count desc;
        select * from sougou_results limit 10;
  • 相关阅读:
    Python Turtle
    Python 键盘记录
    Django框架学习
    MongoDB数据库安装与连接
    Python 进程间通信
    Powershell脚本执行权限
    Python 端口,IP扫描
    Exchange超级实用命令行
    Exchange管理界面
    window7 配置node.js 和coffeescript环境
  • 原文地址:https://www.cnblogs.com/Jxiaobai/p/6669028.html
Copyright © 2011-2022 走看看