zoukankan      html  css  js  c++  java
  • Hive 11、Hive嵌入Python

    Hive嵌入Python

    Python的输入输出都是 为分隔符,否则会出错,python脚本输入print出规定格式的数据

    用法为先add file,使用语法为TRANSFORM (name, items)   USING 'python test.py'  AS (name string, item1 string,item2 string,item3 string),这里后面几个字段对应python的类型

     下面是一个将一列转成多列表小案例:

    create table test (name string,items string) 
    
    ROW FORMAT DELIMITED 
    
    FIELDS TERMINATED BY '	';
    

      

    LOAD DATA local INPATH '/opt/data/tt.txt' OVERWRITE INTO TABLE test ;

    tt.txt的内容:

    tom	shu fa,wei qi,chang ge
    jack	game,kan shu,shang wang
    lusi	lv you,guang jie,gou wu
    

      表2:

    create table test2 (name string,item1 string,item2 string,item3 string) 
    
    ROW FORMAT DELIMITED 
    
    FIELDS TERMINATED BY '	';
    

      

    -- 将python脚本上传到Hive
    Hive> add file /root/test.py
    

      

    -- 将结果放到test2中
    INSERT OVERWRITE TABLE test2  
    
    SELECT  TRANSFORM (name, items)  
    USING 'python test.py'  
    AS (name string, item1 string,item2 string,item3 string)  
    FROM test;
    

      

    #!/usr/bin/python  
    
    import sys  
    for line in sys.stdin:  
         line = line.strip()    
         name,it = line.split('	')  
         count = it.count(',')+1
         for i in range(0,3-count):
              it = it+',NULL'
         result = it.split(',')[0:3]
         print '%s	%s'%(name,'	'.join(result))
    

      

    结果:
    -- 表1
    hive> select * from test;
    OK
    tom    shu fa,wei qi,chang ge
    jack    game,kan shu,shang wang
    lusi    lv you,guang jie,gou wu
    Time taken: 0.07 seconds, Fetched: 3 row(s)
    
    
     hive> desc test2;
     OK
     name                	string              	                    
     item1               	string              	                    
     item2               	string              	                    
     item3               	string              	                    
     Time taken: 0.141 seconds, Fetched: 4 row(s)
    -- 表2
    hive> select * from test2;
    OK
    tom    shu fa    wei qi    chang ge
    jack    game    kan shu    shang wang
    lusi    lv you    guang jie    gou wu
    Time taken: 1.368 seconds, Fetched: 3 row(s)
    

      

  • 相关阅读:
    Oracle索引HINT的使用
    Interger不可变原理
    Tomcat的JVM设置和连接数设置
    jvm系列五、jvm垃圾回收机制、jvm各种参数及调优
    RabbitMQ学习(一):RabbitMQ要点简介
    Python 字典(Dictionary)操作详解
    python学习笔记(四)-数据类型
    Python数据类型详解
    HTML语法大全
    H5前端性能测试总结
  • 原文地址:https://www.cnblogs.com/tesla-turing/p/11509344.html
Copyright © 2011-2022 走看看