zoukankan      html  css  js  c++  java
  • pyhton 操作hive数据仓库

    使用库Pyhive

    安装:pip   install Pyhive   -i http://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com

    from pyhive import hive   # or import hive
    conn = hive.Connection(host='****', port=****, username='****', database='****')
    cursor.execute(''SELECT * FROM my_awesome_data LIMIT 10'')
    for i in range(****):
        sql = "INSERT INTO **** VALUES ({},'username{}')".format(value, str(username))
        cursor.execute(sql)
     
     
    # 下面是官网代码:
    from pyhive import presto  # or import hive
    cursor = presto.connect('localhost').cursor()
    cursor.execute('SELECT * FROM my_awesome_data LIMIT 10')
    print(cursor.fetchone())
    print(cursor.fetchall())
    

      

    impyla

    安装:

    pip   install Pyhive   -i http://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com

    from impala.dbapi import connect 
    conn = connect(host ='****',port = ****)
    cursor = conn.cursor()
    cursor.execute('SELECT * FROM mytable LIMIT 100')
    print cursor.description   # 打印结果集的schema 
    results = cursor.fetchall()
    

      impyla交互hive 与pandas

    from pyhive import hive
    import pandas as pd
    def LinkHive(sql_select):
        connection = hive.Connection(host='localhost')
        cur = connection.cursor()      
        cur.execute(sql_select)
        columns = [col[0] for col in cursor.description]
        result = [dict(zip(columns, row)) for row in cursor.fetchall()]
        Main = pd.DataFrame(result)
        Main.columns = columns 
        return Main
     
    sql = "select * from 数据库.表名"
    df  = LinkHive(sql)
    或者

    rom impala.dbapi import connect
    from impala.util import as_pandas
    conn = connect(host='10.161.20.11', port=21050)
    cur = conn.cursor()
    cur.execute('SHOW TABLES')
    cur.execute('SELECT * FROM businfo')
    data = as_pandas(cur)
    print (data)
    print (type(data))

     

    Usage

    Impyla implements the Python DB API v2.0 (PEP 249) database interface (refer to it for API details):

    from impala.dbapi import connect
    conn = connect(host='my.host.com', port=21050)
    cursor = conn.cursor()
    cursor.execute('SELECT * FROM mytable LIMIT 100')
    print cursor.description  # prints the result set's schema
    results = cursor.fetchall()
    

    The Cursor object also exposes the iterator interface, which is buffered (controlled by cursor.arraysize):

    cursor.execute('SELECT * FROM mytable LIMIT 100')
    for row in cursor:
        process(row)
    

    Furthermore the Cursor object returns you information about the columns returned in the query. This is useful to export your data as a csv file.

    import csv
    
    cursor.execute('SELECT * FROM mytable LIMIT 100')
    columns = [datum[0] for datum in cursor.description]
    targetfile = '/tmp/foo.csv'
    
    with open(targetfile, 'w', newline='') as outcsv:
        writer = csv.writer(outcsv, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL, lineterminator='
    ')
        writer.writerow(columns)
        for row in cursor:
            writer.writerow(row)
    

    You can also get back a pandas DataFrame object

    from impala.util import as_pandas
    df = as_pandas(cur)
    # carry df through scikit-learn, for example

     

  • 相关阅读:
    第八届蓝桥杯JavaB---承压计算
    JAVA Double去掉科学计数"E"
    最小公倍数和最大公约数
    蓝桥杯练习系统错题总结—(二)
    蓝桥杯练习系统错题总结—(一)
    今日总结及近期做题规划
    算法习题--电缆分割问题(二分法)
    jQuery 事件方法
    jQuery学习(一)
    jquery中的$(document).ready()
  • 原文地址:https://www.cnblogs.com/SunshineKimi/p/12969751.html
Copyright © 2011-2022 走看看