zoukankan      html  css  js  c++  java
  • Apache Drill – Querying Complex Data & Data Definition Statements & Querying Data

    8. Apache Drill – Querying Complex Data

    In this chapter, we will discuss in detail about which all composite data types does Apache Drill supports.
     Array - An array is a repeated list of values.

    A value in an array can be a scalar type, such as string or int, or an array can be a complex type, such as a map or another array.
     Map - A map is a set of name/value pairs. A value in a map can be a scalar type, such as string or int, or a complex type, such as an array or another map.


    Apache Drill uses map and array data types internally for reading complex and nested data structures from data sources.

    FLATTEN
    FLATTEN separates the elements in a repeated field into individual records.
    Syntax:
    FLATTEN(x)
    Where,
    x - JSON array.

    {
    "num1" : 10,
    "num2" : [10,20,30],
    "num3" : " simple json array",
    "num4" : 50.5
    }

    Now we can execute this query in Drill.
    The following program shows the query for this function:
    0: jdbc:drill:zk=local> select *,flatten(num2) as flatten from dfs.`/Users/../workspace/Drill-samples/array.json`;

    KVGEN
    This function returns a list of the keys that exist in the map.
    Syntax:
    KVGEN(column name)

    {
    "student_ID" : {
    "001" : "Adam",
    "002" : "Amit"
    }
    }
    {
    "student_ID" : {
    "003" : "Bob",
    "004" : "David"
    }
    }

    The following program shows the query for this function:
    0: jdbc:drill:zk=local> select kvgen(student_ID) from dfs.`/Users/../workspace/Drill-samples/student_map.json`;

    66
    Result:
    num1 num2 num3 num4 flatten
    10 [10,20,30] simple json array 50.5 10
    10 [10,20,30] simple json array 50.5 20
    10 [10,20,30] simple json array 50.5 30
    KVGEN
    This function returns a list of the keys that exist in the map.
    Syntax:
    KVGEN(column name)
    Create a simple JSON map file named “student_map” as shown in the following program.
    {
    "student_ID" : {
    "001" : "Adam",
    "002" : "Amit"
    }
    }
    {
    "student_ID" : {
    "003" : "Bob",
    "004" : "David"
    }
    }
    The following program shows the query for this function:
    0: jdbc:drill:zk=local> select kvgen(student_ID) from dfs.`/Users/../workspace/Drill-samples/student_map.json`;

    Result:
    EXPR$0
    [{"key":"001","value":"Adam"},{"key":"002","value":"Amit"}]
    [{“key":"003","value":"Bob"},{"key":"004","value":"David"}]


    REPEATED_COUNT (array)
    The following program shows the query for this function:
    0: jdbc:drill:zk=local> select REPEATED_COUNT(num2) from dfs.`/Users/../workspace/Drill-samples/array.json`;
    Result:
    EXPR$0
    3

    REPEATED CONTAINS
    Searches for a keyword in an array. If the keyword is present in an array, the result will be true otherwise false.
    Syntax:
    REPEATED_CONTAINS(array_name, keyword)
    The array_name is a simple array. The following program shows the query for this function:
    0: jdbc:drill:zk=local> select REPEATED_CONTAINS(num2,10) from dfs.`/Users/../workspace/Drill-samples/array.json`;
    Result:
    true

    Query:
    0: jdbc:drill:zk=local> select REPEATED_CONTAINS(num2,40) from dfs.`/Users/deiva/workspace/Drill-samples/array.json`;
    Result:
    false

    9. Apache Drill – Data Definition Statements

    Create Statement
    You can create tables in Apache Drill by using the following two CTAS commands.
    Method 1
    Syntax:
    CREATE TABLE name [ (column list) ] AS query;

    Where,
    Query - select statement.

    Method 2
    Syntax:
    CREATE TABLE name [ ( <column list> ) ] [ PARTITION BY ( <column_name> [ , ... ] ) ] AS <select statement>
    Where,
     name - unique directory name
     column list - optional list of column names or aliases in the new table.
     PARTITION BY - partitions the data by the first column_name

    To create a table, you should adhere to the following steps:
     Set the workspace to a writable workspace.
     You can only create new tables in df.tmp workspace. You cannot create tables using storage plugins, such as Hive and HBase

    For example:
    "tmp": {
    "location": "/tmp",
    "writable": true,
    }

    Example Query:
    0: jdbc:drill:zk=local> use dfs.tmp;
    Result:
    +-------+--------------------------------------+
    | ok | summary |
    +-------+--------------------------------------+
    | true | Default schema changed to [dfs.tmp] |
    +-------+--------------------------------------+

    Query:
    0: jdbc:drill:zk=local> create table students as select * from dfs.`/Users/../workspace/Drill-samples/student_list.json`;
    Result:
    +-----------+----------------------------+
    | Fragment | Number of records written |
    +-----------+----------------------------+
    | 0_0 | 10 |
    +-----------+----------------------------+

    To view records –
    0: jdbc:drill:zk=local> select * from students;
    Result:
    +------+--------+------+---------+-----------+--------+--------+--------+--------------------+----------+
    | ID | name | age | gender | standard | mark1 | mark2 | mark3 | addr | pincode |
    +------+--------+------+---------+-----------+--------+--------+--------+--------------------+----------+
    | 001 | Adam | 12 | male | six | 70 | 50 | 60 | 23 new street | 111222 |
    | 002 | Amit | 12 | male | six | 40 | 50 | 40 | 12 old street | 111222 |
    | 003 | Bob | 12 | male | six | 60 | 80 | 70 | 10 cross street | 111222 |

    ……

    The following program shows the query for this function:
    0: jdbc:drill:zk=local> create table student_new partition by (gender) as select * from dfs.`/Users/../workspace/Drill-samples/student_list.json`;

    Result:
    +-----------+----------------------------+
    | Fragment | Number of records written |
    +-----------+----------------------------+
    | 0_0 | 10 |
    +-----------+----------------------------+
    To view the records of the table –
    0: jdbc:drill:zk=local> select * from student_new;

    Result:
    +------+--------+------+---------+-----------+--------+--------+--------+--------------------+----------+
    | ID | name | age | gender | standard | mark1 | mark2 | mark3 | addr | pincode |
    +------+--------+------+---------+-----------+--------+--------+--------+--------------------+----------+

    | 005 | Esha | 12 | female | six | 70 | 60 | 65 | 20 garden street | 111222 |
    | 006 | Ganga | 12 | female | six | 100 | 95 | 98 | 25 north street | 111222 |
    | 008 | Leena | 12 | female | six | 90 | 85 | 95 | 24 south street | 111222 |
    | 009 | Mary | 13 | female | six | 75 | 85 | 90 | 5 west street | 111222 |
    | 010 | Peter | 13 | female | six | 80 | 85 | 88 | 16 park avenue | 111222 |
    | 001 | Adam | 12 | male | six | 70 | 50 | 60 | 23 new street | 111222 |
    | 002 | Amit | 12 | male | six | 40 | 50 | 40 | 12 old street | 111222 |
    | 003 | Bob | 12 | male | six | 60 | 80 | 70 | 10 cross street | 111222 |
    | 004 | David | 12 | male | six | 50 | 70 | 70 | 15 express avenue | 111222 |
    | 007 | Jack | 13 | male | six | 55 | 45 | 45 | 2 park street | 111222 |

    Here the table records are partitioned by gender.

    Alter Statement
    The ALTER SYSTEM command permanently changes a system setting.

    Syntax:
    ALTER SYSTEM SET `option_name` = value;
    To reset the system settings, use the following syntax.
    ALTER SYSTEM RESET `option_name`;
    ALTER SYSTEM RESET ALL;

    Query:
    Here is the sample query that enables the Decimal data type –
    0: jdbc:drill:zk=local> ALTER SYSTEM SET `planner.enable_decimal_data_type` = true;

    Result:
    +-------+--------------------------------------------+
    | ok | summary |
    +-------+--------------------------------------------+
    | true | planner.enable_decimal_data_type updated. |
    +-------+--------------------------------------------+
    By default, Apache Drill disables the decimal data type.

    To reset all the changes, you will need to key-in the following command –
    0: jdbc:drill:zk=local> ALTER SYSTEM RESET all;

    Result:
    +-------+---------------+
    | ok | summary |
    +-------+---------------+
    | true | ALL updated. |
    +-------+---------------+

    Create View Statement
    The CREATE VIEW command creates a virtual structure for the result set of a stored query. A view can combine data from multiple underlying data sources and provide the illusion that all of the data is from one source.
    Syntax:
    CREATE [OR REPLACE] VIEW [workspace.]view_name [ (column_name [, ...]) ] AS query;

    Where,
     workspace - The location where you want the view to exist. By default, the view can be created in “dfs.tmp”.
     view_name - The name that you give to the view. This view must have a unique name.

    Query:
    0: jdbc:drill:zk=local> create view student_view as select * from dfs.`/Users/../workspace/Drill-samples/student_list.json`;

    Result:
    +-------+--------------------------------------------------------------+
    | ok | summary |
    +-------+--------------------------------------------------------------+
    | true | View 'student_view' created successfully in 'dfs.tmp' schema |
    +-------+--------------------------------------------------------------+
    To see the records, you can use the following query.
    0: jdbc:drill:zk=local> select * from student_view;

    Result:
    +------+--------+------+---------+-----------+--------+--------+--------+--------------------+----------+
    | ID | name | age | gender | standard | mark1 | mark2 | mark3 | addr | pincode |
    +------+--------+------+---------+-----------+--------+--------+--------+--------------------+----------+
    | 001 | Adam | 12 | male | six | 70 | 50 | 60 | 23 new street | 111222 |
    | 002 | Amit | 12 | male | six | 40 | 50 | 40 | 12 old street | 111222 |
    | 003 | Bob | 12 | male | six | 60 | 80 | 70 | 10 cross street | 111222 |
    | 004 | David | 12 | male | six | 50 | 70 | 70 | 15 express avenue | 111222 |
    | 005 | Esha | 12 | female | six | 70 | 60 | 65 | 20 garden street | 111222 |

    ……

    Drop Table
    The drop table statement is used to drop the table from a DFS storage plugin.
    Syntax:
    DROP TABLE [workspace.]name;
    Query:
    0: jdbc:drill:zk=local> drop table student_new;
    Result:
    +-------+------------------------------+
    | ok | summary |
    +-------+------------------------------+
    | true | Table [student_new] dropped |
    +-------+------------------------------+

    Drop View
    Similar to the table, a view can be dropped by using the following command –
    0: jdbc:drill:zk=local> drop view student_view;
    Result:
    +-------+-----------------------------------------------------------------+
    | ok | summary |
    +-------+-----------------------------------------------------------------+
    | true | View [student_view] deleted successfully from schema [dfs.tmp]. |
    +-------+-----------------------------------------------------------------+

     10. Apache Drill – Querying Data

    In this chapter, we will learn about how Apache Drill allows us to query plain text files such as CSV, TSV and PSV.

    CSV File
    Create a CSV file named “students.csv” as shown in the following program:
    001,Adam,23 new street
    002,Amit,12 old street
    003,Bob,10 cross street
    004,David,15 express avenue
    005,Esha,20 garden street
    006,Ganga,25 north street
    007,Jack,2 park street
    008,Leena,24 south street
    009,Mary,5 west street
    010,Peter,16 park avenue

    After saving the file, you can return to the terminal again and type the following query to view that CSV file.
    0: jdbc:drill:zk=local> select * from dfs.`/Users/../workspace/Drill-samples/students.csv`;
    Result:
    +--------------------------------------+
    | columns |
    +--------------------------------------+
    | ["001","Adam","23 new street"] |
    | ["002","Amit","12 old street"] |
    | ["003","Bob","10 cross street"] |
    | ["004","David","15 express avenue"] |
    | ["005","Esha","20 garden street"] |
    | ["006","Ganga","25 north street"] |
    | ["007","Jack","2 park street"] |
    | ["008","Leena","24 south street"] |

    ……

    From this output we can conclude that, Apache Drill recognizes each row as an array of values and returns one column for each row.

    Finding Columns[n]
    The COLUMNS[n] syntax is used to return CSV rows in a column by the column format, where n starts from 0 to n-1.
    Query:
    0: jdbc:drill:zk=local>select columns[0],columns[1],columns[2] from dfs.`/Users/../workspace/Drill-samples/students.csv`;

    Result:
    +---------+---------+--------------------+
    | EXPR$0 | EXPR$1 | EXPR$2 |
    +---------+---------+--------------------+
    | 001 | Adam | 23 new street |
    | 002 | Amit | 12 old street |
    | 003 | Bob | 10 cross street |
    | 004 | David | 15 express avenue |
    | 005 | Esha | 20 garden street |
    | 006 | Ganga | 25 north street |
    | 007 | Jack | 2 park street |
    | 008 | Leena | 24 south street |
    | 009 | Mary | 5 west street |
    | 010 | Peter | 16 park avenue |
    +---------+---------+--------------------+


    If you want to assign an alias name for columns, use the following query :

    0: jdbc:drill:zk=local>select columns[0] as ID,columns[1] as Name,columns[2] as Address from dfs.`/Users/../workspace/Drill-samples/students.csv`;

    Result:
    +------+--------+--------------------+
    | ID | Name | Address |
    +------+--------+--------------------+
    | 001 | Adam | 23 new street |
    | 002 | Amit | 12 old street |
    | 003 | Bob | 10 cross street |
    | 004 | David | 15 express avenue |
    | 005 | Esha | 20 garden street |
    | 006 | Ganga | 25 north street |
    | 007 | Jack | 2 park street |
    | 008 | Leena | 24 south street |
    | 009 | Mary | 5 west street |
    | 010 | Peter | 16 park avenue |
    +------+--------+--------------------+

    Create Table
    Like in JSON, you can create table for plain text files. Following is a sample query:
    0: jdbc:drill:zk=local> create table CSV as select * from dfs.`/Users/../workspace/Drill-samples/students.csv`;
    Result:
    +-----------+----------------------------+
    | Fragment | Number of records written |
    +-----------+----------------------------+
    | 0_0 | 10 |
    +-----------+----------------------------+
    To view the file contents, type the following query:
    0: jdbc:drill:zk=local> select * from CSV;

    Result:
    +--------------------------------------+
    | columns |
    +--------------------------------------+
    | ["001","Adam","23 new street"] |
    | ["002","Amit","12 old street"] |
    | ["003","Bob","10 cross street"] |
    | ["004","David","15 express avenue"] |
    | ["005","Esha","20 garden street"] |
    | ["006","Ganga","25 north street"] |
    | ["007","Jack","2 park street"] |
    | ["008","Leena","24 south street"] |
    | ["009","Mary","5 west street"] |
    | ["010","Peter","16 park avenue"] |
    +--------------------------------------+

    TSV File
    Create a TSV file named “students.tsv” as shown in the following program:

    Now we can execute this TSV file in Apache Drill by using the following query:
    0: jdbc:drill:zk=local> select * from dfs.`/Users/../workspace/Drill-samples/student.tsv`;
    Result:
    +--------------------------------------------------------------------------------------------+
    | columns |
    +--------------------------------------------------------------------------------------------+
    | ["ID","Name","Age","Standard","Marks","Addr","pincode"] |

    | ["001 id ","name is adam","for the age of 12","studying sxith std" "Having good marks"] |
    +--------------------------------------------------------------------------------------------+

    Create Table
    As shown in the CSV file above, you can also create a table for the TSV file.
    Query:
    0: jdbc:drill:zk=local> select * from dfs.`/Users/../workspace/Drill-samples/student.tsv`;
    Result:
    +-----------+----------------------------+
    | Fragment | Number of records written |
    +-----------+----------------------------+
    | 0_0 | 2 |
    +-----------+----------------------------+
    1 row selected (0.347 seconds)
    Query:
    0: jdbc:drill:zk=local> select * from TSV;

    Result:
    +--------------------------------------------------------------------------------------------+
    | columns |
    +--------------------------------------------------------------------------------------------+
    | ["ID","Name","Age","Standard","Marks","Addr","pincode"] |
    | ["001 id ","name is adam","for the age of 12","studying sxith std" "Having good marks"] |
    +--------------------------------------------------------------------------------------------+

    PSV (Pipe Separated Value) File
    Create a psv file named “sample.tbl” as shown in the following program.
    Tutorialspoint|Apache|Drill|article
    Now we can execute this PSV file in Drill,
    Query:
    0: jdbc:drill:zk=local> select * from dfs.`/Users/../workspace/Drill-samples/sample.tbl`;
    Result:
    +------------------------------------------------+
    | columns |
    +------------------------------------------------+
    | ["Tutorialspoint","Apache","Drill","article"] |
    +------------------------------------------------+
    Now, similar to the CSV and TSV files, try for yourself to create a table for PSV file.

    11. Apache Drill – Querying Data Using HBase

    ……

    12. Apache Drill – Querying Data Using Hive

    ……

    13. Apache Drill – Querying Parquet Files

    Parquet is a columnar storage format. Apache Drill uses Parquet format for easy, fast and efficient access.


    Create a Table
    Before moving to create a table in parquet, you must change the Drill storage format using the following command.
    0: jdbc:drill:zk=local> alter session set `store.format`= 'parquet';
    Result:
    +-------+------------------------+
    | ok | summary |
    +-------+------------------------+
    | true | store.format updated. |
    +———+------------------------+


    You can create a table using the following syntax.

    0: jdbc:drill:zk=local> create table dfs.tmp.`/Users/../workspace` as select * from dfs.`/Users/../workspace/Drill-samples/student_list.json`;

    Result:
    +-----------+----------------------------+
    | Fragment | Number of records written |
    +-----------+----------------------------+
    | 0_0 | 10 |
    +-----------+----------------------------+
    To see the table contents, type-in the following query –
    0: jdbc:drill:zk=local> select * from dfs.tmp.`/Users/../workspace`;

    Result:
    +------+--------+------+---------+-----------+--------+--------+--------+--------------------+----------+
    | ID | name | age | gender | standard | mark1 | mark2 | mark3 | addr | pincode |
    +------+--------+------+---------+-----------+--------+--------+--------+--------------------+----------+
    | 001 | Adam | 12 | male | six | 70 | 50 | 60 | 23 new street | 111222 |
    | 002 | Amit | 12 | male | six | 40 | 50 | 40 | 12 old street | 111222 |
    | 003 | Bob | 12 | male | six | 60 | 80 | 70 | 10 cross street | 111222 |
    | 004 | David | 12 | male | six | 50 | 70 | 70 | 15 express avenue | 111222 |
    | 005 | Esha | 12 | female | six | 70 | 60 | 65 | 20 garden street | 111222 |

    ……

  • 相关阅读:
    存储过程、触发器、数据完整性
    结构化查询语句
    HDU 1205 鸽巢原理
    ZOJ 2386 容斥原理
    poj 2356 抽屉原理
    HDU 2048 错排
    数据库原理及应用——关系数据库
    数据库原理及应用——绪论
    UVA
    汇编试验十一:编写子程序
  • 原文地址:https://www.cnblogs.com/panpanwelcome/p/13446235.html
Copyright © 2011-2022 走看看