zoukankan      html  css  js  c++  java
  • how to use Sqoop to import/ export data

    Sqoop is a tool designed for efficiently transferring data between RDBMS and HDFS, we can import data from mysql, oracle, and other data bases into HDFS very easily; meanwhile we can dump data into data base from HDFS. For detailed documentation, please refer to sqoop documentation.

    Before using Sqoop, please follow steps to setup it correctly.

    Sqoop - Import

    the following command is used for import

    sqoop import (generic-args) (import-args)

    given a table named stock_info, and the schema is:

    Case 1: we can use below command to import stock_info data to hadoop hdfs file system:

    sqoop import --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --m 1

    and the result looks like:

    we can verify result in hdfs by running command

    hadoop fs -cat /emp/part-m-*

    Case 2: sepcify the target directory in hdfs by running the following import command

    sqoop import --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --m 1 --target-dir /temp

    then we can verify result by executing the same command as above

    Case 3: imcremental import by specifying --incremental, --check-column and --append arguments. Note we should change 'last_chg_date' when applying other tables.

    sqoop import --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --m 1 --target-dir /temp --incremental lastmodified --check-column last_chg_date --append

    Case 4: specify target file format as parquet format by adding argument '--as-parquetfile'

    sqoop import --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --m 1 --target-dir /temp --incremental lastmodified --check-column last_chg_date --append --as-parquetfile

    Case 5: import all tables

    sqoop import-all-tables --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser 

    Sqoop - Export

    export means to dump data from hdfs to mysql, oracle or other data bases, command syntax is like

    sqoop export (generic-args) (export-args)

    given there are many parquet files under stock_info folder which is imported by sqoop import command incrementally

    then we want to dump data back into mysql data base, using the following command

    sqoop export --connent jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --export-dir /user/hlli/stock_info

    finally verify data in mysql command line

    select * from stock_info;

    Incremental importing data

    by using linux timer 'crontab' to schedule a job to execute importing periodically.

    cd /var/spool/cron

    touch hlli (please change hlli to your user name here)

    vi hlli

    */5 * * * * /usr/lib/sqoop/bin/sqoop import --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --m 1 --target-dir /temp --incremental lastmodified --check-column last_chg_date --append --as-parquetfile

    if it works, you will receive email in '/var/spool/mail/hlli'; meanwhile we can verify data by running command

    hadoop fs -ls /

    Commonly used Sqoop commands

    sqoop help import

    sqoop help export

    sqoop help job

    sqoop help codegen

    sqoop help eval

    sqoop help list-tables

    sqoop help list-databases

    sqoop help import-all-tables

    References:

    1. http://sqoop.apache.org/
    2. http://man.linuxde.net/crontab
  • 相关阅读:
    全局对象
    公用属性与原型链
    内存图
    JS中数据类型的转换
    for in 遍历
    JS里的数据类型
    Collect~Something else
    Spring Batch学习笔记(一)
    C#之WinForm设置控件居中
    C#之使用AutoUpdater自动更新客户端
  • 原文地址:https://www.cnblogs.com/allanli/p/how_to_use_sqoop.html
Copyright © 2011-2022 走看看