zoukankan      html  css  js  c++  java
  • Sqoop 防止数据导出不一致的参数配置

    问题来源

    官网原话是这样的:

    Since Sqoop breaks down export process into multiple transactions, it is possible that a failed export job may result in partial data being committed to the database.
    This can further lead to subsequent jobs failing due to insert collisions in some cases, or lead to duplicated data in others.
    You can overcome this problem by specifying a staging table via the --staging-table option which acts as an auxiliary table that is used to stage exported data.
    The staged data is finally moved to the destination table in a single transaction.

    大概意思就是

    “由于Sqoop将导出过程分解为多个事务,因此失败的导出作业可能会导致将部分数据提交到数据库。

     在某些情况下,这可能进一步导致后续作业因插入冲突而失败,而在其他情况下,则可能导致数据重复。

    您可以通过--staging-table选项指定暂存表来解决此问题,该选项用作用于暂存导出数据的辅助表。

    最后,已分阶段处理的数据将在单个事务中移至目标表。”

    解决

    sqoop export 
    --connect jdbc:mysql://192.168.137.10:3306/user_behavior
    --username root
    --password 123456
    --table app_cource_study_report
    --columns watch_video_cnt,complete_video_cnt,dt
    --fields-terminated-by " "
    --export-dir "/user/hive/warehouse/tmp.db/app_cource_study_analysis_${day}"
    --staging-table app_cource_study_report_tmp #创建临时表来存储结果,全部成功后再提交
    --clear-staging-table
    --input-null-string 'N'
  • 相关阅读:
    HDU_5372 树状数组 (2015多校第7场1004)
    《 字典树模板_递归 》
    《神、上帝以及老天爷》
    《Crazy tea party》
    UVA_ Overflow
    UVA_If We Were a Child Again
    UVA_Product
    UVA_Integer Inquiry
    你也可以屌到爆的这样敲代码当黑客!
    大数相加_原创
  • 原文地址:https://www.cnblogs.com/yangxusun9/p/13022535.html
Copyright © 2011-2022 走看看