zoukankan      html  css  js  c++  java
  • Vertica 项目常用代码

    1.查看目录下面有多少文件数

    ls -l |grep "^-"|wc -l

    思路很明显了,ls后通过grep进行过滤判断是文件还是文件夹,

    如果是判断文件夹,可以使用ls -l |grep "^d"|wc -l

    2.copy命令连接数据库 并分批导入文件到数据库中

    下面是我的示例代码,可以分批把数据导入到数据库中.同时捕获错误的信息与错误的记录.

    date

    totalcount=1315890897;

    onetimecount=100000000;

    for (( c=1100000001; c<$totalcount; c=c+$onetimecount))

    do

    date;

    sdate=`date +%y%m%d%H%M%S`

    echo "$c rows";

    cat /CLQR/Factory/FlowTest/Cimation/outbound/201401/part-r-00000 | tail -n+$c | head -$onetimecount |/opt/vertica/bin/vsql -h yourhostname -d yourdbname -U accountname -w your!password -c "copy clqr_az.STG_INKJET_TEST_DATA201401 from local stdin direct no escape null as 'N' DELIMITER E'02' REJECTED DATA '201401_excpata_$sdate.txt' exceptions '201401$sdate.err' ";

    date;

    done;

    使用copy的时候,需要注意的两点是,如果你的列分隔符 有问题,譬如说有些列内部有列分隔符 本身,这样就会造成找到的列比表本身多,会有问题.

    如果你的行分隔符有问题,默认是 ,就会找到的列比设计的列少,都会被拒绝载入的.

    我们可以加上exception选项和rejected data选项捕获这些错误的信息与错误的数据,以方便查找原因.

    我的项目中出现过类似的问题,我的解决方案是使用不可见字符譬如0102分别做列与列的分隔符 .

    注意:如果你的密码中含有特殊字符,譬如说!之类的,是需要进行转义的 加上

    3.

    vertica优化代码

    --optimize

    select dbd_create_workspace('Test')

    select dbd_create_design('Test','Test');

    select dbd_add_design_tables('Test','clqr_az.STG_INKJET_TEST_DATA');

    select dbd_populate_design('Test','Test');

       

    --Execute

    select dbd_create_deployment('Test', 'Test');

    select dbd_add_deployment_design('Test', 'Test', 'Test');

    select dbd_add_deployment_drop('Test', 'Test');

    select dbd_execute_deployment('Test', 'Test');

       

    --Cleanup

    select dbd_drop_deployment('Test', 'Test');

    select dbd_remove_design('Test', 'Test');

    select dbd_drop_design('Test', 'Test');

    select dbd_drop_workspace('Test');

       

    当然你也可以手动去启动database desinger.然后针对特定的query设计projections.

    4.

    给表重命名的步骤:

    Alter table schemaname.tablename rename to newtablenamewithoutschema ;

    select analyze_statistics ('schema.tablename');

    select refresh ('schema.tablename');

    这样也会保持projection的数据up to date.

    5.如何查看被删除的数据

    可以使用at epoch at timestamp 'sometime'之类的查询.

    可以在select语句之前加上 at epoch latest来读取当前时间的记录,类似于sqlserver中的snapshot.

    这样不阻塞其他进程写数据.

    在 Merge ,INSERT 后面加入/*direct*/,可以防止WOS full之类的错误,大数据时直接写入ROS.

    有时候表join会把内存爆掉,可以加入: /*+set_vertica_options(EE, ENABLE_JOIN_SPILL)*/

    允许其外部排序.

    6.创建projections

    一开始我们并不太熟悉如何创建合适的projections,推荐使用database desinger自动生成,然后学习这个生成的规则,我使用的感觉就是非常像使用sqlserver或其他关系数据库系统中的覆盖索引(视图).

    但是它往往需要你创建一个buddy projections.

    下面是一个示例:

    CREATE PROJECTION TableName_DBD_2_seg_evan_dbd_b0 /*+basename(TableName_DBD_2_seg_evan_dbd),createtype(D)*/
    
    (
    
    Test_Step_Fact_Key ENCODING DELTARANGE_COMP,
    
    Serial_Number ENCODING AUTO,
    
    Test_Run_Number ENCODING COMMONDELTA_COMP,
    
    Project_Name ENCODING RLE,
    
    Sequence_Name ENCODING RLE,
    
    Step_Start_Local_Timestamp ENCODING COMMONDELTA_COMP,
    
    Step_Start_UTC_Timestamp ENCODING COMMONDELTA_COMP,
    
    Step_Start_Local_Day_Dimension_Key ENCODING COMMONDELTA_COMP,
    
    Step_Number ENCODING BLOCKDICT_COMP,
    
    Manufacturer_Site_Key ENCODING RLE
    
    )
    
    AS
    
    SELECT Test_Step_Fact_Key,
    
    Serial_Number,
    
    Test_Run_Number,
    
    Project_Name,
    
    Sequence_Name,
    
    Step_Start_Local_Timestamp,
    
    Step_Start_UTC_Timestamp,
    
    Step_Start_Local_Day_Dimension_Key,
    
    Step_Number,
    
    Manufacturer_Site_Key
    
    FROM TableName
    
    ORDER BY Manufacturer_Site_Key,
    
    Project_Name,
    
    Sequence_Name,
    
    Serial_Number
    
    SEGMENTED BY MODULARHASH (Serial_Number, Project_Name, Sequence_Name, Manufacturer_Site_Key) ALL NODES OFFSET 0;
    
       
    
    CREATE PROJECTION TableName_DBD_2_seg_evan_dbd_b1 /*+basename(TableName_DBD_2_seg_evan_dbd),createtype(D)*/
    
    (
    
    Test_Step_Fact_Key ENCODING DELTARANGE_COMP,
    
    Serial_Number ENCODING AUTO,
    
    Test_Run_Number ENCODING COMMONDELTA_COMP,
    
    Project_Name ENCODING RLE,
    
    Sequence_Name ENCODING RLE,
    
    Step_Start_Local_Timestamp ENCODING COMMONDELTA_COMP,
    
    Step_Start_UTC_Timestamp ENCODING COMMONDELTA_COMP,
    
    Step_Start_Local_Day_Dimension_Key ENCODING COMMONDELTA_COMP,
    
    Step_Number ENCODING BLOCKDICT_COMP,
    
    Manufacturer_Site_Key ENCODING RLE
    
    )
    
    AS
    
    SELECT Test_Step_Fact_Key,
    
    Serial_Number,
    
    Test_Run_Number,
    
    Project_Name,
    
    Sequence_Name,
    
    Step_Start_Local_Timestamp,
    
    Step_Start_UTC_Timestamp,
    
    Step_Start_Local_Day_Dimension_Key,
    
    Step_Number,
    
    Manufacturer_Site_Key
    
    FROM TableName
    
    ORDER BY Manufacturer_Site_Key,
    
    Project_Name,
    
    Sequence_Name,
    
    Serial_Number
    
    SEGMENTED BY MODULARHASH (Serial_Number, Project_Name, Sequence_Name, Manufacturer_Site_Key) ALL NODES OFFSET 1;
    View Code
    View Code
    Looking for a job working at Home about MSBI
  • 相关阅读:
    InitializingBean
    执行jar的记事本
    vue中$forceUpdate()事件
    帆软时间检索限制90天
    vue清楚子组件v-model绑定的值
    Intellij IDEA中启动多个微服务(开启Run Dashboard管理)
    Java将CST的时间字符串转换成需要的日期格式字符串
    axios发送命令如何实现同步
    利用tomcat启动web前端
    vue生命周期
  • 原文地址:https://www.cnblogs.com/huaxiaoyao/p/3689378.html
Copyright © 2011-2022 走看看