zoukankan      html  css  js  c++  java
  • 关于DataPump的external_table模式

    在pre10g的很长时间内,Oracle仅提供exp/imp导入导出工具,虽然这2个实用程序十分有效(现在也是如此),但因为它们受限于client/server模式工具自身的限制,以普通用户程序的身份来运转数以TB计的数据,其才不堪大用!DataPump是10g以后主推的数据抽取/导入工具,不同于exp/imp工具,DataPump是一个服务器端的实用程序,因为运行在服务器上故而DataPump进程可以直接访问数据文件与SGA(无需借shadow进程之手),与exp/imp工具相比使用DataPump可以获得显著的性能改善。DataPump可以通过直接路径或外部表路径这两种方法导出数据;其中直接路径避开了数据库高速缓存。当使用直接路径模式抽取数据时,DataPump从磁盘直接读取数据文件,抽取和格式化文件内容,最后将内容写出到转储文件。该种模式和SGA交互等待少,其导入导出速度直接取决于数据库所在磁盘速度和cpu;因此,直接路径极为快速。 外部表路径模式将使用到数据库的高速缓存buffer cache,通过外部表路径方法导出数据时,DataPump使用普通的SELECT操作将数据块从数据文件中读入buffer cache,为了写出转储文件,数据会在缓存中被格式化。通过外部表路径导入数据时,DataPump根据转储文件的内容构造标准的插入语句,并且通过将数据块从数据文件读至缓存来执行这些语句,插入操作按照标准的样式在缓存中完成;如同任何普通DML操作一样,外部表路径也会同时产生撤销和重做。 DataPump自身会根据对象的复杂性作出使用直接路径还是外部表路径的选择;对于较复杂的对象(后文将列出)而言,为了分解复杂性而必须同SGA进行交互,此情况下Data Pump只能采用外部表模式。我们还可以通过使用access_method参数来控制其行为,当然这仅在我们确认Data Pump作出了错误选择时才有必要。
    若满足右列条件EXPDP将采用direct_path即直接路径模式 表结构允许使用直接路径模式,举例而言: 表上没有启用针对SELECT操作的fine-grained access control 非队列表(queue table) 表上没有BFILE和opaque类型的列,或包含有opaque列的对象类型 表上没有加密列 表上没有被废弃的旧类型列 若表上存在LONG或LONG RAW类型列,则此列只能是最后一列 使用Expdp执行导出任务时没有为相关表指定QUERY, SAMPLE, or REMAP_DATA等参数 需要导出的表或分区相对较少(多达250M),亦或者表或分区其实很大,但导出任务无法工作在并行模式(未指定parallel参数,或parallel参数设置为1)
    若满足右列条件EXPDP将采用external_table即外部表模式 数据结构不满足在直接路径模式下抽取的条件,举例而言: 表上启用了针对SELECT操作的精细粒度控制 队列表 表上包含了BFILE或opaque类型列,或者包含有opaque列的对象类型 表上存在加密列 表上存在被废弃的旧类型列 表上存在LONG或LONG RAW类型列,且不是最后列 数据结构满足使用直接路径模式的条件,但执行导出任务时相关表上指定了QUERY, SAMPLE, or REMAP_DATA等参数 数据结构满足使用直接路径模式的条件,但相关的表或分区相对较大(大于250M),此时并行SQL可以用来加速数据抽取
    若满足右列条件IMPDP将采用direct_path即直接路径模式 数据结构满足使用直接路径模式的条件,举例而言: 当导入某单一表分区时该分区表上没有建立全局索引,这一点也包括分区的对象表 没有基于LOB列建立的域索引(domain index) 非cluster表 表上没有BFILE列或opaque类型列 表上没有嵌入了opaque类型的VARRAY列 表上没有加密列 没有启用补全日志(Supplemental logging)且表上没有LOB类型列 若导入表已预先建立了表建构,则需满足以下条件: 表上没有激活的触发器 并且 若是分区表则应有索引 并且 表上上没有启用针对INSERT操作的精细粒度控制 并且 表上除去check类型约束不存在其他类型约束 并且 表上没有unique的索引 执行导入任务时没有为相关表指定QUERY, REMAP_DATA等参数 需要导入的表或分区相对较小(少于250M),或者表或分区其实很大,但无法以并行模式导入(未指定parallel参数或指定其为1)
    若满足右列条件IMPDP将采用external_table即外部表路径 当数据结构不满足在直接路径模式下导入的条件,举例而言: 当导入某单一表分区时该分区表上建有全局索引,这一点也包括分区的对象表 表上有基于LOB列建立的域索引(domain index) cluster表 表上有BFILE列或opaque类型列 表上有嵌入了opaque类型的VARRAY列 表上有加密列 启用了补全日志且表上有至少一个LOB列 若导入表已预先建立了表建构,且满足以下条件: 表上有激活的触发器 或者 是分区表且没有任何索引 或者 表上启用了针对INSERT操作的精细粒度控制 或者 表上除去check类型约束还还有其他类型约束 或者 表上有unique的索引 执行导入任务时有为相关表指定QUERY, REMAP_DATA等参数 数据结构满足使用直接路径模式的条件,但相关的表或分区相对较大(大于250M),此时并行SQL可以用来加速数据导入
    我们有必要深入了解一下外部表路径究竟是如何工作的:
    SQL> select * from v$version;
    BANNER
    ----------------------------------------------------------------
    Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
    PL/SQL Release 10.2.0.4.0 - Production
    CORE    10.2.0.4.0      Production
    TNS for Linux: Version 10.2.0.4.0 - Production
    NLSRTL Version 10.2.0.4.0 - Production
    
    SQL> exec dbms_workload_repository.create_snapshot;
    PL/SQL procedure successfully completed.
    
    /*  执行expdp导出任务,TCACHE表大小为465M,这里显示指定了外部表路径 */
    
    [maclean@rh2 ~]$ expdp maclean/maclean directory=dumps tables=TCACHE dumpfile=tcache1.dmp access_method=external_table
    Export: Release 10.2.0.4.0 - 64bit Production on Tuesday, 24 August, 2010 20:31:48
    Copyright (c) 2003, 2007, Oracle.  All rights reserved.
    Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options
    Starting "MACLEAN"."SYS_EXPORT_TABLE_07":  maclean/******** directory=dumps tables=TCACHE dumpfile=tcache1.dmp access_method=external_table
    Estimate in progress using BLOCKS method...
    Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
    Total estimation using BLOCKS method: 544 MB
    Processing object type TABLE_EXPORT/TABLE/TABLE
    Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
    . . exported "MACLEAN"."TCACHE"                          465.6 MB 4999999 rows
    Master table "MACLEAN"."SYS_EXPORT_TABLE_07" successfully loaded/unloaded
    ******************************************************************************
    Dump file set for MACLEAN.SYS_EXPORT_TABLE_07 is:
      /s01/dump/tcache1.dmp
    Job "MACLEAN"."SYS_EXPORT_TABLE_07" successfully completed at 20:32:18
    
    /*  这次我们指定了并行度为4  */
    
    [maclean@rh2 ~]$ expdp maclean/maclean directory=dumps tables=TCACHE dumpfile=tcache2.dmp,tcache3.dmp,tcache4.dmp,tcache5.dmp access_method=external_table parallel=4
    Export: Release 10.2.0.4.0 - 64bit Production on Tuesday, 24 August, 2010 20:32:49
    Copyright (c) 2003, 2007, Oracle.  All rights reserved.
    Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options
    Starting "MACLEAN"."SYS_EXPORT_TABLE_07":  maclean/******** directory=dumps tables=TCACHE dumpfile=tcache2.dmp,tcache3.dmp,tcache4.dmp,tcache5.dmp access_method=external_table parallel=4
    Estimate in progress using BLOCKS method...
    Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
    Total estimation using BLOCKS method: 544 MB
    Processing object type TABLE_EXPORT/TABLE/TABLE
    Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
    . . exported "MACLEAN"."TCACHE"                          465.6 MB 4999999 rows
    Master table "MACLEAN"."SYS_EXPORT_TABLE_07" successfully loaded/unloaded
    ******************************************************************************
    Dump file set for MACLEAN.SYS_EXPORT_TABLE_07 is:
      /s01/dump/tcache2.dmp
      /s01/dump/tcache3.dmp
      /s01/dump/tcache4.dmp
    Job "MACLEAN"."SYS_EXPORT_TABLE_07" successfully completed at 20:33:04
    
    SQL> exec dbms_workload_repository.create_snapshot;
    PL/SQL procedure successfully completed.
    
    SQL> @?/rdbms/admin/awrrpt
    
    接着我们分析该时段内的awr报告!
    
    从AWR报告中,我们不难找到以下语句:
    SQL ID 3qwsywpvtdyjp:
    CREATE TABLE "ET$088200010001" ( "OWNER", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID", "DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPORARY", "GENERATED", "SECONDARY" ) ORGANIZATION EXTERNAL ( TYPE ORACLE_DATAPUMP DEFAULT DIRECTORY DUMPS ACCESS PARAMETERS (DEBUG =0 DATAPUMP INTERNAL TABLE "MACLEAN"."TCACHE" JOB ( "MACLEAN", "SYS_EXPORT_TABLE_07", 1) WORKERID 1 PARALLEL 1 VERSION COMPATIBLE ENCRYPTPASSWORDISNULL ) LOCATION ('bogus.dat') ) PARALLEL 1 REJECT LIMIT UNLIMITED AS SELECT /*+ PARALLEL(KU$, 1) */ "OWNER", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID", "DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPORARY", "GENERATED", "SECONDARY" FROM RELATIONAL("MACLEAN"."TCACHE" ) KU$
    
    SQL ID 1rxax27p7anr3:
    CREATE TABLE "ET$088000020001" ( "OWNER", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID", "DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPORARY", "GENERATED", "SECONDARY" ) ORGANIZATION EXTERNAL ( TYPE ORACLE_DATAPUMP DEFAULT DIRECTORY DUMPS ACCESS PARAMETERS (DEBUG =0 DATAPUMP INTERNAL TABLE "MACLEAN"."TCACHE" JOB ( "MACLEAN", "SYS_EXPORT_TABLE_07", 1) WORKERID 2 PARALLEL 2 VERSION COMPATIBLE ENCRYPTPASSWORDISNULL ) LOCATION ('bogus.dat') ) PARALLEL 2 REJECT LIMIT UNLIMITED AS SELECT /*+ PARALLEL(KU$, 2) */ "OWNER", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID", "DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPORARY", "GENERATED", "SECONDARY" FROM RELATIONAL("MACLEAN"."TCACHE" ) KU$
    
    以上就是Expdp在导出数据时使用的创建组织类型为ORACLE_DATAPUMP的外部表的SQL语句;后者语句中指定的并行度为2,而非我所指定的4,因该是Oracle考虑到所需导出表的具体大小,设置过高的并行度反而会适得其反,所以对设定值做了修正。
    
    启用了并行的SQL 1rxax27p7anr3执行时间为22.63s,反而要比不使用并行慢;可见在TCACHE表这个数量级,完全没有必要使用并行导出。 接着我们来探究一下外部表路径的导入:
    SQL> truncate table tcache;
    Table truncated.
    SQL> set pagesize 1400;
    SQL> set linesize 140;
    
    /* 执行外部表路径的数据导入,并行度1  */
    
    [maclean@rh2 admin]$ impdp maclean/maclean directory=dumps tables=TCACHE content=data_only  dumpfile=tcache1.dmp access_method=external_table
    
    /* 在以上SQL执行期间查询V$SQL动态视图的相关语句*/
    
    select sql_text,
           sql_id,
           cpu_time,
           elapsed_time,
           disk_reads,
           buffer_gets,
           rows_processed
      from v$sql
     where sql_text like '%TCACHE%'
       and sql_text not like '%like%';
    
    SQL_TEXT
    --------------------------------------------------------------------------------------------------------------------------------------------
    SQL_ID          CPU_TIME ELAPSED_TIME DISK_READS BUFFER_GETS ROWS_PROCESSED
    ------------- ---------- ------------ ---------- ----------- --------------
    INSERT /*+ SYS_DL_CURSOR */ INTO RELATIONAL("MACLEAN"."TCACHE") ("OWNER","OBJECT_NAME","SUBOBJECT_NAME","OBJECT_ID","DATA_OBJECT_ID","OBJECT
    _TYPE","CREATED","LAST_DDL_TIME","TIMESTAMP","STATUS","TEMPORARY","GENERATED","SECONDARY") VALUES (NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,N
    ULL,NULL,NULL,NULL,NULL)
    6tn47a220d34q        408          408          0           0              0
    
    INSERT /*+APPEND+*/  /*+PARALLEL("TCACHE",1)+*/ INTO RELATIONAL("MACLEAN"."TCACHE" ) ("OWNER", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID",
     "DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPORARY", "GENERATED", "SECONDARY")     SELECT "OWNE
    R", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID", "DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPO
    RARY", "GENERATED", "SECONDARY"     FROM "ET$087B00060001" KU$
    66q4w8c4ak0wf      31576        31576          0         608              0
    
    /* 当导入job完成后再次查询 */
    select sql_text,
           sql_id,
           cpu_time,
           elapsed_time,
           disk_reads,
           buffer_gets,
           rows_processed
      from v$sql
     where sql_text like '%TCACHE%'
       and sql_text not like '%like%';
    
    SQL_TEXT
    --------------------------------------------------------------------------------------------------------------------------------------------
    SQL_ID          CPU_TIME ELAPSED_TIME DISK_READS BUFFER_GETS ROWS_PROCESSED
    ------------- ---------- ------------ ---------- ----------- --------------
    INSERT /*+ SYS_DL_CURSOR */ INTO RELATIONAL("MACLEAN"."TCACHE") ("OWNER","OBJECT_NAME","SUBOBJECT_NAME","OBJECT_ID","DATA_OBJECT_ID","OBJECT
    _TYPE","CREATED","LAST_DDL_TIME","TIMESTAMP","STATUS","TEMPORARY","GENERATED","SECONDARY") VALUES (NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,N
    ULL,NULL,NULL,NULL,NULL)
    6tn47a220d34q        408          408          0           0              0
    
    /* 执行完成后一句SQL立马消失了!? Oracle好像很不情愿我们看到外部表路径使用的INSERT语句 :) */
    
    INSERT /*+APPEND+*/ /*+PARALLEL("TCACHE",1)+*/
    INTO RELATIONAL
      ("MACLEAN"."TCACHE")
      ("OWNER",
       "OBJECT_NAME",
       "SUBOBJECT_NAME",
       "OBJECT_ID",
       "DATA_OBJECT_ID",
       "OBJECT_TYPE",
       "CREATED",
       "LAST_DDL_TIME",
       "TIMESTAMP",
       "STATUS",
       "TEMPORARY",
       "GENERATED",
       "SECONDARY")
      SELECT "OWNER",
             "OBJECT_NAME",
             "SUBOBJECT_NAME",
             "OBJECT_ID",
             "DATA_OBJECT_ID",
             "OBJECT_TYPE",
             "CREATED",
             "LAST_DDL_TIME",
             "TIMESTAMP",
             "STATUS",
             "TEMPORARY",
             "GENERATED",
             "SECONDARY"
        FROM "ET$087F00230001" KU$
    
    /* 这里的KU$代表的就是以ORACLE_DATAPUMP转储文件形式存放在数据库外的外部表 */
    
    /* 可以看到这里尝试使用了APPEND插入模式,以减少undo和redo的产生*/
    
    SQL> truncate table tcache;
    Table truncated.
    
    /* 若启用并行导入呢? */
    
    [maclean@rh2 admin]$ impdp maclean/maclean directory=dumps tables=TCACHE content=data_only  dumpfile=tcache2.dmp,tcache3.dmp,tcache4.dmp access_method=external_table parallel=3
    ........
    
    select sql_text,
           sql_id,
           cpu_time,
           elapsed_time,
           disk_reads,
           buffer_gets,
           rows_processed
      from v$sql
     where sql_text like '%TCACHE%'
       and sql_text not like '%like%';
    SQL_TEXT
    --------------------------------------------------------------------------------------------------------------------------------------------
    SQL_ID          CPU_TIME ELAPSED_TIME DISK_READS BUFFER_GETS ROWS_PROCESSED
    ------------- ---------- ------------ ---------- ----------- --------------
    INSERT /*+APPEND+*/  /*+PARALLEL("TCACHE",1)+*/ INTO RELATIONAL("MACLEAN"."TCACHE" ) ("OWNER", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID",
     "DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPORARY", "GENERATED", "SECONDARY")     SELECT "OWNE
    R", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID", "DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPO
    RARY", "GENERATED", "SECONDARY"     FROM "ET$087A00040001" KU$
    2mvs15623ssvy    5910289      5910289          0       73578              0
    
    INSERT /*+ SYS_DL_CURSOR */ INTO RELATIONAL("MACLEAN"."TCACHE") ("OWNER","OBJECT_NAME","SUBOBJECT_NAME","OBJECT_ID","DATA_OBJECT_ID","OBJECT
    _TYPE","CREATED","LAST_DDL_TIME","TIMESTAMP","STATUS","TEMPORARY","GENERATED","SECONDARY") VALUES (NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,N
    ULL,NULL,NULL,NULL,NULL)
    6tn47a220d34q       1167         1167          0           0              0
    
    /* 似乎因为沿用了APPEND插入模式,插入语句的并行度并未上升 */
    
    /* 因为Oracle不想同我们分享它的小秘密,所以awr和ash报告中都不记录外部表路径插入SQL的相关信息 */
    
    /* 好吧,它有权保持沉默!  */
    
    That's great!
    
  • 相关阅读:
    swift mac 使用git, 并使用osc, 打开当前目录命令在终端输入 open . windows 下为start .
    压力测试-jmeter
    nginx在线与离线安装
    数据库架构简要解析
    maven-build-downloading
    springboot-权限控制shiro(二)
    mysql8报错解析
    springboot-权限控制shiro(一)
    pycharm问题解析(connecting to console)
    springboot-热部署Jrebel
  • 原文地址:https://www.cnblogs.com/macleanoracle/p/2967510.html
Copyright © 2011-2022 走看看