zoukankan      html  css  js  c++  java
  • LOAD DATA INFILE – performance case study

    转:

    http://venublog.com/2007/11/07/load-data-infile-performance/

    I often noticed that people complain about the LOAD DATA performance when loading the table with large number of rows of data. Even today I saw a case where the LOAD DATA on a simple 3 column table with about 5 million rows taking ~15 minutes of time. This is because the server did not had any tuning in regards to bulk insertion.

    Consider the following simple MyISAM table on Redhat Linux 32-bit.

     
     
     
    Shell
     
    1
    2
    3
    4
    5
    6
    7
    8
     
    CREATE TABLE load1 (
      `col1` varchar(100) NOT NULL default '',
      `col2` int(11) default NULL,
      `col3` char(1) default NULL,
      PRIMARY KEY  (`col1`)
    ) TYPE=MyISAM;
     

    The table has a string key column. Here is the data file(download here) that I used it for testing:

     
     
     
    Shell
     
    1
    2
    3
    4
    5
    6
    7
     
    [vanugant@escapereply:t55 tmp]$ wc loaddata.csv
      5164946   5164946 227257389 loaddata.csv
    [vanugant@escapereply:t55 tmp]$ ls -alh loaddata.csv
    -rw-r--r--  1 vanugant users 217M Nov  6 14:42 loaddata.csv
    [vanugant@escapereply:t55 tmp]$
     

    Here is the default mysql system variables related to LOAD DATA:

     
     
     
    Shell
     
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
     
    mysql> show variables;
    +-------------------------+---------+
    | Variable_name              | Value   |
    +-------------------------+---------+
    | bulk_insert_buffer_size   | 8388608 |
    | myisam_sort_buffer_size   | 16777216 |
    | key_buffer_size            | 33554432 |
    +-------------------------+----------+
     

    and here is the actual LOAD DATA query to load all ~5m rows (~256M of data) to the table and its timing.

     
     
     
    Shell
     
    1
    2
    3
    4
    5
     
    mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
    Query OK, 4675823 rows affected (14 min 56.84 sec)
    Records: 5164946  Deleted: 0  Skipped: 489123  Warnings: 0
     

    Now, lets experiment by disabling the keys in the table before running the LOAD DATA:

     
     
     
    Shell
     
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
     
    mysql> SET SESSION BULK_INSERT_BUFFER_SIZE=314572800;
    Query OK, 0 rows affected (0.00 sec)
     
    mysql> alter table load1 disable keys;
    Query OK, 0 rows affected (0.00 sec)
     
    mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
    Query OK, 4675823 rows affected (13 min 47.50 sec)
    Records: 5164946  Deleted: 0  Skipped: 489123  Warnings: 0
     

    No use, just 1% increase or same…., now lets set the real MyISAM values… and try again…

     
     
     
    Shell
     
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
     
    mysql> SET SESSION BULK_INSERT_BUFFER_SIZE=256217728;
    Query OK, 0 rows affected (0.00 sec)
     
    mysql> set session MYISAM_SORT_BUFFER_SIZE=256217728;
    Query OK, 0 rows affected (0.00 sec)
     
    mysql> set global KEY_BUFFER_SIZE=256217728;
    Query OK, 0 rows affected (0.05 sec)
     
    mysql> alter table load1 disable keys;
    Query OK, 0 rows affected (0.00 sec)
     
    mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
    Query OK, 4675823 rows affected (1 min 55.05 sec)
    Records: 5164946  Deleted: 0  Skipped: 489123  Warnings: 0
     
    mysql> alter table load1 enable keys;
    Query OK, 0 rows affected (0.00 sec)
     

    Wow…thats almost 90% increase in the performance. So, disabling the keys in MyISAM is not just the key, but tuning the buffer size does play role based on the input data.

    For the same case with Innodb, here is the status by adjusting the Innodb_buffer_pool_size=1G andInnodb_log_file_size=256M along with innodb_flush_logs_at_trx_commit=1.

     
     
     
    Shell
     
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
     
    mysql> show variables like '%innodb%size';
    +---------------------------------+------------+
    | Variable_name                   | Value      |
    +---------------------------------+------------+
    | innodb_additional_mem_pool_size | 26214400   |
    | innodb_buffer_pool_size         | 1073741824 |
    | innodb_log_buffer_size          | 8388608    |
    | innodb_log_file_size            | 268435456  |
    +---------------------------------+------------+
     
    mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
    Query OK, 4675823 rows affected (2 min 37.53 sec)
    Records: 5164946  Deleted: 0  Skipped: 489123  Warnings: 0
     

    With innodb_flush_logs_at_trx_commit=2, innodb_flush_method=O_DIRECT and innodb_doublewrite=0; it will be another 40% difference (use all these variables with caution, unless you know what you are doing)

     
     
     
    Shell
     
    1
    2
    3
    4
    5
     
    mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
    Query OK, 4675823 rows affected (1 min 53.69 sec)
    Records: 5164946  Deleted: 0  Skipped: 489123  Warnings: 0
  • 相关阅读:
    servlet上传图片 服务器路径(转)
    图片和提交servlet的相对和绝对路径
    Intel 的面试经历中国研究院
    CentOS-6.5-x86_64 最小化安装,已安装包的总数,这些包?
    西门子PLC学习笔记8-(计时器)
    这个周末我太累了
    windows7股票的,win8残疾人,安装Han澳大利亚sinoxn个时间,sinox它支持大多数windows软体
    net.sf.json 迄今 时刻 格式 办法
    ar命令提取.a时刻,一个错误 is a fat file (use libtool(1) or lipo(1) and ar(1) on it)
    POJ 2187: Beauty Contest(旋转卡)
  • 原文地址:https://www.cnblogs.com/yuyue2014/p/5542284.html
Copyright © 2011-2022 走看看