zoukankan      html  css  js  c++  java
  • 批量导入数据(ADO.NET)

    如果你需要在程序中批量插入成千上万行的数据,你会怎么编写代码呢?最近在帮朋友调优这个的时候,总结了几种方法,并对其进行比较。

    大概的界面如下,我模拟了一个客户资料表.

    image

    数据我是放在一个XML文件的,大约6734行。类似下面的格式

    <?xml version="1.0" encoding="utf-8" ?> 
    <root>
    <Customers>
      <CustomerID>ALFKI</CustomerID>
      <CompanyName>Sina</CompanyName>
      <ContactName>Maria Anders</ContactName>
      <ContactTitle>Sales Representative</ContactTitle>
      <Address>Obere Str. 57</Address>
      <City>Berlin</City>
      <PostalCode>12209</PostalCode>
      <Country>Germany</Country>
      <Phone>030-0074321</Phone>
      <Fax>030-0076545</Fax>
    </Customers>
    <Customers>
      <CustomerID>ANATR</CustomerID>
      <CompanyName>Ana Trujillo Emparedados y helados</CompanyName>
      <ContactName>Ana Trujillo</ContactName>
      <ContactTitle>Owner</ContactTitle>
      <Address>Avda. de la Constitución 2222</Address>
      <City>México D.F.</City>
      <PostalCode>05021</PostalCode>
      <Country>Mexico</Country>
      <Phone>(5) 555-4729</Phone>
      <Fax>(5) 555-3745</Fax>
    </Customers>
    <Customers>
      <CustomerID>ANTON</CustomerID>
      <CompanyName>Antonio Moreno Taquería</CompanyName>
      <ContactName>Antonio Moreno</ContactName>
      <ContactTitle>Owner</ContactTitle>
      <Address>Mataderos  2312</Address>
      <City>México D.F.</City>
      <PostalCode>05023</PostalCode>
      <Country>Mexico</Country>
      <Phone>(5) 555-3932</Phone>
    </Customers>
    </root>

    下面首先在服务器稍微准备一下环境

    USE [tempdb]
    GO
    
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[Customers](
        [CustomerID] [nchar](5) NOT NULL,
        [CompanyName] [nvarchar](40) NOT NULL,
        [ContactName] [nvarchar](30) NULL,
        [ContactTitle] [nvarchar](30) NULL,
        [Address] [nvarchar](60) NULL,
        [City] [nvarchar](15) NULL,
        [Region] [nvarchar](15) NULL,
        [PostalCode] [nvarchar](10) NULL,
        [Country] [nvarchar](15) NULL,
        [Phone] [nvarchar](24) NULL,
        [Fax] [nvarchar](24) NULL
    ) ON [PRIMARY]
    
    
    
    CREATE PROCEDURE [dbo].[usp_InsertCustomer]
        @CustomerID nchar(5),
        @CompanyName nvarchar(40),
        @ContactName nvarchar(30),
        @ContactTitle nvarchar(30),
        @Address nvarchar(60),
        @City nvarchar(15),
        @Region nvarchar(15),
        @PostalCode nvarchar(10),
        @Country nvarchar(15),
        @Phone nvarchar(24),
        @Fax nvarchar(24)
    AS
    
    SET NOCOUNT ON
    
    INSERT INTO [dbo].[Customers] (
        [CustomerID],
        [CompanyName],
        [ContactName],
        [ContactTitle],
        [Address],
        [City],
        [Region],
        [PostalCode],
        [Country],
        [Phone],
        [Fax]
    ) VALUES (
        @CustomerID,
        @CompanyName,
        @ContactName,
        @ContactTitle,
        @Address,
        @City,
        @Region,
        @PostalCode,
        @Country,
        @Phone,
        @Fax
    )
     
     
    我们在tempdb中创建了一个表和一个存储过程
     
    首先,我们把数据加载到一个DataSet
            DataSet ds = new DataSet();
            private void btLoadData_Click(object sender, EventArgs e)
            {
                string dataFile = "CustomersData.xml";
                ds.ReadXml(dataFile);
                bindingSource1.DataSource = ds;
                bindingSource1.DataMember = "Customers";
                dataGridView1.DataSource = bindingSource1;
                
            }

    然后,我们第一个测试代码是遍历这个DataSet,每一行提交一次

            private string GetConnectionString()
            {
                return "server=(local);database=tempdb;integrated security=true;";
            }
            /// <summary>
            /// 直接遍历,一个一个的提交给服务器。时间为265毫秒左右
            /// 每一行都需要写日志
            /// </summary>
            /// <param name="sender"></param>
            /// <param name="e"></param>
            private void btOneByOne_Click(object sender, EventArgs e)
            {
                SqlConnection conn = new SqlConnection(GetConnectionString());
                SqlCommand cmd = conn.CreateCommand();
                cmd.CommandText = "usp_InsertCustomer";
                cmd.CommandType = CommandType.StoredProcedure;
                conn.Open();
    
                TimeSpan startTime = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime;
    
                foreach (DataRow row in ds.Tables[0].Rows)
                {
                    cmd.Parameters.Clear();
                    SqlParameter[] param = new SqlParameter[]{
                        new SqlParameter("@CustomerID",row[0].ToString()),
                        new SqlParameter("@CompanyName",row[1].ToString()),
                        new SqlParameter("@ContactName",row[2].ToString()),
                        new SqlParameter("@ContactTitle",row[3].ToString()),
                        new SqlParameter("@Address",row[4].ToString()),
                        new SqlParameter("@City",row[5].ToString()),
                        new SqlParameter("@Region",row[6].ToString()),
                        new SqlParameter("@PostalCode",row[7].ToString()),
                        new SqlParameter("@Country",row[8].ToString()),
                        new SqlParameter("@Phone",row[9].ToString()),
                        new SqlParameter("@Fax",row[10].ToString())
                    };
                    cmd.Parameters.AddRange(param);
                    cmd.ExecuteNonQuery();
                }
                conn.Close();
    
    
                TimeSpan duration = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime.Subtract(startTime);
                MessageBox.Show("已经全部插入成功,所用时间为" + duration.Milliseconds.ToString() + "毫秒");
            }

     

    接下来,我们使用ADO.NET内置的一个DataAdapter来提交

            /// <summary>
            /// 这是使用Adapter的方式,其实还是遍历,而且语法也没有简单
            /// 同时,速度甚至更慢。时间为650毫秒左右
            /// 每一行都需要写日志
            /// </summary>
            /// <param name="sender"></param>
            /// <param name="e"></param>
            private void btUseAdapter_Click(object sender, EventArgs e)
            {
                SqlDataAdapter adapter = new SqlDataAdapter();
                adapter.AcceptChangesDuringUpdate = false;//为了演示目的,把这个开关关掉,以免它在更新完成后把数据集标记为未更改
                adapter.UpdateBatchSize = 10;//这个好像也没有什么用
                
                SqlConnection conn = new SqlConnection(GetConnectionString());
                SqlCommand insertCommand = conn.CreateCommand();
                insertCommand.CommandText = "usp_InsertCustomer";
                insertCommand.CommandType = CommandType.StoredProcedure;
    
                insertCommand.UpdatedRowSource = UpdateRowSource.None;
    
                insertCommand.Parameters.Add("@CustomerID", SqlDbType.NChar, 5, "CustomerID");
                insertCommand.Parameters.Add("@CompanyName", SqlDbType.NVarChar, 40, "CompanyName");
                insertCommand.Parameters.Add("@ContactName", SqlDbType.NVarChar, 30, "ContactName");
                insertCommand.Parameters.Add("@ContactTitle", SqlDbType.NVarChar, 30, "ContactTitle");
                insertCommand.Parameters.Add("@Address", SqlDbType.NVarChar, 60, "Address");
                insertCommand.Parameters.Add("@City", SqlDbType.NVarChar, 15, "City");
                insertCommand.Parameters.Add("@Region", SqlDbType.NVarChar, 15, "Region");
                insertCommand.Parameters.Add("@PostalCode", SqlDbType.NVarChar, 10, "PostalCode");
                insertCommand.Parameters.Add("@Country", SqlDbType.NVarChar, 15, "Country");
                insertCommand.Parameters.Add("@Phone", SqlDbType.NVarChar, 24, "Phone");
                insertCommand.Parameters.Add("@Fax", SqlDbType.NVarChar, 24, "Fax");
    
    
                adapter.InsertCommand = insertCommand;
                TimeSpan startTime = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime;
                adapter.Update(ds,"Customers");
                TimeSpan duration = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime.Subtract(startTime);
                MessageBox.Show("已经全部插入成功,所用时间为" + duration.Milliseconds.ToString() + "毫秒");
    
                
                
            }

     

    最后,我们找到了最快的方法

            /// <summary>
            /// 使用新的API,批量导入,这个速度很快,大约26毫秒,很显然,这种方式只写一次日志,不会为每一行写日志
            /// </summary>
            /// <param name="sender"></param>
            /// <param name="e"></param>
            private void btBCP_Click(object sender, EventArgs e)
            {
                using (SqlConnection conn = new SqlConnection(GetConnectionString()))
                {
                    SqlBulkCopy bcp = new SqlBulkCopy(conn);
                    bcp.DestinationTableName = "Customers";
                    bcp.BatchSize = 100;//这是批尺寸可以调整
                    for (int i = 0; i < 11; i++)
                    {
                        bcp.ColumnMappings.Add(i, i);
                    }
                    TimeSpan startTime = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime;
                    conn.Open();
                    bcp.WriteToServer(ds.Tables[0]);
                    TimeSpan duration = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime.Subtract(startTime);
                    MessageBox.Show("已经全部插入成功,所用时间为" + duration.Milliseconds.ToString() + "毫秒");
    
                }
            }

     

    还有一种办法是通过在服务器OPENXML,因为XML反复处理效率很差,所以就没有测试了,可以断定它肯定比其他几种还要慢。

    另外提示一下,如果不用编程的方式,那么有其他三个可能的途径去做这个事情

    1. BCP工具(这是一个命令行,可以做导入和导出,不过来源文件如果不规范,那么可能很费劲)

    2. BULK INSERT语句(这是一个T-SQL语句,只能做导入,我们上面使用的SQLBULKCopy应该和他很类似)

    3. XML Bulk Load(这是一套COM的对象模型,适合导入XML文档)

     

    还有,在做大量的数据导入和导出时,可以考虑微软为SQL Server配套的SSIS(Integration Service)

     

     

     
     
  • 相关阅读:
    寻找水王
    学习进度条(第十一周)
    学习进度条(第十周)
    学习进度条(第九周)
    团队项目:个人工作总结10
    团队项目:个人工作总结09
    团队项目:个人工作总结08
    深度学习之--深度置信网络DBN
    神经网络学习之----受限玻尔兹曼机RBM(代码实现)
    神经网络学习之----受限玻尔兹曼机RBM
  • 原文地址:https://www.cnblogs.com/chenxizhang/p/1331060.html
Copyright © 2011-2022 走看看