微软数据开发技术的前世今生
(Microsoft Data Development Technologies: Past, Present, and Future)
在过去的20多年时间里,微软开发了许多的数据访问方案,这里回顾下这些技术的发展历程。
在1990s,微软主要提供两个数据库产品:Access和SQL Server。Access作为一个桌面数据库,提供了数据的一个表格形式的开发界面,用户完全不用担心和数据库的交互机制,Acceess把这些都做好了,这为许多简单的应用提供了便利。但是,我们也有许多的应用是需要考虑数据访问的,比如许多的企业应用需要考虑数据访问的多并发、高性能,而这时SQL Server一开始就致力于要解决的问题。这里介绍的就是实现数据访问的形形色色的技术。
Win32(Native)平台上的访问技术
在1989年,微软发布SQL Server 1.0的时候,提供了一个简练的API库DB-Library。通过其中的150个函数,应用程序可以实现数据的CRUD功能。同时,还提供了一个预编译器Embedded SQL for C(ESQL for C),支持在代码中嵌入SQL语句,这为后来的Linq提供了铺垫。在这一时期,各个数据库都提供了自己的API库,形成百家争鸣,割据独立的局面。
Figure 1: Data access technologies circa 1990; Microsoft’s offerings were only those for SQL Server.
在1992年9月,微软发布了ODBC(Open Database Connectivity specification)。这个协议定义了50个API,提供了一个统一的调用接口,
Figure 2: Data access technologies in September 1992.
几年以后,随着对象技术和编程语言(COM,C++,Visual Basic)成为主流,新的面向对象的访问技术出现了。最开始是VB3(1992年11月)中引入了Data Access Objects(DAO)。在1995年8月发布的VB4中,被Remote Data Objects(RDO)取代,在ODBC的基础上提供一个VB兼容的对象层。一年后,1996年8月,微软发布了更为普适的的面向对象数据访问技术OLE DB。类似ODBC,OLE DB也是在各个数据存储提供的Providor上进行封装,这个技术也得到了广泛的支持(有更多的数据格式(比如spreadsheet和text)提供支持),但是它没有比ODBC提供更多的抽象,只是提供了一个面向对象的编程模型。
Figure 3: Data access technologies in August 1996.
这个时期,随着网络技术的发展,基于网页的应用越来越重要,但是OLE DB主要面向的是支持指针的编程语言,比如C和C++,也就是说脚本语言,比如VBScript和JavaScript并不能使用OLE DB技术。为了满足这些需求,诞生了ActiveX Data Object(ADO)技术,它在OLE DB的基础上进行了抽象,支持指针和非指针类语言。ADO是在1996年10月发布的,显然在开发OLE的时候ADO已经开始了开发。
Figure 4: Data Access in October 1996.
到这个阶段,先后已经出现了6中数据访问技术,这些技术的应用场景如下。
1996 Choices | SQL Server DBs | DBs w/ ODBC driver | DBs w/ OLE DB driver |
Apps written in C/C++ | DB-Library, ESQL/C ODBC, OLE DB, ADO | ODBC | OLD DB, ADO |
Apps written in VB | RDO | RDO | ADO |
Web applications | ADO |
| ADO |
从那时候开始,有些技术继续发展,有些技术慢慢消亡,比如DB Library到SQL Server 2000以后便不再支持,到SQL Server 2005的时候出现了更好的API,SQL Server Native Client。同样的,到SQL Server 2008,Linq顶替了ESQL for C。
ODBC,OLE DB和ADO仍然是当前win32平台上主要的数据访问技术,现在一般称作Microsoft Data Access Components (MDAC) or the Windows Data Access Components (WDAC)。
Figure 5: Data Access Technologies for Unmanaged Code, current to September 2010.
Current Choice Matrix (unmanaged code) | SQL Server DBs | DBs w/ ODBC driver | DBs w/ OLE DB driver |
Apps written in C/C++ | ODBC, OLE DB, ADO, SQL Native Client | ODBC | OLD DB, ADO |
Web applications | ADO, PHP Driver, JDBC Driver | Accessible through the Microsoft OLE DB Provider for ODBC (MSDASQL) | ADO |
数据访问的现状:直接访问数据(.NET平台(托管平台))
为了满足.NET平台上访问数据的需求,微软开发了ADO.NET,这是ADO的姊妹篇,实现了类似的概念模型,其中比较核心的几个类是:SqlConnection,SqlCommand,DataReader,DataSet,DataTable。在各种ADO.NET Data Provider的基础上,ADO.NET实现了统一的抽象,如下图。其中SQL Server提供了专门的provider,同时,ADO.NET可以很好支持XML。
Figure 6: The introduction of ADO.NET in 2002/2003.
到目前为止,这些数据访问技术各有不同,但是都遵循一个相同的编程模型:
1.Open a database connection
2.Execute a query in the database
3.Get back a set of results
4.Process those results
5.Release the results
6.Close the connection
在ADO.NET平台上,目前最主要的两个方向是LINQ(.NET 3.5)和ADO.NET Entity Framework(.NET 3.5 SP1)。Linq实现了对各种数据的一致的查询接口,只要类支持IEnumerable或者IQueryable接口,便可以使用Linq。
Figure 7: The introduction of LINQ in November 2007 including LINQ to Objects,LINQ to XML, LINQ to DataSet, and LINQ to SQL.
不管是Linq还是以前的ODBC,OLE DB,ADO,ADO.NET,他们提供的数据抽象还是关系型的,展现给上层应用的还是表结构的形式。对于简单的应用,也许我们的概念模型和数据模型是一个样子,然而,随着应用越来越复杂,领域概念模型和存储数据模型会越来越不一样,比如我们可能会把Order分成Order和OrderDetail表,但是对象模型仍然使用Order。这是设计中出现的object-relational impedance mismatch,对象-关系阻抗失配。为了解决这个问题,需要进行映射,Entity Framework便是这样一个映射的工具(ORM)。
Figure 8: The Entity Framework, first released in August 2008, automates the hard work of conceptual mapping. An Entity Data Model is used at compile time to generate classes for a mapping layer.
数据开发的现状:云上数据
前面讨论的都是对数据的直接访问,但也有些场景,比如Web应用,并不能直接访问数据,这时候需要REST-based数据服务,也就是WCF Data Services (formerly ADO.NET Data Services and code name “Astoria”)。数据服务使用URI定位数据,使用JSON和XML表示数据。
Figure 9: WCF Data Services facilitates creating and consuming REST-based data services.
Entity Framework
Entity Framework architecture for accessing data.
Linq to dataset
可以简化对DataSet的查询操作。为了使DataSet可以支持LINQ,System.Data.DataSetExtensions模块提供了DataRowExtensions和DataTableExtensions的扩展方法。
Linq需要对象实现IEnumerable<T>或者T:System.Query.IQueryable`1接口。DataTable没有实现这些接口,所以要调用扩展方法AsEnumerable才能使用LINQ查询。
准备数据
使用SqlDataAdapter填充DataSet
try { // Create a new adapter and give it a query to fetch sales order, contact, // address, and product information for sales in the year 2002. Point connection // information to the configuration setting "AdventureWorks". string connectionString = "Data Source=localhost;Initial Catalog=AdventureWorks;" + "Integrated Security=true;"; SqlDataAdapter da = new SqlDataAdapter( "SELECT SalesOrderID, ContactID, OrderDate, OnlineOrderFlag, " + "TotalDue, SalesOrderNumber, Status, ShipToAddressID, BillToAddressID " + "FROM Sales.SalesOrderHeader " + "WHERE DATEPART(YEAR, OrderDate) = @year; " + "SELECT d.SalesOrderID, d.SalesOrderDetailID, d.OrderQty, " + "d.ProductID, d.UnitPrice " + "FROM Sales.SalesOrderDetail d " + "INNER JOIN Sales.SalesOrderHeader h " + "ON d.SalesOrderID = h.SalesOrderID " + "WHERE DATEPART(YEAR, OrderDate) = @year; " + "SELECT p.ProductID, p.Name, p.ProductNumber, p.MakeFlag, " + "p.Color, p.ListPrice, p.Size, p.Class, p.Style, p.Weight " + "FROM Production.Product p; " + "SELECT DISTINCT a.AddressID, a.AddressLine1, a.AddressLine2, " + "a.City, a.StateProvinceID, a.PostalCode " + "FROM Person.Address a " + "INNER JOIN Sales.SalesOrderHeader h " + "ON a.AddressID = h.ShipToAddressID OR a.AddressID = h.BillToAddressID " + "WHERE DATEPART(YEAR, OrderDate) = @year; " + "SELECT DISTINCT c.ContactID, c.Title, c.FirstName, " + "c.LastName, c.EmailAddress, c.Phone " + "FROM Person.Contact c " + "INNER JOIN Sales.SalesOrderHeader h " + "ON c.ContactID = h.ContactID " + "WHERE DATEPART(YEAR, OrderDate) = @year;", connectionString); // Add table mappings. da.SelectCommand.Parameters.AddWithValue("@year", 2002); da.TableMappings.Add("Table", "SalesOrderHeader"); da.TableMappings.Add("Table1", "SalesOrderDetail"); da.TableMappings.Add("Table2", "Product"); da.TableMappings.Add("Table3", "Address"); da.TableMappings.Add("Table4", "Contact"); // Fill the DataSet. da.Fill(ds); // Add data relations. DataTable orderHeader = ds.Tables["SalesOrderHeader"]; DataTable orderDetail = ds.Tables["SalesOrderDetail"]; DataRelation order = new DataRelation("SalesOrderHeaderDetail", orderHeader.Columns["SalesOrderID"], orderDetail.Columns["SalesOrderID"], true); ds.Relations.Add(order); DataTable contact = ds.Tables["Contact"]; DataTable orderHeader2 = ds.Tables["SalesOrderHeader"]; DataRelation orderContact = new DataRelation("SalesOrderContact", contact.Columns["ContactID"], orderHeader2.Columns["ContactID"], true); ds.Relations.Add(orderContact); } catch (SqlException ex) { Console.WriteLine("SQL exception occurred: " + ex.Message); }
来自 <https://msdn.microsoft.com/en-us/library/bb399340(v=vs.110).aspx>
Cross-table queries
下面用到了扩展方法Field<T>(System.Data.DataSetExtensions.dll),实现了检查isnull和类型转换。
// Fill the DataSet. DataSet ds = new DataSet(); ds.Locale = CultureInfo.InvariantCulture; FillDataSet(ds); DataTable orders = ds.Tables["SalesOrderHeader"]; DataTable details = ds.Tables["SalesOrderDetail"]; var query = from order in orders.AsEnumerable() join detail in details.AsEnumerable() on order.Field<int>("SalesOrderID") equals detail.Field<int>("SalesOrderID") where order.Field<bool>("OnlineOrderFlag") == true && order.Field<DateTime>("OrderDate").Month == 8 select new { SalesOrderID = order.Field<int>("SalesOrderID"), SalesOrderDetailID = detail.Field<int>("SalesOrderDetailID"), OrderDate = order.Field<DateTime>("OrderDate"), ProductID = detail.Field<int>("ProductID") }; foreach (var order in query) { Console.WriteLine("{0} {1} {2:d} {3}", order.SalesOrderID, order.SalesOrderDetailID, order.OrderDate, order.ProductID); }
来自 <https://msdn.microsoft.com/en-us/library/bb386969(v=vs.110).aspx>
Linq to SQL
Linq to SQL是.NET 3.5添加的功能,将数据库映射为对象结构,查询时将Linq语句翻译成SQL查询,返回结果再映射回对象,方便访问。参考白皮书:LINQ to SQL: .NET Language-Integrated Query for Relational Data
建立模型
可以通过IDE自动生成,步骤如下。也可以使用工具SqlMetal.exe(Code Generation Tool)
添加Linq to sql类
配置需要映射的数据表
下面是截取的部分自动生成的代码。
[global::System.Data.Linq.Mapping.DatabaseAttribute(Name = "Northwind")] public partial class LingToSqlDataContext : System.Data.Linq.DataContext { public System.Data.Linq.Table<Products> Products { get { return this.GetTable<Products>(); } } public System.Data.Linq.Table<Categories> Categories { get { return this.GetTable<Categories>(); } } } [global::System.Data.Linq.Mapping.TableAttribute(Name = "dbo.Categories")] public partial class Categories : INotifyPropertyChanging, INotifyPropertyChanged { private static PropertyChangingEventArgs emptyChangingEventArgs = new PropertyChangingEventArgs(String.Empty); private int _CategoryID; private string _CategoryName; private string _Description; private System.Data.Linq.Binary _Picture; private EntitySet<Products> _Products; [global::System.Data.Linq.Mapping.ColumnAttribute(Storage = "_CategoryID", AutoSync = AutoSync.OnInsert, DbType = "Int NOT NULL IDENTITY", IsPrimaryKey = true, IsDbGenerated = true)] public int CategoryID { get { return this._CategoryID; } set { if ((this._CategoryID != value)) { this.OnCategoryIDChanging(value); this.SendPropertyChanging(); this._CategoryID = value; this.SendPropertyChanged("CategoryID"); this.OnCategoryIDChanged(); } } } }
public static void TestLinqToSqlSimple() { DataContext dc = new DataContext(ConfigurationManager.ConnectionStrings["Northwind"].ConnectionString); Table<Category> categoryTable = dc.GetTable<Category>(); dc.Log = Console.Out; IQueryable<Category> query = from r in categoryTable where r.CategoryID > 1 select r; foreach (var item in query) { Console.WriteLine("{0}, {1}", item.CategoryID, item.CategoryName); } Category newCat = new Category(); newCat.CategoryName = "test"; categoryTable.InsertOnSubmit(newCat); dc.SubmitChanges(); foreach (var item in query) { Console.WriteLine("{0}, {1}", item.CategoryID, item.CategoryName); } } [Table(Name = "Categories")] class Category { private int _CategoryID; private string _CategoryName; [Column(IsPrimaryKey = true, Storage = "_CategoryID", IsDbGenerated = true)] public int CategoryID { get { return _CategoryID; } set { _CategoryID = value; } } [Column(Storage = "_CategoryName")] public string CategoryName { get { return _CategoryName; } set { _CategoryName = value; } } }
增删查改操作
查找记录
// Northwnd inherits from System.Data.Linq.DataContext.
Northwnd nw = new Northwnd(@"northwnd.mdf");
// or, if you are not using SQL Server Express
// Northwnd nw = new Northwnd("Database=Northwind;Server=server_name;Integrated Security=SSPI");
var companyNameQuery =
from cust in nw.Customers
where cust.City == "London"
select cust.CompanyName;
foreach (var customer in companyNameQuery)
{
Console.WriteLine(customer);
}
// Northwnd inherits from System.Data.Linq.DataContext. Northwnd nw = new Northwnd(@"northwnd.mdf"); // or, if you are not using SQL Server Express // Northwnd nw = new Northwnd("Database=Northwind;Server=server_name;Integrated Security=SSPI"); var companyNameQuery = from cust in nw.Customers where cust.City == "London" select cust.CompanyName; foreach (var customer in companyNameQuery) { Console.WriteLine(customer); }
添加记录
// Northwnd inherits from System.Data.Linq.DataContext. Northwnd nw = new Northwnd(@"northwnd.mdf"); Customer cust = new Customer(); cust.CompanyName = "SomeCompany"; cust.City = "London"; cust.CustomerID = "98128"; cust.PostalCode = "55555"; cust.Phone = "555-555-5555"; nw.Customers.InsertOnSubmit(cust); // At this point, the new Customer object is added in the object model. // In LINQ to SQL, the change is not sent to the database until // SubmitChanges is called. nw.SubmitChanges();
修改记录
Northwnd nw = new Northwnd(@"northwnd.mdf"); var cityNameQuery = from cust in nw.Customers where cust.City.Contains("London") select cust; foreach (var customer in cityNameQuery) { if (customer.City == "London") { customer.City = "London - Metro"; } } nw.SubmitChanges();
删除记录。
Northwnd nw = new Northwnd(@"northwnd.mdf"); var deleteIndivCust = from cust in nw.Customers where cust.CustomerID == "98128" select cust; if (deleteIndivCust.Count() > 0) { nw.Customers.DeleteOnSubmit(deleteIndivCust.First()); nw.SubmitChanges(); }
Typed Dataset
使用强类型的DataSet可以方便调用,同时可以获得编译器的类型检查。可以通过XML文件配置,用xsd生成代码,或者通过VS IDE配置:
xsd.exe /d /l:CS XSDSchemaFileName.xsd /eld /n:XSDSchema.Namespace
来自 <https://msdn.microsoft.com/en-us/library/wha85tzb(v=vs.110).aspx>
添加数据集
配置后
自动生成的部分代码:
public partial class DataSet1 : global::System.Data.DataSet { private ProductsDataTable tableProducts; private CategoriesDataTable tableCategories; private global::System.Data.DataRelation relationFK_Products_Categories; } public partial class CategoriesDataTable : global::System.Data.TypedTableBase<CategoriesRow> { private global::System.Data.DataColumn columnCategoryID; private global::System.Data.DataColumn columnCategoryName; private global::System.Data.DataColumn columnDescription; private global::System.Data.DataColumn columnPicture; } public partial class CategoriesRow : global::System.Data.DataRow { private CategoriesDataTable tableCategories; internal CategoriesRow(global::System.Data.DataRowBuilder rb) : base(rb) { this.tableCategories = ((CategoriesDataTable)(this.Table)); } public int CategoryID { get { return ((int)(this[this.tableCategories.CategoryIDColumn])); } set { this[this.tableCategories.CategoryIDColumn] = value; } } }
Retrieving Database Schema Information
获得数据库的schema信息,也就是metadata。connection对象的GetSchema方法可以得到指定集合的信息,比如Tables、Views,不传参数调用可以得到支持的CollectionName列表。DataReader对象的GetSchemaTable可以获得字段的信息。
public static void TestSchema() { var settings = ConfigurationManager.ConnectionStrings["Northwind"]; using (SqlConnection conn = new SqlConnection(settings.ConnectionString)) { conn.Open(); var res = conn.GetSchema(); PrintTable(res); res = conn.GetSchema("Tables"); PrintTable(res); SqlCommand command = new SqlCommand("select * from dbo.customers", conn); var read = command.ExecuteReader(CommandBehavior.SchemaOnly); res = read.GetSchemaTable(); PrintTable(res); } } private static void PrintTable(DataTable dt) { Console.WriteLine("*** {0} *****************************************", dt.TableName); foreach (DataColumn column in dt.Columns) { Console.Write(column.ColumnName + " "); } Console.WriteLine(); foreach (DataRow dataRow in dt.Rows) { foreach (DataColumn dataColumn in dt.Columns) { Console.Write(dataRow[dataColumn] + " "); } Console.WriteLine(); } }
运行结果:
Retrieving Binary Data
使用DataReader一般一行一行读取,也就是说会把一整行数据全部加载到内存中,如果有大对象字段(BLOB,可能有几个G),我们可不希望全部加载到内存,这时候可以使用SequentialAccess模式。
// Assumes that connection is a valid SqlConnection object. SqlCommand command = new SqlCommand( "SELECT pub_id, logo FROM pub_info", connection); FileStream stream; BinaryWriter writer; int bufferSize = 100; byte[] outByte = new byte[bufferSize]; long retval; long startIndex = 0; string pubID = ""; connection.Open(); SqlDataReader reader = command.ExecuteReader(CommandBehavior.SequentialAccess); while (reader.Read()) { pubID = reader.GetString(0); stream = new FileStream( "logo" + pubID + ".bmp", FileMode.OpenOrCreate, FileAccess.Write); writer = new BinaryWriter(stream); startIndex = 0; retval = reader.GetBytes(1, startIndex, outByte, 0, bufferSize); while (retval == bufferSize) { writer.Write(outByte); writer.Flush(); startIndex += bufferSize; retval = reader.GetBytes(1, startIndex, outByte, 0, bufferSize); } writer.Write(outByte, 0, (int)retval - 1); writer.Flush(); writer.Close(); stream.Close(); } reader.Close(); connection.Close();