索引>查询>处理文档关联
处理文档关联
RavenDB坚持的一个设计原则就是文档是独立的,这就是说处理一个文档所需的所有数据都存储在文档本身之中。然而,这不是说对象之间不应存在联系。
在许多场景下我们都需要定义对象之间的关系。一般来说,我们主要面临一个主要问题:什么时候加载被包含的实体,我们打算由关联实体加载数据(除非我们对这些数据不感兴趣)。将实体关联的所有对象都存储在一个文档中的方式看起来成本很低,但其实这样做在数据库资源和网络流量方面成本很高。
RavenDB提供三种优雅的方式来解决这个问题。不同的场景会用到其中的一种或多种方式。如果使用恰当,可以答复提高性能、降低网络带宽占用并加速开发。
1. 反规范化
最简单的解决方式就是打破数据规范化,在文档中除了存储外键,也将关联实体的实际数据存储在文档中(或直接代替外键)。
以下面的JSON文档为例:
// Order document with id: orders/1234 { "Customer": { "Name": "Itamar", "Id": "customers/2345" }, Items: [ { "Product": { "Id": "products/1234", "Name": "Milk", "Cost": 2.3 }, "Quantity": 3 } ] }
正如你所见,Order文档包含来自Customer和Product文档的部分数据,后者在其他地方存储了完整版本。注意我们没有将customer的所有属性拷贝到Order中;相反我们只是拷贝了一部分在展示或处理订单时所关心的数据。这种方式称作反规范化引用(denormalized reference)。
这种反规范化的方式避免了跨文档查询,仅让必要的数据被传输,但这种方式让其他一些场景的处理变困难。如,考虑下面这些实体结构:
public class Order { public string CustomerId { get; set; } public Guid[] SupplierIds { get; set; } public Referral Refferal { get; set; } public LineItem[] LineItems { get; set; } public double TotalPrice { get; set; } }
public class Customer { public string Id { get; set; } public string Name { get; set; } }
当我们由数据库加载一个Order时,我们需要知道客户的姓名和地址,我们决定创建一个反规范化的字段Order.Customer将客户的这些具体信息直接存储在Order对象中。显然,密码及其他不相关的信息不会包含在其中:
public class DenormalizedCustomer { public string Id { get; set; } public string Name { get; set; } public string Address { get; set; } }
Order和Customer之间没有直接关联。相反,Order包含一个DenormalizedCustomer,后者中包含我们在处理Order对象时所需的来自Customer的信息。
但当用户的地址改变时会发生什么呢?我们不得不执行一个级联操作来更新这个客户下的所有订单。但如果这个客户有很多订单并频繁的更新地址呢?保持这些信息的同步会给服务器带来很大的负担。如果另一个订单处理操作需要一套不同的客户属性呢?DenormalizedCustomer需要被扩充,直至客户记录的大部分字段被拷贝。
提示
反规范化对于极少变化的数据或当所关联的底层数据改变时需要保持不变的数据是一种有效的解决方案。
2. 包含
包含特性的目标正式反规范化的缺陷。包含方式中一个对象只需要持有另一个对象的引用而不是包含来自另一个对象属性的拷贝。可以在检索根对象时预加载相关联的对象。下面的代码展示了如何进行这个操作:
Session:
Order order = session .Include<Order>(x => x.CustomerId) .Load("orders/1234"); // this will not require querying the server! Customer customer = session.Load<Customer>(order.CustomerId);
Commands:
MultiLoadResult result = store .DatabaseCommands .Get(ids: new[] { "orders/1234" }, includes: new[] { "CustomerId" }); RavenJObject order = result.Results[0]; RavenJObject customer = result.Includes[0];
上面代码中,我们检索key为"orders/1234"的Order,同时"包含"Order.CustomerId属性关联的Customer。 第二行代码中的Load()完全在客户端执行(换句话说,不会向RavenDB服务器发送第二次请求),因为相关的Customer对象已经被获取到(这是完整的Customer对象而不是像之前那种反规范化的版本)。
同时加载多个文档也是可行的。
Session:
Order[] orders = session .Include<Order>(x => x.CustomerId) .Load("orders/1234", "orders/4321"); foreach (Order order in orders) { // this will not require querying the server! Customer customer = session.Load<Customer>(order.CustomerId); }
Commands:
MultiLoadResult result = store .DatabaseCommands .Get(ids: new[] { "orders/1234", "orders/4321" }, includes: new[] { "CustomerId" }); List<RavenJObject> orders = result.Results; List<RavenJObject> customers = result.Includes;
Include也可以和Query一起使用:
Query:
IList<Order> orders = session .Query<Order, Orders_ByTotalPrice>() .Customize(x => x.Include<Order>(o => o.CustomerId)) .Where(x => x.TotalPrice > 100) .ToList(); foreach (Order order in orders) { // this will not require querying the server! Customer customer = session.Load<Customer>(order.CustomerId); }
DocumentQuery:
IList<Order> orders = session .Advanced .DocumentQuery<Order, Orders_ByTotalPrice>() .Include(x => x.CustomerId) .WhereGreaterThan(x => x.TotalPrice, 100) .ToList(); foreach (Order order in orders) { // this will not require querying the server! Customer customer = session.Load<Customer>(order.CustomerId); }
Commands:
QueryResult result = store .DatabaseCommands .Query( "Orders/ByTotalPrice", new IndexQuery { Query = "TotalPrice_Range:{Ix100 TO NULL}" }, includes: new[] { "CustomerId" }); List<RavenJObject> orders = result.Results; List<RavenJObject> customers = result.Includes;
Index:
public class Orders_ByTotalPrice : AbstractIndexCreationTask<Order> { public Orders_ByTotalPrice() { Map = orders => from order in orders select new { order.TotalPrice }; } }
RavenDB处理上面请求的底层工作方式是这样的,RavenDB通过2个通道来返回一个加载请求的结果。第一个通道是返回Load()方法检索的根对象的Results通道。第二个是返回其他包含文档的Includes通道。在客户端这一侧,包含文档并不通过Load()方法调用返回,而是被添加到session UoW(Unit of Work)中,后续对这些文档的加载请求将由session直接提供结果,而不需要额外的服务器查询。
一对多包含
包含可以用于一对多关联。在上面的类中,Order有一个Suppliers属性,其包含一系列对Supplier文档的引用。下面的代码会让这些供应商对象被预查询。
Session:
Order order = session .Include<Order>(x => x.SupplierIds) .Load("orders/1234"); foreach (Guid supplierId in order.SupplierIds) { // this will not require querying the server! Supplier supplier = session.Load<Supplier>(supplierId); }
Commands:
MultiLoadResult result = store .DatabaseCommands .Get(ids: new[] { "orders/1234" }, includes: new[] { "SupplierIds" }); RavenJObject order = result.Results[0]; RavenJObject customer = result.Includes[0];
同样,foreach循环中的Load()调用不用请求服务器,因为Supplier对象已经被加载到session缓存中。
多个加载同样可行:
Session:
Order[] orders = session .Include<Order>(x => x.SupplierIds) .Load("orders/1234", "orders/4321"); foreach (Order order in orders) { foreach (Guid supplierId in order.SupplierIds) { // this will not require querying the server! Supplier supplier = session.Load<Supplier>(supplierId); } }
Commands:
MultiLoadResult result = store .DatabaseCommands .Get(ids: new[] { "orders/1234", "orders/4321" }, includes: new[] { "SupplierIds" }); List<RavenJObject> orders = result.Results; List<RavenJObject> customers = result.Includes;
二级属性包含
包含不仅可用于加载文档中第一级的属性的值。还可以加载第二层属性的值。在上面的类中,Order包含一个Referral属性,类型如下:
public class Referral { public string CustomerId { get; set; } public double CommissionPercentage { get; set; } }
这个类包含一个Customer的id。下面的代码将会包含二级属性关联的文档:
Session:
Order order = session .Include<Order>(x => x.Refferal.CustomerId) .Load("orders/1234"); // this will not require querying the server! Customer customer = session.Load<Customer>(order.Refferal.CustomerId);
Commands:
MultiLoadResult result = store .DatabaseCommands .Get(ids: new[] { "orders/1234" }, includes: new[] { "Refferal.CustomerId" }); RavenJObject order = result.Results[0]; RavenJObject customer = result.Includes[0];
另一种方式是提供基于字符串的属性路径:
Session:
Order order = session.Include("Refferal.CustomerId") .Load<Order>("orders/1234"); // this will not require querying the server! Customer customer = session.Load<Customer>(order.Refferal.CustomerId);
Commands:
MultiLoadResult result = store .DatabaseCommands .Get(ids: new[] { "orders/1234" }, includes: new[] { "Refferal.CustomerId" }); RavenJObject order = result.Results[0]; RavenJObject customer = result.Includes[0];
二级属性包含对于集合属性中的二级属性同样适用。Order.LineItems属性是一个LineItem对象的集合,每个LineItem含有一个Product的引用:
public class LineItem { public Guid ProductId { get; set; } public string Name { get; set; } public int Quantity { get; set; } }
Product文档可以下面这样的语法来包含:
Session:
Order order = session .Include<Order>(x => x.LineItems.Select(li => li.ProductId)) .Load("orders/1234"); foreach (LineItem lineItem in order.LineItems) { // this will not require querying the server! Product product = session.Load<Product>(lineItem.ProductId); }
Commands:
MultiLoadResult result = store .DatabaseCommands .Get(ids: new[] { "orders/1234" }, includes: new[] { "LineItems.,ProductId" }); RavenJObject order = result.Results[0]; RavenJObject product = result.Includes[0];
当你想要加载多个文档。
Include方法中的Select()告诉RavenDB哪一个二级属性被用于关联。
约定
当像下面这样使用基于字符串的属性:
Session:
Order order = session .Include<Order>(x => x.Refferal.CustomerId) .Load("orders/1234"); // this will not require querying the server! Customer customer = session.Load<Customer>(order.Refferal.CustomerId);Commands:
MultiLoadResult result = store .DatabaseCommands .Get(ids: new[] { "orders/1234" }, includes: new[] { "Refferal.CustomerId" }); RavenJObject order = result.Results[0]; RavenJObject customer = result.Includes[0];你必须记住提供的字符串路径要符合如下特定的规则:
点用于分隔属性。如:上面例子中的"Referral.CustomerId"表示Order包含Referral属性,Referral属性包含另一个名为CustomerId的属性。
逗号用于指示属性是一个集合类型,如List。如果我们的Order有一个LineItems的列表,其中每个LineItem有一个ProductId属性,我们可以创建一个如下的字符串路径:"LineItems.,ProductId"。
前缀用于标识非字符串的文档标识符的id前缀。例如,如果CustomerId属性是一个整形,我们需要在其路径字符串"Referral.CustomerId"中添加customers/前缀,所以最终的字符串路径为"Referral.CustomerId(customers/)",集合的例子中如果ProductId是非字符串类型,路径应该是"LineItems.,ProductId(products/)"。
注意
对于字符串类型的标识属性,前缀是不需要的,因为他们已默认包含。
当使用HTTP API查询数据库时,学习字符串路径规则将很有帮助。
curl -X GET "http://localhost:8080/databases/Northwind/queries/?include=LineItems.,ProductId(products/)&id=orders/1"
值类型标识符
上面的Include的示例假定用于解析引用的Id属性是一个字符串且其包含用于关联文档的完整标识(如CustomerId属性的值为"customers/5678")。包含也可以用于值类型的标识符。使用下面给出的实体:
public class Order2 { public int CustomerId { get; set; } public Guid[] SupplierIds { get; set; } public Referral Refferal { get; set; } public LineItem[] LineItems { get; set; } public double TotalPrice { get; set; } }
public class Customer2 { public int Id { get; set; } public string Name { get; set; } }
public class Referral2 { public int CustomerId { get; set; } public double CommissionPercentage { get; set; } }
上面的例子可以重写为:
Session:
Order2 order = session .Include<Order2, Customer2>(x => x.CustomerId) .Load("order2s/1234"); // this will not require querying the server! Customer2 customer = session.Load<Customer2>(order.CustomerId);
Commands:
MultiLoadResult result = store .DatabaseCommands .Get(ids: new[] { "order2s/1234" }, includes: new[] { "CustomerId" }); RavenJObject order = result.Results[0]; RavenJObject customer = result.Includes[0];
Query:
IList<Order2> orders = session .Query<Order2, Order2s_ByTotalPrice>() .Customize(x => x.Include<Order2, Customer2>(o => o.CustomerId)) .Where(x => x.TotalPrice > 100) .ToList(); foreach (Order2 order in orders) { // this will not require querying the server! Customer2 customer = session.Load<Customer2>(order.CustomerId); }
DocumentQuery:
IList<Order2> orders = session .Advanced .DocumentQuery<Order2, Order2s_ByTotalPrice>() .Include("CustomerId") .WhereGreaterThan(x => x.TotalPrice, 100) .ToList(); foreach (Order2 order in orders) { // this will not require querying the server! Customer2 customer = session.Load<Customer2>(order.CustomerId); }
Commands:
QueryResult result = store .DatabaseCommands .Query( "Order2s/ByTotalPrice", new IndexQuery { Query = "TotalPrice_Range:{Ix100 TO NULL}" }, includes: new[] { "CustomerId" }); List<RavenJObject> orders = result.Results; List<RavenJObject> customers = result.Includes;
Index:
public class Order2s_ByTotalPrice : AbstractIndexCreationTask<Order2> { public Order2s_ByTotalPrice() { Map = orders => from order in orders select new { order.TotalPrice }; } }
Session:
Order2 order = session .Include<Order2, Supplier>(x => x.SupplierIds) .Load("order2s/1234"); foreach (Guid supplierId in order.SupplierIds) { // this will not require querying the server! Supplier supplier = session.Load<Supplier>(supplierId); }
Commands:
MultiLoadResult result = store .DatabaseCommands .Get(ids: new[] { "order2s/1234" }, includes: new[] { "SupplierIds" }); RavenJObject order = result.Results[0]; List<RavenJObject> suppliers = result.Includes;
Session:
Order2 order = session .Include<Order2, Customer2>(x => x.Refferal.CustomerId) .Load("order2s/1234"); // this will not require querying the server! Customer2 customer = session.Load<Customer2>(order.Refferal.CustomerId);
Commands:
MultiLoadResult result = store .DatabaseCommands .Get(ids: new[] { "order2s/1234" }, includes: new[] { "Refferal.CustomerId" }); RavenJObject order = result.Results[0]; RavenJObject customer = result.Includes[0];
Session:
Order2 order = session .Include<Order2, Product>(x => x.LineItems.Select(li => li.ProductId)) .Load("orders/1234"); foreach (LineItem lineItem in order.LineItems) { // this will not require querying the server! Product product = session.Load<Product>(lineItem.ProductId); }
Commands:
MultiLoadResult result = store .DatabaseCommands .Get(ids: new[] { "order2s/1234" }, includes: new[] { "LineItems.,ProductId" }); RavenJObject order = result.Results[0]; List<RavenJObject> products = result.Includes;
Include<T, TInclude>中的第二个泛型参数指示关联属性指向的文档的类型。RavenDB会将关联文档的类型名称与关联属性的值合在一起的得到关联文档的完整标识符。例如,在第一个例子中,如果Order.CustomerId属性的值为56,客户端将使用customer2s/56作为键由数据库加载关联文档。Session.Load<Customer2>()
方法接收的参数为56,然后使用customer2s/56作为键由session缓存中查找并加载文档。
字典包含
在进行关联包含操作时,也可以使用字典的键和值。看下面这个例子:
public class Person { public string Id { get; set; } public string Name { get; set; } public Dictionary<string, string> Attributes { get; set; } }
session.Store( new Person { Id = "people/1", Name = "John Doe", Attributes = new Dictionary<string, string> { { "Mother", "people/2" }, { "Father", "people/3" } } }); session.Store( new Person { Id = "people/2", Name = "Helen Doe", Attributes = new Dictionary<string, string>() }); session.Store( new Person { Id = "people/3", Name = "George Doe", Attributes = new Dictionary<string, string>() });
我们可以包含字典值中关联的所有文档:
Session:
var person = session .Include<Person>(x => x.Attributes.Values) .Load("people/1"); var mother = session.Load<Person>(person.Attributes["Mother"]); var father = session.Load<Person>(person.Attributes["Father"]); Assert.Equal(1, session.Advanced.NumberOfRequests);
Commands:
var result = store .DatabaseCommands .Get(new[] { "people/1" }, new[] { "Attributes.$Values" }); var include1 = result.Includes[0]; var include2 = result.Includes[1];
当然也可以包含字典键关联的文档:
Session:
var person = session .Include<Person>(x => x.Attributes.Keys) .Load("people/1");
Commands:
var result = store .DatabaseCommands .Get(new[] { "people/1" }, new[] { "Attributes.$Keys" });
复合类型
如果字典中的值是复合类型,如:
public class PersonWithAttribute { public string Id { get; set; } public string Name { get; set; } public Dictionary<string, Attribute> Attributes { get; set; } } public class Attribute { public string Ref { get; set; } }
session.Store( new PersonWithAttribute { Id = "people/1", Name = "John Doe", Attributes = new Dictionary<string, Attribute> { { "Mother", new Attribute { Ref = "people/2" } }, { "Father", new Attribute { Ref = "people/3" } } } });session.Store( new Person { Id = "people/2", Name = "Helen Doe", Attributes = new Dictionary<string, string>() });session.Store( new Person { Id = "people/3", Name = "George Doe", Attributes = new Dictionary<string, string>() });
可以在指定的属性上进行包含操作:
Session:
var person = session .Include<PersonWithAttribute>(x => x.Attributes.Values.Select(v => v.Ref)) .Load("people/1"); var mother = session.Load<Person>(person.Attributes["Mother"].Ref); var father = session.Load<Person>(person.Attributes["Father"].Ref); Assert.Equal(1, session.Advanced.NumberOfRequests);
Commands:
var result = store .DatabaseCommands .Get(new[] { "people/1" }, new[] { "Attributes.$Values,Ref" }); var include1 = result.Includes[0]; var include2 = result.Includes[1];
3. 两种方式组合
可以将之前介绍的两种技术结合。下面新的订单类中使用了前文中的DenormalizedCustomer类:
public class Order3 { public DenormalizedCustomer Customer { get; set; } public string[] SupplierIds { get; set; } public Referral Refferal { get; set; } public LineItem[] LineItems { get; set; } public double TotalPrice { get; set; } }
我们得到反规范化的好处,一个可以被快速加载轻量级的Order及用于订单处理的相对固定的Customer详情。但是在必要时我们也可以简单高效的加载完整的Customer对象:
Session:
Order3 order = session .Include<Order3, Customer>(x => x.Customer.Id) .Load("orders/1234"); // this will not require querying the server! Customer customer = session.Load<Customer>(order.Customer.Id);
Commands:
MultiLoadResult result = store .DatabaseCommands .Get(ids: new[] { "orders/1234" }, includes: new[] { "Customer.Id" }); RavenJObject order = result.Results[0]; RavenJObject customer = result.Includes[0];
反规范化与包含的结合同样可用于反规范化对象的列表。
包含也可以用于动态投影(Live Projection)的查询。包含在TransformResults执行完成后才执行。这给实现三次包含(Tertiary Includes)(如加载一个根文档关联的文档下关联的文档)提供了可能。
虽然RavenDB支持三次包含,但当你使用这个功能之前你应该重新评估你的文档模型。需要使用三次包含意味着你的文档设计走了“关系型”设计路线。
总结
什么时候选择何种方式没有一个严格的准则,但一般来说要给与充分考虑,评估每种方式带来的影响。
例如,一个电子商务应用中最好把商品名称和价格作为反规范化的数据放在一个订单明细对象中,因为你应该想让客户在订单历史中看到与下单时相同的价格和商品名称。但客户名称和地址最好使用引用而不是反规范化存于订单实体中。
在大部分反序列化不适用的场景,包含往往是可用的解决方案。