分布式数据库

zoukankan html css js c++ java

分布式数据库
1. 分布式数据库领域CAP理论

Consistency(一致性), 数据一致更新，所有数据变动都是同步的
Availability(可用性), 好的响应性能
Partition tolerance(分区容错性) 可靠性，A single piece of data is stored in 3 nodes, 1 node failed, the other 2 nodes can still work. This is implemented via Replication or Duplication. 也就是没有单点失败

定理：任何分布式系统只可同时满足二点，没法三者兼顾。
忠告：架构师不要将精力浪费在如何设计能满足三者的完美分布式系统，而是应该进行取舍。

2. 为什么Partition Tolerance is mandatory？

串行系统 VS 并行系统（Partition Tolerance）的可用性对比。
对于应用服务器，并行意味着多台相同的应用服务器cluster，通常在cluster前端配置有load balance，这个cluster在eBay中叫pool
对于数据库服务器，并行意味着热备份的多台数据库服务器（Replication)，一般至少有两台（master，failover server）
一个大系统一般都有超过 30 个环节（串行）：如果每个环节都做到 99% 的准确率，最终系统的准确率是 74%; 如果每个环节都做到98%的准确率，最终系统的准确率 54%。

如果是并行系统，准确率如下面formula:

P(any failure) = 1 – P(individual node not failing)^{number of nodes}

如系统中每个模块的准确率是70%，那么3个模块并行，整体准确率=1-0.3^3=97.3%,如果是4个并行，准确率=1-0.3^4=99.19%,我在想这就是负载均衡靠谱的数学原理

5个9或6个9的QoS一定是指数思维的结果，线性思维等于送死
Reference: http://blog.sina.com.cn/s/blog_5459f60d01016ntb.html

3. 为什么在PT是必须的前提下，Consistency and Availability 二者只能选其一？
You cannot, however, choose both consistency and availability in a distributed system.

As a thought experiment, imagine a distributed system which keeps track of a single piece of data using three nodes—A, B, and C—and which claims to be both consistent and available in the face of network partitions. Misfortune strikes, and that system is partitioned into two components: {A,B} and {C}. In this state, a write request arrives at node C to update the single piece of data.

That node only has two options:

Accept the write, knowing that neither A nor B will know about this new data until the partition heals.
Refuse the write, knowing that the client might not be able to contact A or B until the partition heals.

You either choose availability (Door #1) or you choose consistency (Door #2). You cannot choose both.
Refrence: http://codahale.com/you-cant-sacrifice-partition-tolerance/
4.分布式数据库的优缺点

优点：
- 提高系统的可靠性、可用性当某一场地出现故障时，系统可以对另一场地上的相同副本进行操作，不会因一处故障而造成整个系统的瘫痪。
- 提高系统性能系统可以根据距离选择离用户最近的数据副本进行操作，减少通信代价，改善整个系统的性能。
- 易于扩展，如果服务器软件支持透明的水平扩展，那么就可以增加多个服务器来进一步分布数据和分担处理任务。（关于水平扩展可以参考http://xuezhongfeicn.blog.163.com/blog/static/22460141201201153456711/， eBay的数据库存储就是水平扩展的）
缺点：
事务管理的性能比在集中式数据库花费更高，很难保证高度一致性

系统开销大，主要花在通信部分。

复杂的存取结构，原来在集中式系统中有效存取数据的技术
版权声明：本文为博主原创文章，未经博主允许不得转载。
查看全文

相关阅读:
CSS書寫規範及CSS Hack
C#中为什么不能再方法里定义静态变量(Static)
本机操作Excel文件提示错误：未在本地计算机上注册“Microsoft.Jet.OLEDB.4.0”提供程序。
C#中静态变量和静态方法的作用
 C#静态构造函数和非静态构造函数
 C# 判断字符串为空的4种方法及效率
 ASP.NET反射
 C#排序1（冒泡排序、直接排序、快速排序）
javascript、jquery 、C#、sqlserveer、mysql、oracle中字符串截取的区别和用法
 MySQL数据库不识别server=.而是识别localhost

原文地址：https://www.cnblogs.com/significantfrank/p/4875845.html