Why one grouping of attributes into a relation schema may be better than another
two levels we can discuss the goodness of relation schemas.
logical(conceptual) level - how users interpret the relation schemas and the meaning of their attributes.
implementation (physical storage) level - how the tuples in a base relation are stored and updated.
Top-down design approach and as such is more appropriate when performing design of database by analysis and decomposition of sets of attributes
Relational database design ultimately produces a set of relations. The implicit goals of the design activity are information preservation and minimum redundancy.
Minimizing redundancy = minimize redundant storage of same information + reduce the need for multiple updates to maintain consistency across multiple copies of the same information.
functional dependency: a formal constraint among attributes that is the main tool for formally measuring the appropriateness of attribute groupings into relation schemas.
normalization 与 functional dependency的关系:
the process of normalization using functional dependencies.
1NF: 列的原子性,即列不能够再分成其他几列 联系电话包括: 家庭电话 和 公司电话, 需要拆分为两列在表里进行储存
2NF: 完全依赖主键 OrderID 和 ProductID 也就是说, 2NF在1NF的基础之上,消除了非主属性对于码的部分函数依赖。
3NF: 任何非主属性不依赖于其它非主属性。第三范式(3NF)是第二范式(2NF)的一个子集,即满足第三范式(3NF)必须满足第二范式(2NF)。
Update Anomaly(更新异常): 修改一个列会 影响到其它的列 主要包括 insertion anomalies, deletion anomalies, and modification anomalies
EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
Update Anomaly: change the name of project number P1 from "Billing" to "Customer-Accounting" may cause this update to be made for all 100 employees working on project P1
Insert Anomaly(插入异常): Can not insert a project unless an employee is assigned to it. 混在一起, 必须要处理不同的信息
Delete Anomaly(删除异常): When a project is deleted, it will result in deleting all the employees who work on the project 会有多个行存一条Project信息
改变Entity 的状态: 不应该受到影响
Modification Anomaly(修改异常):
Guidelines:
非异常的情况: Design a schema that does not suffer from the insertion, deletion and update anomalies
Relations should be designed such that their tuples will have as few NULL values as possible
The relations should be designed to satisfy the losslee join condition:
Candidate Key: 表中的一个属性或属性组,若除K之外的所有属性都完全依赖于K
- 如何判断一个Relation 是否符合2NF?
- 数据表中所有的candidate key
- 根据candidate key, 找出主属性
- 找到所有的非主属性
- 查看是否存在非主属性对码的部分函数依赖
完全函数依赖: 在一张表中,如果有X -> Y, 那么对于X的任何一个真子集, X' -> Y都不成立, 则Y对于X完全函数依赖。 也就是说X当中少了任何一个attribute都不能够唯一确定Y
部分函数依赖: 是完全函数依赖的取反, X当中的部分的属性就可以确定Y
传递函数依赖: Z函数依赖于Y, 且Y函数依赖于X,那么我们就称Z传递函数依赖于X
NULL值用在什么地方?
- not applicable or invalid
- Attribute value unknown
- Value known to exist, but unavailable
Closure of Attributes
Given a relation, FD, a set of attributes A, find all B such that A -> B