数据库的规范化

Database Normalization is a technique of organizing the data in the database. Normalization is a systematic(系统的) approach of decomposing(分解) tables to eliminate data redundancy(repetition) and undesirable(不可取的，不受欢迎的) characteristics like Insertion, Update and Deletion Anamolies(原文写错了，应该是Anomalies，异常，异常情况). It is a multi-step process that puts data into tabular(扁平的，表格式的) form, removing duplicated data from the relation tables.

Normalization is used for mainly two purposes,

Eliminating reduntant(useless) data.
Ensuring data dependencies make sense i.e data is logically stored.

Problems Without Normalization

If a table is not properly normalized and have data redundancy then it will not only eat up extra memory space but will also make it difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anamolies are very frequent if database is not normalized. To understand these anomalies let us take an example of a Student table.

In the table above, we have data of 4 Computer Sci. students. As we can see, data for the fileds branch, hod(Head of Department) and office_tel is repeated for the students who are in the same branch in the college, this is Data Redundancy(数据冗余).

Insertion Anomaly

Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot be inserted, or else we will have to set the branch information as NULL.

Also, if we have to insert data of 100 students of same branch, then the branch information will be repeated for all those 100 students.

These scenarios are nothing but Insertion anomalies.

Updation Anomaly

What if Mr. X leaves the college? or is no longer the HOD of computer science department? In that case all the student records will have to be updated, and if by mistake we miss any record, it will lead to data inconsistency. This is Updation anomaly.

Deletion Anomaly

In our Student table, two different informations are kept together, Student information and Branch information. Hence, at the end of the academic year, if student records are deleted, we will also lose the branch information. This is Deletion anomaly.

Normalization Rule

Normalization rules are divided into the following normal forms:

First Normal Form
Second Normal Form
Third Normal Form
BCNF

First Normal Form (1NF)

For a table to be in the First Normal Form, it should follow the following 4 rules:

It should only have single(atomic) valued attributes/columns.
Values stored in a column should be of the same domain
All the columns in a table should have unique names.
And the order in which data is stored, does not matter.

In the next tutorial, we will discuss about the First Normal Form in details.

Second Normal Form (2NF)

For a table to be in the Second Normal Form,

It should be in the First Normal form.
And, it should not have Partial Dependency.

To understand what is Partial Dependency and how to normalize a table to 2nd normal for, jump to the Second Normal Form tutorial.

Third Normal Form (3NF)

A table is said to be in the Third Normal Form when,

It is in the Second Normal form.
And, it doesn't have Transitive Dependency(依赖传递).

Here is the Third Normal Form tutorial. But we suggest you to first study about the second normal form and then head over to the third normal form.

Boyce and Codd Normal Form (BCNF)

Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with certain type of anomaly that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF. For a table to be in BCNF, following conditions must be satisfied:

R must be in 3rd Normal Form
and, for each functional dependency ( X → Y ), X should be a super Key.

最后一个范式看着有点蛋疼呀。

数据库设计（四）数据库的规范化(Normalization)