Normalisation in detail with Types and example
Normalization is a database design technique used to organize data in a relational database to reduce data redundancy and improve data integrity. It involves breaking down large tables into smaller, related tables and establishing relationships between them. There are several normal forms (NF) to guide this process, with the most common ones being the First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF). Here's a detailed explanation of each:
-
First Normal Form (1NF):
- In 1NF, each column in a table must contain atomic (indivisible) values, and each value must be of the same data type.
- It eliminates repeating groups and ensures that each cell in a table contains a single, discrete piece of data.
Example: Consider a table for storing information about students and their enrolled courses:
StudentID StudentName Courses 1 Alice Math, Physics 2 Bob Chemistry, Biology To bring it to 1NF, you would split the Courses column into a separate table:
Students Table:
StudentID StudentName 1 Alice 2 Bob Courses Table:
StudentID Course 1 Math 1 Physics 2 Chemistry 2 Biology -
Second Normal Form (2NF):
- A table is in 2NF if it is in 1NF and all non-key attributes (attributes not part of the primary key) are fully functionally dependent on the entire primary key.
Example: Continuing from the previous example, let's say the primary key for the Students Table is (StudentID) and the Courses Table has a composite primary key (StudentID, Course). This is already in 2NF because there are no partial dependencies.
-
Third Normal Form (3NF):
- A table is in 3NF if it is in 2NF and there are no transitive dependencies. In other words, non-key attributes should not depend on other non-key attributes.
Example: Extending the previous example, suppose we add an attribute for the Professor of each course:
Courses Table:
StudentID Course Professor 1 Math Dr. Smith 1 Physics Dr. Johnson 2 Chemistry Dr. Brown 2 Biology Dr. Davis In this case, Professor depends on Course, but Course depends on StudentID. To achieve 3NF, you would split the table again:
Courses Table:
Course Professor Math Dr. Smith Physics Dr. Johnson Chemistry Dr. Brown Biology Dr. Davis Professors Table:
Course StudentID Math 1 Physics 1 Chemistry 2 Biology 2
By normalizing the database to 3NF, you reduce data redundancy and maintain data integrity, ensuring that updates, inserts, and deletes do not result in inconsistencies. However, it's important to note that normalization should be done judiciously, as excessive normalization can lead to complex queries and decreased performance. The level of normalization depends on the specific requirements of your application.