SQL Database Optimization| Normalization by Abhilasha

SQL Database Optimization| Normalization

Normalization

Normalization is a fundamental concept in database design aimed at organizing data in a database efficiently to reduce redundancy and improve data integrity. It involves structuring tables and their relationships according to a set of rules known as "normal forms."

We can assume it as organizing your digital files on a computer. Instead of having the same document saved in multiple folders, you keep one master copy and create shortcuts or links to it from other locations.

Why is Normalization Important?

Reduces Data Redundancy: Avoids storing the same information in multiple places, saving storage space and reducing the chance of inconsistencies.

Improves Data Integrity: Ensures that data is accurate and consistent across the database. Changes to a piece of information only need to be made in one place.

Eliminates Data Anomalies: Prevents issues that can arise during data modification,
such as:

Insertion Anomalies: Difficulty in adding new data without also adding redundant information.
Update Anomalies: Needing to update the same information in multiple rows.
Deletion Anomalies: Unintentionally losing related data when deleting a record.

Simplifies Database Design: Provides a structured approach to organizing data, making the database easier to understand and maintain.

Enhances Query Performance: Well-normalized databases often lead to more efficient querying.

Supports Scalability: Makes it easier to adapt the database to future needs.

Types of Normalization (Normal Forms):

Normalization is a progressive process, with each normal form building upon the previous one. The most common normal forms are:

First Normal Form (1NF):

  • Each column in a table contains atomic (indivisible) values.
  • There are no repeating groups of columns within a table.
  • Each row is uniquely identified by a primary key.

Original Table (Not in 1NF): Students

StudentID StudentName CoursesEnrolled
101 Alice Math, Physics
102 Bob Chemistry
103 Carol History, English, Art

Normalized Table (in 1NF): Students

StudentID StudentName
101 Alice
102 Bob
103 Carol

Normalized Table (in 1NF): StudentCourses

EnrollmentID StudentID CourseName
1 101 Math
2 101 Physics
3 102 Chemistry
4 103 History
5 103 English
6 103 Art

By breaking down the multi-valued Courses Enrolled into individual rows in a separate table, we've successfully brought the data into First Normal Form, laying the groundwork for further normalization if needed

Second Normal Form (2NF):

  • The table must already be in 1NF.
  • All non-key attributes (columns that are not part of the primary key) must be fully functionally dependent on the entire primary key. This means a non-key attribute should depend on all parts of a composite primary key, not just a part of it.

Original Table (in 1NF, but Not in 2NF): OrderDetails

This table stores details about customer orders, including the product details.

OrderID ProductID ProductName ProductPrice Quantity
1 101 Laptop 1200.00 1
1 102 Mouse 25.00 2
2 101 Laptop 1200.00 1
3 103 Keyboard 75.00 1

Normalizing to 2NF:

To bring this table into 2NF, we need to remove the partial dependencies by creating new tables for the attributes that depend only on a part of the primary key.

Normalized Tables (in 2NF):

  1. OrderItems (Contains the full functional dependency)
OrderID ProductID Quantity
1 101 1
1 102 2
2 101 1
3 103 1
  • Primary Key: (OrderID, ProductID)

  • Non-Key Attribute: Quantity (fully dependent on (OrderID, ProductID))

  1. Products (Contains the partial dependencies)
ProductID ProductName ProductPrice
101 Laptop 1200.00
102 Mouse 25.00
103 Keyboard 75.00
  • Primary Key: ProductID

  • Non-Key Attributes: ProductName, ProductPrice (both fully dependent on ProductID)

By doing this, we eliminate redundancy (e.g., "Laptop" and "1200.00" are no longer repeated for every order that includes a laptop) and improve data integrity. If the price of a laptop changes, we only need to update it in one place (the Products table) instead of potentially many rows in the OrderDetails table, thus preventing update anomalies.

Third Normal Form (3NF):

The table must already be in 2NF.
All non-key attributes are not transitively dependent on the primary key. In other words, a non-key attribute should depend directly on the primary key and not on another non-key attribute.

Transitive Dependency: A transitive dependency exists when a non-key attribute is indirectly dependent on the primary key through another non-key attribute. If we have A→B and B→C, then A→C is a transitive dependency. 3NF aims to eliminate this type of dependency.

Original Table (in 2NF, but Not in 3NF): Employees

This table stores employee information, including their department details.

EmployeeID EmployeeName DepartmentName DepartmentLocation
1 Alice HR New York
2 Bob Sales London
3 Carol HR New York
4 David Marketing Paris

Normalized Tables (in 3NF):

  1. Employees
EmployeeID EmployeeName DepartmentName
1 Alice HR
2 Bob Sales
3 Carol HR
4 David Marketing
  • Primary Key: EmployeeID

  • Non-Key Attributes: EmployeeName, DepartmentName (where DepartmentName now acts as a foreign key referencing the Departments table).

  1. Departments
DepartmentName DepartmentLocation
HR New York
Sales London
Marketing Paris
  • Primary Key: DepartmentName

  • Non-Key Attribute: DepartmentLocation (directly dependent on DepartmentName)

By doing this, we eliminate redundancy (e.g., "New York" is no longer repeated for every HR employee) and prevent update anomalies. If the location of the HR department changes, we only need to update it in one place (the Departments table) instead of potentially many rows in the Employees table. This ensures data consistency and integrity.

Fourth Normal Form (4NF)

Beyond 3NF, there are higher normal forms like Boyce-Codd Normal Form (BCNF), Fourth Normal Form (4NF), and Fifth Normal Form (5NF), which address more complex dependencies and are less commonly encountered in typical database design. Achieving 3NF or BCNF is often sufficient for most applications, providing a good balance between data integrity and performance.

To be in 4NF, a table must satisfy two conditions:

  • (BCNF is a stricter version of 3NF. If a table is in BCNF, it's also in 3NF. BCNF states that for every non-trivial functional dependency X→Y, X must be a super key. This means that if an attribute or set of attributes determines another attribute, then the determining attribute(s) must be a candidate key.)

  • It must not contain any non-trivial multi-valued dependencies. A multi-valued dependency A↠B exists if, for a given value of A, there is a fixed set of values for B, and this set of B values is independent of any other attributes in the table.

You might also like

Understanding Data Warehouses, Data Lakes, and Data Marts

By Abhilasha · Aug 5, 2025

SQL Denormalization| A Key to High-Performance OLAP Systems

By Abhilasha · Aug 1, 2025

Leveraging SQL SET Operators for Advanced Data Manipulation

By Abhilasha · Jul 29, 2025

OLTP vs OLAP||A Deep Dive into Transactional and Analytical Data Processing

By Abhilasha · Jul 24, 2025

Understanding SQL Indexes Types B Trees and Performance Benefits

By Abhilasha · Jul 18, 2025

Scenario based SQL interview question how to extract domain from an email

By Abhilasha · Jul 15, 2025

Understanding Views and Materialized Views in SQL

By Abhilasha · Jul 10, 2025

Databases and SQL Organizing and Accessing Data in the Digital Age

By Abhilasha · Jul 1, 2025

Mastering the Basics | 10 Essential Unix Commands for Beginners

By Abhilasha · Jul 1, 2025

Mastering SQL windows function |RANK, DENSE_RANK, ROW_NUMBER

By Abhilasha · Jul 1, 2025

Practical use of Database In Business and Industries

By Abhilasha · Jun 15, 2025