Need a discount on popular programming courses? Find them here. View offers

{{ activeMenu.name }} courses & tutorials
- Android Development
- Data Structures and Algorithms
Recent Articles

- Artificial Intelligence
- Machine Learning
- Data Science
- Apache Spark
- Deep Learning
- Microsoft Power BI

- Adobe After Effects
- Game Design
- Design Thinking
- User Interface Design
- Illustrator
- Adobe Experience Design
- User Experience Design
![write a case study with an example to demonstrate the different types of normalization 10 Best Design Books for Design Students [Updated]](https://cdn.hackr.io/uploads/posts/small/1642872008wCnbdSvF6Q.webp)
- Linux System Administration
- Computer Networks
- System Architecture
- Google Cloud Platform
- Microsoft Azure

- Programming
Don't have an account? Sign up
Forgot your password?
Already have an account? Login
Have you read our submission guidelines?
Go back to Sign In
Normalization in DBMS: 1NF, 2NF, 3NF, and BCNF [Examples]

When developing the schema of a relational database, one of the most important aspects to be taken into account is to ensure that the duplication of data is minimized. We do this by carrying out database normalization, an important part of the database schema design process.
Here, we explain normalization in DBMS, explaining 1NF, 2NF, 3NF, and BCNF with explanations. First, let’s take a look at what normalization is and why it is important.
- What is Normalization in DBMS?
Database normalization is a technique that helps design the schema of the database in an optimal way. The core idea of database normalization is to divide the tables into smaller subtables and store pointers to data rather than replicating it.
Why Do We Carry out Database Normalization?

There are two primary reasons why database normalization is used. First, it helps reduce the amount of storage needed to store the data. Second, it prevents data conflicts that may creep in because of the existence of multiple copies of the same data.
If a database isn’t normalized, then it can result in less efficient and generally slower systems, and potentially even inaccurate data. It may also lead to excessive disk I/O usage and bad performance.
- What is a Key?
You should also be aware of what a key is. A key is an attribute that helps identify a row in a table. There are seven different types, which you’ll see used in the explanation of the various normalizations:
- Candidate Key
- Primary Key
- Foreign Key
- Alternate Key
- Composite Key
- Database Normalization Example
To understand (DBMS)normalization with example tables, let's assume that we are storing the details of courses and instructors in a university. Here is what a sample database could look like:
Here, the data basically stores the course code, course venue, instructor name, and instructor’s phone number. At first, this design seems to be good. However, issues start to develop once we need to modify information. For instance, suppose, if Prof. George changed his mobile number. In such a situation, we will have to make edits in 2 places.
What if someone just edited the mobile number against CS101, but forgot to edit it for CS154? This will lead to stale/wrong information in the database. This problem can be easily tackled by dividing our table into 2 simpler tables:
Table 1 (Instructor):
- Instructor ID
- Instructor Name
- Instructor mobile number
Table 2 (Course):
- Course code
- Course venue
Now, our data will look like the following:
Basically, we store the instructors separately and in the course table, we do not store the entire data of the instructor. Rather, we store the ID of the instructor. Now, if someone wants to know the mobile number of the instructor, they can simply look up the instructor table. Also, if we were to change the mobile number of Prof. George, it can be done in exactly one place. This avoids the stale/wrong data problem.
Further, if you observe, the mobile number now need not be stored 2 times. We have stored it in just 1 place. This also saves storage. This may not be obvious in the above simple example. However, think about the case when there are hundreds of courses and instructors and for each instructor, we have to store not just the mobile number, but also other details like office address, email address, specialization, availability, etc. In such a situation, replicating so much data will increase the storage requirement unnecessarily.
Suggested Course
Database Management System (DBMS) & SQL : Complete Pack 2023
- Types of DBMS Normalization
There are various normal forms in DBMS. Each normal form has an importance that helps optimize the database to save storage and reduce redundancies. We explain normalization in DBMS with examples below.
First Normal Form (1NF)
The first normal form simply says that each cell of a table should contain exactly one value. Assume we are storing the courses that a particular instructor takes, we can store it like this:
Here, the issue is that in the first row, we are storing 2 courses against Prof. George. This isn’t the optimal way since that’s now how SQL databases are designed to be used. A better method would be to store the courses separately. For instance:
This way, if we want to edit some information related to CS101, we do not have to touch the data corresponding to CS154. Also, observe that each row stores unique information. There is no repetition. This is the First Normal Form.
Data redundancy is higher in 1NF because there are multiple columns with the same in multiple rows. 1NF is not so focused on eliminating redundancy as much as it is focused on eliminating repeating groups.
Second Normal Form (2NF)
For a table to be in second normal form, the following 2 conditions must be met:
- The table should be in the first normal form.
- The primary key of the table should have exactly 1 column.
The first point is obviously straightforward since we just studied 1NF. Let us understand the second point: a 1-column primary key. A primary key is a set of columns that uniquely identifies a row. Here, no 2 rows have the same primary keys.
In this table, the course code is unique so that becomes our primary key. Let us take another example of storing student enrollment in various courses. Each student may enroll in multiple courses. Similarly, each course may have multiple enrollments. A sample table may look like this (student name and course code):
Here, the first column is the student name and the second column is the course taken by the student.
Clearly, the student name column isn’t unique as we can see that there are 2 entries corresponding to the name ‘Rahul’ in row 1 and row 3. Similarly, the course code column is not unique as we can see that there are 2 entries corresponding to course code CS101 in row 2 and row 4.
However, the tuple (student name, course code) is unique since a student cannot enroll in the same course more than once. So, these 2 columns when combined form the primary key for the database.
As per the second normal form definition, our enrollment table above isn’t in the second normal form. To achieve the same (1NF to 2NF), we can rather break it into 2 tables:
Here the second column is unique and it indicates the enrollment number for the student. Clearly, the enrollment number is unique. Now, we can attach each of these enrollment numbers with course codes.
These 2 tables together provide us with the exact same information as our original table.
Third Normal Form (3NF)
Before we delve into the details of third normal form, let us understand the concept of a functional dependency on a table.
Column A is said to be functionally dependent on column B if changing the value of A may require a change in the value of B. As an example, consider the following table:
Here, the department column is dependent on the professor name column. This is because if in a particular row, we change the name of the professor, we will also have to change the department value. As an example, suppose MA214 is now taken by Prof. Ronald who happens to be from the mathematics department, the table will look like this:
Here, when we changed the name of the professor, we also had to change the department column. This is not desirable since someone who is updating the database may remember to change the name of the professor, but may forget updating the department value. This can cause inconsistency in the database.
Third normal form avoids this by breaking this into separate tables:
Here, the third column is the ID of the professor who’s taking the course.
Here, in the above table, we store the details of the professor against his/her ID. This way, whenever we want to reference the professor somewhere, we don’t have to put the other details of the professor in that table again. We can simply use the ID.
Therefore, in the third normal form, the following conditions are required:
- The table should be in the second normal form.
- There should not be any functional dependency.
Boyce-Codd Normal Form (BCNF)
The Boyce-Codd Normal form is a stronger generalization of the third normal form. A table is in Boyce-Codd Normal form if and only if at least one of the following conditions are met for each functional dependency A → B:
- A is a superkey
- It is a trivial functional dependency.
Let us first understand what a superkey means. To understand BCNF in DBMS, consider the following BCNF example table:
Here, the first column (course code) is unique across various rows. So, it is a superkey. Consider the combination of columns (course code, professor name). It is also unique across various rows. So, it is also a superkey. A superkey is basically a set of columns such that the value of that set of columns is unique across various rows. That is, no 2 rows have the same set of values for those columns. Some of the superkeys for the table above are:
- Course code, professor name
- Course code, professor mobile number
A superkey whose size (number of columns) is the smallest is called a candidate key. For instance, the first superkey above has just 1 column. The second one and the last one have 2 columns. So, the first superkey (Course code) is a candidate key.
Boyce-Codd Normal Form says that if there is a functional dependency A → B, then either A is a superkey or it is a trivial functional dependency. A trivial functional dependency means that all columns of B are contained in the columns of A. For instance, (course code, professor name) → (course code) is a trivial functional dependency because when we know the value of course code and professor name, we do know the value of course code and so, the dependency becomes trivial.
Let us understand what’s going on:
A is a superkey: this means that only and only on a superkey column should it be the case that there is a dependency of other columns. Basically, if a set of columns (B) can be determined knowing some other set of columns (A), then A should be a superkey. Superkey basically determines each row uniquely.
It is a trivial functional dependency: this means that there should be no non-trivial dependency. For instance, we saw how the professor’s department was dependent on the professor’s name. This may create integrity issues since someone may edit the professor’s name without changing the department. This may lead to an inconsistent database.
Another example would be if a company had employees who work in more than one department. The corresponding database can be decomposed into where the functional dependencies could be such keys as employee ID and employee department.
Fourth normal form
A table is said to be in fourth normal form if there is no two or more, independent and multivalued data describing the relevant entity.
Fifth normal form
A table is in fifth normal form if:
- It is in its fourth normal form.
- It cannot be subdivided into any smaller tables without losing some form of information.
- Normalization is Important for Database Systems
Normalization in DBMS is useful for designing the schema of a database such that there is no data replication which may possibly lead to inconsistencies. While designing the schema for applications, we should always think about how we can make use of these forms.
If you want to learn more about SQL , check out our post on the best SQL certifications . You can also read about SQL vs MySQL to learn about what the two are. To become a data engineer , you’ll need to learn about normalization and a lot more, so get started today.
- Frequently Asked Questions
1. Does database normalization reduce the database size?
Yes, database normalization does reduce database size. Redundant data is removed, so the database disk storage use becomes smaller.
2. Which normal form can remove all the anomalies in DBMS?
5NF will remove all anomalies. However, generally, most 3NF tables will be free from anomalies.
3. Can database normalization reduce the number of tables?
Database normalization increases the number of tables. This is because we split tables into sub-tables in order to eliminate redundant data.
4. What is the Difference between BCNF and 3NF?
BCNF is an extension of 3NF. The primary difference is that it removes the transitive dependency from a relation.
People are also reading:
- SQL Courses
- SQL Certifications
- Download SQL Cheat Sheet PDF
- Top DBMS Interview Questions & Answers
- Difference between MongoDB vs MySQL
- Create Database in MySQL
- What is Stored Procedure?
- Difference between OLTP vs OLAP
- What is MongoDB?
- Basic SQL Command
Subscribe to our monthly newsletter
Welcome to the club and Thank you for subscribing!

Entrepreneur, Coder, Speed-cuber, Blogger, fan of Air crash investigation! Aman Goel is a Computer Science Graduate from IIT Bombay. Fascinated by the world of technology he went on to build his own start-up - AllinCall Research and Solutions to build the next generation of Artificial Intelligence, Machine Learning and Natural Language Processing based solutions to power businesses.
Disclosure: Hackr.io is supported by its audience. When you purchase through links on our site, we may earn an affiliate commission.
In this article
- Download SQL Injection Cheat Sheet PDF for Quick References
- SQL vs MySQL: What’s the Difference and Which One to Choose
- What is SQL? A Beginner's Definitive Guide SQL
Please login to leave comments
This video might be helpful to you: https://www.youtube.com/watch?v=B5r8CcTUs5Y
4 years ago

Sagar Jaybhay
Very very nice explanation
Tiago Mendes
Thank you for your the tutorial, it was explained well and easy to folow!
Always be in the loop.
Get news once a week, and don't worry — no spam.
- Programming Tips
- Help center
- We ❤️ Feedback
- Advertise / Partner
- Write for us
- Privacy Policy
- Cookie Policy
- Disclosure Policy
- Terms and Conditions
- Refund Policy
Disclosure: This page may contain affliate links, meaning when you click the links and make a purchase, we receive a commission.

Database Normalization: A Step-By-Step-Guide With Examples
Database normalisation is a concept that can be hard to understand.
But it doesn’t have to be.
In this article, I’ll explain what normalisation in a DBMS is and how to do it, in simple terms.
By the end of the article, you’ll know all about it and how to do it.
So let’s get started.
Want to improve your database modelling skills? Click here to get my Database Normalisation Checklist: a list of things to do as you normalise or design your database!
Table of Contents
What Is Database Normalization?
Database normalisation, or just normalisation as it’s commonly called, is a process used for data modelling or database creation, where you organise your data and tables so it can be added and updated efficiently.
It’s something a person does manually, as opposed to a system or a tool doing it. It’s commonly done by database developers and database administrators.
It can be done on any relational database , where data is stored in tables that are linked to each other. This means that normalization in a DBMS (Database Management System) can be done in Oracle, Microsoft SQL Server, MySQL, PostgreSQL and any other type of database.
To perform the normalization process, you start with a rough idea of the data you want to store, and apply certain rules to it in order to get it to a more efficient form.
I’ll show you how to normalise a database later in this article.
Why Normalize a Database?
So why would anyone want to normalize their database?
Why do we want to go through this manual process of rearranging the data?
There are a few reasons we would want to go through this process:
- Make the database more efficient
- Prevent the same data from being stored in more than one place (called an “insert anomaly”)
- Prevent updates being made to some data but not others (called an “update anomaly”)
- Prevent data not being deleted when it is supposed to be, or from data being lost when it is not supposed to be (called a “delete anomaly”)
- Ensure the data is accurate
- Reduce the storage space that a database takes up
- Ensure the queries on a database run as fast as possible
Normalization in a DBMS is done to achieve these points. Without normalization on a database, the data can be slow, incorrect, and messy .
Data Anomalies
Some of these points above relate to “anomalies”.
An anomaly is where there is an issue in the data that is not meant to be there . This can happen if a database is not normalised.
Let’s take a look at the different kinds of data anomalies that can occur and that can be prevented with a normalised database.
Our Example
We’ll be using a student database as an example in this article, which records student, class, and teacher information.
Let’s say our student database looks like this:
This table keeps track of a few pieces of information:
- The student names
- The fees a student has paid
- The classes a student is taking, if any
This is not a normalised table, and there are a few issues with this.
Insert Anomaly
An insert anomaly happens when we try to insert a record into this table without knowing all the data we need to know.
For example, if we wanted to add a new student but did not know their course name.
The new record would look like this:
We would be adding incomplete data to our table, which can cause issues when trying to analyse this data.
Update Anomaly
An update anomaly happens when we want to update data, and we update some of the data but not other data.
For example, let’s say the class Biology 1 was changed to “Intro to Biology”. We would have to query all of the columns that could have this Class field and rename each one that was found.
There’s a risk that we miss out on a value, which would cause issues.
Ideally, we would only update the value once, in one location.
Delete Anomaly
A delete anomaly occurs when we want to delete data from the table, but we end up deleting more than what we intended.
For example, let’s say Susan Johnson quits and her record needs to be deleted from the system. We could delete her row:
But, if we delete this row, we lose the record of the Biology 2 class, because it’s not stored anywhere else. The same can be said for the Medicine course.
We should be able to delete one type of data or one record without having impacts on other records we don’t want to delete.
What Are The Normal Forms?
The process of normalization involves applying rules to a set of data. Each of these rules transforms the data to a certain structure, called a normal form .
There are three main normal forms that you should consider (Actually, there are six normal forms in total, but the first three are the most common).
Whenever the first rule is applied, the data is in “ first normal form “. Then, the second rule is applied and the data is in “ second normal form “. The third rule is then applied and the data is in “ third normal form “.
Fourth and fifth normal forms are then achieved from their specific rules.
Alright, so there are three main normal forms that we’re going to look at. I’ve written a post on designing a database , but let’s see what is involved in getting to each of the normal forms in more detail.
What Is First Normal Form?
First normal form is the way that your data is represented after it has the first rule of normalization applied to it. Normalization in DBMS starts with the first rule being applied – you need to apply the first rule before applying any other rules.
Let’s start with a sample database. In this case, we’re going to use a student and teacher database at a school. We mentioned this earlier in the article when we spoke about anomalies, but here it is again.
Our Example Database
We have a set of data we want to capture in our database, and this is how it currently looks. It’s a single table called “student” with a lot of columns.
Everything is in one table.
How can we normalise this?
We start with getting the data to First Normal Form.
To apply first normal form to a database, we look at each table, one by one, and ask ourselves the following questions of it:
Does the combination of all columns make a unique row every single time?
What field can be used to uniquely identify the row?
Let’s look at the first question.
No. There could be the same combination of data, and it would represent a different row. There could be the same values for this row and it would be a separate row (even though it is rare).
The second question says:
Is this the student name? No, as there could be two students with the same name.
Address? No, this isn’t unique either.
Any other field?
We don’t have a field that can uniquely identify the row.
If there is no unique field, we need to create a new field. This is called a primary key, and is a database term for a field that is unique to a single row. (Related: The Complete Guide to Database Keys )
When we create a new primary key, we can call it whatever we like, but it should be obvious and consistently named between tables. I prefer using the ID suffix, so I would call it student ID.
This is our new table:
Student ( student ID , student name, fees paid, date of birth, address, subject 1, subject 2, subject 3, subject 4, teacher name, teacher address, course name)
This can also be represented in an Entity Relationship Diagram (ERD):

The way I have written this is a common way of representing tables in text format. The table name is written, and all of the columns are shown in brackets, with the primary key underlined.
This data is now in first normal form.
This example is still in one table, but it’s been made a little better by adding a unique value to it.
Want to find a tool that creates these kinds of diagrams? There are many tools for creating these kinds of diagrams. I’ve listed 76 of them in this guide to Data Modeling Tools , along with reviews, price, and other features. So if you’re looking for one to use, take a look at that list.
What Is Second Normal Form?
The rule of second normal form on a database can be described as:
- Fulfil the requirements of first normal form
- Each non-key attribute must be functionally dependent on the primary key
What does this even mean?
It means that the first normal form rules have been applied. It also means that each field that is not the primary key is determined by that primary key , so it is specific to that record. This is what “functional dependency” means.
Let’s take a look at our table.
Are all of these columns dependent on and specific to the primary key?
The primary key is student ID, which represents the student. Let’s look at each column:
- student name: Yes, this is dependent on the primary key. A different student ID means a different student name.
- fees paid: Yes, this is dependent on the primary key. Each fees paid value is for a single student.
- date of birth: Yes, it’s specific to that student.
- address: Yes, it’s specific to that student.
- subject 1: No, this column is not dependent on the student. More than one student can be enrolled in one subject.
- subject 2: As above, more than one subject is allowed.
- subject 3: No, same rule as subject 2.
- subject 4: No, same rule as subject 2
- teacher name: No, the teacher name is not dependent on the student.
- teacher address: No, the teacher address is not dependent on the student.
- course name: No, the course name is not dependent on the student.
We have a mix of Yes and No here. Some fields are dependent on the student ID, and others are not.
How can we resolve those we marked as No?
Let’s take a look.
First, the subject 1 column. It is not dependent on the student, as more than one student can have a subject, and the subject isn’t a part of the definition of a student.
So, we can move it to a new table:
Subject (subject name)
I’ve called it subject name because that’s what the value represents. When we are writing queries on this table or looking at diagrams, it’s clearer what subject name is instead of using subject.
Now, is this field unique? Not necessarily. Two subjects could have the same name and this would cause problems in our data.
So, what do we do? We add a primary key column, just like we did for student. I’ll call this subject ID, to be consistent with the student ID.
Subject ( subject ID , subject name)
This means we have a student table and a subject table. We can do this for all four of our subject columns in the student table, removing them from the student table so it looks like this:
Student ( student ID , student name, fees paid, date of birth, address, teacher name, teacher address, course name)
But they are in separate tables. How do we link them together?
We’ll cover that shortly. For now, let’s keep going with our student table.
The next column we marked as No was the Teacher Name column. The teacher is separate to the student so should be captured separately. This means we should move it to its own table.
Teacher (teacher name)
We should also move the teacher address to this table, as it’s a property of the teacher. I’ll also rename teacher address to be just address.
Teacher (teacher name, address)
Just like with the subject table, the teacher name and address is not unique. Sure, in most cases it would be, but to avoid duplication we should add a primary key. Let’s call it teacher ID,
Teacher ( teacher ID , teacher name, address)
The last column we have to look at was the Course Name column. This indicates the course that the student is currently enrolled in.
While the course is related to the student (a student is enrolled in a course), the name of the course itself is not dependent on the student.
So, we should move it to a separate table. This is so any changes to courses can be made independently of students.
The course table would look like this:
Course (course name)
Let’s also add a primary key called course ID.
Course ( course ID , course name)
We now have our tables created from columns that were in the student table. Our database so far looks like this:
Student ( student ID , student name, fees paid, date of birth, address) Subject ( subject ID , subject name) Teacher ( teacher ID , teacher name, address) Course ( course ID , course name)
Using the data from the original table, our data could look like this:
How do we link these tables together? We still need to know which subjects a student is taking, which course they are in, and who their teachers are.
Foreign Keys in Tables
We have four separate tables, capturing different pieces of information. We need to capture that students are taking certain courses, have teachers, and subjects. But the data is in different tables.
How can we keep track of this?
We use a concept called a foreign key.
A foreign key is a column in one table that refers to the primary key in another table . Related: The Complete Guide to Database Keys .
It’s used to link one record to another based on its unique identifier, without having to store the additional information about the linked record.
Here are our two tables so far:
Student ( student ID , student name, fees paid, date of birth, address) Subject ( subject ID , subject name) Teacher ( teacher ID , teacher name, teacher address) Course ( course ID , course name)
To link the two tables using a foreign key, we need to put the primary key (the underlined column) from one table into the other table.
Let’s start with a simple one: students taking courses. For our example scenario, a student can only be enrolled in one course at a time, and a course can have many students.
We need to either:
- Add the course ID from the course table into the student table
- Add the student ID from the student table into the course table
But which one is it?
In this situation, I ask myself a question to work out which way it goes:
Does a table1 have many table2s, or does a table2 have many table1s?
If it’s the first, then table1 ID goes into table 2, and if it’s the second then table2 ID goes into table1.
So, if we substitute table1 and table2 for course and student:
Does a course have many students, or does a student have many courses?
Based on our rules, the first statement is true: a course has many students.
This means that the course ID goes into the student table.
Student ( student ID , course ID , student name, fees paid, date of birth, address) Subject ( subject ID , subject name) Teacher ( teacher ID , teacher name, teacher address) Course ( course ID , course name)
I’ve italicised it to indicate it is a foreign key – a value that links to a primary key in another table.
When we actually populate our tables, instead of having the course name in the student table, the course ID goes in the student table. The course name can then be linked using this ID.
This also means that the course name is stored in one place only, and can be added/removed/updated without impacting other tables.
I’ve created a YouTube video to explain how to identify and diagram one-to-many relationships like this:
We’ve linked the student to the course. Now let’s look at the teacher.
How are teachers related? Depending on the scenario, they could be related in one of a few ways:
- A student can have one teacher that teaches them all subjects
- A subject could have a teacher than teaches it
- A course could have a teacher that teaches all subjects in a course
In our scenario, a teacher is related to a course. We need to relate these two tables using a foreign key.
Does a teacher have many courses, or does a course have many teachers?
In our scenario, the first statement is true. So the teacher ID goes into the course table:
Student ( student ID , course ID , student name, fees paid, date of birth, address) Subject ( subject ID , subject name) Teacher ( teacher ID , teacher name, teacher address) Course ( course ID , teacher ID , course name)
The table data would look like this:
This allows us to change the teacher’s information without impacting the courses or students.
Student and Subject
So we’ve linked the course, teacher, and student tables together so far.
What about the subject table?
Does a subject have many students, or does a student have many subjects?
The answer is both.
How is that possible?
A student can be enrolled in many subjects at a time, and a subject can have many students in it.
How can we represent that? We could try to put one table’s ID in the other table:
But if we do this, we’re storing many pieces of information in one column, possibly separated by commas.
This makes it hard to maintain and is very prone to errors.
If we have this kind of relationship, one that goes both ways, it’s called a many to many relationship . It means that many of one record is related to many of the other record.
Many to Many Relationships
A many to many relationship is common in databases. Some examples where it can happen are:
- Students and subjects
- Employees and companies (an employee can have many jobs at different companies, and a company has many employees)
- Actors and movies (an actor is in multiple movies, and a movie has multiple actors)
If we can’t represent this relationship by putting a foreign key in each table, how can we represent it?
We use a joining table.
This is a table that is created purely for storing the relationships between the two tables.
It works like this. Here are our two tables:
Student ( student ID , course ID , student name, fees paid, date of birth, address) Subject ( subject ID , subject name)
And here is our joining table:
Subject_Student ( student ID , subject ID )
It has two columns. Student ID is a foreign key to the student table, and subject ID is a foreign key to the subject table.
Each record in the row would look like this:
Each row represents a relationship between a student and a subject.
Student 1 is linked to subject 1.
Student 1 is linked to subject 2.
Student 2 is linked to subject 2.
This has several advantages:
- It allows us to store many subjects for each student, and many students for each subject.
- It separates the data that describes the records (subject name, student name, address, etc.) from the relationship of the records (linking ID to ID).
- It allows us to add and remove relationships easily.
- It allows us to add more information about the relationship. We could add an enrolment date, for example, to this table, to capture when a student enrolled in a subject.
You might be wondering, how do we see the data if it’s in multiple tables? How can we see the student name and the name of the subjects they are enrolled in?
Well, that’s where the magic of SQL comes in. We use a SELECT query with JOINs to show the data we need. But that’s outside the scope of this article – you can read the articles on my Oracle Database page to find out more about writing SQL.
One final thing I have seen added to these joining tables is a primary key of its own. An ID field that represents the record. This is an optional step – a primary key on a single new column works in a similar way to defining the primary key on the two ID columns. I’ll leave it out in this example.
So, our final table structure looks like this:
Student ( student ID , course ID , student name, fees paid, date of birth, address) Subject ( subject ID , subject name) Subject Enrolment ( student ID , subject ID ) Teacher ( teacher ID , teacher name, teacher address) Course ( course ID , teacher ID , course name)
I’ve called the table Subject Enrolment. I could have left it as the concatenation of both of the related tables (student subject), but I feel it’s better to rename the table to what it actually captures – the fact a student has enrolled in a subject. This is something I recommend in my SQL Best Practices post .
I’ve also underlined both columns in this table, as they represent the primary key. They can also represent a foreign key, which is why they are also italicised.
An ERD of these tables looks like this:

This database structure is in second normal form. We almost have a normalised database.
Now, let’s take a look at third normal form.
What Is Third Normal Form?
Third normal form is the final stage of the most common normalization process. The rule for this is:
- Fulfils the requirements of second normal form
- Has no transitive functional dependency
What does this even mean? What is a transitive functional dependency?
It means that every attribute that is not the primary key must depend on the primary key and the primary key only .
For example:
- Column A determines column B
- Column B determines column C
- Therefore, column A determines C
This means that column A determines column B which determines column C . This is a transitive functional dependency, and it should be removed. Column C should be in a separate table.
We need to check if this is the case for any of our tables.
Student ( student ID , course ID , student name, fees paid, date of birth, address)
Do any of the non-primary-key fields depend on something other than the primary key?
No, none of them do. However, if we look at the address, we can see something interesting:
We can see that there is a relationship between the ZIP code and the city or suburb. This is common with addresses, and you may have noticed this if you have filled out any forms for addresses online recently.
How are they related? The ZIP code, or postal code, determines the city, state, and suburb.
In this case, 56128 is South Boston, and 56125 is North Boston. (I just made this up so this is probably inaccurate data).
This falls into the pattern we mentioned earlier: A determines B which determines C.
Student determines the address ZIP code which determines the suburb.
So, how can we improve this?
We can move the ZIP code to another table, along with everything it identifies, and link to it from the student table.
Our table could look like this:
Student ( student ID , course ID , student name, fees paid, date of birth, street address, address code ID ) Address Code ( address code ID , ZIP code, suburb, city, state)
I’ve created a new table called Address Code, and linked it to the student table. I created a new column for the address code ID, because the ZIP code may refer to more than one suburb. This way we can capture that fact, and it’s in a separate table to make sure it’s only stored in one place.
Let’s take a look at the other tables:
Subject ( subject ID , subject name) Subject Enrolment ( student ID , subject ID )
Both of these tables have no columns that aren’t dependent on the primary key.
Teacher ( teacher ID , teacher name, teacher address)
The teacher table also has the same issue as the student table when we look at the address. We can, and should use the same approach for storing address.
So our table would look like this:
Teacher ( teacher ID , teacher name, street address, address code ID ) Address Code ( address code ID , ZIP code, suburb, city, state)
It uses the same Address Code table as mentioned above. We aren’t creating a new address code table.
Finally, the course table:
Course ( course ID , teacher ID , course name)
This table is OK. The course name is dependent on the course ID.
So, what does our database look like now?
Student ( student ID , course ID , student name, fees paid, date of birth, street address, address code ID ) Address Code ( address code ID , ZIP code, suburb, city, state) Subject ( subject ID , subject name) Subject Enrolment ( student ID , subject ID ) Teacher ( teacher ID , teacher name, street address, address code ID ) Course ( course ID , teacher ID , course name)
So, that’s how third normal form could look if we had this example.
An ERD of third normal form would look like this:

Stopping at Third Normal Form
For most database normalisation exercises, stopping after achieving Third Normal Form is enough .
It satisfies a good relationship rules and will greatly improve your data structure from having no normalisation at all.
There are a couple of steps after third normal form that are optional. I’ll explain them here so you can learn what they are.
Fourth Normal Form and Beyond
Fourth normal form is the next step after third normal form.
What does it mean?
It needs to satisfy two conditions:
- Meet the criteria of third normal form.
- There are no non-trivial multivalued dependencies other than a candidate key.
So, what does this mean?
A multivalued dependency is probably better explained with an example, which I’ll show you shortly. It means that there are other attributes in the table that are not dependent on the primary key, and can be moved to another table.
Our database looks like this:
This meets the third normal form rules.
However, let’s take a look at the address fields: street address and address code.
- Both the student and teacher table have these
- What if a student moves addresses? Do we update the address in this field? If we do, then we lose the old address.
- If an address is updated, is it because they moved? Or is it because there was an error in the old address?
- What if two students have the same street address. Are they actually at the same address? What if we update one and not the other?
- What if a teacher and a student are at the same address?
- What if we want to capture a student or a teacher having multiple addresses (for example, postal and residential)?
There are a lot of “what if” questions here. There is a way we can resolve them and improve the quality of the data.
This is a multivalued dependency.
We can solve this by moving the address to a separate table .
The address can then be linked to the teacher and student tables.
Let’s start with the address table.
Address ( address ID , street address, address code ID )
In this table, we have a primary key of address ID, and we have stored the street address here. The address code table stays the same.
We need to link this to the student and teacher tables. How do we do this?
Do we also want to capture the fact that a student or teacher can have multiple addresses? It may be a good idea to future proof the design. It’s something you would want to confirm in your organisation.
For this example, we will design it so there can be multiple addresses for a single student.
Our tables could look like this:
Student ( student ID , course ID , student name, fees paid, date of birth) Address ( address ID , street address, address code ID ) Address Code ( address code ID , ZIP code, suburb, city, state) Student Address ( address ID, student ID ) Subject ( subject ID , subject name) Subject Enrolment ( student ID , subject ID ) Teacher ( teacher ID , teacher name) Teacher Address ( teacher ID, address ID ) Course ( course ID , teacher ID , course name)
An ERD would look like this:

A few changes have been made here:
- The address code ID has been removed from the Student table, because the relationships between student and address is now captured in the joining table called Student Address.
- The teacher’s address is also captured in the joining table Teacher Address, and the address code ID has been removed from the Teacher table. I couldn’t think of a better name for each of these tables.
- Address still links to address code ID
This design will let you do a few things:
- Store multiple addresses for each student or teacher
- Store additional information about an address relationship to a teach or student, such as an effective date (to track movements over time) or an address type (postal, residential)
- Determine which students or teachers live at the same address with certainty (it’s linked to the same record).
So, that’s how you can achieve fourth normal form on this database.
There are a few enhancements you can make to this design, but it depends on your business rules:
- Combine the student and teacher tables into a person table, as they are both effectively people, but teachers teach a class and students take a class. This table could then link to subjects and define that relationship as “teaches” or “is enrolled in”, to cater for this.
- Relate a course to a subject, so you can see which subjects are in a course
- Split the address into separate fields for unit number, street number, address line 1, address line 2, and so on.
- Split the student name and teacher name into first and last names to help with displaying data and sorting.
These changes could improve the design, but I haven’t detailed them in any of these steps as they aren’t required for fourth normal form.
I hope this explanation has helped you understand what the normal forms are and what normalization in DBMS is. Do you have any questions on this process? Share them in the section below.
Lastly, if you enjoy the information and career advice I’ve been providing, sign up to my newsletter below to stay up-to-date on my articles. You’ll also receive a fantastic bonus. Thanks!
62 thoughts on “Database Normalization: A Step-By-Step-Guide With Examples”
Thanks so much for explaining this concept Ben. To me as a learner, this is the best way to grab this concept.
You broke this down to the last atom. Keep up the good work!
Thanks James, glad you found it helpful!
Absolutely, the best and easiest explanation I have seen. Very helpful.
Saludos Ben, buen post. Podrías por favor revisar la simbología que utilizaste en la relación de las tablas Student y Course, dado que comentaste en las líneas de arriba “significa que la identificación del curso entra en la tabla de estudiantes.” Comenta si la relación sería: Course -< Student
Hi Ronald, Sure, I’ll check this and update the post. (Google Translate: Greetings Ben, good post. Could you please check the symbology you used in the Student and Course table relationship, since you commented on the lines above “it means that the course identification enters the student table.” Comment if the relationship would be: Course – < Student)
Saludos Ben, la simbología de Courses y Student según planteaste es de “1 a n” verifica si sería ” -< " Buen post.
Thanks Ronald! (Google Translate: Greetings Ben, the symbology of Courses and Student as you raised is “1 to n” verifies if it would be “- <" Good post.)
thank you for sharing these things to us , damn i really love it. You guys are really awesome
i dont understand the second normal form linking student id to course id isnt clear
Excellent working example
Glad you found it useful!
This is a nice compilation and a good illustration of how to carry out table normalization. I wish you can provide script samples that represent the ERD you presented here. It will be so much helpful.
Hi DJ, Glad you like the article. Good idea – I’ll create some sample scripts and add them to the post. Thanks, Ben
Good job! This is a great way explaining this topic. You made it look easy to understand. But, one question I have for you is where is a best scenario in real life used the fourth normal form?
Hi Nati, Thanks, I’m glad you like the article. I’m not sure what would be a realistic example of using fourth normal form. I think many 3NF databases can be turned into 4NF, depending on how to want to record values (either a value in a table or an ID to another table). I haven’t used it that often.
Dear Sir’ Can we call the Fourth Normal Form as a BCNF (Boyce Codd Normal Form). Or not?
Hi Subzar, I think Fourth Normal Form is slightly different to BCNF. I haven’t used either method but I know they are a little different. I think 4NF comes after BCNF.
Hey Ben, Your notes are too helpful. Will recommend my other friends for sure. Thanks a lot :)
You’ve done a truly superb job on the majority of this. You’ve used far more details than most people who provide examples do.
Particularly good is the splitting of addresses into a separate table. That is just not done nearly enough.
However, two tables have critical problems: (1) The Student table should not contain Course ID (nor fees paid); there should be a separate Student_Course intersection table. (2) Similarly, the Course table should not have a Teacher ID, likewise a separate Course_Teacher intersection table should be created.
Reasoning (condensed): (1) A student must be able to enroll first, without yet specificying a course. The student’s name and other enrollment data are not related to any specific course. (2) A course has data of its own not related to the teacher: # of credit/lab hours, cost(s), first term offered, last term offered, etc.. Should the only teacher currently teaching a course withdraw from it, the course data should not be lost. Courses have prerequisites, sometimes complex ones, that have nothing to do with who is teaching the course.
Thanks for the feedback Scott and glad you like the post! I understand your reasoning, and yes ideally there would be those two intersection tables. These scenarios are things that we, as developers, would clarify during the design. The example I went with was a simple one where a student must have a course and a course must have a teacher, but in the real world it would cater to your scenarios. Thanks again, Ben
Simple & powerfully conveyed. Thank you.
Glad you found it helpful!
As a grade 11 teacher, I am well aware of the complexities students face when teaching/explaining the concepts within this topic. This article is brilliant and breaks down such a confusing topic quite nicely with examples along the way. My students will definitely benefit from this, thank you so much.
I am just starting out in SQL and this was quite helpful. Thanks.
what a tremendous insights about the normalisation you have explained and its gave me a lot of confedence Thank u some much ben my deepest gratitude for sharing knowledge . you truly proved sharing is caring with regards chandu, india
Came across your material while searching for Normalisation material, as wanting to use my time to improve my Club Membership records, having gained a ‘Database qualification’ some 20 to 30 years years ago, I think, I needed to refresh my memory! Anyway – some queries:
1. Shouldn’t the Student Name be broken down or decomposed into StudentForename and StudentSurname, since doesn’t 1NF require this? 2. Shouldn’t Teacher Name be converted as per 121 above as well?
This would enable records to be retrieved using Surname or Forename
Hi Tim, yes that’s a good point and it would be better to break it into two names for that purpose.
Wow, this is the very first website i finally thoroughly understood normalization. Thanks a lot.
Hello again
I have thought thru the data I need to record, I think, time will tell I suspect. Anyway we run upto 18 competitions, played by teams of 1, 2 3 or 4 members, thus I think that there may be Many to Many relationship between Member and Competition tables, as in Many Member records may be related to Many Competition records [potential for a Pairs or Triples or Fours teams to win all the Competitions they enter], am I correct?
Also should I design the Competition table as CompID, Year, Comp1, Comp 2, Comp3, each Header having the name of the Competition, then I presume a table that links the two, along the lines of:
CompID, Year, MemberID OR MemberID, Year, CompID
Regards, Tim
Hello again, thinking further, I presume that I could create 18 tables, one per Competition to capture the annual results.
Again though, presume my single Comp table (see above) shouldn’t have a column per comp, as this is a repeating group
So do I create a ‘joining table’, that records Year and Comp, another that records Member and Comp and one that records Member and Year.
I may be over thinking it, but as No Comp this year, I think I need to be able to record this, I think
Hi Tim, good question. You could create one table per competition, but you would have 18 tables which have the same structure and different data, which is quite repetitive. I would suggest having a single table for comp, and a column that indicates which competition it is for. It’s OK for a table to have a value that is repeated like this, as it identifies which competition something refers to. Yes you could create joining tables for Year and Comp (if there is a many to many relationship between them) and Member and Comp as well. What’s the relationship between Member and Year?
Thanks for reply, however, would it be easier to say create a Comp table of 18 records, a Comp Type table which has 2 records, that is Annual and One Day, another table for Comp Year, which will record the annual competition results based on:
CompYear – CompTypeID – CompID – MemID
thus each year would create 32 records [10 x 1 + 4 x2 + 2 x3 + 2 x 4], e.g
2019 – 1 – 4 – 1 2019 – 1 – 4 – 2 2019 – 1 – 4 – 3 2019 – 1 – 4 – 4 2019 – 1 – 3 – 1 2019 – 1 – 3 – 2 2019 – 1 – 3 – 3 2019 – 2 – 1 – 1
so members 1 to 4 won the 2019 one day comp Bickford Fours; and members 1 to 3 won the 2019 one day comp Arthur Johnson Triples; and member 1 won the 2019 annual singles championship
Would this work and have I had a light bulb moment?
Hi Tim, yes I think that would work! Storing the comp data separately from the comp type (and so on) will ensure the data is not repeated and is only stored in one place. Good luck!
Ben, Thank you so much! I was only able to grasp the concept of normalization in one hour or so because how you simplified the concepts through a simple running example.
Thanks a lot sir Daniel i have really understood this you are a great teacher
Firstly :Thank you for your website. Secondly: I have still problem with understanding the Second normal form. For example you have written : “student name: Yes, this is dependent on the primary key. A different student ID means a different student name.”. So in your design I am not allowed to have 2 students with same names? What will happen when this, not so uncommon situation occurs?
Glad you like it Wojtek! Yes, in this design you can have two students with the same name. They would have different IDs and different rows in the table, as two people with the same name is common like you mentioned.
This is the best explanation on why and how to normalize Tables… excellent work, maybe the best explanation out there….
Hi, Thanks for the post. That is exactly what I was looking for. But I have a question, how would I insert into student address and teacher address. Best regards
This is amazing, very well explained. Simple example made it easy to understand. Thank you so much!
Thanks sir, this very helpful
Sorry but your example is not in 1NF. 1NF dictates that you cannot have a repeating group. The subjects are a repeating group. Not saying you got the design wrong at the end just that you failed to remove the repeating group in 1NF. When you went to 2NF you made rules for the repeating group multiple times and took care of it but it should have been taken care of in 1NF.
Its so helpful easy to understand
This is great! You make it easy and simple to learn where I can understand.
If you”re going to break address into atoms, (unit, street, city, zip, etc.), would you also break down telephone numbers, (country code, area code, exchange, unit number) into separate fields in another table? How elemental do you go?
Clearly explained :)
Wow c’est vraiment très utile, ça vient d’apporter un plus ma connaissance. Vraiment merci beaucoup pour cet article.
Great post! Just wanted to check – for the ERD diagram in the 2NF example, wouldn’t the relationship between Course and Student be the other way around? A course can have many students, but a student can only be enrolled in one course. So the crows feet symbol should be on the Student table.
From table Student (student ID, course ID, student name, fees paid, date of birth, street address, address code ID) why you put “street address” column remember that “address code ID” is available idenitify address of student i dont understand here
Hi Ben ! ive got a question for you. i didnt understand this part:
So that means in a first normal form duplicate roes are allowed?
I stumbled on this article explaining so succinctly about normalization of database. I must admit that it helped me understand the concept using an example far much better than just theory.
Kudos to you @Ben – You are really a DB star.
Will recommend it to my friends. I am in the process of putting this knowledge into an Employee Management System.
Mmnh ! Thank you sir, it’s very awesome tutor. 👍
The model breaks as soon as you add a 2nd course for a student. CourseID should not be in the student table. You need a separate table that ties studentID and courseID together
That’s a good point. Yes you would need a separate table for that scenario.
The ERDs for the 2nd / 3rd / 4th normal form show the crows foot at the COURSE table, but the STUDENT table receives the FK course_id. The crows foot represents the many end of the relationship, but there is no representation in the course table of any student data point.
Imho, this is where it is broken in addition to what Dan has pointed to.
Loved it – easy explanation. Thank you – love from India
Well done Ben, this is the best normalization tutorial I have ever seen. Thanks so much. keep up the good work.
Thanks for taking the time to create this. This is one of the best breakdowns that I’ve read regarding DB Normalization. However, I can’t really wrap my mind around 4NF as I would like to know how to use it in a real life scenario. Thanks again as this was extremely helpful to me!
Thank you . Very clear explanation
If the statement is true that each student can only have 1 course, then the relationship is not shown correctly in the ERD.
Leave a Comment Cancel Reply
Your email address will not be published. Required fields are marked *
This site uses Akismet to reduce spam. Learn how your comment data is processed .

What is Normalization in DBMS (SQL)? 1NF, 2NF, 3NF Example

What is Database Normalization?
Normalization is a database design technique that reduces data redundancy and eliminates undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization rules divides larger tables into smaller tables and links them using relationships. The purpose of Normalisation in SQL is to eliminate redundant (repetitive) data and ensure data is stored logically.
The inventor of the relational model Edgar Codd proposed the theory of normalization of data with the introduction of the First Normal Form, and he continued to extend theory with Second and Third Normal Form. Later he joined Raymond F. Boyce to develop the theory of Boyce-Codd Normal Form.
Database Normal Forms
Here is a list of Normal Forms in SQL:
- 1NF (First Normal Form)
- 2NF (Second Normal Form)
- 3NF (Third Normal Form)
BCNF (Boyce-Codd Normal Form)
- 4NF (Fourth Normal Form)
- 5NF (Fifth Normal Form)
- 6NF (Sixth Normal Form)
The Theory of Data Normalization in MySQL server is still being developed further. For example, there are discussions even on 6 th Normal Form. However, in most practical applications, normalization achieves its best in 3 rd Normal Form . The evolution of Normalization in SQL theories is illustrated below-
.png)
Database Normalization With Examples
Database Normalization Example can be easily understood with the help of a case study. Assume, a video library maintains a database of movies rented out. Without any normalization in database, all information is stored in one table as shown below. Let’s understand Normalization database with normalization example with solution:

Here you see Movies Rented column has multiple values. Now let’s move into 1st Normal Forms:
1NF (First Normal Form) Rules
- Each table cell should contain a single value.
- Each record needs to be unique.
The above table in 1NF-
1NF Example

Before we proceed let’s understand a few things —
What is a KEY in SQL
A KEY in SQL is a value used to identify records in a table uniquely. An SQL KEY is a single column or combination of multiple columns used to uniquely identify rows or tuples in the table. SQL Key is used to identify duplicate information, and it also helps establish a relationship between multiple tables in the database.
Note: Columns in a table that are NOT used to identify a record uniquely are called non-key columns.
What is a Primary Key?

A primary is a single column value used to identify a database record uniquely.
It has following attributes
- A primary key cannot be NULL
- A primary key value must be unique
- The primary key values should rarely be changed
- The primary key must be given a value when a new record is inserted.
What is Composite Key?
A composite key is a primary key composed of multiple columns used to identify a record uniquely
In our database, we have two people with the same name Robert Phil, but they live in different places.

Hence, we require both Full Name and Address to identify a record uniquely. That is a composite key.
Let’s move into second normal form 2NF
2NF (Second Normal Form) Rules
- Rule 1- Be in 1NF
- Rule 2- Single Column Primary Key that does not functionally dependant on any subset of candidate key relation
It is clear that we can’t move forward to make our simple database in 2 nd Normalization form unless we partition the table above.

We have divided our 1NF table into two tables viz. Table 1 and Table2. Table 1 contains member information. Table 2 contains information on movies rented.
We have introduced a new column called Membership_id which is the primary key for table 1. Records can be uniquely identified in Table 1 using membership id
Database – Foreign Key
In Table 2, Membership_ID is the Foreign Key

Foreign Key references the primary key of another Table! It helps connect your Tables
- A foreign key can have a different name from its primary key
- It ensures rows in one table have corresponding rows in another
- Unlike the Primary key, they do not have to be unique. Most often they aren’t
- Foreign keys can be null even though primary keys can not

Why do you need a foreign key?
Suppose, a novice inserts a record in Table B such as

You will only be able to insert values into your foreign key that exist in the unique key in the parent table. This helps in referential integrity.
The above problem can be overcome by declaring membership id from Table2 as foreign key of membership id from Table1
Now, if somebody tries to insert a value in the membership id field that does not exist in the parent table, an error will be shown!
What are transitive functional dependencies?
A transitive functional dependency is when changing a non-key column, might cause any of the other non-key columns to change
Consider the table 1. Changing the non-key column Full Name may change Salutation.

Let’s move into 3NF
3NF (Third Normal Form) Rules
- Rule 1- Be in 2NF
- Rule 2- Has no transitive functional dependencies
To move our 2NF table into 3NF, we again need to again divide our table.
3NF Example
Below is a 3NF example in SQL database:

We have again divided our tables and created a new table which stores Salutations.
There are no transitive functional dependencies, and hence our table is in 3NF
In Table 3 Salutation ID is primary key, and in Table 1 Salutation ID is foreign to primary key in Table 3
Now our little example is at a level that cannot further be decomposed to attain higher normal form types of normalization in DBMS. In fact, it is already in higher normalization forms. Separate efforts for moving into next levels of normalizing data are normally needed in complex databases. However, we will be discussing next levels of normalisation in DBMS in brief in the following.
Even when a database is in 3 rd Normal Form, still there would be anomalies resulted if it has more than one Candidate Key.
Sometimes is BCNF is also referred as 3.5 Normal Form.
4NF (Fourth Normal Form) Rules
If no database table instance contains two or more, independent and multivalued data describing the relevant entity, then it is in 4 th Normal Form.
5NF (Fifth Normal Form) Rules
A table is in 5 th Normal Form only if it is in 4NF and it cannot be decomposed into any number of smaller tables without loss of data.
6NF (Sixth Normal Form) Proposed
6 th Normal Form is not standardized, yet however, it is being discussed by database experts for some time. Hopefully, we would have a clear & standardized definition for 6 th Normal Form in the near future…
That’s all to SQL Normalization!!!
- Database designing is critical to the successful implementation of a database management system that meets the data requirements of an enterprise system.
- Normalization in DBMS is a process which helps produce database systems that are cost-effective and have better security models.
- Functional dependencies are a very important component of the normalize data process
- Most database systems are normalized database up to the third normal forms in DBMS.
- A primary key uniquely identifies are record in a Table and cannot be null
- A foreign key helps connect table and references a primary key
- Database Design in DBMS Tutorial: Learn Data Modeling
- MySQL Workbench Tutorial: What is, How to Install & Use
- What is a Database? Definition, Meaning, Types with Example
- MySQL SELECT Statement with Examples
- MariaDB vs MySQL – Difference Between Them

- Interview Q
DBMS Tutorial
Data modeling, relational data model, normalization, transaction processing, concurrency control, file organization, indexing and b+ tree, sql introduction.
Interview Questions
Advantages of Normalization
- Normalization helps to minimize data redundancy.
- Greater overall database organization.
- Data consistency within the database.
- Much more flexible database design.
- Enforces the concept of relational integrity.
Disadvantages of Normalization
- You cannot start building the database before knowing what the user needs.
- The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, 5NF.
- It is very time-consuming and difficult to normalize relations of a higher degree.
- Careless decomposition may lead to a bad database design, leading to serious problems.

- Send your Feedback to [email protected]
Help Others, Please Share

Learn Latest Tutorials

Transact-SQL

Reinforcement Learning

R Programming

React Native

Python Design Patterns

Python Pillow

Python Turtle

Preparation

Verbal Ability

Company Questions
Trending Technologies

Artificial Intelligence

Cloud Computing

Data Science

Machine Learning

B.Tech / MCA

Data Structures

Operating System

Computer Network

Compiler Design

Computer Organization

Discrete Mathematics

Ethical Hacking

Computer Graphics

Software Engineering

Web Technology

Cyber Security

C Programming

Control System

Data Mining

Data Warehouse
Javatpoint Services
JavaTpoint offers too many high quality services. Mail us on h [email protected] , to get more information about given services.
- Website Designing
- Website Development
- Java Development
- PHP Development
- Graphic Designing
- Digital Marketing
- On Page and Off Page SEO
- Content Development
- Corporate Training
- Classroom and Online Training
Training For College Campus
JavaTpoint offers college campus training on Core Java, Advance Java, .Net, Android, Hadoop, PHP, Web Technology and Python. Please mail your requirement at [email protected] . Duration: 1 week to 2 week

Database Concept
- Overview of DBMS
- Components of DBMS
- Database Architecture
- Types of Database Model
DBMS - ER Model
- ER Model: Basic Concepts
- ER Model: Creating ER Diagram
- ER Model: Generalization and Specialization
DBMS - Relational Model
- Codd's 12 rule of RDBMS
- Basic Concepts of RDBMS
- Relational Algebra
- Relational Calculus
- ER Model to Relational Model
- Types of Database Key
- Database Normalization
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- Boyce-Codd Normal Form (BCNF)
- Fourth Normal Form (4NF)
- Fifth Normal Form (5NF)
SQL Concept
- SQL Introduction
DDL Command
- Create query
- Alter query
- Truncate, Drop and Rename query
DML Command
- INSERT command
- UPDATE command
- DELETE command
TCL Command
- All TCL Command
DCL Command
- All DCL Command
- SELECT query
- WHERE clause
- LIKE clause
- ORDER BY clause
- Group BY clause
- Having clause
- DISTINCT keyword
- AND & OR operator
Advance SQL
- SQL Constraints
- SQL function
- SQL SET operation
- SQL Sequences
Normalization in DBMS
Normalization in DBMS is a technique using which you can organize the data in the database tables so that:
There is less repetition of data,
A large set of data is structured into a bunch of smaller tables,
and the tables have a proper relationship between them.
DBMS Normalization is a systematic approach to decompose (break down) tables to eliminate data redundancy(repetition) and undesirable characteristics like Insertion anomaly in DBMS, Update anomaly in DBMS, and Delete anomaly in DBMS.
It is a multi-step process that puts data into tabular form, removes duplicate data, and set up the relationship between tables.
Why we need Normalization in DBMS?
Normalization is required for,
Eliminating redundant(useless) data, therefore handling data integrity , because if data is repeated it increases the chances of inconsistent data.
Normalization helps in keeping data consistent by storing the data in one table and referencing it everywhere else.
Storage optimization although that is not an issue these days because Database storage is cheap.
Breaking down large tables into smaller tables with relationships, so it makes the database structure more scalable and adaptable.
Ensuring data dependencies make sense i.e. data is logically stored.
This video will give you a good overview of Database Normalization. If you want you can skip the video, as the concept is covered in this tutorial as well - Normalization in DBMS (YouTube Video) .
Problems without Normalization in DBMS
If a table is not properly normalized and has data redundancy(repetition) then it will not only eat up extra memory space but will also make it difficult for you to handle and update the data in the database, without losing data.
Insertion, Updation, and Deletion Anomalies are very frequent if the database is not normalized.
To understand these anomalies let us take an example of a Student table.
In the table above, we have data for four Computer Sci. students .
As we can see, data for the fields branch , hod (Head of Department), and office_tel are repeated for the students who are in the same branch in the college, this is Data Redundancy .
1. Insertion Anomaly in DBMS
Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot be inserted, or else we will have to set the branch information as NULL .
Also, if we have to insert data for 100 students of the same branch, then the branch information will be repeated for all those 100 students.
These scenarios are nothing but Insertion anomalies .
If you have to repeat the same data in every row of data, it's better to keep the data separately and reference that data in each row.
So in the above table, we can keep the branch information separately, and just use the branch_id in the student table, where branch_id can be used to get the branch information.
2. Updation Anomaly in DBMS
What if Mr. X leaves the college? or Mr. X is no longer the HOD of the computer science department? In that case, all the student records will have to be updated, and if by mistake we miss any record, it will lead to data inconsistency.
This is an Updation anomaly because you need to update all the records in your table just because one piece of information got changed.
3. Deletion Anomaly in DBMS
In our Student table, two different pieces of information are kept together, the Student information and the Branch information .
So if only a single student is enrolled in a branch, and that student leaves the college, or for some reason, the entry for the student is deleted, we will lose the branch information too.
So never in DBMS, we should keep two different entities together, which in the above example is Student and branch,
The solution for all the three anomalies described above is to keep the student information and the branch information in two different tables. And use the branch_id in the student table to reference the branch.
Primary Key and Non-key attributes
Before we move on to learn different Normal Forms in DBMS, let's first understand what is a primary key and what are non-key attributes.

As you can see in the table above, the student_id column is a primary key because using the student_id value we can uniquely identify each row of data, hence the remaining columns then become the non-key attributes .
Types of DBMS Normal forms
Normalization rules are divided into the following normal forms:
First Normal Form
Second Normal Form
Third Normal Form
Fourth Normal Form
Fifth Normal Form
Let's cover all the Database Normal forms one by one with some basic examples to help you understand the DBMS normal forms.
1. First Normal Form (1NF)
For a table to be in the First Normal Form, it should follow the following 4 rules:
It should only have single( atomic ) valued attributes/columns.
Values stored in a column should be of the same domain.
All the columns in a table should have unique names.
And the order in which data is stored should not matter.
Watch this YouTube video to understand First Normal Form (if you like videos) - DBMS First Normal Form 1NF with Example
Let's see an example.
If we have an Employee table in which we store the employee information along with the employee skillset , the table will look like this:
The above table has 4 columns:
All the columns have different names.
All the columns hold values of the same type like emp_name has all the names, emp_mobile has all the contact numbers, etc.
The order in which we save data doesn't matter
But the emp_skills column holds multiple comma-separated values , while as per the First Normal form, each column should have a single value.
Hence the above table fails to pass the First Normal form.
So how do you fix the above table? There are two ways to do this:
Remove the emp_skills column from the Employee table and keep it in some other table.
Or add multiple rows for the employee and each row is linked with one skill.
1. Create Separate tables for Employee and Employee Skills
So the Employee table will look like this,
And the new Employee_Skill table:
2. Add Multiple rows for Multiple skills
You can also simply add multiple rows to add multiple skills. This will lead to repetition of the data, but that can be handled as you further Normalize your data using the Second Normal form and the Third Normal form.
If you want to learn about the First Normal Form in detail, check out DBMS First Normal Form tutorial.
2. Second Normal Form (2NF)
For a table to be in the Second Normal Form,
It should be in the First Normal form.
And, it should not have Partial Dependency .
Watch this YouTube video to understand Second Normal Form (if you like videos) - DBMS Second Normal Form 2NF with Example
Let's take an example to understand Partial dependency and the Second Normal Form.
What is Partial Dependency?
When a table has a primary key that is made up of two or more columns, then all the columns(not included in the primary key) in that table should depend on the entire primary key and not on a part of it. If any column(which is not in the primary key) depends on a part of the primary key then we say we have Partial dependency in the table.
Confused? Let's take an example.
If we have two tables Students and Subjects, to store student information and information related to subjects.
Student table:
Subject Table:
And we have another table Score to store the marks scored by students in any subject like this,
Now in the above table, the primary key is student_id + subject_id , because both these information are required to select any row of data.
But in the Score table, we have a column teacher_name , which depends on the subject information or just the subject_id , so we should not keep that information in the Score table.
The column teacher_name should be in the Subjects table. And then the entire system will be Normalized as per the Second Normal Form.
Updated Subject table:
Updated Score table:
To understand what is Partial Dependency and how you can normalize a table to 2nd normal form, jump to the DBMS Second Normal Form tutorial.
3. Third Normal Form (3NF)
A table is said to be in the Third Normal Form when,
It satisfies the First Normal Form and the Second Normal form.
And, it doesn't have Transitive Dependency.
Watch this YouTube video to understand the Third Normal Form (if you like videos) - DBMS Third Normal Form 3NF with Example
What is Transitive Dependency?
In a table we have some column that acts as the primary key and other columns depends on this column. But what if a column that is not the primary key depends on another column that is also not a primary key or part of it? Then we have Transitive dependency in our table.
Let's take an example. We had the Score table in the Second Normal Form above. If we have to store some extra information in it, like,
total_marks
To store the type of exam and the total marks in the exam so that we can later calculate the percentage of marks scored by each student.
The Score table will look like this,
In the table above, the column exam_type depends on both student_id and subject_id , because,
a student can be in the CSE branch or the Mechanical branch,
and based on that they may have different exam types for different subjects.
The CSE students may have both Practical and Theory for Compiler Design,
whereas Mechanical branch students may only have Theory exams for Compiler Design.
But the column total_marks just depends on the exam_type column. And the exam_type column is not a part of the primary key. Because the primary key is student_id + subject_id , hence we have a Transitive dependency here.
How to Transitive Dependency?
You can create a separate table for ExamType and use it in the Score table.
New ExamType table,
We have created a new table ExamType and we have added more related information in it like duration (duration of exam in mins.), and now we can use the exam_type_id in the Score table.
Here is the DBMS Third Normal Form tutorial. But we suggest you first study the second normal form and then head over to the third normal form.
4. Boyce-Codd Normal Form (BCNF)
Boyce and Codd Normal Form is a higher version of the Third Normal Form.
This form deals with a certain type of anomaly that is not handled by 3NF.
A 3NF table that does not have multiple overlapping candidate keys is said to be in BCNF.
For a table to be in BCNF, the following conditions must be satisfied:
R must be in the 3rd Normal Form
and, for each functional dependency ( X → Y ), X should be a Super Key.
You can also watch our YouTube video to learn about BCNF - DBMS BCNF with Example
To learn about BCNF in detail with a very easy-to-understand example, head to the Boye-Codd Normal Form tutorial.
5. Fourth Normal Form (4NF)
A table is said to be in the Fourth Normal Form when,
It is in the Boyce-Codd Normal Form.
And, it doesn't have Multi-Valued Dependency.
You can also watch our YouTube video to learn about Fourth Normal Form - DBMS Fourth Normal Form 4NF with Example
Here is the Fourth Normal Form tutorial. But we suggest you understand other normal forms before you head over to the fourth normal form.
5. Fifth Normal Form (5NF)
The fifth normal form is also called the PJNF - Project-Join Normal Form
It is the most advanced level of Database Normalization.
Using Fifth Normal Form you can fix Join dependency and reduce data redundancy.
It also helps in fixing Update anomalies in DBMS design.
We have an amazing video to showcase the Fifth Normal Form with the help of Examples and to explain when it occurs and how you can fix it, check out the video on YouTube - Fifth Normal Form in DBMS
Here are some frequently asked questions related to the normalization in DBMS.
Q. Why do we need Normalization in DBMS?
Database Normalization helps you design and structure your table properly so that you have proper relationships between tables. It helps you with the following:
Data Integrity
Data consistency
Better relationship between tables
More scalable design for tables.
No large tables, small tables with a proper relationship.
Removing dependencies, like Partial Dependency, Transitive Dependency, Join Dependency, etc.
Q. What are the different Normal Forms in DBMS?
Following are the different Database Normal Forms:
First Normal Form also known as 1NF
Second Normal Form or 2NF
Third Normal Form or 3NF
Boyce-Codd Normal Form or BCNF
Fourth Normal Form or 4NF
Fifth Normal Form or 5NF or PJNF (Project-Join Normal Form)
Q. What is a Primary Key in DBMS?
A Primary key is a column that can be used to uniquely identify each row in a table. It can be a single column, or it can be multiple columns together. Yes, a primary key can have two columns or even more than two columns in it.
Q. What are non-key attribute in a Table?
All the columns that are not a primary key or not a part of the primary key are called as non-Key columns in a Table.
For example, if we have a table Students with columns student_id , student_name , student_address , and student_id is the primary key in this table, then student_name and student_address will be the non-Key attributes .
Q. What is the fullform of BCNF?
BCNF stands for Boyce-Codd Normal Form. BCNF is a higher version of the Third Normal Form.
Q. Is BCNF and Third Normal Form the same?
No. BCNF is a higher version of the Third Normal Form. The purpose of the Third Normal Form or 3NF is to remove Transitive dependency whereas BCNF is more strict than 3NF, and it focuses on removing all non-trivial functional dependencies.
Q. What is PJNF?
PJNF stands for Project-Join Normal Form. This is a name given to the Fifth Normal Form because the Fifth Normal Form or 5NF is used to fix Join dependency in tables.
Q. Which Normal Form is called PJNF?
The Fifth Normal Form is also known as PJNF or Project-Join Normal Form. The fifth normal form fixes the Join dependency in tables hence it is called PJNF. This is an advanced Normal form that helps in reducing Data redundancy and Updation anomaly.
- ← Prev
- Next →
DBMS MCQ Tests
practice sql queries.

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
Chapter 12 Normalization
Adrienne Watt
Normalization should be part of the database design process. However, it is difficult to separate the normalization process from the ER modelling process so the two techniques should be used concurrently.
Use an entity relation diagram (ERD) to provide the big picture, or macro view, of an organization’s data requirements and operations. This is created through an iterative process that involves identifying relevant entities, their attributes and their relationships.
Normalization procedure focuses on characteristics of specific entities and represents the micro view of entities within the ERD.
What Is Normalization?
Normalization is the branch of relational theory that provides design insights. It is the process of determining how much redundancy exists in a table. The goals of normalization are to:
- Be able to characterize the level of redundancy in a relational schema
- Provide mechanisms for transforming schemas in order to remove redundancy
Normalization theory draws heavily on the theory of functional dependencies. Normalization theory defines six normal forms (NF). Each normal form involves a set of dependency properties that a schema must satisfy and each normal form gives guarantees about the presence and/or absence of update anomalies. This means that higher normal forms have less redundancy, and as a result, fewer update problems.
Normal Forms
All the tables in any database can be in one of the normal forms we will discuss next. Ideally we only want minimal redundancy for PK to FK. Everything else should be derived from other tables. There are six normal forms, but we will only look at the first four, which are:
- First normal form (1NF)
- Second normal form (2NF)
- Third normal form (3NF)
- Boyce-Codd normal form (BCNF)
BCNF is rarely used.
First Normal Form (1NF)
In the first normal form , only single values are permitted at the intersection of each row and column; hence, there are no repeating groups.
To normalize a relation that contains a repeating group, remove the repeating group and form two new relations.
The PK of the new relation is a combination of the PK of the original relation plus an attribute from the newly created relation for unique identification.
Process for 1NF
We will use the Student_Grade_Report table below, from a School database, as our example to explain the process for 1NF.
- In the Student Grade Report table, the repeating group is the course information. A student can take many courses.
- Remove the repeating group. In this case, it’s the course information for each student.
- Identify the PK for your new table.
- The PK must uniquely identify the attribute value (StudentNo and CourseNo).
- After removing all the attributes related to the course and student, you are left with the student course table ( StudentCourse ).
- The Student table ( Student ) is now in first normal form with the repeating group removed.
- The two new tables are shown below.
How to update 1NF anomalies
StudentCourse ( StudentNo, CourseNo , CourseName, InstructorNo, InstructorName, InstructorLocation, Grade)
- To add a new course, we need a student.
- When course information needs to be updated, we may have inconsistencies.
- To delete a student, we might also delete critical information about a course.
Second Normal Form (2NF)
For the second normal form , the relation must first be in 1NF. The relation is automatically in 2NF if, and only if, the PK comprises a single attribute.
If the relation has a composite PK, then each non-key attribute must be fully dependent on the entire PK and not on a subset of the PK (i.e., there must be no partial dependency or augmentation).
Process for 2NF
To move to 2NF, a table must first be in 1NF.
- The Student table is already in 2NF because it has a single-column PK.
- When examining the Student Course table, we see that not all the attributes are fully dependent on the PK; specifically, all course information. The only attribute that is fully dependent is grade.
- Identify the new table that contains the course information.
- Identify the PK for the new table.
- The three new tables are shown below.
How to update 2NF anomalies
- When adding a new instructor, we need a course.
- Updating course information could lead to inconsistencies for instructor information.
- Deleting a course may also delete instructor information.
Third Normal Form (3NF)
To be in third normal form , the relation must be in second normal form. Also all transitive dependencies must be removed; a non-key attribute may not be functionally dependent on another non-key attribute.
Process for 3NF
- Eliminate all dependent attributes in transitive relationship(s) from each of the tables that have a transitive relationship.
- Create new table(s) with removed dependency.
- Check new table(s) as well as table(s) modified to make sure that each table has a determinant and that no table contains inappropriate dependencies.
- See the four new tables below.
At this stage, there should be no anomalies in third normal form. Let’s look at the dependency diagram (Figure 12.1) for this example. The first step is to remove repeating groups, as discussed above.
Student (StudentNo, StudentName, Major)
StudentCourse (StudentNo, CourseNo, CourseName, InstructorNo, InstructorName, InstructorLocation, Grade)
To recap the normalization process for the School database, review the dependencies shown in Figure 12.1.
The abbreviations used in Figure 12.1 are as follows:
- PD: partial dependency
- TD: transitive dependency
- FD: full dependency (Note: FD typically stands for functional dependency. Using FD as an abbreviation for full dependency is only used in Figure 12.1.)
Boyce-Codd Normal Form (BCNF)
When a table has more than one candidate key, anomalies may result even though the relation is in 3NF. Boyce-Codd normal form is a special case of 3NF. A relation is in BCNF if, and only if, every determinant is a candidate key.
BCNF Example 1
Consider the following table ( St_Maj_Adv ).
T he semantic rules (busines s rules applied to the database) for this table are:
- Each Student may major in several subjects.
- For each Major, a given Student has only one Advisor.
- Each Major has several Advisors.
- Each Advisor advises only one Major.
- Each Advisor advises several Students in one Major.
The functional dependencies for this table are listed below. The first one is a candidate key; the second is not.
- Student_id, Major ——> Advisor
- Advisor ——> Major
Anomalies for this table include:
- Delete – student deletes advisor info
- Insert – a new advisor needs a student
- Update – inconsistencies
Note : No single attribute is a candidate key.
PK can be Student_id, Major or Student_id, Advisor .
To reduce the St_Maj_Adv relation to BCNF, you create two new tables:
- St_Adv ( Student_id, Advisor )
- Adv_Maj ( Advisor , Major)
St_Adv table
Adv_Maj table
BCNF Example 2
Consider the following table ( Client_Interview) .
FD1 – ClientNo, InterviewDate –> InterviewTime, StaffNo, RoomNo (PK)
FD2 – staffNo, interviewDate, interviewTime –> clientNO (candidate key: CK)
FD3 – roomNo, interviewDate, interviewTime –> staffNo, clientNo (CK)
FD4 – staffNo, interviewDate –> roomNo
A relation is in BCNF if, and only if, every determinant is a candidate key. We need to create a table that incorporates the first three FDs ( Client_Interview2 table) and another table ( StaffRoom table) for the fourth FD.
Client_Interview2 table
StaffRoom table
Normalization and Database Design
During the normalization process of database design, make sure that proposed entities meet required normal form before table structures are created. Many real-world databases have been improperly designed or burdened with anomalies if improperly modified during the course of time. You may be asked to redesign and modify existing databases. This can be a large undertaking if the tables are not properly normalized.
Key Terms and Abbrevations
first normal form (1NF): only single values are permitted at the intersection of each row and column so there are no repeating groups
normalization : the process of determining how much redundancy exists in a table
second normal form (2NF) : the relation must be in 1NF and the PK comprises a single attribute
semantic rules : business rules applied to the database
Complete chapters 11 and 12 before doing these exercises.
- What is normalization?
- When is a table in 1NF?
- When is a table in 2NF?
- When is a table in 3NF?
- Using the dependency diagram you just drew, show the tables (in their third normal form) you would create to fix the problems you encountered. Draw the dependency diagram for the fixed table.
- This table is susceptible to update anomalies. Provide examples of insertion, deletion and update anomalies.
- Normalize this table to third normal form. State any assumptions.
- ____________________ produces a lower normal form.
- Any attribute whose value determines other values within a row is called a(n) ____________________.
- An attribute that cannot be further divided is said to display ____________________.
- ____________________ refers to the level of detail represented by the values stored in a table’s row.
- A relational table must not contain ____________________ groups.
Also see Appendix B: Sample ERD Exercises
Bibliography
Nguyen Kim Anh , Relational Design Theory . OpenStax CNX. 8 Jul 2009 Retrieved July 2014 from http://cnx.org/contents/606cc532-0b1d-419d-a0ec-ac4e2e2d533b@1@1
Russell, Gordon. Chapter 4 – Normalisation. Database eLearning . N.d. Retrived July 2014 from db.grussell.org/ch4.html
Database Design - 2nd Edition by Adrienne Watt is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.
Share This Book

Database Normalization – Normal Forms 1nf 2nf 3nf Table Examples
In relational databases, especially large ones, you need to arrange entries so that other maintainers and administrators can read them and work on them. This is why database normalization is important.
In simple words, database normalization entails organizing a database into several tables in order to reduce redundancy. You can design the database to follow any of the types of normalization such as 1NF, 2NF, and 3NF.
In this article, we’ll look at what database normalization is in detail and its purpose. We’ll also take a look at the types of normalization – 1NF, 2NF, 3NF – with examples.
What We'll Cover
What is database normalization, what is the purpose of normalization, the first normal form – 1nf, the second normal form – 2nf, the third normal form – 3nf, examples of 1nf, 2nf, and 3nf.
Database normalization is a database design principle for organizing data in an organized and consistent way.
It helps you avoid redundancy and maintain the integrity of the database. It also helps you eliminate undesirable characteristics associated with insertion, deletion, and updating.
The main purpose of database normalization is to avoid complexities, eliminate duplicates, and organize data in a consistent way. In normalization, the data is divided into several tables linked together with relationships.
Database administrators are able to achieve these relationships by using primary keys, foreign keys, and composite keys.
To get it done, a primary key in one table, for example, employee_wages is related to the value from another table, for instance, employee_data .
N.B. : A primary key is a column that uniquely identifies the rows of data in that table. It’s a unique identifier such as an employee ID, student ID, voter’s identification number (VIN), and so on.
A foreign key is a field that relates to the primary key in another table.
A composite key is just like a primary key, but instead of having a column, it has multiple columns.
What is 1NF 2NF and 3NF?
1NF, 2NF, and 3NF are the first three types of database normalization. They stand for first normal form , second normal form , and third normal form , respectively.
There are also 4NF (fourth normal form) and 5NF (fifth normal form). There’s even 6NF (sixth normal form), but the commonest normal form you’ll see out there is 3NF (third normal form).
All the types of database normalization are cumulative – meaning each one builds on top of those beneath it. So all the concepts in 1NF also carry over to 2NF, and so on.
For a table to be in the first normal form, it must meet the following criteria:
- a single cell must not hold more than one value (atomicity)
- there must be a primary key for identification
- no duplicated rows or columns
- each column must have only one value for each row in the table
The 1NF only eliminates repeating groups, not redundancy. That’s why there is 2NF.
A table is said to be in 2NF if it meets the following criteria:
- it’s already in 1NF
- has no partial dependency. That is, all non-key attributes are fully dependent on a primary key.
When a table is in 2NF, it eliminates repeating groups and redundancy, but it does not eliminate transitive partial dependency.
This means a non-prime attribute (an attribute that is not part of the candidate’s key) is dependent on another non-prime attribute. This is what the third normal form (3NF) eliminates.
So, for a table to be in 3NF, it must:
- have no transitive partial dependency.
Database normalization is quite technical, but we will illustrate each of the normal forms with examples.
Imagine we're building a restaurant management application. That application needs to store data about the company's employees and it starts out by creating the following table of employees:
All the entries are atomic and there is a composite primary key (employee_id, job_code) so the table is in the first normal form (1NF) .
But even if you only know someone's employee_id , then you can determine their name , home_state , and state_code (because they should be the same person). This means name , home_state , and state_code are dependent on employee_id (a part of primary composite key). So, the table is not in 2NF . We should separate them to a different table to make it 2NF.
Example of Second Normal Form (2NF)
Employee_roles table, employees table.
home_state is now dependent on state_code . So, if you know the state_code , then you can find the home_state value.
To take this a step further, we should separate them again to a different table to make it 3NF.
Example of Third Normal Form (3NF)
States table.
Now our database is in 3NF.
This article took you through what database normalization is, its purpose, and its types. We also look at those types of normalization and the criteria a table must meet before it can be certified to be in any of them.
It is worth noting that most tables don’t exceed the 3NF limit, but you can also take them to 4NF and 5NF, depending on requirements and the size of the data at hand.
If you find the article helpful, don’t hesitate to share it with friends and family.
Web developer and technical writer focusing on frontend technologies. I also dabble in a lot of other technologies.
If you read this far, thank the author to show them you care. Say Thanks
Learn to code for free. freeCodeCamp's open source curriculum has helped more than 40,000 people get jobs as developers. Get started

An Introduction to Database Normalization
May 17, 2022
11 min. read
Data has become the currency of marketing. And thanks to modern technology, marketers have more data-driven insights than ever when it comes to learning about audiences, buying behaviors, and the impacts of their campaigns. However, only data that is normalized can go beyond the database to provide those insights.
Database normalization adds context and structure to your data. Rather than having bits of information stashed here and there, data is arranged in a logical, usable format. Users can more quickly find information within the database without it feeling like a game of hide-and-seek. And when data can be more easily found, it can also be put to better use.
How can you transform database normalization from a problem into a top priority? Use this guide to get started.
Table of Contents:
What Is Database Normalization?
Why is normalized data important, types of data normalization, how to normalize data.

Let’s get some clarity on database normalization: What exactly is it?
Unlike the Marie Kondo approach, where you only keep what brings you joy, this type of organization focuses on arranging data in a logical manner.
Normalizing data is the next logical step after creating a database. It is where you remove any potential anomaly, error, or redundancy, set up a rule to link certain data together, and test your rules to make sure they work.
The end results are simplicity and power. When you add structure and logic to your data, you can maintain a smaller database that’s accurate and easier to use. If that’s the case, you’re inherently able to do more with your data .
Normalized data has become increasingly important as marketing data proliferates . Marketers are collecting information from more sources than ever, including email, social media , search engines, and market research. If you’re using multiple Software-as-a-Service tools, those apps may not always support each other.
By normalizing how all of your data is stored and accessed, you can expect multiple benefits:
- Eliminate data redundancy
- Focus only on necessary values
- Store related data together
- Resolve conflicting data
- Reduce the complexity of databases
- Analyze data faster and more accurately
Normalizing data is also an opportunity to address anomalies that might cloud your analysis. For instance, anomalies might appear over time due to data insertion, deletion, or alteration. Once you find these anomalies and discover how to fix them, your data will ultimately do more for you.
The key thing to remember is that the ultimate benefit isn’t data normalization itself. Rather, companies need to focus on the benefits that lie behind the action. Without going through this process, a lot of the valuable data you collect will never get used — at least not to its full potential. Normalizing data is a great way to get rid of unnecessary or inaccurate data that affects big picture insights.

Normalization has many faces, depending on the database. What are the data normalization types? Let’s look at some examples:
- First Normal Form: The first normal form (abbreviated as 1NF ) is the simplest and most basic form of normalization. This process focuses on removing redundant data and separating related data into their own tables.
- Second Normal Form: An extension of 1NF, the second normal form ( 2NF ) eliminates subgroups of data that are displayed in multiple rows on a table. New tables are created and connections between them are added.
- Third Normal Form: The third normal form ( 3NF ) satisfies all the rules from 2NF and then some. It eliminates columns that have no intra-table dependencies on the main key value.
- Fourth Normal Form and Fifth Normal Form: The most complex normal form is the fifth normal form ( 5NF ), with the fourth normal form ( 4NF ) following closely behind. These are rarely used, as the first three normal form types are most common.
- Boyce-Codd Normal Form: The Boyce-Codd normal form (abbreviated as BCNF ) is a happy medium between 3NF and 4NF (call it a 3.5NF). After meeting 3NF guidelines, BCNF can determine a superkey value if all other values are functionally dependent on it.
With each increase in the level of normalization, data receive a more technical makeover. For example, in 4NF, all dependencies between multiple values are deleted. However, technical and complex processes are not to be confused with “better” processes. A lower level of technicality can work just fine. In some cases, you might choose a combination of rules from different normalization types.
Now for the big question: How do you normalize data with all of the above information?
Normalization is a process, one that’s best broken down into phases. In a typical normalization project, you’ll need to do the following:
- Create tables for each value
- Connect values between tables
- Connect main keys with non-key values
Let’s do a real-world sample project together.
Sample Normalization Use Case
Let’s say that you want to create a relational database for your marketing agency. You offer multiple marketing services for a variety of industries. So far, your relational database only has tables for campaigns, employees assigned to the campaigns, and projects within the campaign.
But you’d like to learn more. That’s why, after your initial database design, you’d like to add more details to your database. This will usually require normalization, since adding new values that weren’t part of the existing schema may not be properly connected to the other values.
Phase One: Create Your Tables
The first step is to make sure your table schema supports all of your values. It helps to start by creating a map of all the relational data you need to collect.
In the above use case, you’d have one table for listing your marketing services, one for employees, and one for projects.
Now, let’s say you’d like to add a customer table to add an extra layer of visibility. This new table will contain key customer details, such as company names, contact person, industry, phone number, address, and ZIP code.
Your table should be broken down into the simplest information possible. For example, you would not want to combine your contact person and job title into the same table column. Assign only one value for each column and row. Here’s a basic example:
Customers Table
You also need to give the table a primary key (also called a candidate key). The primary key is how users can identify unique data that is not the same as anything else. In this use case, we could assign project IDs as the primary key, which requires a separate column.
Also, make sure there is no repeating information in the table. For example, if you have multiple columns for project ratings to account for multiple projects for the same client, you should move these into a new table. This reduces the complexity of the customers table without getting rid of important feedback. From there, you can link the project ratings table to the customers table and projects table (which comes into play in the next phase).
Doing this should satisfy all 1NF requirements.
Phase Two: Connect Values Between Tables
You could stop right here and consider your database normalized. But going the extra mile to finish 2NF doesn’t require much extra work. Plus, it can help you gain better control of your data.
In this phase, you need to make sure that all the columns in the table relate to the primary key. Those that don’t directly relate to the primary key should branch off into another table.
In our sample use case, we created a column for project IDs and made it our primary key. This unique identifier helps us see which projects belong to each client.
However, we wouldn’t be able to relate the contact person, job title, or phone number values to the primary key because this information might change. This adds some risk to your database accuracy: What if a new employee of that company becomes your new contact person? That might mean the phone number and job title values will also change. That’s a lot of extra fields to update when a point of contact changes.
So, since these values can’t relate to the primary key, we need to shift them to their own table. This new table will hold all the data related to the contact person. Then, we can connect the table to the customers table by using a contact person ID as the primary key. The new tables might look something like this:
Customers Table 2
ContactPerson Table
Now, when you get a new contact person for a client, you can simply update the "Contact Person ID" in the customers table. Looking at this ID will pull up all details for the current point of contact.
If the contact person changes, simply create a new record in the contact person table.
Rinse and repeat to ensure all columns in a table are assigned to a primary key and consider 2NF complete.
Phase Three: Connect Main Keys with Non-Key Values
To satisfy 3NF, we need to make sure that the columns only rely on the primary key and no other columns. Having functional dependency on more than one column in a table can impact data accuracy as values change over time.
Using our example customers table above, we see functional dependency at work in the city, state, and ZIP code columns. These not only relate to the customer’s primary key, but also to each other. For instance, if a company were to change locations, updating the city would most certainly affect the ZIP code.
Why doesn’t this apply to address or state, you might wonder? That’s because these values only affect each other some of the time. If a company moved down the street, it likely wouldn’t affect the city, state, or ZIP. Changing an address may or may not change the state. But changing cities or states will always affect the ZIP code.
We can align with 3NF by (you guessed it) making a separate table. This table should contain the city, state, and ZIP code(s) associated with each client. You can link it to the customers table by creating a foreign key. In this case, it’s the "ZIPCodeID".
Our new tables will look something like this:
Customers Table 3
ZIP Codes Table
When your customer changes their address, you can create a new record in the ZIP codes table to update the city, state, and ZIP, then update the address in the customers table.
Turning your big data into a normalized database can be quite the undertaking, but it also helps you get the most from data science practices. If machine learning, data model automation, or data mining for deep learning is on your radar in the future, you need to standardize your dataset organization. Good database management will only help you.
To learn how Meltwater uses database normalization on your behalf, contact us today.
Continue Reading

What Is Data Analytics? [Beginner’s Guide 2023]

How Data-Driven Marketing Campaigns Work

The Value of API and Analysing Different Data Types

What Is a Single Source of Truth and Why Do You Need It?

Image Recognition: What Is It & How Does It Work?

Image Recognition Software (Top Picks for 2023)
- Engineering Mathematics
- Discrete Mathematics
- Operating System
- Computer Networks
- Digital Logic and Design
- C Programming
- Data Structures
- Theory of Computation
- Compiler Design
- Computer Org and Architecture

- Explore Our Geeks Community
- DBMS Tutorial - Database Management System
Basic of DBMS
- Introduction of DBMS (Database Management System) - Set 1
- History of DBMS
- Advantages of Database Management System
- Disadvantages of DBMS
- Application of DBMS
- Need for DBMS
- DBMS Architecture 1-level, 2-Level, 3-Level
- Difference between File System and DBMS
Entity Relationship Model
- Introduction of ER Model
- Structural Constraints of Relationships in ER Model
- Difference between entity, entity set and entity type
- Difference between Strong and Weak Entity
- Generalization, Specialization and Aggregation in ER Model
- Recursive Relationships in ER diagrams
Relational Model
- Introduction of Relational Model and Codd Rules in DBMS
- Types of Keys in Relational Model (Candidate, Super, Primary, Alternate and Foreign)
- Anomalies in Relational Model
- Mapping from ER Model to Relational Model
- Strategies for Schema design in DBMS
Relational Algebra
- Introduction of Relational Algebra in DBMS
- Basic Operators in Relational Algebra
- Extended Operators in Relational Algebra
- SQL | Join (Inner, Left, Right and Full Joins)
- Join operation Vs Nested query in DBMS
- Tuple Relational Calculus (TRC) in DBMS
- Domain Relational Calculus in DBMS
Functional Dependencies
- Functional Dependency and Attribute Closure
- Armstrong's Axioms in Functional Dependency in DBMS
- Equivalence of Functional Dependencies
- Canonical Cover of Functional Dependencies in DBMS
Normalisation
Introduction of database normalization.
- Normal Forms in DBMS
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Boyce-Codd Normal Form (BCNF)
- Introduction of 4th and 5th Normal Form in DBMS
- The Problem of Redundancy in Database
- Database Management System | Dependency Preserving Decomposition
- Lossless Decomposition in DBMS
- Lossless Join and Dependency Preserving Decomposition
- Denormalization in Databases
Transactions and Concurrency Control
- Concurrency Control in DBMS
- ACID Properties in DBMS
- Implementation of Locking in DBMS
- Lock Based Concurrency Control Protocol in DBMS
- Graph Based Concurrency Control Protocol in DBMS
- Two Phase Locking Protocol
- Multiple Granularity Locking in DBMS
- Polygraph to check View Serializability in DBMS
- Log based Recovery in DBMS
- Timestamp based Concurrency Control
- Dirty Read in SQL
- Types of Schedules in DBMS
- Conflict Serializability in DBMS
- Condition of schedules to View-equivalent
- Recoverability in DBMS
- Precedence Graph for Testing Conflict Serializability in DBMS
- Database Recovery Techniques in DBMS
- Starvation in DBMS
- Deadlock in DBMS
- Types of Schedules based Recoverability in DBMS
- Why recovery is needed in DBMS
Indexing, B and B+ trees
- Indexing in Databases - Set 1
- Introduction of B-Tree
- Insert Operation in B-Tree
- Delete Operation in B-Tree
- Introduction of B+ Tree
- Bitmap Indexing in DBMS
- Inverted Index
- Difference between Inverted Index and Forward Index
- SQL queries on clustered and non-clustered Indexes
File organization
- File Organization in DBMS | Set 1
- File Organization in DBMS | Set 2
- File Organization in DBMS | Set 3
DBMS Interview questions and Last minute notes
- Last Minute Notes - DBMS
- Commonly asked DBMS interview questions
- Commonly asked DBMS interview questions | Set 2
DBMS GATE Previous Year Questions
- Database Management System - GATE CSE Previous Year Questions
- Database Management Systems | Set 2
- Database Management Systems | Set 3
- Database Management Systems | Set 4
- Database Management Systems | Set 5
- Database Management Systems | Set 6
- Database Management Systems | Set 7
- Database Management Systems | Set 8
Database normalization is the process of organizing the attributes of the database to reduce or eliminate data redundancy (having the same data but at different places) .
Problems because of data redundancy: Data redundancy unnecessarily increases the size of the database as the same data is repeated in many places. Inconsistency problems also arise during insert, delete and update operations.
Functional Dependency: Functional Dependency is a constraint between two sets of attributes in relation to a database. A functional dependency is denoted by an arrow (→). If an attribute A functionally determines B, then it is written as A → B.
For example, employee_id → name means employee_id functionally determines the name of the employee. As another example in a timetable database, {student_id, time} → {lecture_room}, student ID and time determine the lecture room where the student should be.
Advantages of Functional Dependency
- The database’s data quality is maintained using it.
- It communicates the database design’s facts.
- It aids in precisely outlining the limitations and implications of databases.
- It is useful to recognize poor designs.
- Finding the potential keys in the relationship is the first step in the normalization procedure. Identifying potential keys and normalizing the database without functional dependencies is impossible.
What does functionally dependent mean?
A function dependency A → B means for all instances of a particular value of A, there is the same value of B. For example in the below table A → B is true, but B → A is not true as there are different values of A for B = 3.
Trivial Functional Dependency
X → Y is trivial only when Y is a subset of X.
Examples
Non Trivial Functional Dependencies
Example:
Semi Non Trivial Functional Dependencies
Examples:
- Normal Forms
- Quiz on Normalization
The features of database normalization are as follows:
Elimination of Data Redundancy: One of the main features of normalization is to eliminate the data redundancy that can occur in a database. Data redundancy refers to the repetition of data in different parts of the database. Normalization helps in reducing or eliminating this redundancy, which can improve the efficiency and consistency of the database.
Ensuring Data Consistency: Normalization helps in ensuring that the data in the database is consistent and accurate. By eliminating redundancy, normalization helps in preventing inconsistencies and contradictions that can arise due to different versions of the same data.
Simplification of Data Management: Normalization simplifies the process of managing data in a database. By breaking down a complex data structure into simpler tables, normalization makes it easier to manage the data, update it, and retrieve it.
Improved Database Design: Normalization helps in improving the overall design of the database. By organizing the data in a structured and systematic way, normalization makes it easier to design and maintain the database. It also makes the database more flexible and adaptable to changing business needs.
Avoiding Update Anomalies: Normalization helps in avoiding update anomalies, which can occur when updating a single record in a table affects multiple records in other tables. Normalization ensures that each table contains only one type of data and that the relationships between the tables are clearly defined, which helps in avoiding such anomalies.
Standardization: Normalization helps in standardizing the data in the database. By organizing the data into tables and defining relationships between them, normalization helps in ensuring that the data is stored in a consistent and uniform manner.
Normalization is an important process in database design that helps in improving the efficiency, consistency, and accuracy of the database. It makes it easier to manage and maintain the data and ensures that the database is adaptable to changing business needs.
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.
Please Login to comment...

- AmanSrivastava1
- krahul69029
- 5758rushistng
- DBMS-Normalization
Please write us at contrib[email protected] to report any issue with the above content
Improve your Coding Skills with Practice

- SQL Courses
- Getting Started
- SQL Recipies
- All SQL Blog Posts
- Database Queries
- Database Joins
- SQL Subqueries
- Common Table Expressions
- Window Functions
- Set Operators
- Data Modification
- SQL Programming
- SQL DataTypes
- SQL Aggregate Functions
- SQL Conversion Functions
- SQL Date Functions
- SQL Logical Functions
- SQL Mathematical Functions
- SQL String Functions
- Database Normalization – in Easy to Understand English
- Related: Home
- SQL Database
Database normalization is a process used to organize a database into tables and columns. There are three main forms: first normal form , second normal form, and third normal form. The main idea is each table should be about a specific topic and only supporting topics included. Take a spreadsheet containing the information as an example, where the data contains salespeople and customers serving several purposes:
- Identify salespeople in your organization
- List all customers your company calls upon to sell a product
- Identify which salespeople call on specific customers.
By limiting a table to one purpose you reduce the number of duplicate data contained within your database. This eliminates some issues stemming from database modifications.
To achieve these objectives, we’ll use some established rules. This is called database normalization . As you apply these rules, new tables are formed. The progression from unruly to optimized passes through several normal forms: first, second, and third normal form.
As tables satisfy each successive database normalization form, they become less prone to database modification anomalies and more focused toward a sole purpose or topic. Before we move on be sure you understand the definition of a database table .
Table of contents
Reasons for database normalization, data duplication and modification anomalies, database normalization definition, first normal form (1nf) database normalization, second normal form (2nf) database normalization, third normal form (3nf) database normalization, conclusion – can database normalization get out of hand, next steps..
There are three main reasons to normalize a database. The first is to minimize duplicate data, the second is to minimize or avoid data modification issues, and the third is to simplify queries.
As we go through the various states of normalization we’ll discuss how each form addresses these issues, but to start, let’s look at some data which hasn’t been normalized and discuss some potential pitfalls.
I think once you understand the issues, you better appreciate normalization. Consider the following table:

The first thing to notice is this table serves many purposes including:
- Identifying the organization’s salespeople
- Listing the sales offices and phone numbers
- Associating a salesperson with an sales office
- Showing each salesperson’s customers
As a DBA this raises a red flag. In general I like to see tables that have one purpose. Having the table serve many purposes introduces many of the challenges; namely, data duplication, data update issues, and increased effort to query data.

Notice that for each SalesPerson we have listed both the SalesOffice and OfficeNumber. There are duplicate salesperson data. Duplicated information presents two problems:
- It increases storage and decrease performance.
- It becomes more difficult to maintain data changes.
For example:
Consider if we move the Chicago office to Evanston, IL. To properly reflect this in our table, we need to update the entries for all the SalesPersons currently in Chicago. Our table is a small example, but you can see if it were larger, that potentially this could involve hundreds of updates.
These situations are modification anomalies. Database normalization fixes them. There are three modification anomalies that can occur:
Insert Anomaly

There are facts we cannot record until we know information for the entire row. In our example we cannot record a new sales office until we also know the sales person.
Why? Because in order to create the record, we need provide a primary key. In our case this is the EmployeeID.
Update Anomaly

In this case we have the same information in several rows.
For instance if the office number changes, then there are multiple updates that need to be made. If we don’t update all rows, then inconsistencies appear.
Deletion Anomaly

Removal of a row causes removal of more than one set of facts. For instance, if John Hunt retires, then deleting that row cause us to lose information about the New York office.
Database Normalization Helps with Search and Sort Issues
The last reason we’ll consider is making it easier to search and sort your data. In the SalesStaff table if you want to search for a specific customer such as Ford, you would have to write a query like
Clearly if the customer were somehow in one column our query would be simpler. Also, consider if you want to run a query and sort by customer.

Our current table makes this tough. You would have to use three separate UNION queries! You can eliminate or reduce these anomalies by separating the data into different tables. This puts the data into tables serving a single purpose.
The process to redesign the table is database normalization.
There are three common forms of database normalization: 1 st , 2 nd , and 3 rd normal form. They are also abbreviated as 1NF, 2NF, and 3NF respectively.
There are several additional forms, such as BCNF , but I consider those advanced, and not too necessary to learn in the beginning.
The forms are progressive, meaning that to qualify for 3 rd normal form a table must first satisfy the rules for 2 nd normal form, and 2 nd normal form must adhere to those for 1 st normal form. Before we discuss the various forms and rules in detail, let’s summarize the various forms:
- First Normal Form – The information is stored in a relational table with each column containing atomic values. There are no repeating groups of columns.
- Second Normal Form – The table is in first normal form and all the columns depend on the table’s primary key.
- Third Normal Form – the table is in second normal form and all of its columns are not transitively dependent on the primary key
If the rules don’t make too much sense, don’t worry. we take a deep dive into them here!
Our Sample Data for Normalization
Before we go too much further let’s look at the sample table we’ll use to demonstrate database normalization.

Before we get into the definition see if you can find some potential problems with this table’s setup. At a glance, here is what worries me:
- There are several “topics” covered in one table: salespeople, offices, and customers.
- The Customers are repeated as columns.
- The customer addresses are within on text field.
Let’s learn more about the first normal form of database normalization to see what we can do to make our table better.
The first step to constructing the right SQL table is to ensure that the information is in its first normal form. When a table is in its first normal form, searching, filtering, and sorting information is easier.

The rules to satisfy the 1st normal form are:
- When the data is in a database table . The table stores information in rows and columns where one or more columns, called the primary key, uniquely identify each row.
- Each column has atomic values and should be not repeating groups of columns.
Tables cannot have sub-columns in the first normal form. That is, you cannot list multiple cities in one column and separate them with a semi-colon.
Atomic Values
When a value is atomic, we don’t further subdivide the value. For example, the value “Chicago” is atomic; whereas “Chicago; Los Angeles; New York” is not.
Related to this requirement is the concept that a table should not have repeating groups of columns such as Customer1Name, Customer2Name, and Customer3Name.
Read More: Primary Key vs Foreign Key – Data Modeling Tips >>
First Normal Form Case Study
Let’s take a look at our sample data. Notice there are two “topics” covered within the table: sales staff and related information, and customer.

Let’s use those two topics as our basis to design our initial tables. If left as-is, and we don’t appply database normalization rules we end up with these issue:

Check out the example below. Here you see we removed the offending repeating column name groups. We replaced them with a new table to house the one or more customers.

The repeated column groups in the Customer Table are now linked to the EmployeeID Foreign Key. As described in the Data Modeling lesson, a foreign key is a value that matches the primary key of another table.
In this case, the customer table holds the corresponding EmployeeID for the SalesStaffInformation row. Here is our data in the first normal form.

First Normal Form (1NF) Improvements
This design is superior to our original table in several ways:
- The original design limited each SalesStaffInformation entry to three customers. In the new design, the number of customers associated to each design is practically unlimited.
- The Customer, which is our original data, is nearly impossible to sort. You could, if you used the UNION statement, but it would be cumbersome. Now, it is simple to sort customers.
- The same holds true for filtering on the customer table. It is much easier to filter on one customer name related column than three.
- The design eliminates the insert and deletion anomalies for Customer. You can remove all the customers for a SalesPerson without having to remove the entire SalesStaffInformation row.
Modification anomalies are still in both tables, but these are fixed once we reorganize them as 2nd normal form.
Now it is time to look at the second normal form. I like to think the reason we place tables in 2nd normal form is to narrow them to a single purpose. Doing so brings clarity to the database design, makes it easier for us to describe and use a table, and tends to remove modification anomalies.
This stems from the primary key identifying the main topic at hand, such as identifying buildings, employees, or classes, and the columns, serving to add meaning through descriptive attributes.
An EmployeeID isn’t much on its own, but add a name, height, hair color and age, and now you’re starting to describe a real person.
So what is the definition of 2nd normal form?
Second Normal Form (2NF) Definition
A table is in 2nd Normal Form if:
- The table is in 1st normal form, and
- All the non-key columns are dependent on the table’s primary key.
We already know about the 1st normal form, but what about the second requirement? Let me try to explain.

The primary key provides a means to uniquely identify each row in a table. When we talk about columns depending on the primary key, we mean, that to find a particular value, such as what color is Kris’ hair, you would first have to know the primary key, such as an EmployeeID, to look up the answer.
Key Question to Ask Yourself
Once you identify a table’s purpose, then look at each of the table’s columns and ask yourself, “Does this column serve to describe what the primary key identifies?”

- If you answer “yes,” then the column is dependent on the primary key and belongs in the table.
- If you answer “no,” then you should move the column to a different table.
When all the columns relate to the primary key, they naturally share a common purpose, such as describing an employee. That is why I say that when a table is in second normal form, it has a single purpose, such as storing employee information.
Issues with our Example Data Model
So far, we have taken our example to the first normal form, and it has several issues.
The first issue is that there are several attributes which don’t completely rely on the entire Customer table primary key.
For a given customer, it doesn’t make sense that you should have to know both the CustomerID and EmployeeID to find the customer.

It stands to reason you should only need to know the CustomerID . Given this, the Customer table isn’t in 2nd normal form as there are columns that aren’t dependent on the full primary key. We should move them to another table.
The second issue is the SalesStaffInformation table has two columns which aren’t dependent on the EmployeeID . Though the SalesOffice and OfficeNumber columns describe which office the SalesPerson is based out of, they don’t serve to describe the employee.
You can see these issues outline in red below.

Fix the Model to Second Normal Form (2NF) Standards
Since the columns outlined in red aren’t completely dependent on the table’s primary key, it stands to reason they belong elsewhere. In both cases, move the columns to new tables.
In the case of SalesOffice and OfficeNumber, let’s create the SalesOffice table. I added a foreign key to SalesStaffInformaiton so we can still describe in which office a salesperson is based.
The changes to make Customer a second normal form table are a little trickier. Rather than move the offending columns CustomerName, CustomerCity, and CustomerPostalCode to new table, recognize that the issue is EmployeeID!
The three columns don’t depend on this part of the key. Really this table is trying to serve two purposes:
- To show which customers each employee calls to make sales.
- To find customers and their locations.
For the moment remove EmployeeID from the table. Now the table’s purpose is clear, it is to find and describe each customer.
Move Sample Data to Second Normal Form
Now let’s create a table named SalesStaffCustomer to describe which customers a salesperson calls upon. This table has two columns CustomerID and EmployeeID. Together, they form a primary key. Separately, they are foreign keys to the Customer and SalesStaffInformation tables, respectively.
The data model is show below in second normal form.

To better visualize this, here are the tables with data.

Second Normal Form Benefits
As you review the data in the tables notice moving the data to second normal form mostly removed the redundancy. Also, see if you can find any update, insert, or deletion anomalies. Those too are gone. You can now remove all the salespeople yet keep customer records. Also, if all the SalesOffices close, it doesn’t mean you have to delete the records containing salespeople.

The SalesStaffCustomer table is a strange one. It’s just all keys! Database architects call this an intersection table. An intersection table is useful when you need to model a many-to-many relationship.
Each column is a foreign key . If you look at the data model, you’ll notice that there is a one-to-many relationship to this table from SalesStaffInformation and another from Customer. In effect the table allows you to bridge the two tables together.

For all practical purposes this is a workable database. Three out of the four tables are even in third normal form, but there is one table which still has a minor issue, preventing it from being so.
Once a table is in second normal form, the design guarantees that every column is dependent on the primary key, or as I like to say, the table serves a single purpose. But what about relationships among the columns? Could there be dependencies between columns that could cause an inconsistency?
A table having both columns for an employee’s age and birth date is spelling trouble, there lurks an opportunity for data inconsistency!
How are these addressed? By the third normal form.
A table is in third normal form if:
- A table is in 2nd normal form.
- It contains only columns that are non-transitively dependent on the primary key
Wow! That’s a mouthful. What does non-transitively dependent mean? Let’s break it down.
When something is transitive, then a meaning or relationship is the same in the middle as it is across the whole. If it helps think of the prefix trans as meaning “across.” When something is transitive, then if something applies from the beginning to the end, it also applies from the middle to the end.

Since ten is greater than five, and five is greater than three, you can infer that ten is greater than three.
In this case, the greater than comparison is transitive. In general, if A is greater than B, and B is greater than C, then it follows that A is greater than C.
If you’re having a hard time wrapping your head around “transitive” I think for our purpose it is safe to think “through” as we’ll be reviewing to see how one column in a table may be related to others, through a second column.
An object has a dependence on another object when it relies upon it. In the case of databases, when we say that a column has a dependence on another column, we mean that the value can be derived from the other. For example, my age is dependent on my birthday. Dependence also plays an important role in the definition of the second normal form.
Transitive Dependence
Now let’s put the two words together to formulate a meaning for transitive dependence that we can understand and use for database columns.

I think it is simplest to think of transitive dependence to mean a column’s value relies upon another column through a second intermediate column.
Consider three columns: AuthorNationality, Author, and Book. Column values for AuthorNationality and Author rely on the Book; once the book is known, you can find out the Author or AuthorNationality. But also notice that the AuthorNationality relies upon Author. That is, once you know the Author, you can determine their nationality. In this sense then, the AuthorNationality relies upon Book, via Author. This is a transitive dependence.
This can be generalized as being three columns: A, B and PK. If the value of A relies on PK, and B relies on PK, and A also relies on B, then you can say that A relies on PK though B. That is A is transitively dependent on PK.
Transitive Dependence Examples
Let’s look at some examples to understand further.
To be non-transitively dependent, then, means that all the columns are dependent on the primary key (a criteria for 2 nd normal form) and no other columns in the table.
Let’s review what we have done so far with our database. You’ll see that I’ve found one transitive dependency:

CustomerCity relies on CustomerPostalCode which relies on CustomerID
Generally speaking a postal code applies to one city. Although all the columns are dependent on the primary key, CustomerID, there is an opportunity for an update anomaly as you could update the CustomerPostalCode without making a corresponding update to the CustomerCity.
We’ve identified this issue in red.

Fix the Model to Third Normal Form (3NF) Standards
In order for our model to be in third normal form, we need to remove the transitive dependencies. As we stated our dependency is:
It is OK that CustomerPostalCode relies on CustomerID; however, we break 3NF by including CustomerCity in the table. To fix this we’ll create a new table, PostalCode, which includes PostalCode as the primary key and City as its sole column.
The CustomerPostalCode remains in the customer table. The CustomerPostalCode can then be designated a foreign key. In this way, through the relation, the city and postal code is still known for each customer. In addition, we’ve eliminated the update anomaly.

To better visualize this, here are the Customer and PostalCode tables with data.

Now each column in the customer table is dependent on the primary key. Also, the columns don’t rely on one another for values. Their only dependency is on the primary key.

The same holds true for the PostalCode table.
At this point our data model fulfills the requirements for the third normal form. For most practical purposes this is usually sufficient; however, there are cases where even further data model refinements can take place. If you are curious to know about these advanced normalization forms, I would encourage you to read about BCNF (Boyce-Codd Normal Form) and more!
Can database normalization be taken too far? You bet! There are times when it isn’t worth the time and effort to fully normalize a database. In our example you could argue to keep the database in second normal form, that the CustomerCity to CustomerPostalCode dependency isn’t a deal breaker.

I think you should normalize if you feel that introducing update or insert anomalies can severely impact the accuracy or performance of your database application. If not, then determine whether you can rely on the user to recognize and update the fields together.
There are times when you’ll intentionally denormalize data. If you need to present summarized or complied data to a user, and that data is very time consuming or resource intensive to create, it may make sense to maintain this data separately.
One of My Experiences with Database Normalization
Several years ago I developed a large engineering change control system which, on the home page, showed each engineer’s the parts, issues, and tasks requiring their attention. It was a database wide task list. The task list was rebuilt on-the-fly in real-time using views. Performance was fine for a couple of years, but as the user base grew, more and more DB resources were being spent to rebuild the list each time the user visited the home page.
I finally had to redesign the DB. I replaced the view with a separate table that was initially populated with the view data and then maintained with code to avoid anomalies. We needed to create complicated application code to ensure it was always up-to-date.
For the user experience it was worth it. We traded off complexity in dealing with update anomalies for improved user experience.
If you’re looking to learn more about database design concepts. I would recommend the following articles:
- Easy To Understand Data Modeling Concepts
- Data Modeling Principles in Action
- What is the difference between a primary and unique key?
Leave a Reply
Your email address will not be published. Required fields are marked *
Name * * * * * * * *
Email * * * * * * * *
This site uses Akismet to reduce spam. Learn how your comment data is processed .
I think that it would be better to have all attributes included from the beginning of the case study. – Customer Postcode and City fields were added while introducing 1NF. – City Population field is added in 3NF. This can be a bit confusing for some.
- Pingback: Week 4 REPLY - Highclasswriters
- Pingback: Third Normal Form: A Guide for Beginners
- Pingback: Second Normal Form: A Beginner's Guide - Simple SQL Tutorials
- Pingback: First Normal Form: An introduction to SQL table normalization
Thank you for the clear explanations. I’m a total novice trying to understand databases and design to improve things at my new job. Your site is my new school! =D
All I can say is thank you, Kris. You have just elaborated and explained this concept in a way so easy to understand that i will use this explanation for my Jr. DBA and my students at “Valencia College”. I have 30+ yrs of experience in this field with a Master degree in “Information Systems” and humbly I express my respect and admiration to you as a professional and a colleague. Thanks again Kris and I encourage you to keep writing publications about Databases and their objects. Juan Galarza – DBA at Orange County Tax Collector.
This makes my day! It is my pleasure to your students learn normalization. I appreciate the kind comments.
really clear – well done
Please share the normalized form of above sales person table
Hello Kris, thanks for the tutorial. Could you post some examples of normalization, preferably complex ones from the real-world?
Yes, will share soon
Question: Which of the following is not true about normalization? a.) produces relations with a single theme b.) may create referential integrity constraints c.) reduces the number of functional dependencies in the schema d.) reduces anomalies e.) splits a poorly structured relation into two or more well-structured relations
c.) reduces the number of functional dependencies in the schema
I compare two tables and take the results into one table. But issue is, in the result table the entries are repeated. I think the issue is a one table which we have taken to compare is not a fixed table. It is automatically filled with a RFID tag reader. So, can I solve this repeating issue in my resultant table from normalization methods???
It sounds like your have a one to many join. That is something that happens when you normalize. Without knowing the table structures you’re comparing it is difficult to know the exact issue. I would suggest looking to see what is the join condition you’re using in the compare.
Also, when you say you take the result into one table, are you combing columns from the two tables’ rows into one row (desired result) or are you trying to just compare to see if they are different and only keep those that don’t compare?
I need to normalize these sets of attributes to 3NF
Owner #, Owner Name, Owner Phone, Pet # Pet Name, Walk #, Walk Date,Duration,Paid, Walk Notes
While multiple pets can be walked at the same time, it is recorded as a separate walk for each pet so, that details of duration, payment and notes can be recorded separately.
Normalise this table to the third normal form, clearly showing the stages of 0NF, 1NF, 2NF and 3NF.
Hi Bradley,
I would recommend joining the EssentialSQL learning group and asking your question there!
https://www.facebook.com/groups/EssentialSQLLearning/
does 3rd normal form resolve modification anomalies?
Mostly, but it is not guaranteed. That is why we sometimes need BCNF.
Thank You for this info! I am finding myself struggling with an on-line course :). I really want to learn this topic, but unfortunately my primary instructor had an emergency and one who took over is not very helpful, so your resource is a life saver! So I have a question, is I start with Person ( employee or customer ( inheritance)), then employees belong to different departments… would it be another step down inheritance or how is the best to tie it all up to get a third normal form?
Thank you for the simple terms, I could share this with the customers :)
Since you were asking what other topics readers were interested in, how about eliminating these parameter tables in the first place? :)
Regards Tansu
This is really helpful to me, thank you very much. What part do I focus to be a new dba, Currently I am looking a job
There are some core skills you should know an master in order to become a DBA. Top among them are being comfortable backingup and restoring databases.
I have several resources you can review to help. Here are some DBA articles I’ve tagged .
Let me know if you have further questions.
This was hugely helpful. I’ve been reading blogs and articles about this topic, but struggling to understand why the normalization rules are actually important and how they affect the database long term so I can make my own educated decision — thanks for this!
Hi Marie, I’m glad you found the article useful. Spread the word! Let everyone know about EssentialSQL! :)
I strive to help make SQL easy and understandable to all.
I’m so glad I was able to help you get normalization. :)
This was a great article, but how do I subscribe?
Well explained. Your other articles are already in my list :)
Hi Kris, Thanks for simple yet powerful explanation of Normalization. Great help for learner’s like me. Thanks Pradeep
My Pleasure! Thanks for reading the article!
I strongly disagree with you that BCNF is “too advanced”. It essentially states that if an attribute determines value for other attribute(s) it must be a candidate key. As such I find it easier to understand and explain to others than 2NF and 3NF. I usually put only very little attention to 3NF and spend the time on BCNF.
I also disagree in your first reason for normalization: duplicated data. It makes students belive that we should do a lot to save space (e.g. only storing 2 digits for the year which gave us the Y2K problem). It is the anamolies that is the sinners and as long we get rid of them (by normalizing) we also get rid of the duplicated data – so there is no need to fight that on its own.
Hi, The reason to avoid duplicated data isn’t to save space, its is to avoid update issues. Why write to disk more than you need to do so? As for BCNF, most enjoy the simplicity my articles bring to the topic.
For that, I’m less apt to tinker with the articles.
Hello Kris, thank you for your help in explaining normalization in a very SIMPLE manner however how will your database table for uml normalization fora bank be like.
Hi – check out this article. It should help!
https://www.essentialsql.com/data-modeling-principles-action/
I’m an online student, trying to make sense of database design so I can make it work for me in the workplace. I just wanted to say thanks for the plain-talk on normalization. It helped me understand normalization before it was formally taught (assuming it will be); and I anticipate that to help me build a stronger relational structure, allowing me to actually build a project-database that I hope to deploy in the real world. Thanks
I’m glad you like the articles.
Dear lord, thank you for a brilliant and easy to follow explanation. I’m taking a database course and their explanation of normalization makes it sound like quantum physics. Thanks 1000 times
You’re welcome! I’m glad it makes sense to you!
Thanks for another informative site. Where else may I am getting that type of info written in such an ideal way? I have a undertaking that I am just now running on, and I’ve been on the glance out for such info.
special thanks to admin very useful to tutorial.
You are welcome! I’m glad you like the article.
Its very use full for me, thanks Manikandan
I’m glad you like the article. If there are other topics you are interested in learning about, please let me know.
hi the post is very good and it is very easy to understand thanks for it if u have tutorials about jsf using visual web ice faces can u plzz send it
Hi, I’m sorry but I don’t have anything about jsf. I do know tons of C#, so if you’re interested in seeing how c# and SQL can go together let me know.
I was a developer for many many years (since 1987). Luckily I didn’t have to learn COBOL…
Hello Kris, I want to build the frontend using C# and backend sql , here the ask is to create a Database inventory.
Great article series! It came in handy for a student like myself.
Hi – I’m glad you liked the series. :)
Thanks for the job well done Kris, I have read several blogs and Tech-journals none has been this clear and simplified. Thanks again.
Thanks for letting me know. I really appreciate that you like the articles. If there are other SQL topics you would like me to explore or explain, please let me know.
Nicely done. As clear and succinct as one could ask for.
Thanks Dennis. I appreciate the compliment. If there are other DB topics you think are confusing, and you would like me to write about, please let me know. You can email me from my Contact page (above on the menu) or leave a reply here.
I want you to explain concurrency control topic in simple words
Hi Vishnu, I’ll add concurrency to my list of topics. It is a good one to cover. Be sure to subscribe to my email so you’ll know when it’s published.
Where’s Get Ready to Learn SQL. 7? Thanks
Hi, You can find the 7th article here !
Note, please note, that a slight denormalisation **CAN** be a freaking speedup for your system… Lookup-tables tend to be logical-read hogs, even when they’re small. You’ll just need to lookup **a lot**!
So your reason number 1 for “Data Duplication and Modification Anomalies” is not a rule written in marble.
I agree that the rule shouldn’t be written in marble (good metaphor btw).
There are definitely times when I take a step back and don’t follow strict normalization. I’ve done this many times in the applications I’ve written over the years; however, I think in general normalization does help performance more than it hinders it.
Check out the end of my post on the third normal form . I talk about cases when normalization can get out of hand.
Database normalization is fascinating topic, although somewhat disconnected from query. A typical introductory database course teaches students what Relational Algebra and SQL are, and then switches gears and talks about normalization with no leverage of skills developed at the first half of the course!
Speaking of design with Customer1, Customer2, Customer3 attributes: http://www.orafaq.com/usenet/comp.databases.theory/2001/05/17/0244.htm
Thanks for your comments and I’m glad you are finding my topics fascinating.
My target audience are people that are just starting to learn SQL. I’m trying to get across important principles using simple and easy to understand English. I’m trying to keep it non-technical, which is hard when you are dealing with a technical subject!
Regarding normalization, the reason I did it before exploring joins is that I thought it would be good to give my readers an appreciation of why their data is spread across so many tables and why they have to go through the hassle to learn how to stitch the data back together.
Hi Kris, I think you’re right to talk about NFs right from the beginning. The NFs define the main principles on which relational databases are built and should be treated “with respect” :) As others have pointed out, there are times when it is possible and necessary to ignore some of the NFs, usually when building search tables, but in general, they are the most important thing to know about when learning about RDBMS.
Hi Chris, Your hunch is correct. It would be better to have a person table to house customers, employees, and referrers. In this case normalization doesn’t directly speak towards having duplicate data in different tables. I think you could consider the table normalized, but of course it isn’t the design we’re looking for. We can look to generalization and specialization concepts to guide our design. Specifically, the common attributes can be place in a generalized entity, which in our case, is a person. The specialized attributes, are then placed in their own entities: customer, employee, and referrer.
This is how I would lay out the tables. person: (personID [pk], last_name, first_name, middle_name, dob, gender)
customer: (customerID [pk], last_purchase_date) • FK: customerID to person.personID
employee: (employeeID [pk], date_hired) • FK: employeeID to person.personID
referrer: (referrerID [pk], referral_date) • FK: referrerID to person.personID
Notes: [pk] = primary key.
Question on normalization in SQL relational dbs.
Let’s say I have an application where a given person can be included as a customer, a employee and/or a referrer .
In the customer table you have last_name, first_name, Middle_name, dob, gender and staff id as pk identity. fk employee_id and fk referrer_id
employee table has name, dob, gender, employee_id as pk identity
referrer table has name, and referrer_id as pk identity
It seems like it might be better to use the person table with a column for employee_id and referrer_id that link to a table that doesn’t have repetitive information in it like names or DOB so that when you query the data you don’t get a weird name or somehow conflicting data. Duplication isn’t covered under normalization but it seems like a bad idea to have duplicated data in a relational database. What is the rule/practice for removing duplicated data from the db?
Embargoed Country
Due to U.S. export compliance requirements, Spunk has blocked your access to Splunk web properties. If you believe that the action was made in error, please send an email to [email protected] with your name, complete address, your physical location at the time of seeking access, email, and phone number. Splunk will research the issue and respond.
Data Normalization Explained: Types, Examples, & Methods

What Are Data Anomalies?
- Exploring Data Anomalies: A Focus on Databases, Data Analysis & Machine Learning
Data Normalization: Understanding The Basics
Who needs data normalization, 4 types of data normalization in databases, 1. first normal form (1nf), 2. second normal form (2nf), 3. third normal form (3nf), 4. beyond 3nf, 3 examples of data normalization in databases.
- Data Normalization In Data Analysis & Machine Learning
- 3 Normalization Techniques & Formulas
- 3 Examples Of Data Normalization In Data Analysis & Machine Learning
How Estuary Can Help With Data Normalization
Popular articles.

Start streaming your data for free
Data quality is the key to success for any organization. If you don't have good data, you're essentially flying blind. Before you know it, productivity takes a hit, systems fail, and costs start to soar. IBM uncovered that poor data quality carries an annual price tag of $3.1 trillion for the U.S. economy. To combat these issues and boost your data quality, one of the most effective strategies is data normalization.
Data normalization is a versatile process aimed at minimizing errors and inconsistencies in data that can significantly undermine the efficiency and accuracy of data systems. It reduces redundancy and standardizes data to promote integrity and consistency across various fields, from database management to data analysis and machine learning.
In this guide, we'll break down the complex concept of data normalization and explore its types and applications to help you handle your data more effectively. But first, let's begin by discussing data anomalies.
Data anomalies refer to inconsistencies or errors that occur when you deal with stored data. These anomalies can compromise the integrity of the data and cause inaccuracies that do not reflect the real-world scenario the data is meant to represent.
In databases, anomalies are typically because of redundancy or poor table construction. In data analysis and machine learning, anomalies can arise from missing values, incorrect data types, or unrealistic values.
Regardless of the context, anomalies can significantly impact the consistency and integrity of data. They can cause inaccurate analyses, misleading results, and poor decision-making . Therefore, identifying and addressing data anomalies is a crucial step in any data-driven process.
Exploring Data Anomalies: A Focus on Databases, Data Analysis & Machine Learning
Data anomalies can originate from a range of sources and their impact can vary, often causing substantial complications if not addressed properly. Let’s talk about 2 broad categories where these anomalies are most prevalent and can cause major issues.
Anomalies In Databases

Image Source
When it comes to databases, 3 primary types of data anomalies result from update, insertion, and deletion operations.
- Insertion anomalies: These occur when the addition of new data to the database is hindered because of the absence of other necessary data. This situation often arises in systems where specific dependencies between data elements exist.
- Update anomalies: This type of anomalies happen when modifications to the data end up causing inconsistencies. This usually occurs when the same piece of data is stored in multiple locations and changes aren't reflected uniformly across all instances.
- Deletion anomalies: You encounter these anomalies when you unintentionally lose other valuable information while removing certain data. This typically happens when multiple pieces of information are stored together and the deletion of one affects the others.
While the above anomalies are mainly related to the operations in databases and their design flaws, understand that anomalies are not limited to these aspects alone. They can very well be present in the data itself and can be a source of misleading analysis and interpretations. Let’s discuss these next.
Anomalies In Data Analysis & Machine Learning
In data analysis and machine learning, data anomalies can manifest as discrepancies in the values, types, or completeness of data which can significantly impact the outcome of analyses or predictive models. Let's examine some of the key anomalies that occur in this context:
- Missing values: These happen when data is not available for certain observations or variables.
- Incorrect data types: These anomalies occur when the data type of a variable does not match the expected data type. For example, a numeric variable might be recorded as a string.
- Unrealistic values: This type of anomaly arises when variables contain values that are not physically possible or realistic. For example, a variable representing human age might contain a value of 200.
Now, let's look at the basics of data normalization and its significance in handling data anomalies. This vital aspect of data management and analysis ensures a more efficient process by standardizing data, eliminating redundancies, and addressing undesirable input errors. By mitigating the impact of anomalies, data normalization plays an important role in achieving reliable results.
Data normalization is an important aspect of data management and analysis that plays a crucial role in both data storage and data analysis. It is a systematic approach to decompose data tables to eliminate redundant data and undesirable characteristics.
The primary goal of data normalization is to add, delete, and modify data without causing data inconsistencies. It ensures that each data item is stored in only one place which reduces the overall disk space requirement and improves the consistency and reliability of the system.
In databases, it organizes fields and tables and in data analysis and machine learning, normalization is used to preprocess data before being used in any analysis.
Data normalization has applications in a wide array of fields and professions. Its ability to streamline data storage, reduce data input error, and ensure consistency makes it an invaluable asset for anyone dealing with large datasets. Let’s discuss some of its use cases.
Data Normalization In Machine Learning
Data normalization is a standard preprocessing step in machine learning. ML engineers use it to standardize and scale their data which is very important to ensure that every feature has an equal impact on the prediction.
Data Normalization In Research
Researchers, particularly those in the field of science and engineering, often use data normalization in their work. Whether they're dealing with experimental data or large datasets, normalization helps to simplify their data , making it easier to analyze and interpret. They use it to eliminate potential distortions caused by differing scales or units and ensure that their findings are accurate and reliable.
Data Normalization In Business
In the business world, data normalization is often used in business intelligence and decision-making . Business analysts use normalization to prepare data for analysis , helping them to identify trends, make comparisons, and draw meaningful conclusions.
This helps in more informed business decisions and strategies to drive growth and success. Normalization also improves data consistency which results in better collaboration between different teams within the company.

Data normalization in databases is a multi-stage process that involves the application of a series of rules known as 'normal forms'. Each normal form represents a level of normalization and comes with its own set of conditions that a database should meet.
These normal forms give a set of rules that a database should adhere to for achieving a certain level of normalization. The process starts with the first normal form (1NF) and can go up to the fifth normal form (5NF), each level addressing a specific type of data redundancy or anomaly. Let’s take a look at each one of them.
The first normal form (1NF) is the initial stage of data normalization. A database is in 1NF if it contains atomic values. This means that each cell in the database holds a single value and each record is unique . This stage eliminates duplicate data and ensures that each entry in the database has a unique identifier, enhancing data consistency.
A database reaches the second normal form (2NF) if it is already in 1NF and all non-key attributes are fully functionally dependent on the primary key . In other words, there should be no partial dependencies in the database. This stage further reduces redundancy and ensures that each piece of data in the database is associated with the primary key which uniquely identifies each record.
The third normal form (3NF) is achieved if a database is in 2NF and there are no transitive dependencies. This means that no non-primary key attribute should depend on another non-primary key attribute . This stage ensures that each non-key attribute of a table is directly dependent on the primary key and not on other non-key attributes.
While most databases are considered normalized after reaching 3NF, there are further stages of normalization, including the fourth normal form (4NF) and fifth normal form (5NF). These stages deal with more complex types of data dependencies and are used when dealing with more complex datasets. In most cases, however, ensuring data integrity and efficiency at the 3NF level is already sufficient.
To better understand these concepts, let's look into some practical examples of data normalization in databases.
A. First Normal Form (1NF) Example
Consider an employee table where employees are associated with multiple departments. The table is not in 1NF because it contains non-atomic values, i.e., cells with more than one value.
Employee Table

To achieve 1NF, we must split the data in the 'Department' column into 2 records , one for each department, so that each record in the table contains a single value.
Employee Table in 1NF

B. Second Normal Form (2NF) Example
Consider a Products table where the 'product' and 'brand' attributes are not fully dependent on the composite primary key 'productID' and 'brandID'.
Products Table

To bring the table to 2NF, we split the table into 3, where each non-key attribute is fully functionally dependent on the primary key.
Products Category Table

Brand Table

Products Brand Table

In the 'Product-Brand Link' table, each row represents a product-brand pair from the original table. This table effectively links the 'Products Category' table and the 'Brand' table, ensuring that the relationship between products and brands is maintained . Here’s the schema:

C. Third Normal Form (3NF) Example
Consider an Employee table where 'Salary' is dependent on 'Salary Slip No', which is not a primary key. This is a transitive dependency that is not allowed in 3NF.

To bring the table to 3NF, we split the table into 2 wherein each separate table, no non-primary key attribute depends on another non-primary key attribute.
Employee Table in 3NF

Salary Table

Data Normalization In Data Analysis & Machine Learning

In data analysis and machine learning workflows, data normalization is a pre-processing step. It adjusts the scale of data and ensures that all variables in a dataset are on a similar scale. This uniformity is important as it prevents any single variable from overshadowing others .
For machine learning algorithms that rely on distance or gradient-based methods, normalized data is especially key. It helps these algorithms to function optimally and leads to the creation of models that are accurate, reliable, and unbiased. This ultimately enhances the quality of insights derived from the data.
3 Normalization Techniques & Formulas
Data analysis and machine learning use several techniques for normalizing data. Let’s discuss the 3 most commonly used methods.
Min-Max Normalization
This technique performs a linear transformation on the original data . Each value is replaced according to a formula that considers the minimum and maximum values of the data. The goal is to scale the data to a specific range, such as [0.0, 1.0]. The formula for min-max normalization is:

Z-Score Normalization
Also known as Zero mean normalization or standardization, this technique normalizes values based on the mean and standard deviation of the data . Each value is replaced by a score that indicates how many standard deviations it is from the mean. You can apply Z-score normalization using the following formula:

Decimal Scaling Normalization
This technique normalizes by moving the decimal point of values of the data . Each value of the data is divided by the maximum absolute value of the data, resulting in values typically in the range of -1 to 1. The formula for this simple normalization technique is:

3 Examples Of Data Normalization In Data Analysis & Machine Learning
Let’s apply the normalization techniques discussed above to real-world data. This can help us uncover the tangible effects they have on data transformation . We will use the Iris dataset which is a popular dataset in the field of machine learning. This dataset consists of 150 samples from 3 species of Iris flowers.
Here’s how you can import the data in Python:
Here’s a sample of the dataset:

Min-Max Normalization Example
Min-Max normalization is a simple yet effective method to rescale features to a specific range , typically 0 to 1. Here is how you can perform Min-Max normalization using Python and Scikit-learn:
When we apply Min-Max normalization to the Iris dataset, we get:

Z-score Normalization Example
Z-score normalization, or standardization, centers the data with a mean of 0 and a standard deviation of 1 . Here's an example of how to perform Z-score normalization:
Z-score normalization of the Iris dataset gives:

Decimal Scaling Normalization Example
Decimal scaling normalization is particularly useful when the maximum absolute value of a feature is known . Here's a simple Python example of decimal scaling normalization:
The decimal scaling normalization code above first checks the order of the largest value in the dataset and then divides the entire dataset by it. Here’s the result:

The scales of the features in each of the 3 normalized datasets are much closer to each other than in the original dataset. This helps to ensure that all features contribute equally to the final result.

Estuary Flow is a real-time data pipeline platform designed to facilitate the seamless ingestion, integration, and transformation of data in real time. It provides a robust infrastructure that lets users build and manage data pipelines with ease, ensuring that data is always up-to-date and readily available for analysis.
Flow is built on a foundation of open-source technologies and offers a unique approach to data management that combines the best aspects of traditional databases and modern streaming systems. Flow's architecture is designed to handle both batch and real-time data , making it a versatile tool for a wide range of applications.
One of the key ingredients of Flow is its use of collections , which are essentially real-time data lakes of JSON documents stored in cloud storage. These collections can either be captured from an external system or derived as a transformation of one or more other collections. This provides a flexible and efficient way to manage and normalize data.
Here are some of the key features of Flow that can support data normalization:
- Default annotations: It uses default annotations to prevent null values from being materialized to your endpoint system, ensuring data consistency.
- Real-time transformations: Flow supports SQL and Typescript for data manipulation, including functions like AVG(), MIN(), MAX(), and STDDEV() that can be used for data normalization.
- Projections: It uses projections to translate between the documents of a collection and a table representation. This feature is particularly useful when dealing with systems that model flat tables of rows and columns.
- Logical partitions: It allows you to logically partition a collection, isolating the storage of documents by their differing values for partitioned fields. This can help improve the efficiency of data storage and retrieval.
- Real-time data processing: It processes data in real-time which ensures that your normalized data is always up-to-date. This is particularly useful for applications that require immediate insights from the data.
- Reductions: Flow can merge multiple documents with a common key into a single document using customizable reduction strategies.
- Schema management: It uses JSON Schema to define your data’s structure, representation, and constraints. This allows for robust data validation and ensures that your data is clean and valid before it's stored or processed.
- Flexible data ingestion: Flow allows for the ingestion of data from a wide array of sources, including databases, cloud storage, and message queues. This flexibility makes it easier to bring in data from various sources for normalization.
Data normalization is a critical process in data management and analysis that ensures the integrity and reliability of data. However, the process can be complex and time-consuming, especially when dealing with large datasets and various types of anomalies.
This is where Estuary Flow comes in. It facilitates seamless real-time data operations and ensures that your data is always up-to-date and ready for analysis. With features like schema management and support for data manipulation functions, Flow can streamline the data normalization process.
So, if you're looking for a platform to simplify your data normalization process, you can explore Estuary Flow for free by signing up here or reaching out to our team to discuss your specific needs.

Doctoral Conference on Computing, Electrical and Industrial Systems
DoCEIS 2016: Technological Innovation for Cyber-Physical Systems pp 261–269 Cite as
Normalization Techniques for Multi-Criteria Decision Making: Analytical Hierarchy Process Case Study
- Nazanin Vafaei 19 ,
- Rita A. Ribeiro 19 &
- Luis M. Camarinha-Matos 19
- Conference paper
9903 Accesses
61 Citations
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT,volume 470)
Multi-Criteria Decision Making (MCDM) methods use normalization techniques to allow aggregation of criteria with numerical and comparable data. With the advent of Cyber Physical Systems, where big data is collected from heterogeneous sensors and other data sources, finding a suitable normalization technique is also a challenge to enable data fusion (integration). Therefore, data fusion and aggregation of criteria are similar processes of combining values either from criteria or from sensors to obtain a common score. In this study, our aim is to discuss metrics for assessing which are the most appropriate normalization techniques in decision problems, specifically for the Analytical Hierarchy Process (AHP) multi-criteria method. AHP uses a pairwise approach to evaluate the alternatives regarding a set of criteria and then fuses (aggregation) the evaluations to determine the final ratings (scores).
- Normalization
- Rank reversal
- Cyber-physical systems (CPSs)
Download conference paper PDF
1 Introduction
Everybody makes decisions in their daily lives, as for example: “Should I take an umbrella today”? “Where should I go for lunch”? To make decisions we need access to information (or data) and to reach a decision we need to combine the data to obtain a final score for each candidate decision alternative (e.g. combining food prices and service of restaurants to recommend). The aim of Multi-Criteria Decision Making (MCDM) methods is to rate and prioritize a set of alternatives that best satisfy a given set of criteria [ 1 ]. Criteria are a set of requirements or independent attributes that have to be satisfied by several alternatives. Each criterion may be measured in different units, for example, degrees, kilograms or meters; but they all have to be normalized to obtain dimensionless classifications, i.e. a common numeric range/scale, to allow aggregation into a final score. Hence, data normalization is an essential part of any decision making process because it transforms the input data into numerical and comparable data, allowing using MCDM methods to rate and rank alternatives [ 2 , 3 ].
In this work, the main research question that we address is: Which normalization technique is more suitable for usage with the AHP method?
The motivation for carrying out this work includes four interconnected issues: (a) the importance of data normalization for decision problems where we need to fuse or aggregate data to obtain a final score per alternative; (b) the reduced number of research studies available in this topic; (c) continuation of previous work on suitability of normalization techniques for well-known MCDM methods (e.g. TOPSIS) [ 4 ]; (d) contributing to advances in Cyber Physical Systems [ 5 ] research, where huge amounts of available data from heterogeneous sensors need to be fused (aggregated) to determine a combined view. Specifically, in this study we focus on the well-known AHP method because it is a well-known and widely used MCDM method [ 6 – 13 ] but we plan to perform the same study for data fusion problems as well as other MCDM methods in the future, to determine which technique is more suitable for any decision problem that requires combining (fusing) data.
The Analytic Hierarchy Process (AHP) was introduced by Saaty [ 6 , 7 ] to solve unstructured problems in economics, social sciences, and management [ 8 ]. AHP has been used in a vast range of problems from simple ones (e.g. selecting a school) to harder ones (e.g. in allocating budgets and energy domains) [ 8 ]. When applying the AHP method, the decision maker is able to structure the decision problem and break it down into a hierarchical top-down process. Then, he/she performs a pairwise matrix comparison of criteria using a [ 1 – 9 ] scale (corresponding to semantic interpretations such has “A is much more important than B” regarding a criterion). After normalization, the priorities are determined using either Eigen vectors or a simplified version with weighted sum (SAW) [ 9 , 10 ].
AHP involves five main steps [ 13 ]: Step 1: Decompose the problem into a hierarchical structure; Step 2: Employ pairwise comparisons. A pairwise comparison is the process of comparing the relative importance, preference, or likelihood of two elements (objectives) with respect to another element (the goal). Pairwise comparisons are carried out to establish priorities. Decision elements at each hierarchy level are compared pairwisely and then the reciprocal matrix is completed; Step 3: Determine the logical consistency and if >10 % revise the pairwise classifications until the consistency index is below 10 %. In the implementation of AHP, we may face with inconsistent judgment of input data that it may cause some bad effects on decision process. For example, A1 may be preferred to A2 and A2 to A3, but A3 may be preferred to A1. So, Saaty [ 7 ] defined a measure of deviation from consistency that is called a consistency index, as: \( {\text{C}} . {\text{I}}. = (\uplambda\hbox{max} - {\text{N}})/\left( {{\text{N}} - 1} \right) \) , where N is the dimension of the matrix and \( \uplambda \) is the largest eigenvalue of the matrix A. Then, Saaty calculated a consistency ratio (C.R.) as the ratio of the C.I. to a random index (R.I.) which is the average C.I. of sets of judgments (from a 1 to 9 scale) for randomly generated reciprocal matrices [ 7 ]. Step 4: Estimate the relative weights by combining the individual subjective judgments. We can use the eigenvalue method to estimate the relative weights of the decision elements. In order to estimate the relative weight of the decision elements in a matrix, we can use A.W = λmax.W where W is the weight of criterion [ 13 ]. Step 5: Determine the priority of alternatives by doing aggregation on relative weights which is obtained by combining the criterion priorities and priorities of each decision alternatives relative to each criterion. Since in our work we discuss the suitability of normalization techniques for the AHP method, we focus on step 4 and 5.
In this work we propose an assessment approach for evaluating five common normalization techniques (see Table 1 ), using an illustrative example solved with AHP method [ 1 , 2 ]. We choose AHP because it is a well-known and widely used MCDM method [ 6 – 13 ] but we plan to perform the same study for other MCDM methods in the future. Our novel assessment approach calculating Pearson correlation for global weight of alternatives and Spearman correlation for rank of alternatives which are borrowed from [ 14 ] to determine mean values in order to ensure a more robust evaluation and selection of the best normalization technique in AHP. The novelty of this study is making adaptation between assessment process and AHP in order to find best normalization technique for AHP method. The next section presents the experimental study performed.
2 Relationship to Cyber-Physical Systems
Cyber-physical systems (CPS) involve merging computation and physical processes, often denoted as embedded systems [ 15 ]. In most CPS, the physical inputs and outputs are typically designed as a network of interacting elements. This conceptual model is tied to the notion of robotics and sensor networks and their usage has been increasing day by day [ 16 ]. But CPS also inherits ideas from the areas of embedded and real-time systems. CPS have a broad scope of potential application in areas such as reactive interventions (e.g., collision avoidance); precision operations (e.g., robotic surgery and nano-level manufacturing); operation in dangerous or inaccessible environments (e.g., search and rescue, firefighting, and deep-sea exploration); complex systems coordination (e.g., air traffic control, war fighting); efficiency (e.g., zero-net energy buildings); and augmentation of human capabilities (e.g., healthcare monitoring and service delivery) [ 16 ], to name a few.
There are some discussions on the relationship between Cyber-Physical Systems and Internet of Things [ 17 – 19 ]. Camarinha and Afsarmanesh [ 5 ] mention that “there is a growing convergence between the two areas since CPSs are becoming more Internet-based”. For example, in smart car parking, data from the parking space is transferred to the car drivers with the help of CPS and IoT technologies. Data is collected from sensors, which are installed in the parking lot, and transferred to the data center to be processed with MCDM methods, to determine the ranking of alternatives (best parking spaces). The best parking spaces are provided to the car drivers to support them making more informed decisions. In the illustrative example section, we will compare several normalization techniques for usage with the AHP method to rank alternatives and support car drivers. The smart car parking example shows a robust relationship between cyber physical system (CPS), Internet of Thing (IoT) and multi-criteria decision making (MCDM) concepts.
3 Normalization
There are several definitions for data normalization, depending on the study domain. For example, in Databases, data normalization is viewed as a process where data attributes, within a data model, are organized in tables to increase the cohesion and efficiency of managing data. In statistics and its applications, the most common definition is the process of adjusting values measured on different scales to a common scale, often prior to aggregating or averaging them [ 19 ]. Many other definitions exist, depending on the context or study domain (see for example [ 20 ]). Here we focus on normalization techniques for MCDM. In general, normalization in MCDM is a transformation process to obtain numerical and comparable input data by using a common scale [ 4 ]. After collecting input data, we must do some pre-processing to ensure comparability of criteria, thus making it useful for decision modeling. Furthermore, in MCDM, normalization techniques usually map attributes (criteria) with different measurement units to a common scale in the interval [0-1] [ 21 , 22 ]. Several studies on the effects of normalization techniques on the ranking of alternatives in MCDM problems have shown that certain techniques are more suitable for specific decision methods than others [ 14 , 23 – 28 ].
Chakraborty and Yeh [ 23 ] analyzed four normalization techniques (vector, linear max-min, linear max and linear sum) in the MCDM simple additive weight (SAW) method. They used a ranking consistency index (RCI) and calculated the average deviation for each normalization technique and concluded that the best normalization technique for SAW is the vector normalization. Further, the same authors analyzed the effects of those normalizations for order preference by similarity to ideal solution method (TOPSIS) by calculating ranking consistency and weight sensitivity of each normalization and proved that vector normalization technique is the best for implementing in TOPSIS method [ 24 ]. The authors [ 24 ] defined weight sensitivity as a method to analyze sensitivity level of different normalization procedures under different problem settings. They assumed same weights for attributes and then they increased their weights to find the sensitivity of the alternatives (normalization techniques) [ 24 ].
Also, the result was further validated by Vafaei et al. [ 4 ], who used Pearson and Spearman correlation coefficients to also conclude that the best normalization technique for TOPSIS method is the vector normalization.
In this work, we selected five (shown in Table 1 ) of the most promising normalization techniques [ 2 , 14 ] and analyzed their effect on the AHP method. In Table 1 , each normalization method is divided in two formulas, one for benefit and another for cost criteria, to ensure that the final decision objective (rating) is logically correct, i.e. when it is a benefit criterion for high values it will correspond to high normalized values (maximization - benefit) and when it is a cost criterion high values will correspond to low normalized values (minimization - cost).
Summarizing, the aim of this study is to identify which normalization technique is best suited for the AHP method.
4 Experimental Study with a Smart Car Parking Example
Here, we discuss the suitability of five normalization techniques for AHP with an illustrative example for smart car parking. This illustrative case consists of 3 criteria (C1, C2, C3), which correspond to time to park, distance, and size of the parking space, and 7 alternatives (A1, A2, …, A7), which correspond to candidate location sites for parking. Finding the best place for parking the car is the goal; C1 and C2 are cost criteria, where low values are better, and C3 is a benefit criterion, where high values are desirable. Following the AHP method we defined three pairwise comparison matrices for each criterion (example in Table 2 ) and then one pairwise comparison matrix between criteria. To these four matrices we applied the five normalization techniques, separately, to determine the ranking of alternatives and compare results. The pairwise decision matrix for criteria “time to park”, after steps 1, 2, 3 of AHP, is shown in Table 2 .
We started by testing the sum-based normalization (N3 in Table 1 ), the usual normalization technique for AHP [ 7 ], because it ensures column sum per alternative is equal to one that is defined by Saaty [ 7 ]. The other normalization techniques do not include this characteristic and the sum of the normalized values can be bigger than 1; hence, for comparison purposes we opted for re-normalizing the other four using N3. For illustrating the alternatives rating procedure we show the calculation for vector normalization of alternative A1 and the final results for all alternatives are shown in the Tables 3 and 4 :
The global weights of alternatives and ranking results for the four tested normalization techniques are shown in Table 5 . We discarded the logarithmic normalization technique from our results because we obtained negative and infinite data (due to the characteristics of pairwise matrices), hence it is not usable (appropriate) for the AHP method. As it can be seen in Table 5 , there is consensus on which normalization techniques is better for alternatives A2, A3, A4 and A5 (i.e. they all have the same ranking), but for the other alternatives there was no consensus. Since, it is not possible to distinguish which is the best normalization technique just by looking at the results, we used the evaluation approach proposed in [ 4 ] to make the assessment. Hence, we calculated Pearson correlation and mean r s values [ 4 ] with the global weights of alternatives and Spearman correlation with the ranks of alternatives to assess the suitability of the four tested normalization techniques for the AHP method. Table 6 displays that there exists complete consensus between Pearson and Spearman correlation’s results and it is clear that the best normalization technique is N1 (linear: max) because it has the highest mean r s value (P = 0.9606 & S = 0.9524) and the worst one is N3 (linear sum) with the lowest mean r s value (P = 0.9029 & S = 0.8413).
From the example we can conclude that linear max (N1) is the best normalization technique for the AHP method and linear sum (N3) is the worst one. It is interesting to note that the single normalization used in AHP (linear sum- N3) is the worst one from this comparison study. Although N1 is elected as the most suitable normalization technique it required a re-normalization with N3 because the sum of the normalized values has to be 1. Therefore, we may conclude that a combination of max-normalization (N1) with linear-sum (N3) seems the most appropriate for AHP.
5 Conclusion
Normalization is the first step of any decision making process to transform data in different units into a common scale and comparable units. In this study we tested five common normalization techniques to assess their suitability for the AHP MCDM method. The tests showed that the logarithmic normalization technique (N5) is not usable in the AHP method because it can result in zero or infinite values in the normalized data, which is not acceptable to use in the method. Further, since AHP requires the columns of the pairwise matrices to sum up 1, the techniques: linear max, linear max-min and vector normalization techniques had to be re-normalized with linear sum (N3) before being compared. To assess the suitability of the normalization techniques for AHP we used Pearson and Spearman correlation and mean r s values; the results showed that the best normalization technique is N1 (linear: max) combined with N3 (linear-sum) to ensure the sum is 1, while the worst one is N3 alone.
In a previous work we did the same assessment study for TOPSIS and in the future we plan to extend it to other well-known MCDM methods, with the aim to support decision makers by recommending the most suitable normalization techniques for usage with each MCDM method.
Triantaphyllou, E.: Multi-criteria decision making methods. In: Multi-criteria Decision Making Methods: A Comparative Study, vol. 44, pp. 5–21. Springer, New York (2000)
Google Scholar
Jahan, A., Edwards, K.L.: A state-of-the-art survey on the influence of normalization techniques in ranking: improving the materials selection process in engineering design. Mater. Des. 65 (2015), 335–342 (2014)
Nayak, S.C., Misra, B.B., Behera, H.S.: Impact of data normalization on stock index forecasting. Int. J. Comput. Inf. Syst. Ind. Manage. Appl. 6 (2014), 257–269 (2014)
Vafaei, N., Ribeiro, R.A., Camarinha-Matos, L.M.: Data normalization techniques in decision making: case study with TOPSIS method. Int. J. Inf. Decis. Sci. (2016, in press)
Camarinha-Matos, L.M., Afsarmanesh, H.: Collaborative systems for smart environments: trends and challenges. Collab. Syst. Smart Networked Environ. 434 (2014), 3–14 (2014)
Saaty, T.L.: A scaling method for priorities in hierarchical structures. J. Math. Psychol. 15 (3), 234–281 (1977)
CrossRef MathSciNet MATH Google Scholar
Saaty, T.L.: The Analytic Hierarchy Process. McGraw-Hill, New York (1980)
MATH Google Scholar
Cheng, C.-H., Yang, K.-L., Hwang, C.-L.: Evaluating attack helicopters by AHP based on linguistic variable weight. Eur. J. Oper. Res. 116 (1999), 423–435 (1999)
CrossRef MATH Google Scholar
Gaudenzi, B., Borghesi, A.: Managing risks in the supply chain using the AHP method. Int. J. Logist. Manage. 17 (1), 114–136 (2006)
CrossRef Google Scholar
Zahedi, F.: The analytic hierarchy process – a survey of the method and its applications. Interfaces (Providence) 16 (4), 96–108 (1986)
CrossRef MathSciNet Google Scholar
Tuzkaya, G., Onut, S., Tuzkaya, U.R., Gulsun, B.: An analytic network process approach for locating undesirable facilities: an example from Istanbul, Turkey. J. Environ. Manage. 88 (2008), 970–983 (2007)
Saaty, T.L., Vargas, L.G.: Models, Methods, Concepts & Applications of the Analytic Hierarchy Process. Institute for Operations Research and the Management Sciences, Maryland (2006)
Tzeng, G.-H., Huang, J.-J.: Multiple Attribute Decision Making: Methods and Applications. Taylor & Francis Group, Boca Raton (2011)
Celen, A.: Comparative analysis of normalization procedures in TOPSIS method: with an application to Turkish deposit banking market. INFORMATICA 25 (2), 185–208 (2014)
Shi, J., Wan, J., Yan, H., Suo, H.: A survey of cyber-physical systems. In: 2011 International Conference on Wireless Communications and Signal Processing (WCSP), pp. 1–6 (2011)
Wiki1: Cyber-physical system. https://en.wikipedia.org/wiki/Cyber-physical_system . Accessed 02 November 2015
Camarinha-Matos, L.M., Tomic, S., Graça, P. (eds.) Technological Innovation for the Internet of Things, vol. 394. Springer, Heidelberg (2013)
Camarinha-Matos, L.M., Tomic, S., Graça, P. (eds.): DoCEIS 2013. IFIP AICT, vol. 394. Springer, Heidelberg (2013)
Jeschke, S.: Everything 4.0? In: Drivers and Challenges of Cyber Physical Systems (2013)
Wiki2: Normalization (statistics). https://en.wikipedia.org/wiki/Normalization_%28statistics%29 . Accessed 15 October 2015
Wiki3: Normalization. https://en.wikipedia.org/wiki/Normalization . Accessed 15 October 2015
Pavlicic, D.M.: Normalization affects the results of MADM methods. Yugosl. J. Oper. Res. 11 (2011), 251–265 (2011)
Etzkorn, B.: Data normalization and standardization. http://www.benetzkorn.com/2011/11/data-normalization-and-standardization/ . Accessed 28 April 2015
Chakraborty, S., Yeh, C.-H.: A simulation based comparative study of normalization procedures in multiattribute decision making. In: International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, pp. 102–109 (2007)
Chakraborty, S., Yeh, C.-H.: A simulation comparison of normalization procedures for TOPSIS. In: Computers Industrial Engineering, pp. 1815–1820 (2009)
Chakraborty, S., Yeh, C.-H.: Rank similarity based MADM method selection. In: International Conference on Statistics in Science, Business and Engineering (ICSSBE 2012) (2012)
Milani, A.S., Shanian, A., Madoliat, R., Nemes, J.A.: The effect of normalization norms in multiple attribute decision making models: a case study in gear material selection. Struct. Multidiscip. Optim. 29 (4), 312–318 (2004)
Wang, Y.-M., Luo, Y.: Integration of correlations with standard deviations for determining attribute weights in multiple attribute decision making. Math. Comput. Model. 51 (2010), 1–12 (2010)
Download references
Acknowledgements
This work was partially funded by FCT Strategic Program UID/EEA/00066/203 of Computational Intelligence Group of CTS/UNINOVA.
Author information
Authors and affiliations.
CTS/UNINOVA and Faculty of Sciences and Technology, NOVA University of Lisbon, 2829-516, Caparica, Portugal
Nazanin Vafaei, Rita A. Ribeiro & Luis M. Camarinha-Matos
You can also search for this author in PubMed Google Scholar
Corresponding author
Correspondence to Nazanin Vafaei .
Editor information
Editors and affiliations.
NOVA University of Lisbon, Monte da Caparica, Portugal
Luis M. Camarinha-Matos
António J. Falcão
Nazanin Vafaei
Shirin Najdi
Rights and permissions
Reprints and Permissions
Copyright information
© 2016 IFIP International Federation for Information Processing
About this paper
Cite this paper.
Vafaei, N., Ribeiro, R.A., Camarinha-Matos, L.M. (2016). Normalization Techniques for Multi-Criteria Decision Making: Analytical Hierarchy Process Case Study. In: Camarinha-Matos, L.M., Falcão, A.J., Vafaei, N., Najdi, S. (eds) Technological Innovation for Cyber-Physical Systems. DoCEIS 2016. IFIP Advances in Information and Communication Technology, vol 470. Springer, Cham. https://doi.org/10.1007/978-3-319-31165-4_26
Download citation
DOI : https://doi.org/10.1007/978-3-319-31165-4_26
Publisher Name : Springer, Cham
Print ISBN : 978-3-319-31164-7
Online ISBN : 978-3-319-31165-4
eBook Packages : Computer Science Computer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Find a journal
- Publish with us
- Machine Learning
- Español – América Latina
- Português – Brasil
- Tiếng Việt
- Foundational courses
Normalization
The goal of normalization is to transform features to be on a similar scale. This improves the performance and training stability of the model.
Normalization Techniques at a Glance
Four common normalization techniques may be useful:
- scaling to a range
- log scaling
The following charts show the effect of each normalization technique on the distribution of the raw feature (price) on the left. The charts are based on the data set from 1985 Ward's Automotive Yearbook that is part of the UCI Machine Learning Repository under Automobile Data Set .
Figure 1. Summary of normalization techniques.
Scaling to a range
Recall from MLCC that scaling means converting floating-point feature values from their natural range (for example, 100 to 900) into a standard range—usually 0 and 1 (or sometimes -1 to +1). Use the following simple formula to scale to a range:
\[ x' = (x - x_{min}) / (x_{max} - x_{min}) \]
Scaling to a range is a good choice when both of the following conditions are met:
- You know the approximate upper and lower bounds on your data with few or no outliers.
- Your data is approximately uniformly distributed across that range.
A good example is age. Most age values falls between 0 and 90, and every part of the range has a substantial number of people.
In contrast, you would not use scaling on income, because only a few people have very high incomes. The upper bound of the linear scale for income would be very high, and most people would be squeezed into a small part of the scale.
Feature Clipping
If your data set contains extreme outliers, you might try feature clipping, which caps all feature values above (or below) a certain value to fixed value. For example, you could clip all temperature values above 40 to be exactly 40.
You may apply feature clipping before or after other normalizations.
Formula: Set min/max values to avoid outliers.
Figure 2. Comparing a raw distribution and its clipped version.
Another simple clipping strategy is to clip by z-score to +-Nσ (for example, limit to +-3σ). Note that σ is the standard deviation.
Log Scaling
Log scaling computes the log of your values to compress a wide range to a narrow range.
\[ x' = log(x) \]
Log scaling is helpful when a handful of your values have many points, while most other values have few points. This data distribution is known as the power law distribution. Movie ratings are a good example. In the chart below, most movies have very few ratings (the data in the tail), while a few have lots of ratings (the data in the head). Log scaling changes the distribution, helping to improve linear model performance.
Figure 3. Comparing a raw distribution to its log.
Z-score is a variation of scaling that represents the number of standard deviations away from the mean. You would use z-score to ensure your feature distributions have mean = 0 and std = 1. It’s useful when there are a few outliers, but not so extreme that you need clipping.
The formula for calculating the z-score of a point, x , is as follows:
\[ x' = (x - μ) / σ \]
Figure 4. Comparing a raw distribution to its z-score distribution.
Notice that z-score squeezes raw values that have a range of ~40000 down into a range from roughly -1 to +4.
Suppose you're not sure whether the outliers truly are extreme. In this case, start with z-score unless you have feature values that you don't want the model to learn; for example, the values are the result of measurement error or a quirk.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-07-18 UTC.
Case Study Analysis: Examples + How-to Guide & Writing Tips
A case study analysis is a typical assignment in business management courses. The task aims to show high school and college students how to analyze a current situation, determine what problems exist, and develop the best possible strategy to achieve the desired outcome.
Our specialists will write a custom essay on any topic for 13.00 10.40/page
Many students feel anxious about writing case analyses because being told to analyze a case study and provide a solution can seem like a big task. That is especially so when working with real-life scenarios. However, you can rest assured writing a case analysis paper is easier than you think. Just keep reading this article and you will find case study examples for students and the advice provided by Custom-writing experts!
- 👣 Main Steps
- 🕵 Preparing the Case
🔬 Analyzing the Case
- 📑 Format & Structure
- 🙅 Things to Avoid
- 🏁 Conclusion
🔗 References
👣 writing a case study analysis: main steps.
Business management is built on case analysis. Every single economic result shows that the methods and instruments employed were either well-timed and expedient, in the event of success, or not, in case of failure. These two options indicate whether the strategy is efficient (and should be followed) or requires corrections (or complete change). Such an approach to the case study will make your writing piece more proficient and valuable for the reader. The following steps will direct your plan for writing a case study analysis.
Step 1: Preliminary work
- Make notes and highlight the numbers and ideas that could be quoted.
- Single out as many problems as you can, and briefly mark their underlying issues. Then make a note of those responsible. In the report, you will use two to five of the problems, so you will have a selection to choose from.
- Outline a possible solution to each of the problems you found. Course readings and outside research shall be used here. Highlight your best and worst solution for further reference.

Step 2: Drafting the Case
- Provide a general description of the situation and its history.
- Name all the problems you are going to discuss.
- Specify the theory used for the analysis.
- Present the assumptions that emerged during the analysis, if any.
- Describe the detected problems in more detail.
- Indicate their link to, and effect on, the general situation.
- Explain why the problems emerged and persist.
- List realistic and feasible solutions to the problems you outlined, in the order of importance.
- Specify your predicted results of such changes.
- Support your choice with reliable evidence (i.e., textbook readings, the experience of famous companies, and other external research).
- Define the strategies required to fulfill your proposed solution.
- Indicate the responsible people and the realistic terms for its implementation.
- Recommend the issues for further analysis and supervision.
Step 3: Finalizing the Case
Like any other piece of writing, a case analysis requires post-editing. Carefully read it through, looking for inconsistencies and gaps in meaning. Your purpose is to make it look complete, precise, and convincing.
🕵 Preparing a Case for Analysis
Your professor might give you various case study examples from which to choose, or they may just assign you a particular case study. To conduct a thorough data analysis, you must first read the case study. This might appear to be obvious. However, you’d be surprised at how many students don’t take adequate time to complete this part.
Read the case study very thoroughly, preferably several times. Highlight, underline, flag key information, and make notes to refer to later when you are writing your analysis report.
If you don’t have a complete knowledge of the case study your professor has assigned, you won’t conduct a proper analysis of it. Even if you make use of a business case study template or refer to a sample analysis, it won’t help if you aren’t intimately familiar with your case study.
You will also have to conduct research. When it comes to research, you will need to do the following:
- Gather hard, quantitative data (e.g. 67% of the staff participated in the meeting).
- Design research tools , such as questionnaires and surveys (this will aid in gathering data).
- Determine and suggest the best specific, workable solutions.
It would be best if you also learned how to analyze a case study. Once you have read through the case study, you need to determine the focus of your analysis. You can do this by doing the following:
Compare your chosen solutions to the solutions offered by the experts who analyzed the case study you were given or to online assignments for students who were dealing with a similar task. The experts’ solutions will probably be more advanced than yours simply because these people are more experienced. However, don’t let this discourage you; the whole point of doing this analysis is to learn. Use the opportunity to learn from others’ valuable experience, and your results will be better next time.
If you are still in doubt, the University of South Carolina offers a great guide on forming a case study analysis.
Receive a plagiarism-free paper tailored to your instructions.
📑 Case Analysis Format & Structure
When you are learning how to write a case study analysis, it is important to get the format of your analysis right. Understanding the case study format is vital for both the professor and the student. The person planning and handing out such an assignment should ensure that the student doesn’t have to use any external sources .
In turn, students have to remember that a well-written case analysis provides all the data, making it unnecessary for the reader to go elsewhere for information.
Regardless of whether you use a case study template, you will need to follow a clear and concise format when writing your analysis report. There are some possible case study frameworks available. Still, a case study should contain eight sections laid out in the following format:
- Describe the purpose of the current case study;
- Provide a summary of the company;
- Briefly introduce the problems and issues found in the case study
- Discuss the theory you will be using in the analysis;
- Present the key points of the study and present any assumptions made during the analysis.
- Present each problem you have singled out;
- Justify your inclusion of each problem by providing supporting evidence from the case study and by discussing relevant theory and what you have learned from your course content;
- Divide the section (and following sections) into subsections, one for each of your selected problems.
- Present a summary of each problem you have identified;
- Present plausible solutions for each of the problems, keeping in mind that each problem will likely have more than one possible solution;
- Provide the pros and cons of each solution in a way that is practical.
- Conclusion . This is a summary of your findings and discussion.
- Decide which solution best fits each of the issues you identified;
- Explain why you chose this solution and how it will effectively solve the problem;
- Be persuasive when you write this section so that you can drive your point home;
- Be sure to bring together theory and what you have learned throughout your course to support your recommendations.
- Provide an explanation of what must be done, who should take action, and when the solution should be carried out;
- Where relevant, you should provide an estimate of the cost in implementing the solution, including both the financial investment and the cost in terms of time.
- References. While you generally do not need to refer to many external sources when writing a case study analysis, you might use a few. When you do, you will need to properly reference these sources, which is most often done in one of the main citation styles, including APA, MLA, or Harvard. There is plenty of help when citing references, and you can follow these APA guidelines , these MLA guidelines , or these Harvard guidelines .
- Appendices. This is the section you include after your case study analysis if you used any original data in the report. These data, presented as charts, graphs, and tables, are included here because to present them in the main body of the analysis would be disruptive to the reader. The University of Southern California provides a great description of appendices and when to make use of them.
When you’ve finished your first draft, be sure to proofread it. Look not only for potential grammar and spelling errors but also for discrepancies or holes in your argument.
You should also know what you need to avoid when writing your analysis.
Just 13.00 10.40/page , and you can get an custom-written academic paper according to your instructions
🙅 Things to Avoid in Case Analysis
Whenever you deal with a case study, remember that there are some pitfalls to avoid! Beware of the following mistakes:
- Excessive use of colloquial language . Even though it is a study of an actual case, it should sound formal.
- Lack of statistical data . Give all the important data, both in percentages and in numbers.
- Excessive details. State only the most significant facts, rather than drowning the reader in every fact you find.
- Inconsistency in the methods you have used . In a case study, theory plays a relatively small part, so you must develop a specific case study research methodology.
- Trivial means of research . It is critical that you design your own case study research method in whatever form best suits your analysis, such as questionnaires and surveys.
It is useful to see a few examples of case analysis papers. After all, a sample case study report can provide you with some context so you can see how to approach each aspect of your paper.
👀 Case Study Examples for Students
It might be easier to understand how a case study analysis works if you have an example to look at. Fortunately, examples of case studies are easy to come by. Take a look at this video for a sample case study analysis for the Coca-Cola Company.
If you want another example, then take a look at the one below!
Business Case Analysis: Example
CRM’s primary focus is customers and customer perception of the brand or the company. The focus may shift depending on customers’ needs. The main points that Center Parcs should consider are an increase in customer satisfaction and its market share. Both of these points will enhance customer perception of the product as a product of value. Increased customer satisfaction will indicate that the company provides quality services, and increased market share can reduce the number of switching (or leaving) customers, thus fostering customer loyalty.
Case Study Topics
- Equifax case study: the importance of cybersecurity measures .
- Study a case illustrating ethical issues of medical research.
- Examine the case describing the complications connected with nursing and residential care.
- Analyze the competitive strategy of Delta Airlines .
- Present a case study of an ethical dilemma showing the conflict between the spirit and the letter of the law.
- Explore the aspects of Starbucks’ marketing strategyin a case study.
- Research a case of community-based clinic organization and development.
- Customer service of United Airlines: a case study .
- Analyze a specific schizophrenia case and provide your recommendations.
- Provide a case study of a patient with hyperglycemia.
- Examine the growth strategy of United Healthcare.
- Present a case study demonstrating ethical issues in business .
- Study a case of the 5% shareholding rule application and its impact on the company.
- Case study of post-traumatic stress disorder .
- Analyze a case examining the issues of cross-cultural management .
- Write a case study exploring the ethical issues the finance manager of a long-term care facility can face and the possible reaction to them.
- Write a case study analyzing the aspects of a new president of a firm election.
- Discuss the specifics of supply chain management in the case of Tehindo company.
- Study a case of a life crisis in a family and the ways to cope with it.
- Case study of Tea Leaves and More: supply chain issues .
- Explore the case of ketogenic diet implementation among sportspeople.
- Analyze the case of Webster Jewelry shop and suggest some changes.
- Examine the unique aspects of Tea and More brand management .
- Adidas case study: an ethical dilemma .
- Research the challenges of Brazos Valley Food Bank and suggest possible solutions.
- Describe the case of dark web monitoring for business.
- Study a case of permissive parenting style .
- Case study of Starbucks employees .
- Analyze a case of workplace discrimination and suggest a strategy to avoid it.
- Examine a case of the consumer decision-making process and define the factors that influence it.
- Present a case study of Netflix illustrating the crucial role of management innovation for company development.
- Discuss a case describing a workplace ethical issue and propose ways to resolve it.
- Case study of the 2008 financial crisis: Graham’s value investing principles in the modern economic climate.
- Write a case study analyzing the harmful consequences of communication issues in a virtual team .
- Analyze a case that highlights the importance of a proper functional currency choice.
- Examine the case of Hitachi Power Systems management.
- Present a case study of medication research in a healthcare facility.
- Study the case of Fiji Water and the challenges the brand faces.
- Research a social problem case and suggest a solution.
- Analyze a case that reveals the connection between alcohol use and borderline personality disorder .
- Transglobal Airline case study: break-even analysis.
- Examine the case of Chiquita Brands International from the moral and business ethics points of view.
- Present a case study of applying for Social Security benefits.
- Study the case of a mass hacker attack on Microsoft clients and suggest possible ways to prevent future attacks.
- Case study of leadership effectiveness .
- Analyze a case presenting a clinical moral dilemma and propose ways to resolve it.
- Describe the case of Cowbell Brewing Company and discuss the strategy that made them successful.
- Write a case study of WeWork company and analyze the strengths and weaknesses of its strategy.
- Case study of medical ethical decision-making.
- Study the case of The Georges hotel and suggest ways to overcome its managerial issues.
🏁 Concluding Remarks
Writing a case study analysis can seem incredibly overwhelming, especially if you have never done it before. Just remember, you can do it provided you follow a plan, keep to the format described here, and study at least one case analysis example.
If you still need help analyzing a case study, your professor is always available to answer your questions and point you in the right direction. You can also get help with any aspect of the project from a custom writing company. Just tackle the research and hand over the writing, write a rough draft and have it checked by a professional, or completely hand the project off to an expert writer.
Regardless of the path you choose, you will turn in something of which you can be proud!
✏️ Case Study Analysis FAQ
Students (especially those who study business) often need to write a case study analysis. It is a kind of report that describes a business case. It includes multiple aspects, for example, the problems that exist, possible solutions, forecasts, etc.
There should be 3 main points covered in a case study analysis:
- The challenge(s) description,
- Possible solutions,
- Outcomes (real and/or foreseen).
Firstly, study some examples available online and in the library. Case study analysis should be a well-structured paper with all the integral components in place. Thus, you might want to use a template and/or an outline to start correctly.
A case study analysis is a popular task for business students. They typically hand it in the format of a paper with several integral components:
- Description of the problem
- Possible ways out
- Results and/or forecasts
Students sometimes tell about the outcome of their research within an oral presentation.
- Case Study: Academia
- Windows of vulnerability: a case study analysis (IEEE)
- A (Very) Brief Refresher on the Case Study Method: SAGE
- The case study approach: Medical Research Methodology
- Strengths and Limitations of Case Studies: Stanford University
- A Sample APA Paper: Radford University
- How to Write a Case Study APA Style: Seattle PI
- The Case Analysis: GVSU
- How to Outline: Purdue OWL
- Incorporating Interview Data: UW-Madison Writing Center
- Share to Facebook
- Share to Twitter
- Share to LinkedIn
- Share to email
Quite an impressive piece The steps and procedures outlined here are well detailed and the examples facilitates understanding.

Thank you for the great feedback, Collins!
it was very helpful. I have an assessment to write where in I need to mention different effective components that are needed to compile a high quality case study assessment.
Glad you liked our article! Your opinion means so much for us!
It is very important and helpful.
Thanks for the feedback, Abraham! Your opinion is very important for us!
Thanks a lot. A knowledge shared with a structured template. Stay the course
Thanks for the feedback!
Thanks for this valuable knowledge.I loved this. keep sharing. to know more about click Air India Case Study – Why Air India failed ?
We are glad to hear your opinion! Thank you for your feedback!
This is going to be a great help in my monthly analysis requirements for my subject. Thank you so much.
Thank you very much for this insightful guidelines… It has really been a great tool for writing my project. Thanks once again.
We are so glad to hear that! Thank you for your feedback!
This article was very helpful, even though I’ll have a clearer mind only after I do the case study myself but I felt very much motivated after reading this, as now I can at least have a plan of what to do compared to the clueless me I was before I read it. I hope if I have any questions or doubts about doing a case study I can clear it out here.
Sarmistha, I’m happy you found the article helpful 🙂
Recommended for You

10 Research Paper Hacks: Tips for Writing a Research Paper
So, have you been recently assigned a research project? Or, even worse, is it already due soon? The following research paper hacks will help you do it in record time. In the article, you’ll see ten things you can do to conduct a study and compose a piece like a...

An Impressive Persuasive Speech Outline: Examples & Guide
Eating a delicacy, watching a good movie, and proving a point to an audience are the three things that make life seem better. Today, you’ll deal with the last one. You’re about to become a professional at public speaking and attention grabbing. Here, you can learn how to write a...
![write a case study with an example to demonstrate the different types of normalization Library Research Paper: Example & Writing Guide [2023]](https://custom-writing.org/blog/wp-content/uploads/2021/01/library-with-books-284x122.jpeg)
Library Research Paper: Example & Writing Guide [2023]
What is a library research paper? It’s nothing more than an academic writing project that summarizes the information on a specific topic taken from primary and secondary sources. There are numerous library research examples you can find online. But to complete this assignment, you should simply follow these essential steps:...
![write a case study with an example to demonstrate the different types of normalization Research Analysis Paper: How to Analyze a Research Article [2023]](https://custom-writing.org/blog/wp-content/uploads/2021/01/opened-book-library-table-284x153.jpg)
Research Analysis Paper: How to Analyze a Research Article [2023]
Do you need to write a research analysis paper but have no idea how to do that? Then you’re in the right place. While completing this type of assignment, your key aim is to critically analyze a research article. An article from a serious scientific journal would be a good...

How to Write a Conclusion for a Research Paper: Examples & Tips
You might be wondering about how to write a conclusion paragraph for a research paper. It may seem like your readers should understand your main arguments by the end, so there is no need for it. However, there are several aspects that prove the importance of a conclusion section in...
![write a case study with an example to demonstrate the different types of normalization American Antiquity Style Guide: Citation Rules & Examples [2023]](https://custom-writing.org/blog/wp-content/uploads/2020/12/education-concept-books-laptop-library-284x153.jpg)
American Antiquity Style Guide: Citation Rules & Examples [2023]
American Antiquity is a professional quarterly journal, which contains various papers on the American archeology. It is incredibly popular among archeologists and the students majoring in history. The organization adopted the rules of The Society for American Archaeology (SAA) citation style. As a result: The journal includes numerous references that...
- Silver Bee Group
- [email protected]

- NEW SOLUTION
- Top Visitors
- Popular Topics
- Newest Members
- Newest Papers
- Top Donators
Normalisation

- University Login
Recent Topics
- kudler fine foods
New Entries
- Quality Parts Company
- Lincoln Electric
- Vêtements Ltée
- Google Case Analysis
Most Recent Request
- oilwell cable comp
- research methods
- human resource sho
- toyota adopts a st
Ease your MBA workload and get more time for yourself

IMAGES
VIDEO
COMMENTS
There are seven different types, which you'll see used in the explanation of the various normalizations: Candidate Key Primary Key Foreign Key Super Key Alternate Key Composite Key Unique Key Database Normalization Example
Database Normalization: A Step-By-Step-Guide With Examples 62 Comments / Last updated: September 26, 2022 Database normalisation is a concept that can be hard to understand. But it doesn't have to be. In this article, I'll explain what normalisation in a DBMS is and how to do it, in simple terms.
Database Normalization With Examples 1NF (First Normal Form) Rules What is a KEY in SQL 2NF (Second Normal Form) Rules Database - Foreign Key Why do you need a foreign key? What are transitive functional dependencies? 3NF (Third Normal Form) Rules BCNF (Boyce-Codd Normal Form) 4NF (Fourth Normal Form) Rules 5NF (Fifth Normal Form) Rules
Following are the various types of Normal forms: Normal Form. Description. 1NF. A relation is in 1NF if it contains an atomic value. 2NF. A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the primary key. 3NF. A relation will be in 3NF if it is in 2NF and no transition dependency exists.
Database Normalization Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF) SQL Concept SQL Introduction DDL Command Create query Alter query Truncate, Drop and Rename query DML Command INSERT command TCL Command All TCL Command
Normalization is the branch of relational theory that provides design insights. It is the process of determining how much redundancy exists in a table. The goals of normalization are to: Be able to characterize the level of redundancy in a relational schema. Provide mechanisms for transforming schemas in order to remove redundancy.
Normalization Process Example In this example, we will look into that how we can normalize the database with the help of different Normal Forms. We will look into each of the Normal Forms separately in this example. Consider the table 1 shown below: Now, we are re-structuring the table according to the 1st Normal Form. Rules of 1st Normal Form
We'll also take a look at the types of normalization - 1NF, 2NF, 3NF - with examples. What We'll Cover What is Database Normalization? What is the Purpose of Normalization? What is 1NF 2NF and 3NF? The First Normal Form - 1NF The Third Normal Form - 3NF Examples of 1NF, 2NF, and 3NF What is Database Normalization?
In this article, we use one of the entities from our case study and perform a database normalization example. We start with the products table, unnormalized, and progress through the first, second, and third normal forms. Finally, we show the results of applying the same techniques to the entire Entity Relationship Diagram for our online store.
This article explains database normalization and how to normalize a database through a hands-on example. What is Database Normalization? Database normalization is a technique for creating database tables with suitable columns and keys by decomposing a large table into smaller logical units.
Normalization is the process of minimizing redundancy from a relation or set of relations. Redundancy in relation may cause insertion, deletion, and update anomalies. So, it helps to minimize the redundancy in relations. Normal forms are used to eliminate or reduce redundancy in database tables. Normalization of DBMS
For example, in 4NF, all dependencies between multiple values are deleted. However, technical and complex processes are not to be confused with "better" processes. A lower level of technicality can work just fine. In some cases, you might choose a combination of rules from different normalization types.
Database normalization is the process of organizing the attributes of the database to reduce or eliminate data redundancy (having the same data but at different places).. Problems because of data redundancy: Data redundancy unnecessarily increases the size of the database as the same data is repeated in many places.Inconsistency problems also arise during insert, delete and update operations.
Database normalization is a process used to organize a database into tables and columns. There are three main forms: first normal form , second normal form, and third normal form. The main idea is each table should be about a specific topic and only supporting topics included. Take a spreadsheet containing the information as an example, where the data contains salespeople and customers serving ...
Database normalization is the process of organizing a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by Edgar F. Codd. "Normal Forms" (NF) are the different stages of normalization. 1 NF (First Normal Form)
Database Normalization is a well-known technique used for designing database schema. The main purpose of applying the normalization technique is to reduce the redundancy and dependency of data. Normalization helps us to break down large tables into multiple small tables by defining a logical relationship between those tables.
Example: Employee Table The Employee table is not in its first normal form because the column Phone No. contains multiple values. To convert the Employee table into 1NF we decompose it as follows: DataHour: Era of AI-Assisted Innovation Date: 6 Nov 2023 Time: 8:00 PM - 9:00 PM IST RSVP! Employee Table- 1NF Second Normal Form (2NF)
Here are some examples of normalized data: Miss ANNA will be written Ms. Anna 4158488400 will be written 415-848-8400 37 buttercup AVE will be written 37 Buttercup Avenue Amazon will be written Amazon.com, Inc. VP product will be written Vice President of Product Knowing how to normalize data en masse is a more complicated matter.
July 30, 2023 Data Normalization Explained: Types, Examples, & Methods Discover the power of data normalization with our guide and learn about the different types of normalization and explore their examples. What Are Data Anomalies? Exploring Data Anomalies: A Focus on Databases, Data Analysis & Machine Learning
Here, we discuss the suitability of five normalization techniques for AHP with an illustrative example for smart car parking. This illustrative case consists of 3 criteria (C1, C2, C3), which correspond to time to park, distance, and size of the parking space, and 7 alternatives (A1, A2, …, A7), which correspond to candidate location sites for parking.
Four common normalization techniques may be useful: scaling to a range. clipping. log scaling. z-score. The following charts show the effect of each normalization technique on the distribution of the raw feature (price) on the left. The charts are based on the data set from 1985 Ward's Automotive Yearbook that is part of the UCI Machine ...
Briefly introduce the problems and issues found in the case study. Discuss the theory you will be using in the analysis; Present the key points of the study and present any assumptions made during the analysis. Findings. This is where you present in more detail the specific problems you discovered in the case study.
Normalisation. The term normalisation is particular to Maria Montessori and was discussed by her. (Montessori, The absorbent mind, chapter 18). Here she described normalisation as a. 'transition shifting from one stage to another, a transitional process'. In many ways the term could be linked with the modern word socialisation which basically.