It is a technique of organizing the data into multiple related tables, to minimize DATA REDUNDANCY. Repetition of similar data at multiple places occupies extra space in the memory.
It also causes other issues known as anomalies.
There are three types of anomalies that occur when the database is not normalized. They are
1.Insertion Anomaly
2.Updation Anomaly
3.Deletion Anomaly
1.Insertion Anomaly
To insert redundant data for every new row is a data insertion problem or anomaly.
2.Deletion Anomaly
Loss of related dataset when some other dataset is deleted.
3.Updation Anomaly
Because of data redundancy same data in multiple places may not be updated which leads to inconsistent data.
**Normalization breaks the existing table into multiple tables.
Normalization can be achieved in multiple ways.
1. 1st Normal Form (1NF) -------atomic values
2. 2nd Normal Form (2NF) -------X Partial Dependency
3. 3rd Normal Form (3NF) ------- X Transitive Dependency
4. BCNF(Boyce-Codd Normal Form) or 3.5 Normal Form.
5. 4th Normal Form (4NF)
6. 5th Normal Form (5NF)
1. 1st Normal Form (1NF) - 1 A
There are 4 basic rules that a table should follow to be in 1st Normal Form
1. Each column should contain atomic values
2. A column should contain values that are of the same type.
3. Each column to have a unique name
4. Order in which data is saved doesn't matter.
If all the above rules are satisfied then we say that the given table is in 1NF.
Let us consider STUDENT table
RNO NAME SUBJECT
101 SWETHA OS,CN
102 RAVI JAVA
103 KAVYA C,CPP
The above table already satisfied 3 rules of 1NF out of 4. Out of 3 students two have opted for more than one subject. we stored the subject name in a single column. To keep the table in 1NF solution is, break the subject values into atomic values.
RNO NAME SUBJECT
101 SWETHA OS
101 SWETHA CN
102 RAVI JAVA
103 KAVYA C
103 KAVYA CPP
Although few values are getting repeated values for subject column is atomic for each row. Hence the table is in 1NF.
2. 2nd Normal Form (2NF) - 2 P
Note: No Partial Dependency α(Prime) -> β (Non Prime)
For a table to be in second normal form. It should follow the following rules.
1. It should be in 1NF
eg: B ->C
A ->C
eg: AB ->C
Dependency or Functional Dependency
If an non-prime attribute (B) is dependent on a composite key (A) but not on any subset of the composite key, such dependency is known as Functional Dependency.
Composite Key: A key which has multiple attributes to uniquely identify rows in a table is called composite key.
therefore both std_id & sub_id (Primary Key 's) are called composite key
Partial Dependency
If a nonprime attribute is functionally dependent on part of a candidate key, then it is said to be Partial Dependency.
R(ABCDEF)
R2(EA)
R3(ECD)
R1(ABD)
R2(BC)
1 KAVYA 7 CSE AP
2 RAHUL 99 IT HR
3 VIJAY 68 CSE MH
4 GIRI 59 CSE TN
In above table S_ID is Primary Key
We have STUDENT table and SUBJECT table during exams consider another table SCORE is created.
SCORE(SCORE_ID,S_ID,SUB_ID,MARKS,FNAME)
1 1 1 67 RK
2 1 2 89 PK
3 2 1 99 KK
4 3 4 85 DK
5 2 3 64 JK
In SCORE table can say that the Primary Key is score_id.
If we want to fetch marks of student with s_id=1 we get two values.
Because we don't know for which subject you are asking, similarly, if we use sub_id we don’t know for which student.
But s_id+sub_id together makes a meaningful Primary Key and we can fetch all information using it. It is a composite Primary Key
Hence, s_id+sub_id can uniquely identify any row data in SCORE table.
But, in SCORE table the column tname is only dependent on sub_id and not on s_id, which is a part of Primary Key. This is known as PARTIAL DEPENDENCY.
To convert above table into 2NF remove tname from SCORE table and move it to subject table. Now the new SCORE table is
Table Name : SCORE
1 1 1 67
2 1 2 89
3 2 1 99
4 3 4 85
5 2 3 64
Table Name : SUBJECT
1 JAVA RK
2 CPP PK
3 PYTHON JK
4 ORACLE DK
Now, SCORE table is in 2NF
3. 3rd Normal Form (3NF) - 3 T
Note: No Transitive Dependency α(Non Prime) -> β (Non Prime)
For a table to be in 3NF
1. It should be in 2NF
2. It should not have Transitive Dependency i.e No non-prime should determine non-prime
Transitive Dependency
In a table, if a non-prime attribute is dependent on another non-prime attribute and not on prime attribute, such type of dependency is known as Transitive Dependency.
R(ABCD)
R1(ABC)
R2(CD)
1 1 1 67 JAVA 100
2 1 2 89 CPP 75
3 2 1 99 JAVA 100
4 3 4 85 ORACLE 200
5 2 3 64 PYTHON 150
Here Primary Key is a composite key made up of SID+SUBID
If we observe EXAMNAME, depends on SID+SUBID, but TOTMARKS is only dependent on EXAMNAME and not on SID+SUBID.
So, EXAMNAME is not part of Primary Key, which is a non-prime attribute, which is Transitive Dependency. So we need to remove it, to convert the table into 3NF.
Move EXAMNAME and TOTMARKS to a new table EXAM
now SCORE table look like this
Table Name : SCORE
1 1 1 67 JAVA
2 1 2 89 CPP
3 2 1 99 JAVA
4 3 4 85 ORACLE
5 2 3 64 PYTHON
Table Name : EXAM
JAVA 100
CPP 75
JAVA 100
ORACLE 200
PYTHON 150
Now, there is no Transitive Dependency in SCORE table, Hence it is in 3NF.
4. BCNF(Boyce-Codd Normal Form) or 3.5 Normal Form.
Note: α(Super key) -> β (Prime/Non Prime)
For a table to be in BCNF
1. It should be in 3NF
2. For any dependency A-->B, A should be a super key.
It means for, A -> B
If A is non-prime and B is a prime attribute
SID SUB PROF
101 JAVA RAM
101 C++ VIJAY
102 JAVA GURU
103 C# RAGHU
104 JAVA RAM
In this table SID+SUB is a primary key.
The above table is in 1NF,2NF and 3NF
If you observe the above table PROF can lead a SUB and SUB is a part of candidate key/primary key(SID+SUB). Hence it is a prime attribute. While is PROF is a non-prime attribute. Hence we have a dependency here, where SUB is dependent on PROF.
But, PROF is not a super key, so the table doesn’t satisfy BCNF.
To convert it into BCNF we have to break the table. We can divide COLLEGE table as STUDENT & PROFESSOR TABLE.
SID PID
101 11
101 22
102 33
103 44
104 11
PID PROF SUB
11 RAM JAVA
22 VIJAY C++
33 GURU JAVA
44 RAGHU C#
Now the above college table is converted into BCNF.
No comments:
Post a Comment