Pages

Tuesday, 9 November 2021

Normalization

It is a technique of organizing the data into multiple related tables, to minimize DATA REDUNDANCY. Repetition of similar data at multiple places occupies extra space in the memory. 

It also causes other issues known as anomalies.

There are three types of anomalies that occur when the database is not normalized. They are 

1.Insertion Anomaly

2.Updation Anomaly

3.Deletion Anomaly

1.Insertion Anomaly

To insert redundant data for every new row is a data insertion problem or anomaly.

2.Deletion Anomaly

Loss of related dataset when some other dataset is deleted.

3.Updation Anomaly 

Because of data redundancy  same data in multiple places may not be updated which leads to inconsistent data.

**Normalization breaks the existing table into multiple tables.

Normalization can be achieved in multiple ways. 

1. 1st Normal Form (1NF)  -------atomic values

2. 2nd Normal Form (2NF) -------X Partial Dependency

3. 3rd Normal Form (3NF) ------- X Transitive Dependency

4. BCNF(Boyce-Codd Normal Form) or 3.5 Normal Form.

5. 4th Normal Form (4NF)

6. 5th Normal Form (5NF)



1. 1st Normal Form (1NF)   - 1 A

There are 4 basic rules that a table should follow to be in 1st Normal Form

1. Each column should contain atomic values

2. A column should contain values that are of the same type.

3. Each column to have a unique name

4. Order in which data is saved doesn't matter.

If all the above rules are satisfied then we say that the given table is in 1NF.

Let us consider STUDENT table

RNO     NAME       SUBJECT

101    SWETHA     OS,CN

102    RAVI           JAVA

103    KAVYA        C,CPP

The above table already satisfied 3 rules of 1NF out of 4. Out of 3 students two have opted for more than one subject. we stored the subject name in a single column. To keep the table in 1NF solution is, break the subject values into atomic values.

RNO     NAME       SUBJECT

101       SWETHA   OS

101       SWETHA   CN

102       RAVI        JAVA

103       KAVYA      C

103       KAVYA      CPP

Although few values are getting repeated values for subject column is atomic for each row. Hence the table is in 1NF.

2. 2nd Normal Form (2NF)  - 2 P

Note: No Partial Dependency     α(Prime) -> β (Non Prime)

For a table to be in second normal form. It should follow the following rules.

1. It should be in 1NF

2. It should not have any "Partial Dependencies" 
eg: B ->C
      A ->C

3. It should be only "Full Dependencies"
 eg: AB ->C

Dependency or Functional Dependency

If an non-prime attribute (B) is dependent on a composite key (A) but not on any subset of  the composite key, such dependency is known as Functional Dependency.

Composite Key: A key which has multiple attributes to uniquely identify rows in a table is called composite key.

Example: std_id(PK)    sub_id(PK)    marks    exams
therefore both std_id & sub_id (Primary Key 's) are called composite key

Partial Dependency

If a nonprime attribute is functionally dependent on part of a candidate key, then it is said to be Partial Dependency.

For Example 1: 
R(ABCDEF) 
FD: C->F
      E->A
      EC ->D
      A->B
(CE) Closer =ABCDEF=R is a candidate key
C,E are the Prime Attributes (P)(It is a part of Candidate Key) 
E,A,D,B,F are the Non Prime Attributes (NP)(It is not a part of Candidate Key) 
      EC->D  Partial Dependency    L (P)  ->  R (NP)
Therefore 2NF is
R(ABCDEF) 

R1(CF) 
R2(EA) 
R3(ECD) 

For Example 2: 
R(ABCD) 
FD: AB ->D
      B->C
(AB) Closer =ABCD=R is a candidate key
A,B are the Prime Attributes (P)(It is a part of Candidate Key) 
C,D are the Non Prime Attributes (NP)(It is not a part of Candidate Key) 
      B->C  Partial Dependency    L (P)  ->  R (NP)
Therefore 2NF is
R(ABCD) 

R1(ABD) 
R2(BC) 

Example on 2NF:
Consider a table 
STUDENT (S_ID(PK),SNAME,RNO,BRANCH,ADDRESS)
S_ID       SNAME      RNO              BRANCH         ADDRESS
1             KAVYA        7                   CSE                   AP
2             RAHUL        99                 IT                     HR
3             VIJAY          68                 CSE                  MH
4             GIRI            59                 CSE                  TN

In above table S_ID is Primary Key

Consider another table SUBJECT(SUB_ID(PK),SUBNAME)
SUB_ID    SUBNAME
    1           JAVA
    2           CPP
    3           PYTHON
    4           ORACLE

We have STUDENT table  and  SUBJECT table during exams  consider another table SCORE is created.

Table Name : 
SCORE(SCORE_ID,S_ID,SUB_ID,MARKS,FNAME)
SCORE_ID      S_ID   SUB_ID     MARKS     FNAME
        1          1          1              67             RK
        2          1          2              89             PK
        3          2          1              99             KK
        4          3          4              85             DK
        5          2          3              64             JK

In SCORE table can say that the Primary Key is score_id.

If we want to fetch marks of student with s_id=1 we get two values.

Because we don't know for which subject you are asking,  similarly, if we use sub_id we don’t know for which student.

But s_id+sub_id  together makes a meaningful Primary Key and we can fetch all information using it. It is a composite Primary Key

Hence, s_id+sub_id can uniquely identify any row data in SCORE table.

But, in SCORE table the column tname is only dependent on sub_id and not on s_id, which is a part of  Primary Key. This is known as PARTIAL DEPENDENCY.

To convert above table into 2NF remove tname from SCORE table and move it to subject table. Now the new SCORE table  is

Table Name : SCORE

SCORE_ID    S_ID   SUB_ID     MARKS
        1         1          1              67             
        2         1          2              89             
        3         2          1              99             
        4         3          4              85             
        5         2          3              64


Table Name : SUBJECT            

SUB_ID         SUBNAME           TNAME
    1                JAVA                  RK
    2                CPP                   PK
    3                PYTHON             JK
    4                ORACLE             DK

Now, SCORE table is in 2NF

3. 3rd Normal Form (3NF) - 3 T

Note: No Transitive Dependency  α(Non Prime) -> β (Non Prime)

For a table to be in 3NF

1. It should be in 2NF

2. It should not have Transitive Dependency i.e  No non-prime should determine non-prime

Transitive Dependency

In a table, if a non-prime attribute is dependent on another non-prime attribute and not on prime attribute, such type of dependency is known as Transitive Dependency.

For Example 1: 
R(ABCD) 
FD: AB ->C
      C->D
(AB) Closer =ABCD=R is a candidate key
A,B are the Prime Attributes (P)(It is a part of Candidate Key) 
C,D are the Non Prime Attributes (NP)(It is not a part of Candidate Key) 
      C (NP) -> D(NP)  Transitive Dependency    L (NP)  ->  R (NP)
Therefore 3NF is
R(ABCD) 

R1(ABC) 
R2(CD) 


Example 2:
Table Name : SCORE
SCOREID   SID   SUBID  MARKS   EXAMNAME    TOTMARKS
1                1          1       67          JAVA              100
2                1          2       89          CPP                75   
3                2          1       99          JAVA              100
4                3          4       85         ORACLE          200
5                2          3       64         PYTHON          150 

Here Primary Key is a composite key made up of  SID+SUBID

If we observe EXAMNAME, depends on SID+SUBID, but TOTMARKS is only dependent on EXAMNAME and not on SID+SUBID.

So, EXAMNAME is not part of Primary Key, which is a non-prime attribute, which is Transitive Dependency. So we need to remove it, to convert the table into 3NF.

Move EXAMNAME and TOTMARKS to a new table EXAM

now SCORE table look like this

Table Name : SCORE

SCOREID           SID     SUBID       MARKS        EXAMNAME 
        1                1          1              67                JAVA        
        2                1          2              89                CPP              
        3                2          1              99                JAVA        
        4                3          4              85                ORACLE  
        5                2          3              64                PYTHON  

Table Name : EXAM

EXAMNAME        TOTMARKS
        JAVA               100
        CPP                 75   
        JAVA               100
        ORACLE          200
        PYTHON          150 

Now, there is no Transitive Dependency in SCORE table, Hence it is in 3NF.

4. BCNF(Boyce-Codd Normal Form) or 3.5 Normal Form.

Note:       α(Super key) -> β (Prime/Non Prime)

For a table to be in BCNF

1. It should be in 3NF

2. For any dependency A-->B, A should be a super key.

It means for, A -> B

If A is non-prime and B is a prime attribute

Ex: COLLEGE
SID      SUB        PROF
101      JAVA       RAM
101       C++     VIJAY
102      JAVA      GURU
103       C#        RAGHU
104      JAVA      RAM

In this table SID+SUB is a primary key.

The above table is in 1NF,2NF and 3NF

If you observe the above table PROF can lead a SUB and SUB is a part of candidate key/primary key(SID+SUB). Hence it is a prime attribute. While is PROF is a non-prime attribute. Hence we have a dependency here, where SUB is dependent on PROF.

But, PROF is not a super key, so the table doesn’t satisfy BCNF.

To convert it into BCNF we have to break the table. We can divide COLLEGE table as STUDENT & PROFESSOR TABLE.

STUDENT
SID     PID
101     11
101     22
102     33
103     44  
104     11  

PROFESSOR
PID    PROF      SUB
11      RAM       JAVA
22      VIJAY     C++
33      GURU     JAVA
44      RAGHU  C#

Now the above college table is converted into BCNF.


No comments:

Post a Comment

Conflict Serializability

Find out conflict serializability for the given transactions T1 T2 T3 R(X)     ...