Normalization
* What is Normalization?
* Why is its done?
* The normal forms 1NF, 2NF, 3NF, BCNF, 4NF
* Transforming normal forms
* E-R modeling and normalization
* Denormalization
Database Tables and Normalization
Normalization is a process for assigning attributes to entities to:
* Reduce data redundancies
o Help eliminate data anomalies
* Produce controlled redundancies to link tables
* No information is lost in normalization
* Result will be a database that can produce the same information as the original
Normalization Process
Normalizatin works through a series of stages called normal forms:
* First Normal form (1NF)
* Second normal form (2NF)
* Third normal form (3NF)
* etc (4th and 5th)
* 2NF is better than 1NF; 3NF is better than 2NF
* For most business database design purposes, 3NF is highest we need to go in the normalization process
* Highest level of normalization is not always most desirable
The Need for Normalization
Example: company that manages building projects
* Charges its clients by billing hours spent on each contract
* Hourly billing rate is dependent on employee�s position
* Periodically, a report is generated that contains information as follows
Sample Report
Table derived form Above
The Need for Normalization
* Structure of data set in Figure 5.1 does not handle data very well
* The table structure appears to work; report is generated with ease
* Unfortunately, the report may yield different results, depending on what data anomaly has occurred
Issues
* Table entries invite data inconsistencies
* Table displays potential data anomalies
o Update: Modifying JOB_CLASS
o Insertion: New Employee must be assigned project
o Deletion: If employee deleted, other vital data lost: if emp 103 leaves lose info on Elect Engineers
Repeating Group
Repeating group
Derives its name from the fact that a group of multiple (related) entries can exist for any single key attribute occurrence
* Relational table must not contain repeating groups
* Normalizing the table structure will reduce these data redundancies
* Normalization is three-step procedure
Converting to First Normal Form
A table in a relational database must be in 1NF.
* Repeating groups must be eliminated
* Primary key determined
o Uniquely identify attribute values (rows)
o All attributes dependent on primary key
o In example: Combination of PROJ_NUM and EMP_NUM
Dependencies
* Dependencies can be depicted with the help of a diagram
* Dependency diagram:
o Depicts all dependencies found within a given table structure
o Helpful in getting bird�s-eye view of all relationships among a table�s attributes
o Use makes it much less likely that an important dependency will be overlooked
* Desirable dependencies based on entire primary key
* Less desirable dependencies
Partial:
Based on part of composite primary key
Transitive:
One nonprime attribute depends on another nonprime attribute
Dependency Diagram
Fig 5.3: Dependency Diagram
1NF: Definition
* Tabular format in which:
o All key attributes are defined
o There are no repeating groups in the table
o All attributes are dependent on primary key
* All relational tables must satisfy 1NF requirements
* Some tables contain partial dependencies
o Dependencies based on only part of the primary key
o Sometimes used for performance reasons, but should be used with caution
* Still subject to data redundancies
Second Normal Form
1. Identify all key components
* Write each key component on separate line
* Write original key on last line
* Write dependent attributes after each key.
2. Each line will become a new table
Second Normal Form Conversion Results
Second Normal Form Defined
Table is in second normal form (2NF) if:
* It is in 1NF and
* It includes no partial dependencies:
* No attribute is dependent on only a portion of the primary key
Converting to Third Normal Form
* Resolve transitive dependencies (attributes dependent on non-key attributes)
* Create separate table for each transitive dependency
3NF Results
Boyce-Codd Normal Form
* Every determinant in the table is a candidate key
o Has same characteristics as primary key, but for some reason, not chosen to be primary key
* If a table contains only one candidate key, the 3NF and the BCNF are equivalent
* BCNF can be violated only if the table contains more than one candidate key
BCNF (cont)
* Most designers consider the Boyce-Codd normal form (BCNF) as a special case of 3NF
* A table is in 3NF if it is in 2NF and there are no transitive dependencies
o A table can be in 3NF and not be in BCNF
o A transitive dependency exists when one nonprime attribute is dependent on another nonprime attribute
o A nonkey attribute is the determinant of a key attribute
Table in 3nf but not BCNF
Decomposition to BCNF
Decomposition to BCNF
Fourth Normal Form
* Table is in 3NF
* Has no multiple sets of multivalued dependencies
Conversion to 4NF
* 4NF is largely academic if tables conform to the following two rules:
o All attributes are dependent on primary key but independent of each other
o No row contains two or more multivalued facts about an entity
Improving the Design
* Table structures are cleaned up to eliminate the troublesome initial partial and transitive dependencies
* Normalization cannot, by itself, be relied on to make good designs
* It is valuable because its use helps eliminate data redundancies
Improving the Design (cont)
The following changes were made:
* PK assignment
* Naming conventions
* Attribute atomicity
* Adding attributes
* Adding relationships
* Refining PKs
* Maintaining historical accuracy
* Dealing with derived attributes
Completed Database
Completed Database
Completed Database: Assign Table
5.6b: Assign Table
Completed Database: Employee
Final ERD for contracting company
Final ERD
Limitations on System Assigned Keys
* System-assigned primary key may not prevent confusing entries
* Data entries in Table 5.2 are inappropriate because they duplicate existing records
* Yet there has been no violation of either entity integrity or referential integrity
* Perhaps Job Description needs to be unique
Normalization and Database Design
* Normalization should be part of design process
* Make sure that proposed entities meet required normal form before table structures are created
* Many real-world databases have been improperly designed or burdened with anomalies if improperly modified during course of time
* You may be asked to redesign and modify existing databases
Normalization and Database Design (cont)
* E-R Diagram provides macro view, determines entities
* Normalization provides micro view of entities
o Focuses on characteristics of specific entities
o May yield additional entities
* Difficult to separate Normalization and ER diagramming
* Extra check: No attribute that is not a (primary/foreign) key should be repeated in the database (except to record historical data)
Denormalization
* Normalization is one of many database design goals
Normalized table requirements
o Additional processing
o Loss of system speed
* Normalization purity is difficult to sustain due to conflict in
o Design efficiency
o Information requirements
o Processing
* Do not be too quick to denormalize
Unnormalized Table Defects
* Data updates less efficient
* Indexing more cumbersome
* No simple strategies for creating views
Overnormalization
* This is done for performance reason frequently in distributed and clustered database systems
* Splits tables beyond the pont required for normal forms.
* In horizontal partitioning rows of a single logical table are split among several physical tables (e.g. geographically)
* In vertical partitioning a table is split vertically with commonly accessed columns in separate physical tables
Subscribe to:
Post Comments (Atom)


No comments:
Post a Comment