* The basic commands and functions of SQL
* How to use SQL for data administration (to create tables, indexes, and views)
* How to use SQL for data manipulation (to add, modify, delete, and retrieve data)
* How to use SQL to query a database to extract useful information
Introduction to SQL Part I
* The relational DBMS is the standard for database management.
* The Structured Query Language, SQL, is the standard for working with them
* This chapter is an introduction to essential SQL.
SQL strengths
Covers both
*
Data definition
*
Data manipulation
SQL is relatively easy to learn.
ANSI prescribes a standard SQL.
SQL Weaknesses
*
Some eccentric notation
o
use of ' marks; in strings, rather than "
o
Wildcards: % instead of *
*
Some things are hard to do
*
Different Vendors implement different dialects
*
Not a good conceptual match to most programming language
*
Strictly DDL and DML no standard procedural language.
DB2 concepts
* DB2 consists of multiple "instances" on each server (we have one)
* Within each instance there are databases: we have two, SAMPLE and DBMS, and will be using DBMS
* Within the databases are schemas one for each user.
* Authorization for users is via the system (cs1 account). Your schema name and user name are the same as your username on CS1
Setup for demonstrations
1. Use the winsql program to connect to the datasource DBMS using your CS1 username and password
2. Perform the following commands to create the tables:
drop table vendor; drop table product; drop table customer;
create table vendor like CH06_SALESCO.vendor;
insert into vendor select * from CH06_SALESCO.vendor;
alter table vendor add primary key (v_code);
create table product like CH06_SALESCO.product;
insert into product select * from CH06_SALESCO.product;
alter table product add primary key (p_code)
add foreign key (v_code) references vendor on delete set null on update restrict;
Data Definition Commands
The Database Model
Simple Database -- PRODUCT and VENDOR tables
Each product is supplied by only a single vendor.
A vendor may supply many products.
Data Definition Commands
The Tables and Their Components
*
The VENDOR table contains vendors who are not referenced in the PRODUCT table. PRODUCT is optional to VENDOR.
*
Existing V_CODE values in the PRODUCT table must have a match in the VENDOR table.
*
A few products are supplied factory-direct, a few are made in-house, and a few may have been bought in a special warehouse sale. That is, a product is not necessarily supplied by a vendor. VENDOR is optional to PRODUCT.
Common SQL Datatypes
Data Type
SQL
Numeric
NUMBER(L,D)
DECIMAL(L,D)
INTEGER
SMALLINT
Character
CHAR (L)
VARCHAR (L)
Date
DATE
Data Definition Commands
Creating the Database Structure
This varies among databases. In DB2 there are instances
within each instance there are databases
within each database there are schemas (one for each user). DB2 users are the same as Operating System users.
Statements in DB2 referencing a table include schema (SELECT * FROM schema.tablename) (current schema are implicit).
Data Definition Commands
Creating Table Structures
CREATE TABLE
Thursday, November 6, 2008
Normalization
* What is Normalization?
* Why is its done?
* The normal forms 1NF, 2NF, 3NF, BCNF, 4NF
* Transforming normal forms
* E-R modeling and normalization
* Denormalization
Database Tables and Normalization
Normalization is a process for assigning attributes to entities to:
* Reduce data redundancies
o Help eliminate data anomalies
* Produce controlled redundancies to link tables
* No information is lost in normalization
* Result will be a database that can produce the same information as the original
Normalization Process
Normalizatin works through a series of stages called normal forms:
* First Normal form (1NF)
* Second normal form (2NF)
* Third normal form (3NF)
* etc (4th and 5th)
* 2NF is better than 1NF; 3NF is better than 2NF
* For most business database design purposes, 3NF is highest we need to go in the normalization process
* Highest level of normalization is not always most desirable
The Need for Normalization
Example: company that manages building projects
* Charges its clients by billing hours spent on each contract
* Hourly billing rate is dependent on employee�s position
* Periodically, a report is generated that contains information as follows
Sample Report
Table derived form Above
The Need for Normalization
* Structure of data set in Figure 5.1 does not handle data very well
* The table structure appears to work; report is generated with ease
* Unfortunately, the report may yield different results, depending on what data anomaly has occurred
Issues
* Table entries invite data inconsistencies
* Table displays potential data anomalies
o Update: Modifying JOB_CLASS
o Insertion: New Employee must be assigned project
o Deletion: If employee deleted, other vital data lost: if emp 103 leaves lose info on Elect Engineers
Repeating Group
Repeating group
Derives its name from the fact that a group of multiple (related) entries can exist for any single key attribute occurrence
* Relational table must not contain repeating groups
* Normalizing the table structure will reduce these data redundancies
* Normalization is three-step procedure
Converting to First Normal Form
A table in a relational database must be in 1NF.
* Repeating groups must be eliminated
* Primary key determined
o Uniquely identify attribute values (rows)
o All attributes dependent on primary key
o In example: Combination of PROJ_NUM and EMP_NUM
Dependencies
* Dependencies can be depicted with the help of a diagram
* Dependency diagram:
o Depicts all dependencies found within a given table structure
o Helpful in getting bird�s-eye view of all relationships among a table�s attributes
o Use makes it much less likely that an important dependency will be overlooked
* Desirable dependencies based on entire primary key
* Less desirable dependencies
Partial:
Based on part of composite primary key
Transitive:
One nonprime attribute depends on another nonprime attribute
Dependency Diagram
Fig 5.3: Dependency Diagram
1NF: Definition
* Tabular format in which:
o All key attributes are defined
o There are no repeating groups in the table
o All attributes are dependent on primary key
* All relational tables must satisfy 1NF requirements
* Some tables contain partial dependencies
o Dependencies based on only part of the primary key
o Sometimes used for performance reasons, but should be used with caution
* Still subject to data redundancies
Second Normal Form
1. Identify all key components
* Write each key component on separate line
* Write original key on last line
* Write dependent attributes after each key.
2. Each line will become a new table
Second Normal Form Conversion Results
Second Normal Form Defined
Table is in second normal form (2NF) if:
* It is in 1NF and
* It includes no partial dependencies:
* No attribute is dependent on only a portion of the primary key
Converting to Third Normal Form
* Resolve transitive dependencies (attributes dependent on non-key attributes)
* Create separate table for each transitive dependency
3NF Results
Boyce-Codd Normal Form
* Every determinant in the table is a candidate key
o Has same characteristics as primary key, but for some reason, not chosen to be primary key
* If a table contains only one candidate key, the 3NF and the BCNF are equivalent
* BCNF can be violated only if the table contains more than one candidate key
BCNF (cont)
* Most designers consider the Boyce-Codd normal form (BCNF) as a special case of 3NF
* A table is in 3NF if it is in 2NF and there are no transitive dependencies
o A table can be in 3NF and not be in BCNF
o A transitive dependency exists when one nonprime attribute is dependent on another nonprime attribute
o A nonkey attribute is the determinant of a key attribute
Table in 3nf but not BCNF
Decomposition to BCNF
Decomposition to BCNF
Fourth Normal Form
* Table is in 3NF
* Has no multiple sets of multivalued dependencies
Conversion to 4NF
* 4NF is largely academic if tables conform to the following two rules:
o All attributes are dependent on primary key but independent of each other
o No row contains two or more multivalued facts about an entity
Improving the Design
* Table structures are cleaned up to eliminate the troublesome initial partial and transitive dependencies
* Normalization cannot, by itself, be relied on to make good designs
* It is valuable because its use helps eliminate data redundancies
Improving the Design (cont)
The following changes were made:
* PK assignment
* Naming conventions
* Attribute atomicity
* Adding attributes
* Adding relationships
* Refining PKs
* Maintaining historical accuracy
* Dealing with derived attributes
Completed Database
Completed Database
Completed Database: Assign Table
5.6b: Assign Table
Completed Database: Employee
Final ERD for contracting company
Final ERD
Limitations on System Assigned Keys
* System-assigned primary key may not prevent confusing entries
* Data entries in Table 5.2 are inappropriate because they duplicate existing records
* Yet there has been no violation of either entity integrity or referential integrity
* Perhaps Job Description needs to be unique
Normalization and Database Design
* Normalization should be part of design process
* Make sure that proposed entities meet required normal form before table structures are created
* Many real-world databases have been improperly designed or burdened with anomalies if improperly modified during course of time
* You may be asked to redesign and modify existing databases
Normalization and Database Design (cont)
* E-R Diagram provides macro view, determines entities
* Normalization provides micro view of entities
o Focuses on characteristics of specific entities
o May yield additional entities
* Difficult to separate Normalization and ER diagramming
* Extra check: No attribute that is not a (primary/foreign) key should be repeated in the database (except to record historical data)
Denormalization
* Normalization is one of many database design goals
Normalized table requirements
o Additional processing
o Loss of system speed
* Normalization purity is difficult to sustain due to conflict in
o Design efficiency
o Information requirements
o Processing
* Do not be too quick to denormalize
Unnormalized Table Defects
* Data updates less efficient
* Indexing more cumbersome
* No simple strategies for creating views
Overnormalization
* This is done for performance reason frequently in distributed and clustered database systems
* Splits tables beyond the pont required for normal forms.
* In horizontal partitioning rows of a single logical table are split among several physical tables (e.g. geographically)
* In vertical partitioning a table is split vertically with commonly accessed columns in separate physical tables
* What is Normalization?
* Why is its done?
* The normal forms 1NF, 2NF, 3NF, BCNF, 4NF
* Transforming normal forms
* E-R modeling and normalization
* Denormalization
Database Tables and Normalization
Normalization is a process for assigning attributes to entities to:
* Reduce data redundancies
o Help eliminate data anomalies
* Produce controlled redundancies to link tables
* No information is lost in normalization
* Result will be a database that can produce the same information as the original
Normalization Process
Normalizatin works through a series of stages called normal forms:
* First Normal form (1NF)
* Second normal form (2NF)
* Third normal form (3NF)
* etc (4th and 5th)
* 2NF is better than 1NF; 3NF is better than 2NF
* For most business database design purposes, 3NF is highest we need to go in the normalization process
* Highest level of normalization is not always most desirable
The Need for Normalization
Example: company that manages building projects
* Charges its clients by billing hours spent on each contract
* Hourly billing rate is dependent on employee�s position
* Periodically, a report is generated that contains information as follows
Sample Report
Table derived form Above
The Need for Normalization
* Structure of data set in Figure 5.1 does not handle data very well
* The table structure appears to work; report is generated with ease
* Unfortunately, the report may yield different results, depending on what data anomaly has occurred
Issues
* Table entries invite data inconsistencies
* Table displays potential data anomalies
o Update: Modifying JOB_CLASS
o Insertion: New Employee must be assigned project
o Deletion: If employee deleted, other vital data lost: if emp 103 leaves lose info on Elect Engineers
Repeating Group
Repeating group
Derives its name from the fact that a group of multiple (related) entries can exist for any single key attribute occurrence
* Relational table must not contain repeating groups
* Normalizing the table structure will reduce these data redundancies
* Normalization is three-step procedure
Converting to First Normal Form
A table in a relational database must be in 1NF.
* Repeating groups must be eliminated
* Primary key determined
o Uniquely identify attribute values (rows)
o All attributes dependent on primary key
o In example: Combination of PROJ_NUM and EMP_NUM
Dependencies
* Dependencies can be depicted with the help of a diagram
* Dependency diagram:
o Depicts all dependencies found within a given table structure
o Helpful in getting bird�s-eye view of all relationships among a table�s attributes
o Use makes it much less likely that an important dependency will be overlooked
* Desirable dependencies based on entire primary key
* Less desirable dependencies
Partial:
Based on part of composite primary key
Transitive:
One nonprime attribute depends on another nonprime attribute
Dependency Diagram
Fig 5.3: Dependency Diagram
1NF: Definition
* Tabular format in which:
o All key attributes are defined
o There are no repeating groups in the table
o All attributes are dependent on primary key
* All relational tables must satisfy 1NF requirements
* Some tables contain partial dependencies
o Dependencies based on only part of the primary key
o Sometimes used for performance reasons, but should be used with caution
* Still subject to data redundancies
Second Normal Form
1. Identify all key components
* Write each key component on separate line
* Write original key on last line
* Write dependent attributes after each key.
2. Each line will become a new table
Second Normal Form Conversion Results
Second Normal Form Defined
Table is in second normal form (2NF) if:
* It is in 1NF and
* It includes no partial dependencies:
* No attribute is dependent on only a portion of the primary key
Converting to Third Normal Form
* Resolve transitive dependencies (attributes dependent on non-key attributes)
* Create separate table for each transitive dependency
3NF Results
Boyce-Codd Normal Form
* Every determinant in the table is a candidate key
o Has same characteristics as primary key, but for some reason, not chosen to be primary key
* If a table contains only one candidate key, the 3NF and the BCNF are equivalent
* BCNF can be violated only if the table contains more than one candidate key
BCNF (cont)
* Most designers consider the Boyce-Codd normal form (BCNF) as a special case of 3NF
* A table is in 3NF if it is in 2NF and there are no transitive dependencies
o A table can be in 3NF and not be in BCNF
o A transitive dependency exists when one nonprime attribute is dependent on another nonprime attribute
o A nonkey attribute is the determinant of a key attribute
Table in 3nf but not BCNF
Decomposition to BCNF
Decomposition to BCNF
Fourth Normal Form
* Table is in 3NF
* Has no multiple sets of multivalued dependencies
Conversion to 4NF
* 4NF is largely academic if tables conform to the following two rules:
o All attributes are dependent on primary key but independent of each other
o No row contains two or more multivalued facts about an entity
Improving the Design
* Table structures are cleaned up to eliminate the troublesome initial partial and transitive dependencies
* Normalization cannot, by itself, be relied on to make good designs
* It is valuable because its use helps eliminate data redundancies
Improving the Design (cont)
The following changes were made:
* PK assignment
* Naming conventions
* Attribute atomicity
* Adding attributes
* Adding relationships
* Refining PKs
* Maintaining historical accuracy
* Dealing with derived attributes
Completed Database
Completed Database
Completed Database: Assign Table
5.6b: Assign Table
Completed Database: Employee
Final ERD for contracting company
Final ERD
Limitations on System Assigned Keys
* System-assigned primary key may not prevent confusing entries
* Data entries in Table 5.2 are inappropriate because they duplicate existing records
* Yet there has been no violation of either entity integrity or referential integrity
* Perhaps Job Description needs to be unique
Normalization and Database Design
* Normalization should be part of design process
* Make sure that proposed entities meet required normal form before table structures are created
* Many real-world databases have been improperly designed or burdened with anomalies if improperly modified during course of time
* You may be asked to redesign and modify existing databases
Normalization and Database Design (cont)
* E-R Diagram provides macro view, determines entities
* Normalization provides micro view of entities
o Focuses on characteristics of specific entities
o May yield additional entities
* Difficult to separate Normalization and ER diagramming
* Extra check: No attribute that is not a (primary/foreign) key should be repeated in the database (except to record historical data)
Denormalization
* Normalization is one of many database design goals
Normalized table requirements
o Additional processing
o Loss of system speed
* Normalization purity is difficult to sustain due to conflict in
o Design efficiency
o Information requirements
o Processing
* Do not be too quick to denormalize
Unnormalized Table Defects
* Data updates less efficient
* Indexing more cumbersome
* No simple strategies for creating views
Overnormalization
* This is done for performance reason frequently in distributed and clustered database systems
* Splits tables beyond the pont required for normal forms.
* In horizontal partitioning rows of a single logical table are split among several physical tables (e.g. geographically)
* In vertical partitioning a table is split vertically with commonly accessed columns in separate physical tables
Entity Relationship Modeling
* How relationships between entities are defined and refined, and how such relationships are incorporated into the database design process
* How ERD components affect database design and implementation
* How to interpret the modeling symbols for the four most popular ER modeling tools
* That real-world database design often requires that you reconcile conflicting goals
Entity Relationship Model and Diagram
* ER model forms the basis of an ER diagram
* ERD represents the conceptual database as viewed by end user
* ERDs depict the ER model�s three main components:
o Entities
o Attributes
o Relationships
* Several different diagramming conventions
Entities
* Refers to the entity set and not to a single entity occurrence
* Corresponds to a table and not to a row in the relational environment
* In both the Chen and Crow�s Foot models, an entity is represented by a rectangle containing the entity�s name
* Entity name, a noun, is usually written in capital letters
Attributes
* Characteristics of entities
* Domain is set of possible values
* Primary keys underlined
Attributes (cont)
Attributes (cont)
Simple
Cannot be subdivided
Age, sex, GPA
Composite
Can be subdivided
Address: street city state zip
Single-valued
Has only a single value
Social security number
Multi-valued
Can have many values
Person may have several college degrees
Derived
Can be calculated from other information
Age can be derived from D.O.B.
Multivalued Attributes
Resolving Multivalued Attribute Problems
Although the conceptual model can handle multivalued attributes, you should not implement them in the relational DBMS
* Within original entity, create several new attributes, one for each of the original multivalued attribute�s components
o Can lead to major structural problems in the table
* Create a new entity composed of original multivalued attribute�s components
Creating New Attributes
Creating New Entity Set
Relationships
* Associations between entities
* Established by Business Rules
* Connected entities termed participants
* Connectivity describes relationship classification:
o 1:1, 1:M, M:N
* Cardinality
o Number of entity occurences associated with one occurence of related entity
Connectivity and Cardinality in an ERD
Relationship Strength
* Existence Dependent
o Entity's existence depends on existence of another related entities
o Existence-independent entities can exist apart from related entities
o Employee claims Child
Child is dependent on employee
* Weak (non-identifying)
o One entity is existence-independent on another
o PK of dependent entity doesn't contain PK component of parent entity
o Book is somewhat confused on this
* Strong (identifying)
o One entity is existence-dependent on another
o PK of related entity contains PK component of parent entity
Relationship Participation
* Optional
o Entity occurrence does not require a corresponding occurrence in related entity
o Shown by drawing a small circle on side of optional entity on ERD
* Mandatory
o Entity occurrence requires corresponding occurrence in related entity
o If no optionality symbol is shown on ERD, it is mandatory
Weak Entity
* Existence-dependent on another entity
* Has primary key that is partially or totally derived from parent entity
Weak Entity
Mandatory Class Course relationship
Optional Class Entity in Professor Teaches Class
Degree of Relationship
A relationships degree indicates the number of associated entities.
Implementation of a Ternary Relationship
Composite Entity
* Used to replace M:N relationships with 1:N relationships
* Bridge entities composed of primary keys of each entity needing connection
Entity Subtypes and Supertypes
Generalization Hierarchy
* Depicts relationships between higher-level supertype and lower-level subtype entities
* Supertype has shared attributes
* Subtypes have unique attributes
* Disjoint relationships
o Unique subtypes
o Non-overlapping
o Indicated with a `G'
* Overlapping subtypes use `Gs' Symbol
Nulls Created by Unique Attributes
Generalization Hierarchy: Disjoint
Generalization Hierarchy: Overlapping and Disjoint
Generalization
Supertype/Subtype relationship in an ERD
Comparison of ER Modeling Symbols
ER Modeling Symbols
Developing an E-R Diagram
* Iterative Process
1. Develop general narrative of organizational operations
2. Draw Basic E-R Model
3. Modify E-R model to incorporate newly discovered components/relationships
* Repeat until designers and users agree E-R model comple
Dealing with Conflicting Goals in Database Design
* Database must be designed to conform to design standards
* High-speed processing may require design compromises
* Quest for timely information may be the focus of database design
Other concerns:
o Security
o Performance
o Shared access
o Integrity
o Capabilities of actual DBMS
* How relationships between entities are defined and refined, and how such relationships are incorporated into the database design process
* How ERD components affect database design and implementation
* How to interpret the modeling symbols for the four most popular ER modeling tools
* That real-world database design often requires that you reconcile conflicting goals
Entity Relationship Model and Diagram
* ER model forms the basis of an ER diagram
* ERD represents the conceptual database as viewed by end user
* ERDs depict the ER model�s three main components:
o Entities
o Attributes
o Relationships
* Several different diagramming conventions
Entities
* Refers to the entity set and not to a single entity occurrence
* Corresponds to a table and not to a row in the relational environment
* In both the Chen and Crow�s Foot models, an entity is represented by a rectangle containing the entity�s name
* Entity name, a noun, is usually written in capital letters
Attributes
* Characteristics of entities
* Domain is set of possible values
* Primary keys underlined
Attributes (cont)
Attributes (cont)
Simple
Cannot be subdivided
Age, sex, GPA
Composite
Can be subdivided
Address: street city state zip
Single-valued
Has only a single value
Social security number
Multi-valued
Can have many values
Person may have several college degrees
Derived
Can be calculated from other information
Age can be derived from D.O.B.
Multivalued Attributes
Resolving Multivalued Attribute Problems
Although the conceptual model can handle multivalued attributes, you should not implement them in the relational DBMS
* Within original entity, create several new attributes, one for each of the original multivalued attribute�s components
o Can lead to major structural problems in the table
* Create a new entity composed of original multivalued attribute�s components
Creating New Attributes
Creating New Entity Set
Relationships
* Associations between entities
* Established by Business Rules
* Connected entities termed participants
* Connectivity describes relationship classification:
o 1:1, 1:M, M:N
* Cardinality
o Number of entity occurences associated with one occurence of related entity
Connectivity and Cardinality in an ERD
Relationship Strength
* Existence Dependent
o Entity's existence depends on existence of another related entities
o Existence-independent entities can exist apart from related entities
o Employee claims Child
Child is dependent on employee
* Weak (non-identifying)
o One entity is existence-independent on another
o PK of dependent entity doesn't contain PK component of parent entity
o Book is somewhat confused on this
* Strong (identifying)
o One entity is existence-dependent on another
o PK of related entity contains PK component of parent entity
Relationship Participation
* Optional
o Entity occurrence does not require a corresponding occurrence in related entity
o Shown by drawing a small circle on side of optional entity on ERD
* Mandatory
o Entity occurrence requires corresponding occurrence in related entity
o If no optionality symbol is shown on ERD, it is mandatory
Weak Entity
* Existence-dependent on another entity
* Has primary key that is partially or totally derived from parent entity
Weak Entity
Mandatory Class Course relationship
Optional Class Entity in Professor Teaches Class
Degree of Relationship
A relationships degree indicates the number of associated entities.
Implementation of a Ternary Relationship
Composite Entity
* Used to replace M:N relationships with 1:N relationships
* Bridge entities composed of primary keys of each entity needing connection
Entity Subtypes and Supertypes
Generalization Hierarchy
* Depicts relationships between higher-level supertype and lower-level subtype entities
* Supertype has shared attributes
* Subtypes have unique attributes
* Disjoint relationships
o Unique subtypes
o Non-overlapping
o Indicated with a `G'
* Overlapping subtypes use `Gs' Symbol
Nulls Created by Unique Attributes
Generalization Hierarchy: Disjoint
Generalization Hierarchy: Overlapping and Disjoint
Generalization
Supertype/Subtype relationship in an ERD
Comparison of ER Modeling Symbols
ER Modeling Symbols
Developing an E-R Diagram
* Iterative Process
1. Develop general narrative of organizational operations
2. Draw Basic E-R Model
3. Modify E-R model to incorporate newly discovered components/relationships
* Repeat until designers and users agree E-R model comple
Dealing with Conflicting Goals in Database Design
* Database must be designed to conform to design standards
* High-speed processing may require design compromises
* Quest for timely information may be the focus of database design
Other concerns:
o Security
o Performance
o Shared access
o Integrity
o Capabilities of actual DBMS
The Relational Database Model
* Tthe relational database model takes a logical view of data
* The relational model's basic components are entities, attributes, and relationships among entities
* How entities and their attributes are organized into tables
* About relational database operators, the data dictionary, and the system catalog
* How data redundancy is handled in the relational database model
* Why indexing is important
A logical View of Data
* Relational model
o Enables us to view data logically rather than physically
o Reminds us of simpler file concept of data storage or spreadsheet concept of table
* Table
o Two-dimensional structure composed of rows and columns
o Has advantages of structural and data independence
o Resembles a file from conceptual point of view
o Easier to understand than its hierarchical and network database predecessors
Tables
* Table also called a relation because the relational model's creator, Codd, used this term.
* A relation (and a relational DBMS's table) differs from a table in that there is no default order of rows or columns
o Note SQL does use a column order but this is a limitation of SQL, not the model.
* Each row (tuple) represents a single entity in an entity set
* Each column represents an attribute and each must have a distinct name
* Each row/column intersection (cell) represents a single atomic value
* All values in a column must have same data format
* Each column has a specific range of values known as a domain
* Each row must have some combination of attributes that uniquely identifies it (key)
Student Table Example
Student Table
Keys
* A key consists of one or more attributes that determine other attributes
* Primary key (PK) is an attribute (or a combination of attributes) that uniquely identifies any given entity (row)
* Key's role is based on determination
o If you know the value of attribute A, you can look up (determine) the value of attribute B
Types of Keys
Composite key
Key consisting of multiple attributes
Key attribute
Any attribute that is part of a key
Superkey
Any key that uniquely identifies each entity
The entire row (all attributes) is a superkey
Candidate key
A superkey without redundancies
Null Values
* Can represent
o An unknown attribute value
o A known, but missing, attribute value
o A 'not applicable' condition
* Not permitted in Primary key attributes
* Not the same as 0 or ""
* Can create problems in logic and using formulas
Controlled Redundancy
* Makes the relational database work
* Tables within the database share common attributes that enable us to link tables together
* Multiple occurrences of values in a table are not redundant when they are required to make the relationship work
* Uncontrolled Redundancy is unnecessary duplication of data
* In general any values duplicated between tables should be part of a Foreign Key -- Primary key relation (see below)
Foreign Keys
Foreign key (FK)
An attribute whose values match primary key values in the related table
Referential integrity
FK contains a value that refers to an existing valid tuple (row) in another relation
Secondary key
Key used strictly for data retrieval purposes
Keys (Summary)
Key Type Definition
Superkey Any attribute or combination of attributes that uniquely identifies a row in the table
Candiate Key aka Minimal Superkey. A superkey that does not contain a subset of attributes that is itself a superkey
Primary Key The candidate key selected to uniquely identify all rows
Cannot contain null values
Foreign Key An attribute (or combination of attributes) in one table that must either match the primary key of another table or be null
Secondary Key An attribute or combination of attributes used to make data retrieval more efficient.
Integrity Rules
Entity Integrity
* All entities are unique
* Each entity has unique primary key
Referential Integrity
* Foreign key must match primary key of referenced table or be null
* Impossible to delete row whose primary key has matching foreign key in another table
Relational Algebra
Key operators:
* SELECT
* PROJECT
* JOIN
Other operators
* INTERSECT
* UNION
* DIFFERENCE
* PRODUCT
* DIVIDE
SELECT
Subset of rows based on condition
Select
PROJECT
Subset of columns
PRODUCT
All possible pairings of two tables
Product
JOIN
* Combines information from two or more tables
* Real power behind the relational database, allowing the use of independent tables linked by common attributes
* Join is actually a combination of Product, Select, and Project
Fig 3-11 Join tables
Natural JOIN
Links tables by selecting rows with common values in common attribute(s) (typically equality on foreign keys).
Three stage process:
* Product creates one table
* Select yields appropriate rows
* Project yields single copy of each attribute to eliminate duplicate columns
Natural Join: Step 1 Product
Natural Join: Select
Natural Join Step 2: Select
Natural Join: Step 2 Select
Natural Join Step 3: Project
Natural Join Step 3: Project
Natural Join: Outcome
* Final outcome yields table that
o Does not include unmatched pairs
o Provides only copies of matches
* If no match is made between the table rows,
o the new table does not include the unmatched row
Other Joins
EquiJOIN
* Links tables based on equality condition that compares specified columns of tables
* Does not eliminate duplicate columns
* Join criteria must be explicitly defined
Theta JOIN
* EquiJOIN that compares specified columns of each table
Outer JOIN
* Matched pairs are retained
* Unmatched values in other tables left null
* Right and left
Intersect
Difference
Yields all rows in one table not found in the other table�that is, it subtracts one table from the other
Union
Divide
Requires user of single-column table and two-column table (I have never seen a DIVIDE)
Divide
Data Dictionary and System Catalog
Data Dictionary
* Metadata
* Detailed account of all tables in database
System Catalog
* System-created database
* Stores database characteristics and contents
* The database stores its own description
* Tables can be queried just like any other tables
* Automatically produces database documentation
Relationships within Relational Database
Classified by cardinalities of entities participating
* 1:M
o Relational modeling ideal
o Should be the norm in any relational database design
* M:N
o Must be avoided because they lead to data redundancies
o Split into a pari of 1:M Relationships
* 1:1
o Should be rare
The 1:1 Relationship
* One entity can be related to only one other entity, and vice versa
* Often means that entity components were not defined properly
* Could indicate that two entities actually belong in the same table
* Sometimes 1:1 relationships are appropriate to help avoid nulls
E-R Diagram
* Graphical representation of model
* Chen
* Crows Feet
ERD Symbols
* Rectangles represent entities
* Chen:
o diamonds represent relations
o numbers or m and n are cardinalities
* Crows Foot
o Lines represent relationships
o bar crossing line is "1" side
o 3 pronged crows foot is many side
Example 1:M Relationship
ERD diagram 1:M Relationship
Implementation of 1:M Relationship
M:N Relationships
M:N Relationship ERD
M:N Relationship
* Cannot be directly implemented in a Relational DBMS
Student and Class Tables
Linking Table
* Implementation of a composite entity
* Yields required M:N to 1:M conversion
* Composite entity table must contain at least the primary keys of original tables
* Linking table contains multiple occurrences of the foreign key values
* Additional attributes may be assigned as needed
Composite ERD
Composite Entities
Linking Tables: Implementation
Linking Tables
Data Redundancy Revisited
Foreign keys control and reduce redundancy.
It is not that there is no redundancy in a properly designed database: the redundancy there is is controlled.
Indexes
Note: I do not consider indexes part of the relational model. They are an implementation artifact to improve performance
* Arrangement used to logically access rows in a table
* Index key
o Index�s reference point
o Points to data location identified by the key
* Unique index
o Index in which the index key can only have one pointer value (row) associated with it
* Each index is associated with only one table
* Tthe relational database model takes a logical view of data
* The relational model's basic components are entities, attributes, and relationships among entities
* How entities and their attributes are organized into tables
* About relational database operators, the data dictionary, and the system catalog
* How data redundancy is handled in the relational database model
* Why indexing is important
A logical View of Data
* Relational model
o Enables us to view data logically rather than physically
o Reminds us of simpler file concept of data storage or spreadsheet concept of table
* Table
o Two-dimensional structure composed of rows and columns
o Has advantages of structural and data independence
o Resembles a file from conceptual point of view
o Easier to understand than its hierarchical and network database predecessors
Tables
* Table also called a relation because the relational model's creator, Codd, used this term.
* A relation (and a relational DBMS's table) differs from a table in that there is no default order of rows or columns
o Note SQL does use a column order but this is a limitation of SQL, not the model.
* Each row (tuple) represents a single entity in an entity set
* Each column represents an attribute and each must have a distinct name
* Each row/column intersection (cell) represents a single atomic value
* All values in a column must have same data format
* Each column has a specific range of values known as a domain
* Each row must have some combination of attributes that uniquely identifies it (key)
Student Table Example
Student Table
Keys
* A key consists of one or more attributes that determine other attributes
* Primary key (PK) is an attribute (or a combination of attributes) that uniquely identifies any given entity (row)
* Key's role is based on determination
o If you know the value of attribute A, you can look up (determine) the value of attribute B
Types of Keys
Composite key
Key consisting of multiple attributes
Key attribute
Any attribute that is part of a key
Superkey
Any key that uniquely identifies each entity
The entire row (all attributes) is a superkey
Candidate key
A superkey without redundancies
Null Values
* Can represent
o An unknown attribute value
o A known, but missing, attribute value
o A 'not applicable' condition
* Not permitted in Primary key attributes
* Not the same as 0 or ""
* Can create problems in logic and using formulas
Controlled Redundancy
* Makes the relational database work
* Tables within the database share common attributes that enable us to link tables together
* Multiple occurrences of values in a table are not redundant when they are required to make the relationship work
* Uncontrolled Redundancy is unnecessary duplication of data
* In general any values duplicated between tables should be part of a Foreign Key -- Primary key relation (see below)
Foreign Keys
Foreign key (FK)
An attribute whose values match primary key values in the related table
Referential integrity
FK contains a value that refers to an existing valid tuple (row) in another relation
Secondary key
Key used strictly for data retrieval purposes
Keys (Summary)
Key Type Definition
Superkey Any attribute or combination of attributes that uniquely identifies a row in the table
Candiate Key aka Minimal Superkey. A superkey that does not contain a subset of attributes that is itself a superkey
Primary Key The candidate key selected to uniquely identify all rows
Cannot contain null values
Foreign Key An attribute (or combination of attributes) in one table that must either match the primary key of another table or be null
Secondary Key An attribute or combination of attributes used to make data retrieval more efficient.
Integrity Rules
Entity Integrity
* All entities are unique
* Each entity has unique primary key
Referential Integrity
* Foreign key must match primary key of referenced table or be null
* Impossible to delete row whose primary key has matching foreign key in another table
Relational Algebra
Key operators:
* SELECT
* PROJECT
* JOIN
Other operators
* INTERSECT
* UNION
* DIFFERENCE
* PRODUCT
* DIVIDE
SELECT
Subset of rows based on condition
Select
PROJECT
Subset of columns
PRODUCT
All possible pairings of two tables
Product
JOIN
* Combines information from two or more tables
* Real power behind the relational database, allowing the use of independent tables linked by common attributes
* Join is actually a combination of Product, Select, and Project
Fig 3-11 Join tables
Natural JOIN
Links tables by selecting rows with common values in common attribute(s) (typically equality on foreign keys).
Three stage process:
* Product creates one table
* Select yields appropriate rows
* Project yields single copy of each attribute to eliminate duplicate columns
Natural Join: Step 1 Product
Natural Join: Select
Natural Join Step 2: Select
Natural Join: Step 2 Select
Natural Join Step 3: Project
Natural Join Step 3: Project
Natural Join: Outcome
* Final outcome yields table that
o Does not include unmatched pairs
o Provides only copies of matches
* If no match is made between the table rows,
o the new table does not include the unmatched row
Other Joins
EquiJOIN
* Links tables based on equality condition that compares specified columns of tables
* Does not eliminate duplicate columns
* Join criteria must be explicitly defined
Theta JOIN
* EquiJOIN that compares specified columns of each table
Outer JOIN
* Matched pairs are retained
* Unmatched values in other tables left null
* Right and left
Intersect
Difference
Yields all rows in one table not found in the other table�that is, it subtracts one table from the other
Union
Divide
Requires user of single-column table and two-column table (I have never seen a DIVIDE)
Divide
Data Dictionary and System Catalog
Data Dictionary
* Metadata
* Detailed account of all tables in database
System Catalog
* System-created database
* Stores database characteristics and contents
* The database stores its own description
* Tables can be queried just like any other tables
* Automatically produces database documentation
Relationships within Relational Database
Classified by cardinalities of entities participating
* 1:M
o Relational modeling ideal
o Should be the norm in any relational database design
* M:N
o Must be avoided because they lead to data redundancies
o Split into a pari of 1:M Relationships
* 1:1
o Should be rare
The 1:1 Relationship
* One entity can be related to only one other entity, and vice versa
* Often means that entity components were not defined properly
* Could indicate that two entities actually belong in the same table
* Sometimes 1:1 relationships are appropriate to help avoid nulls
E-R Diagram
* Graphical representation of model
* Chen
* Crows Feet
ERD Symbols
* Rectangles represent entities
* Chen:
o diamonds represent relations
o numbers or m and n are cardinalities
* Crows Foot
o Lines represent relationships
o bar crossing line is "1" side
o 3 pronged crows foot is many side
Example 1:M Relationship
ERD diagram 1:M Relationship
Implementation of 1:M Relationship
M:N Relationships
M:N Relationship ERD
M:N Relationship
* Cannot be directly implemented in a Relational DBMS
Student and Class Tables
Linking Table
* Implementation of a composite entity
* Yields required M:N to 1:M conversion
* Composite entity table must contain at least the primary keys of original tables
* Linking table contains multiple occurrences of the foreign key values
* Additional attributes may be assigned as needed
Composite ERD
Composite Entities
Linking Tables: Implementation
Linking Tables
Data Redundancy Revisited
Foreign keys control and reduce redundancy.
It is not that there is no redundancy in a properly designed database: the redundancy there is is controlled.
Indexes
Note: I do not consider indexes part of the relational model. They are an implementation artifact to improve performance
* Arrangement used to logically access rows in a table
* Index key
o Index�s reference point
o Points to data location identified by the key
* Unique index
o Index in which the index key can only have one pointer value (row) associated with it
* Each index is associated with only one table
Wednesday, November 5, 2008
Chapter 2: Data Models
Note There's minimal coverage of Hierarichcal and Network models as I consider them of historical importance only
Topics
* Business rules
* Data models
o About the basic data-modeling building blocks
o How the major data models evolved, and their advantages and disadvantages
o How data models can be classified by level of abstraction
Business Rules
* A policy, procedure, or principle within a specific organization's environment
* Applies to any organization that stores and uses data to generate information
* Description of operations that help to create and enforce actions within that organization's environment
* Sometimes are external to the organization
* Describe characteristics of the data as viewed by the company
Business Rules (cont)
* Must be rendered in writing
* Brief, precise, unambiguous
* Must be kept up to date
* Must be easy to understand and widely disseminated
Sources of Business Rules
* Company managers
* Policy makers
* Department managers
* Written documentation
o Procedures
o Standards
o Operations manuals
* Direct interviews with end users
Importance of Business Rules
* Promote creation of an accurate data model
* Standardize company's view of data
* Constitute a communications tool between users and designers
* Allow designer to understand the nature, role, and scope of data
* Allow designer to understand business processes
* Allow designer to develop appropriate relationship participation rules and constraints
The Importance of Data Models
* Good database design uses an appropriate data model as its foundation
* End-users have different views and needs for data
* Data model organizes data for various users
Data model
Relatively simple representation, usually graphical, of complex real-world data structures
Communications tool to facilitate interaction among the designer, the applications programmer, and the end user
Data Model Building Blocks
Entity
Anything about which data are to be collected and stored
Attribute
A characteristic of an entity (e.g. last name)
Relationship
an association among (two or more) entities
* One-to-many (1:M) relationship
* Many-to-many (M:N or M:M) relationship
* One-to-one (1:1) relationship
The Evolution of Data Models
* Hierarchical
* Network
* Relational
* Entity relationship
* Object oriented
Crucial Database Components
Schema
Conceptual organization of entire database as viewed by the database administrator
Subschema
Defines database portion "seen" by the application programs that actually produce the desired information from data contained within the database
Data Definition Language (DDL)
Define data characteristics and data structure in order to manipulate the data
DML Concepts
Schema Data Definition Language (DDL)
Enables database administrator to define schema components and relationships
Subschema DDL
Allows application programs to define database components that will be used
DML
Manipulates database contents
The Relational Model
* Developed by Codd (IBM) in 1970
* Considered ingenious but impractical in 1970
* Conceptually simple
* Computers lacked power to implement the relational model
* Today, microcomputers can run sophisticated relational database software
Relational Model: Basic Structure
* Relational Database Management System (RDBMS)
* Most important advantage of the RDBMS is its ability to let the user/designer operate in a human logical environment
Relation (Table)
Matrix consisting of a series of row/column intersections
Related to each other by sharing a common entity characteristic
Relational schema
Representation of relational database's entities, attributes within those entities, and relationships between those entities
Represented as DDL or Visually
Alternative Terminology
Relational Common Traditional
relation table file
tuple row record
attribute cell field
instance value value
Relational Schema
Relational Schema
Linking Relational Tables
Fig 2.4 Linking Relations
Advantages of the Relational Model
* Structural independence
* Improved conceptual simplicity
* Easier database design, implementation, management, and use
* Ad hoc query capability
* Powerful database management system
Disadvantages of the Relational Model
* Substantial hardware and system software overhead
* May not fit all business models
* Can facilitate poor design and implementation
* May promote "islands of information" problems
The Entity Relationship Model
* Widely accepted and adapted graphical tool for data modeling
* Introduced by Chen in 1976
* Graphical representation of entities and their relationships in a database structure
Basic Structure of the Entity Relationship Model
* Entity relationship diagram (ERD)
o Uses graphic representations to model database components
o Entity is mapped to a relational table
* Entity instance (or occurrence) is row in table
* Entity set is collection of like entities
* Connectivity labels types of relationships
Relationships: the Basic Chen ERD
Note: we will be using mostly CrowsFoot notation
Fig 2.6 Chen ERD
Crow's Foot ERD
Fig 2.7 Crow's Foot ERD
Advantages of E.R. Model
* Exceptional conceptual simplicity
* Visual representation
* Effective communication tool
* Integrated with the relational data model
Disadvantages of E.R. Model
* Limited constraint representation
* Limited relationship representation
* No data manipulation language
* Loss of information content
The Object Oriented Model
* Semantic data model (SDM) developed by Hammer and McLeod in 1981
* Modeled both data and their relationships in a single structure known as an object
* Basis of object oriented data model (OODM)
* OODM becomes the basis for the object oriented database management system (OODBMS)
Object Oriented Model (cont)
* Object is described by its factual content
o Like relational model's entity
* Includes information about relationships between facts within object and relationships with other objects
o Unlike relational model's entity
* Subsequent OODM development allowed an object to also contain operations (aka methods)
* Object becomes basic building block for autonomous structures
Developments that Boosted OODM
* Growing costs put a premium on code reusability
* Complex data types and system requirements became difficult to manage with a traditional RDBMS
* Became possible to support increasingly sophisticated transaction & information requirements
* Ever-increasing computing power made it possible to support the large computing overhead required
OODM: Basic Structure
* Object: abstraction of a real-world entity
* Attributes describe the properties of an object
* Objects that share similar characteristics are grouped in classes
* Classes are organized in a class hierarchy
* Inheritance is the ability of an object within the class hierarchy to inherit the attributes and methods of classes above it
Comparison of OO and ER Models
Fig 2.6 Comparison between OO and ER Model
OO Model Advantages
* Adds semantic content
* Visual presentation includes semantic content
* Database integrity
* Both structural and data independence
OO Model Disadvantages
* Slow pace of OODM standards development
* Complex navigational data access
* Steep learning curve
* High system overhead slows transactions
* Lack of market penetration
* Current relational DBMS's have many OO characteristics
Other Models: Extended Relational Data Model (ERDM)
* Semantic data model developed in response to increasing complexity of applications
* DBMS based on the ERDM often described as an object/relational database management system (O/RDBMS)
* Primarily geared to business applications
Date's Objection to ERDM
* Given proper support for domains, relational data models are quite capable of handling complex data
o Therefore, capability that is supposedly being extended is already there
* O/RDM label is not accurate because the relational data model's domain is not an object model structure
Data Models: A Summary
* Each new data model was intended to overcome the shortcomings of previous models
* Common characteristics:
o Conceptual simplicity without compromising the semantic completeness of the database
o Represent the real world as closely as possible
o Representation of real-world transformations (behavior) must be in compliance with consistency and integrity characteristics of any data model
Degrees of Abstraction
* Way of classifying data models
* Many processes begin at high level of abstraction and proceed to an ever-increasing level of detail
* Designing a usable database follows the same basic process
ANSI/SPARC Data Models
American National Standards Institute/Standards Planning and Requirements Committee (ANSI/SPARC)
Classified data models according to their degree of abstraction (1970s):
* Conceptual
* External
* Internal
Data Abstraction Levels
Data Abstraction Levels
The Conceptual Model
* Represents global view of the database
* Enterprise-wide representation of data as viewed by high-level managers
* Basis for identification and description of main data objects, avoiding details
* Most widely used conceptual model is the entity relationship (ER) model
Conceptual Model Tiny College
Advantages of Conceptual Model
* Provides a relatively easily understood macro level view of data environment
* Independent of both software and hardware
o Does not depend on the DBMS software used to implement the model
o Does not depend on the hardware used in the implementation of the model
o Changes in either the hardware or the DBMS software have no effect on the database design at the conceptual level
The Internal Model
* Representation of the database as �seen� by the DBMS
* Adapts the conceptual model to the DBMS
* Software dependent
* Hardware independent
The External Model
* End users� view of the data environment
* Requires that the modeler subdivide set of requirements and constraints into functional modules that can be examined within the framework of their external models
* Good design should:
o Consider such relationships between views
o Provide programmers with a set of restrictions that govern common entities
Division of Conceptual Model into External Models
Fig 2-13
Advantages of External Models
* Use of database subsets makes application program development much simpler
o Facilitates designer�s task by making it easier to identify specific data required to support each business unit�s operations
o Provides feedback about the conceptual model�s adequacy
* Creation of external models helps to ensure security constraints in the database design
External Models for Tiny College
The Physical Model
* Operates at lowest level of abstraction, describing the way data are saved on storage media such as disks or tapes
* Software and hardware dependent
* Requires that database designers have a detailed knowledge of the hardware and software used to implement database design
Levels of Data Abstraction
Summary
* A good DBMS will perform poorly with a poorly designed database
* A data model is a (relatively) simple abstraction of a complex real-world data-gathering environment
* Basic data modeling components are
o Entities
o Attributes
o Relationships
Note There's minimal coverage of Hierarichcal and Network models as I consider them of historical importance only
Topics
* Business rules
* Data models
o About the basic data-modeling building blocks
o How the major data models evolved, and their advantages and disadvantages
o How data models can be classified by level of abstraction
Business Rules
* A policy, procedure, or principle within a specific organization's environment
* Applies to any organization that stores and uses data to generate information
* Description of operations that help to create and enforce actions within that organization's environment
* Sometimes are external to the organization
* Describe characteristics of the data as viewed by the company
Business Rules (cont)
* Must be rendered in writing
* Brief, precise, unambiguous
* Must be kept up to date
* Must be easy to understand and widely disseminated
Sources of Business Rules
* Company managers
* Policy makers
* Department managers
* Written documentation
o Procedures
o Standards
o Operations manuals
* Direct interviews with end users
Importance of Business Rules
* Promote creation of an accurate data model
* Standardize company's view of data
* Constitute a communications tool between users and designers
* Allow designer to understand the nature, role, and scope of data
* Allow designer to understand business processes
* Allow designer to develop appropriate relationship participation rules and constraints
The Importance of Data Models
* Good database design uses an appropriate data model as its foundation
* End-users have different views and needs for data
* Data model organizes data for various users
Data model
Relatively simple representation, usually graphical, of complex real-world data structures
Communications tool to facilitate interaction among the designer, the applications programmer, and the end user
Data Model Building Blocks
Entity
Anything about which data are to be collected and stored
Attribute
A characteristic of an entity (e.g. last name)
Relationship
an association among (two or more) entities
* One-to-many (1:M) relationship
* Many-to-many (M:N or M:M) relationship
* One-to-one (1:1) relationship
The Evolution of Data Models
* Hierarchical
* Network
* Relational
* Entity relationship
* Object oriented
Crucial Database Components
Schema
Conceptual organization of entire database as viewed by the database administrator
Subschema
Defines database portion "seen" by the application programs that actually produce the desired information from data contained within the database
Data Definition Language (DDL)
Define data characteristics and data structure in order to manipulate the data
DML Concepts
Schema Data Definition Language (DDL)
Enables database administrator to define schema components and relationships
Subschema DDL
Allows application programs to define database components that will be used
DML
Manipulates database contents
The Relational Model
* Developed by Codd (IBM) in 1970
* Considered ingenious but impractical in 1970
* Conceptually simple
* Computers lacked power to implement the relational model
* Today, microcomputers can run sophisticated relational database software
Relational Model: Basic Structure
* Relational Database Management System (RDBMS)
* Most important advantage of the RDBMS is its ability to let the user/designer operate in a human logical environment
Relation (Table)
Matrix consisting of a series of row/column intersections
Related to each other by sharing a common entity characteristic
Relational schema
Representation of relational database's entities, attributes within those entities, and relationships between those entities
Represented as DDL or Visually
Alternative Terminology
Relational Common Traditional
relation table file
tuple row record
attribute cell field
instance value value
Relational Schema
Relational Schema
Linking Relational Tables
Fig 2.4 Linking Relations
Advantages of the Relational Model
* Structural independence
* Improved conceptual simplicity
* Easier database design, implementation, management, and use
* Ad hoc query capability
* Powerful database management system
Disadvantages of the Relational Model
* Substantial hardware and system software overhead
* May not fit all business models
* Can facilitate poor design and implementation
* May promote "islands of information" problems
The Entity Relationship Model
* Widely accepted and adapted graphical tool for data modeling
* Introduced by Chen in 1976
* Graphical representation of entities and their relationships in a database structure
Basic Structure of the Entity Relationship Model
* Entity relationship diagram (ERD)
o Uses graphic representations to model database components
o Entity is mapped to a relational table
* Entity instance (or occurrence) is row in table
* Entity set is collection of like entities
* Connectivity labels types of relationships
Relationships: the Basic Chen ERD
Note: we will be using mostly CrowsFoot notation
Fig 2.6 Chen ERD
Crow's Foot ERD
Fig 2.7 Crow's Foot ERD
Advantages of E.R. Model
* Exceptional conceptual simplicity
* Visual representation
* Effective communication tool
* Integrated with the relational data model
Disadvantages of E.R. Model
* Limited constraint representation
* Limited relationship representation
* No data manipulation language
* Loss of information content
The Object Oriented Model
* Semantic data model (SDM) developed by Hammer and McLeod in 1981
* Modeled both data and their relationships in a single structure known as an object
* Basis of object oriented data model (OODM)
* OODM becomes the basis for the object oriented database management system (OODBMS)
Object Oriented Model (cont)
* Object is described by its factual content
o Like relational model's entity
* Includes information about relationships between facts within object and relationships with other objects
o Unlike relational model's entity
* Subsequent OODM development allowed an object to also contain operations (aka methods)
* Object becomes basic building block for autonomous structures
Developments that Boosted OODM
* Growing costs put a premium on code reusability
* Complex data types and system requirements became difficult to manage with a traditional RDBMS
* Became possible to support increasingly sophisticated transaction & information requirements
* Ever-increasing computing power made it possible to support the large computing overhead required
OODM: Basic Structure
* Object: abstraction of a real-world entity
* Attributes describe the properties of an object
* Objects that share similar characteristics are grouped in classes
* Classes are organized in a class hierarchy
* Inheritance is the ability of an object within the class hierarchy to inherit the attributes and methods of classes above it
Comparison of OO and ER Models
Fig 2.6 Comparison between OO and ER Model
OO Model Advantages
* Adds semantic content
* Visual presentation includes semantic content
* Database integrity
* Both structural and data independence
OO Model Disadvantages
* Slow pace of OODM standards development
* Complex navigational data access
* Steep learning curve
* High system overhead slows transactions
* Lack of market penetration
* Current relational DBMS's have many OO characteristics
Other Models: Extended Relational Data Model (ERDM)
* Semantic data model developed in response to increasing complexity of applications
* DBMS based on the ERDM often described as an object/relational database management system (O/RDBMS)
* Primarily geared to business applications
Date's Objection to ERDM
* Given proper support for domains, relational data models are quite capable of handling complex data
o Therefore, capability that is supposedly being extended is already there
* O/RDM label is not accurate because the relational data model's domain is not an object model structure
Data Models: A Summary
* Each new data model was intended to overcome the shortcomings of previous models
* Common characteristics:
o Conceptual simplicity without compromising the semantic completeness of the database
o Represent the real world as closely as possible
o Representation of real-world transformations (behavior) must be in compliance with consistency and integrity characteristics of any data model
Degrees of Abstraction
* Way of classifying data models
* Many processes begin at high level of abstraction and proceed to an ever-increasing level of detail
* Designing a usable database follows the same basic process
ANSI/SPARC Data Models
American National Standards Institute/Standards Planning and Requirements Committee (ANSI/SPARC)
Classified data models according to their degree of abstraction (1970s):
* Conceptual
* External
* Internal
Data Abstraction Levels
Data Abstraction Levels
The Conceptual Model
* Represents global view of the database
* Enterprise-wide representation of data as viewed by high-level managers
* Basis for identification and description of main data objects, avoiding details
* Most widely used conceptual model is the entity relationship (ER) model
Conceptual Model Tiny College
Advantages of Conceptual Model
* Provides a relatively easily understood macro level view of data environment
* Independent of both software and hardware
o Does not depend on the DBMS software used to implement the model
o Does not depend on the hardware used in the implementation of the model
o Changes in either the hardware or the DBMS software have no effect on the database design at the conceptual level
The Internal Model
* Representation of the database as �seen� by the DBMS
* Adapts the conceptual model to the DBMS
* Software dependent
* Hardware independent
The External Model
* End users� view of the data environment
* Requires that the modeler subdivide set of requirements and constraints into functional modules that can be examined within the framework of their external models
* Good design should:
o Consider such relationships between views
o Provide programmers with a set of restrictions that govern common entities
Division of Conceptual Model into External Models
Fig 2-13
Advantages of External Models
* Use of database subsets makes application program development much simpler
o Facilitates designer�s task by making it easier to identify specific data required to support each business unit�s operations
o Provides feedback about the conceptual model�s adequacy
* Creation of external models helps to ensure security constraints in the database design
External Models for Tiny College
The Physical Model
* Operates at lowest level of abstraction, describing the way data are saved on storage media such as disks or tapes
* Software and hardware dependent
* Requires that database designers have a detailed knowledge of the hardware and software used to implement database design
Levels of Data Abstraction
Summary
* A good DBMS will perform poorly with a poorly designed database
* A data model is a (relatively) simple abstraction of a complex real-world data-gathering environment
* Basic data modeling components are
o Entities
o Attributes
o Relationships
Chapter 1
File Systems and Databases
- Database
- Persistent collection of data and
- Metadata (data about the characteristics of the data and relationships
of the data)
- Database Management System
- collection of programs to manage the database
- Manages and enforces database structure
- Includes interface to manipulate the data
- Allows data to be shared among multiple users
Data vs Information
- Data:
- Raw facts; building blocks of information
- Unprocessed information
- Raw facts; building blocks of information
- Information:
- Data processed to reveal meaning
- Data processed to reveal meaning
- Accurate, relevant, and timely information is key to good decision making
- Good decision making is key to survival in global environment
Importance of DBMS
- Makes data management more efficient and effective
- Query language allows quick answers to ad hoc queries
- Provides easier access to more and better-managed data
- Promotes an integrated view of organization�s operations
- Reduces the chance of inconsistent data
- Helps protect against loss of data
DBMS Manages Interface Between Data and Users
Database Design: Why Design is important
- The database is the foundation of the information system.
- Design should reflect the expected use
- Poor design results in unwanted redundancy
- Poor design leads to inconsistent data
- Poor design leads to poor performance
- Poor design leads to improper information systems operation
Historical Roots
- First business computer applications focused on clerical tasks
- Requests for information quickly followed
- File systems developed to address needs
- Data organized according to expected use
- Data Processing (DP) specialists computerized manual file systems
File Systems

File System Data Management
- Requires extensive programming, typically in Third Generation Language
(3GL)
- Leads to islands of information and data redundancy
- Difficult to make ad hoc queries to obtain information
- Difficult to maintain data integrity
Data and Structural Dependence
- Data characteristics are embodied in programs not stored with the data.
- Changes in data characteristics requires modifying programs
- Changes in file structures require modification of related programs
Data Redundancy
- Different and possibly conflicting versions of same data
- Results in problems during data:
- Modification (e.g. address changes)
- Insertion
- Deletion
- Modification (e.g. address changes)
- Data inconsistency: Lack of integrity
Database Systems
- Database consists of logically related data stored in a single repository
- Advantages over file system management approach:
- Eliminates inconsistency, data anomalies, data dependency, and structural
dependency problems
- Stores data structures, relationships, and access paths
- Eliminates inconsistency, data anomalies, data dependency, and structural
Database vs. File Systems
Database System Environment

Database System Types
- Scale
- Single User (desktop)
- Workgroup
- Enterprise
- Distributed or Federated
- Single User (desktop)
- Use
- Production/Transaction
- Decision Support/Data Warehouse
- Production/Transaction
Uses of Databases
- Transactional (or production):
- Supports a company�s day-to-day operations
- Supports a company�s day-to-day operations
- Data warehouse:
- Stores data used to generate information required to make tactical or strategic decisions
- Such decisions typically require �data massaging�
- Often used to store historical data
- Structure is quite different
- Stores data used to generate information required to make tactical or strategic decisions
DBMS Functions
- Metadata/Data Dictionary Management
- Data storage management
- Data transformation and presentation
- Security management and Multiuser access control
- Backup and recovery management
- Data integrity management
- Database language and application programming interfaces
- Database communication interfaces
Database Models
Collection of logical constructs used to represent data structure and relationships
- Conceptual Models: logical nature of data representation
- Implementation Models: how data are represented
Database Models
The hierarchical and network models are of historical interest only.
Database Models:
- Relational
- Entity-Relationship
- Object oriented
Relational Model
- Most common model
- Perceived by user as collection of tables containing data
- Actually has a formal definitin based on set theory
- Tables are a series of row/column intersections
- Tables related by sharing common entity characteristic(s)
Relational Database

Relational Database Model Advantages
- Structural independence
- Improved conceptual simplicity
- Easier database design, implementation, management, and use
- Ad hoc query capability with SQL (standard interface)
- Powerful database management system
Relational Database Model Disadvantages
- Substantial hardware and system software overhead
- Poor design and implementation is made easy
- Not a cure all: May promote "islands of information" problems
- SQL is not completely standardized. One DBMS is not a "drop in"
replacement for another.
- May have problems storing some types of data
Entity Relationship Database Model
- Primarily a database design tool.
- Complements the relational data model concepts
- Represented in an entity relationship diagram (ERD)
- Based on entities, attributes, and relationships
ER-Diagram

ER Model Advantages
- Conceptual simplicity
- Visual representation
- Effective communication tool
- Integrated with the relational database model
ER Model Disadvatages
- Limited constraint representation
- Limited relationship representation
- No data manipulation language
- Loss of information content
- May be overly complex for end users
Object-Oriented Model
- Objects or abstractions of real-world entities are stored
- Attributes describe properties
- Collection of similar objects is a class
- Methods represent real world actions of classes
- Classes are organized in a class hierarchy
- Objects inherit attributes and methods of classes above.
Object Oriented Model

OO Model Advantages
- Adds semantic context
- Structural and data independence
- May mesh well with Object Oriented Programming
OO Model Disadvantages
- Lack of standards in model
- Lack of standard manipulation languages
- Complex navigational data access
- Steep learning curve
- Poor performance
Subscribe to:
Posts (Atom)

