Thursday, November 6, 2008



* The basic commands and functions of SQL
* How to use SQL for data administration (to create tables, indexes, and views)
* How to use SQL for data manipulation (to add, modify, delete, and retrieve data)
* How to use SQL to query a database to extract useful information

Introduction to SQL Part I

* The relational DBMS is the standard for database management.
* The Structured Query Language, SQL, is the standard for working with them
* This chapter is an introduction to essential SQL.

SQL strengths

Covers both

*

Data definition
*

Data manipulation

SQL is relatively easy to learn.

ANSI prescribes a standard SQL.
SQL Weaknesses

*

Some eccentric notation
o

use of ' marks; in strings, rather than "
o

Wildcards: % instead of *
*

Some things are hard to do
*

Different Vendors implement different dialects
*

Not a good conceptual match to most programming language
*

Strictly DDL and DML no standard procedural language.

DB2 concepts

* DB2 consists of multiple "instances" on each server (we have one)
* Within each instance there are databases: we have two, SAMPLE and DBMS, and will be using DBMS
* Within the databases are schemas one for each user.
* Authorization for users is via the system (cs1 account). Your schema name and user name are the same as your username on CS1

Setup for demonstrations

1. Use the winsql program to connect to the datasource DBMS using your CS1 username and password
2. Perform the following commands to create the tables:

drop table vendor; drop table product; drop table customer;
create table vendor like CH06_SALESCO.vendor;
insert into vendor select * from CH06_SALESCO.vendor;
alter table vendor add primary key (v_code);

create table product like CH06_SALESCO.product;
insert into product select * from CH06_SALESCO.product;
alter table product add primary key (p_code)
add foreign key (v_code) references vendor on delete set null on update restrict;



Data Definition Commands
The Database Model

Simple Database -- PRODUCT and VENDOR tables
Each product is supplied by only a single vendor.
A vendor may supply many products.

Data Definition Commands
The Tables and Their Components

*

The VENDOR table contains vendors who are not referenced in the PRODUCT table. PRODUCT is optional to VENDOR.
*

Existing V_CODE values in the PRODUCT table must have a match in the VENDOR table.
*

A few products are supplied factory-direct, a few are made in-house, and a few may have been bought in a special warehouse sale. That is, a product is not necessarily supplied by a vendor. VENDOR is optional to PRODUCT.

Common SQL Datatypes

Data Type


SQL

Numeric


NUMBER(L,D)

DECIMAL(L,D)

INTEGER

SMALLINT

Character


CHAR (L)

VARCHAR (L)

Date


DATE
Data Definition Commands
Creating the Database Structure

This varies among databases. In DB2 there are instances

within each instance there are databases

within each database there are schemas (one for each user). DB2 users are the same as Operating System users.

Statements in DB2 referencing a table include schema (SELECT * FROM schema.tablename) (current schema are implicit).
Data Definition Commands
Creating Table Structures

CREATE TABLE

(
);

Data Definition Commands

CREATE TABLE VENDOR(
V_CODE INTEGER NOT NULL PRIMARY KEY DEFAULT 0,
V_NAME VARCHAR(15),
V_CONTACT VARCHAR(50),
V_AREACODE VARCHAR(3),
V_PHONE VARCHAR(8),
V_STATE VARCHAR(2),
V_ORDER VARCHAR(1)
)

Data Definition Commands

CREATE TABLE PRODUCT(
P_CODE VARCHAR(10) NOT NULL PRIMARY KEY,
P_DESCRIPT VARCHAR(35),
P_INDATE DATE,
P_ONHAND SMALLINT DEFAULT 0,
P_MIN SMALLINT DEFAULT 0,
P_PRICE DECIMAL(15, 2) DEFAULT 0,
P_DISCOUNT DOUBLE DEFAULT 0,
V_CODE INTEGER DEFAULT 0 REFERENCES VENDOR(V_CODE)
)

Data Definition Commands
SQL Integrity Constraints

Entity Integrity
PRIMARY KEY
NOT NULL and UNIQUE
Referential Integrity
FOREIGN KEY
ON DELETE
ON UPDATE
Check Constraint
Validates data when an attribute value is entered

Basic Data Management
Data Entry

INSERT INTO
VALUES (attribute 1 value, attribute 2 value, ... etc.);

INSERT INTO VENDOR VALUES(26000, 'Quality Tools', 'Johnson', '915','555-3234', 'TX', 'N');
INSERT INTO PRODUCT VALUES('14 ABC12', 'Concrete Saw', '09/02/1996', 2, 1, 510.99, 0.00, 26000, '');

Basic Data Management
Committing Changes

Changes do not take place until they are committed assuming autocommit is off. Many end user query environments (including ours) do not support turning autocommit off.

COMMIT ;

Listing the Table Contents

SELECT * FROM PRODUCT;

SELECT P_CODE, P_DESCRIPT, P_INDATE, P_ONHAND,
P_MIN, P_PRICE, P_DISCOUNT, V_CODE
FROM PRODUCT;

Basic Data Management
Making a Correction

UPDATE PRODUCT SET P_INDATE = '2003-11-15'
WHERE P_CODE = '13-Q2/P2';

UPDATE PRODUCT SET P_INDATE = '2003-11-15', P_PRICE = 15.99, P_MIN = 10
WHERE P_CODE = '13-Q2/P2';

Restoring the Table Contents (assumes autocommit).

ROLLBACK

Basic Data Management
Deleting Table Rows

DELETE FROM PRODUCT WHERE P_CODE = '2238/QPD';

DELETE FROM PRODUCT WHERE P_MIN = 5;

Delete is a dangerous command. Typing:
DELETE FROM

will neatly delete all the records in the table!
Queries
Partial Listing of Table Contents

SELECT FROM
WHERE ;

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE
FROM PRODUCT
WHERE V_CODE = 21344;

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE
FROM PRODUCT
WHERE V_CODE <> 21344;

SELECT P_DESCRIPT, P_ONHAND, P_MIN, P_PRICE
FROM PRODUCT
WHERE P_PRICE <= 10; Queries Using Mathematical Operators on Character Attributes SELECT P_DESCRIPT, P_ONHAND, P_MIN, P_PRICE FROM PRODUCT WHERE P_CODE < '1558-QWI'; Using Mathematical Operators on Dates SELECT P_DESCRIPT, P_ONHAND, P_MIN, P_PRICE, P_INDATE FROM PRODUCT WHERE P_INDATE >= '01/01/2004';

Queries
Logical Operators: AND, OR, and NOT

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE
FROM PRODUCT
WHERE V_CODE = 21344 OR V_CODE = 21225;

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE
FROM PRODUCT
WHERE P_PRICE <> '01/01/2004';

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE
FROM PRODUCT
WHERE (P_PRICE <> '01/01/2004')
OR V_CODE = 24288;

Queries: Special Operators

*

BETWEEN - used to define range limits.
*

IS NULL - used to check whether an attribute value is null
*

LIKE - used to check for similar character strings.
*

IN - used to check whether an attribute value matches a value contained within a (sub)set of listed values.
*

EXISTS - used to check whether an attribute has a value. In effect, EXISTS is the opposite of IS NULL.

Queries: Special Operators
BETWEEN is used to define range limits.

SELECT *
FROM PRODUCT
WHERE P_PRICE BETWEEN 10.00 AND 100.00;

SELECT * FROM PRODUCT
WHERE P_PRICE > 10.00 AND P_PRICE <>
MODIFY ( );

ALTER TABLE
ADD ( );

Changing a Column's Data Type

Probably illegal in DB2

ALTER TABLE PRODUCT MODIFY (V_CODE CHAR(5));
ALTER TABLE PRODUCT MODIFY (P_PRICE DECIMAL(9,2));

Adding a New Column to the Table

ALTER TABLE PRODUCT ADD column P_SALECODE CHAR(1) ;

Updating Data

UPDATE PRODUCT SET P_SALECODE = '2' WHERE P_CODE = '1546-QQ2';
UPDATE PRODUCT SET P_SALECODE = '1' WHERE P_CODE IN ('13-Q2/P2', '2232/QTY');
UPDATE PRODUCT SET P_SALECODE = '2' WHERE P_INDATE < '01/01/2004'; UPDATE PRODUCT SET P_SALECODE = '1' WHERE P_INDATE >= '01/01/2004'AND P_INDATE < '10/20/2004'; Copying Tables Copying table definitions and data CREATE TABLE NEWPRODUCT LIKE PRODUCT; INSERT INTO NEWPRODUCT SELECT * FROM PRODUCT; Copying Parts of Tables CREATE TABLE PART (PART_CODE CHAR(8) NOT NULL UNIQUE, PART_DESCRIPT CHAR(35), PART_PRICE DECIMAL(8,2), PRIMARY KEY(PART_CODE)); INSERT INTO PART (PART_CODE, PART_DESCRIPT, PART_PRICE) SELECT P_CODE, P_DESCRIPT, P_PRICE FROM PRODUCT; Deleting a Table from the Database DROP TABLE
;

DROP TABLE PART;

Primary and Foreign Key Designation

(Note we did these when we created the table)

ALTER TABLE PRODUCT ADD PRIMARY KEY (P_CODE);

ALTER TABLE PRODUCT ADD FOREIGN KEY (V_CODE) REFERENCES VENDOR;

ALTER TABLE PRODUCT ADD PRIMARY KEY (P_CODE) ADD FOREIGN KEY (V_CODE) REFERENCES VENDOR;

Notes


Normalization

* What is Normalization?
* Why is its done?
* The normal forms 1NF, 2NF, 3NF, BCNF, 4NF
* Transforming normal forms
* E-R modeling and normalization
* Denormalization

Database Tables and Normalization

Normalization is a process for assigning attributes to entities to:

* Reduce data redundancies
o Help eliminate data anomalies
* Produce controlled redundancies to link tables
* No information is lost in normalization
* Result will be a database that can produce the same information as the original

Normalization Process

Normalizatin works through a series of stages called normal forms:

* First Normal form (1NF)
* Second normal form (2NF)
* Third normal form (3NF)
* etc (4th and 5th)

* 2NF is better than 1NF; 3NF is better than 2NF
* For most business database design purposes, 3NF is highest we need to go in the normalization process
* Highest level of normalization is not always most desirable

The Need for Normalization

Example: company that manages building projects

* Charges its clients by billing hours spent on each contract
* Hourly billing rate is dependent on employee�s position
* Periodically, a report is generated that contains information as follows

Sample Report

Table derived form Above

The Need for Normalization

* Structure of data set in Figure 5.1 does not handle data very well
* The table structure appears to work; report is generated with ease
* Unfortunately, the report may yield different results, depending on what data anomaly has occurred

Issues

* Table entries invite data inconsistencies
* Table displays potential data anomalies
o Update: Modifying JOB_CLASS
o Insertion: New Employee must be assigned project
o Deletion: If employee deleted, other vital data lost: if emp 103 leaves lose info on Elect Engineers

Repeating Group

Repeating group
Derives its name from the fact that a group of multiple (related) entries can exist for any single key attribute occurrence

* Relational table must not contain repeating groups
* Normalizing the table structure will reduce these data redundancies
* Normalization is three-step procedure

Converting to First Normal Form

A table in a relational database must be in 1NF.

* Repeating groups must be eliminated
* Primary key determined
o Uniquely identify attribute values (rows)
o All attributes dependent on primary key
o In example: Combination of PROJ_NUM and EMP_NUM

Dependencies

* Dependencies can be depicted with the help of a diagram
* Dependency diagram:
o Depicts all dependencies found within a given table structure
o Helpful in getting bird�s-eye view of all relationships among a table�s attributes
o Use makes it much less likely that an important dependency will be overlooked

* Desirable dependencies based on entire primary key
* Less desirable dependencies

Partial:
Based on part of composite primary key
Transitive:
One nonprime attribute depends on another nonprime attribute

Dependency Diagram

Fig 5.3: Dependency Diagram
1NF: Definition

* Tabular format in which:
o All key attributes are defined
o There are no repeating groups in the table
o All attributes are dependent on primary key
* All relational tables must satisfy 1NF requirements
* Some tables contain partial dependencies
o Dependencies based on only part of the primary key
o Sometimes used for performance reasons, but should be used with caution
* Still subject to data redundancies

Second Normal Form

1. Identify all key components
* Write each key component on separate line
* Write original key on last line
* Write dependent attributes after each key.
2. Each line will become a new table

Second Normal Form Conversion Results

Second Normal Form Defined

Table is in second normal form (2NF) if:

* It is in 1NF and
* It includes no partial dependencies:
* No attribute is dependent on only a portion of the primary key

Converting to Third Normal Form

* Resolve transitive dependencies (attributes dependent on non-key attributes)
* Create separate table for each transitive dependency

3NF Results

Boyce-Codd Normal Form

* Every determinant in the table is a candidate key
o Has same characteristics as primary key, but for some reason, not chosen to be primary key
* If a table contains only one candidate key, the 3NF and the BCNF are equivalent
* BCNF can be violated only if the table contains more than one candidate key

BCNF (cont)

* Most designers consider the Boyce-Codd normal form (BCNF) as a special case of 3NF
* A table is in 3NF if it is in 2NF and there are no transitive dependencies
o A table can be in 3NF and not be in BCNF
o A transitive dependency exists when one nonprime attribute is dependent on another nonprime attribute
o A nonkey attribute is the determinant of a key attribute

Table in 3nf but not BCNF

Decomposition to BCNF

Decomposition to BCNF
Fourth Normal Form

* Table is in 3NF
* Has no multiple sets of multivalued dependencies

Conversion to 4NF

* 4NF is largely academic if tables conform to the following two rules:
o All attributes are dependent on primary key but independent of each other
o No row contains two or more multivalued facts about an entity

Improving the Design

* Table structures are cleaned up to eliminate the troublesome initial partial and transitive dependencies
* Normalization cannot, by itself, be relied on to make good designs
* It is valuable because its use helps eliminate data redundancies

Improving the Design (cont)

The following changes were made:

* PK assignment
* Naming conventions
* Attribute atomicity
* Adding attributes
* Adding relationships
* Refining PKs
* Maintaining historical accuracy
* Dealing with derived attributes

Completed Database

Completed Database
Completed Database: Assign Table

5.6b: Assign Table
Completed Database: Employee

Final ERD for contracting company

Final ERD
Limitations on System Assigned Keys

* System-assigned primary key may not prevent confusing entries
* Data entries in Table 5.2 are inappropriate because they duplicate existing records
* Yet there has been no violation of either entity integrity or referential integrity
* Perhaps Job Description needs to be unique

Normalization and Database Design

* Normalization should be part of design process
* Make sure that proposed entities meet required normal form before table structures are created
* Many real-world databases have been improperly designed or burdened with anomalies if improperly modified during course of time
* You may be asked to redesign and modify existing databases

Normalization and Database Design (cont)

* E-R Diagram provides macro view, determines entities
* Normalization provides micro view of entities
o Focuses on characteristics of specific entities
o May yield additional entities
* Difficult to separate Normalization and ER diagramming
* Extra check: No attribute that is not a (primary/foreign) key should be repeated in the database (except to record historical data)

Denormalization

* Normalization is one of many database design goals
Normalized table requirements
o Additional processing
o Loss of system speed
* Normalization purity is difficult to sustain due to conflict in
o Design efficiency
o Information requirements
o Processing
* Do not be too quick to denormalize

Unnormalized Table Defects

* Data updates less efficient
* Indexing more cumbersome
* No simple strategies for creating views

Overnormalization

* This is done for performance reason frequently in distributed and clustered database systems
* Splits tables beyond the pont required for normal forms.
* In horizontal partitioning rows of a single logical table are split among several physical tables (e.g. geographically)
* In vertical partitioning a table is split vertically with commonly accessed columns in separate physical tables
Entity Relationship Modeling

* How relationships between entities are defined and refined, and how such relationships are incorporated into the database design process
* How ERD components affect database design and implementation
* How to interpret the modeling symbols for the four most popular ER modeling tools
* That real-world database design often requires that you reconcile conflicting goals

Entity Relationship Model and Diagram

* ER model forms the basis of an ER diagram
* ERD represents the conceptual database as viewed by end user
* ERDs depict the ER model�s three main components:
o Entities
o Attributes
o Relationships
* Several different diagramming conventions

Entities

* Refers to the entity set and not to a single entity occurrence
* Corresponds to a table and not to a row in the relational environment
* In both the Chen and Crow�s Foot models, an entity is represented by a rectangle containing the entity�s name
* Entity name, a noun, is usually written in capital letters

Attributes

* Characteristics of entities
* Domain is set of possible values
* Primary keys underlined

Attributes (cont)
Attributes (cont)

Simple
Cannot be subdivided
Age, sex, GPA
Composite
Can be subdivided
Address: street city state zip
Single-valued
Has only a single value
Social security number
Multi-valued
Can have many values
Person may have several college degrees
Derived
Can be calculated from other information
Age can be derived from D.O.B.

Multivalued Attributes

Resolving Multivalued Attribute Problems

Although the conceptual model can handle multivalued attributes, you should not implement them in the relational DBMS

* Within original entity, create several new attributes, one for each of the original multivalued attribute�s components
o Can lead to major structural problems in the table
* Create a new entity composed of original multivalued attribute�s components

Creating New Attributes

Creating New Entity Set

Relationships

* Associations between entities
* Established by Business Rules
* Connected entities termed participants
* Connectivity describes relationship classification:
o 1:1, 1:M, M:N
* Cardinality
o Number of entity occurences associated with one occurence of related entity

Connectivity and Cardinality in an ERD

Relationship Strength

* Existence Dependent
o Entity's existence depends on existence of another related entities
o Existence-independent entities can exist apart from related entities
o Employee claims Child
Child is dependent on employee
* Weak (non-identifying)
o One entity is existence-independent on another
o PK of dependent entity doesn't contain PK component of parent entity
o Book is somewhat confused on this
* Strong (identifying)
o One entity is existence-dependent on another
o PK of related entity contains PK component of parent entity

Relationship Participation

* Optional
o Entity occurrence does not require a corresponding occurrence in related entity
o Shown by drawing a small circle on side of optional entity on ERD
* Mandatory
o Entity occurrence requires corresponding occurrence in related entity
o If no optionality symbol is shown on ERD, it is mandatory

Weak Entity

* Existence-dependent on another entity
* Has primary key that is partially or totally derived from parent entity

Weak Entity
Mandatory Class Course relationship

Optional Class Entity in Professor Teaches Class

Degree of Relationship

A relationships degree indicates the number of associated entities.

Implementation of a Ternary Relationship

Composite Entity

* Used to replace M:N relationships with 1:N relationships
* Bridge entities composed of primary keys of each entity needing connection

Entity Subtypes and Supertypes
Generalization Hierarchy

* Depicts relationships between higher-level supertype and lower-level subtype entities
* Supertype has shared attributes
* Subtypes have unique attributes
* Disjoint relationships
o Unique subtypes
o Non-overlapping
o Indicated with a `G'
* Overlapping subtypes use `Gs' Symbol

Nulls Created by Unique Attributes

Generalization Hierarchy: Disjoint

Generalization Hierarchy: Overlapping and Disjoint

Generalization
Supertype/Subtype relationship in an ERD

Comparison of ER Modeling Symbols

ER Modeling Symbols
Developing an E-R Diagram

* Iterative Process
1. Develop general narrative of organizational operations
2. Draw Basic E-R Model
3. Modify E-R model to incorporate newly discovered components/relationships
* Repeat until designers and users agree E-R model comple

Dealing with Conflicting Goals in Database Design

* Database must be designed to conform to design standards
* High-speed processing may require design compromises
* Quest for timely information may be the focus of database design
Other concerns:
o Security
o Performance
o Shared access
o Integrity
o Capabilities of actual DBMS
The Relational Database Model

* Tthe relational database model takes a logical view of data
* The relational model's basic components are entities, attributes, and relationships among entities
* How entities and their attributes are organized into tables
* About relational database operators, the data dictionary, and the system catalog
* How data redundancy is handled in the relational database model
* Why indexing is important

A logical View of Data

* Relational model
o Enables us to view data logically rather than physically
o Reminds us of simpler file concept of data storage or spreadsheet concept of table
* Table
o Two-dimensional structure composed of rows and columns
o Has advantages of structural and data independence
o Resembles a file from conceptual point of view
o Easier to understand than its hierarchical and network database predecessors

Tables

* Table also called a relation because the relational model's creator, Codd, used this term.
* A relation (and a relational DBMS's table) differs from a table in that there is no default order of rows or columns
o Note SQL does use a column order but this is a limitation of SQL, not the model.
* Each row (tuple) represents a single entity in an entity set
* Each column represents an attribute and each must have a distinct name
* Each row/column intersection (cell) represents a single atomic value
* All values in a column must have same data format
* Each column has a specific range of values known as a domain
* Each row must have some combination of attributes that uniquely identifies it (key)

Student Table Example

Student Table
Keys

* A key consists of one or more attributes that determine other attributes
* Primary key (PK) is an attribute (or a combination of attributes) that uniquely identifies any given entity (row)
* Key's role is based on determination
o If you know the value of attribute A, you can look up (determine) the value of attribute B

Types of Keys

Composite key
Key consisting of multiple attributes
Key attribute
Any attribute that is part of a key
Superkey
Any key that uniquely identifies each entity
The entire row (all attributes) is a superkey
Candidate key
A superkey without redundancies

Null Values

* Can represent
o An unknown attribute value
o A known, but missing, attribute value
o A 'not applicable' condition
* Not permitted in Primary key attributes
* Not the same as 0 or ""
* Can create problems in logic and using formulas

Controlled Redundancy

* Makes the relational database work
* Tables within the database share common attributes that enable us to link tables together
* Multiple occurrences of values in a table are not redundant when they are required to make the relationship work
* Uncontrolled Redundancy is unnecessary duplication of data
* In general any values duplicated between tables should be part of a Foreign Key -- Primary key relation (see below)

Foreign Keys

Foreign key (FK)
An attribute whose values match primary key values in the related table
Referential integrity
FK contains a value that refers to an existing valid tuple (row) in another relation
Secondary key
Key used strictly for data retrieval purposes

Keys (Summary)
Key Type Definition
Superkey Any attribute or combination of attributes that uniquely identifies a row in the table
Candiate Key aka Minimal Superkey. A superkey that does not contain a subset of attributes that is itself a superkey
Primary Key The candidate key selected to uniquely identify all rows
Cannot contain null values
Foreign Key An attribute (or combination of attributes) in one table that must either match the primary key of another table or be null
Secondary Key An attribute or combination of attributes used to make data retrieval more efficient.
Integrity Rules
Entity Integrity

* All entities are unique
* Each entity has unique primary key

Referential Integrity

* Foreign key must match primary key of referenced table or be null
* Impossible to delete row whose primary key has matching foreign key in another table

Relational Algebra
Key operators:

* SELECT
* PROJECT
* JOIN

Other operators

* INTERSECT
* UNION
* DIFFERENCE
* PRODUCT
* DIVIDE

SELECT

Subset of rows based on condition

Select
PROJECT

Subset of columns

PRODUCT

All possible pairings of two tables

Product
JOIN

* Combines information from two or more tables
* Real power behind the relational database, allowing the use of independent tables linked by common attributes
* Join is actually a combination of Product, Select, and Project

Fig 3-11 Join tables
Natural JOIN

Links tables by selecting rows with common values in common attribute(s) (typically equality on foreign keys).

Three stage process:

* Product creates one table
* Select yields appropriate rows
* Project yields single copy of each attribute to eliminate duplicate columns

Natural Join: Step 1 Product

Natural Join: Select
Natural Join Step 2: Select

Natural Join: Step 2 Select
Natural Join Step 3: Project

Natural Join Step 3: Project
Natural Join: Outcome

* Final outcome yields table that
o Does not include unmatched pairs
o Provides only copies of matches
* If no match is made between the table rows,
o the new table does not include the unmatched row

Other Joins
EquiJOIN

* Links tables based on equality condition that compares specified columns of tables
* Does not eliminate duplicate columns
* Join criteria must be explicitly defined

Theta JOIN

* EquiJOIN that compares specified columns of each table

Outer JOIN

* Matched pairs are retained
* Unmatched values in other tables left null
* Right and left

Intersect

Difference

Yields all rows in one table not found in the other table�that is, it subtracts one table from the other

Union

Divide

Requires user of single-column table and two-column table (I have never seen a DIVIDE)

Divide
Data Dictionary and System Catalog
Data Dictionary

* Metadata
* Detailed account of all tables in database

System Catalog

* System-created database
* Stores database characteristics and contents
* The database stores its own description
* Tables can be queried just like any other tables
* Automatically produces database documentation

Relationships within Relational Database

Classified by cardinalities of entities participating

* 1:M
o Relational modeling ideal
o Should be the norm in any relational database design
* M:N
o Must be avoided because they lead to data redundancies
o Split into a pari of 1:M Relationships
* 1:1
o Should be rare

The 1:1 Relationship

* One entity can be related to only one other entity, and vice versa
* Often means that entity components were not defined properly
* Could indicate that two entities actually belong in the same table
* Sometimes 1:1 relationships are appropriate to help avoid nulls

E-R Diagram

* Graphical representation of model
* Chen
* Crows Feet

ERD Symbols

* Rectangles represent entities
* Chen:
o diamonds represent relations
o numbers or m and n are cardinalities
* Crows Foot
o Lines represent relationships
o bar crossing line is "1" side
o 3 pronged crows foot is many side

Example 1:M Relationship

ERD diagram 1:M Relationship
Implementation of 1:M Relationship

M:N Relationships

M:N Relationship ERD
M:N Relationship

* Cannot be directly implemented in a Relational DBMS

Student and Class Tables
Linking Table

* Implementation of a composite entity
* Yields required M:N to 1:M conversion
* Composite entity table must contain at least the primary keys of original tables
* Linking table contains multiple occurrences of the foreign key values
* Additional attributes may be assigned as needed

Composite ERD

Composite Entities
Linking Tables: Implementation

Linking Tables
Data Redundancy Revisited

Foreign keys control and reduce redundancy.

It is not that there is no redundancy in a properly designed database: the redundancy there is is controlled.
Indexes

Note: I do not consider indexes part of the relational model. They are an implementation artifact to improve performance

* Arrangement used to logically access rows in a table
* Index key
o Index�s reference point
o Points to data location identified by the key
* Unique index
o Index in which the index key can only have one pointer value (row) associated with it
* Each index is associated with only one table

Wednesday, November 5, 2008

Chapter 2: Data Models

Note There's minimal coverage of Hierarichcal and Network models as I consider them of historical importance only

Topics

* Business rules
* Data models
o About the basic data-modeling building blocks
o How the major data models evolved, and their advantages and disadvantages
o How data models can be classified by level of abstraction

Business Rules

* A policy, procedure, or principle within a specific organization's environment
* Applies to any organization that stores and uses data to generate information
* Description of operations that help to create and enforce actions within that organization's environment
* Sometimes are external to the organization
* Describe characteristics of the data as viewed by the company

Business Rules (cont)

* Must be rendered in writing
* Brief, precise, unambiguous
* Must be kept up to date
* Must be easy to understand and widely disseminated

Sources of Business Rules

* Company managers
* Policy makers
* Department managers
* Written documentation
o Procedures
o Standards
o Operations manuals
* Direct interviews with end users

Importance of Business Rules

* Promote creation of an accurate data model
* Standardize company's view of data
* Constitute a communications tool between users and designers
* Allow designer to understand the nature, role, and scope of data
* Allow designer to understand business processes
* Allow designer to develop appropriate relationship participation rules and constraints

The Importance of Data Models

* Good database design uses an appropriate data model as its foundation
* End-users have different views and needs for data
* Data model organizes data for various users

Data model
Relatively simple representation, usually graphical, of complex real-world data structures
Communications tool to facilitate interaction among the designer, the applications programmer, and the end user

Data Model Building Blocks

Entity
Anything about which data are to be collected and stored
Attribute
A characteristic of an entity (e.g. last name)
Relationship
an association among (two or more) entities

* One-to-many (1:M) relationship
* Many-to-many (M:N or M:M) relationship
* One-to-one (1:1) relationship

The Evolution of Data Models

* Hierarchical
* Network
* Relational
* Entity relationship
* Object oriented

Crucial Database Components

Schema
Conceptual organization of entire database as viewed by the database administrator
Subschema
Defines database portion "seen" by the application programs that actually produce the desired information from data contained within the database
Data Definition Language (DDL)
Define data characteristics and data structure in order to manipulate the data

DML Concepts

Schema Data Definition Language (DDL)
Enables database administrator to define schema components and relationships
Subschema DDL
Allows application programs to define database components that will be used
DML
Manipulates database contents

The Relational Model

* Developed by Codd (IBM) in 1970
* Considered ingenious but impractical in 1970
* Conceptually simple
* Computers lacked power to implement the relational model
* Today, microcomputers can run sophisticated relational database software

Relational Model: Basic Structure

* Relational Database Management System (RDBMS)
* Most important advantage of the RDBMS is its ability to let the user/designer operate in a human logical environment

Relation (Table)
Matrix consisting of a series of row/column intersections
Related to each other by sharing a common entity characteristic
Relational schema
Representation of relational database's entities, attributes within those entities, and relationships between those entities
Represented as DDL or Visually

Alternative Terminology
Relational Common Traditional
relation table file
tuple row record
attribute cell field
instance value value
Relational Schema

Relational Schema
Linking Relational Tables

Fig 2.4 Linking Relations
Advantages of the Relational Model

* Structural independence
* Improved conceptual simplicity
* Easier database design, implementation, management, and use
* Ad hoc query capability
* Powerful database management system

Disadvantages of the Relational Model

* Substantial hardware and system software overhead
* May not fit all business models
* Can facilitate poor design and implementation
* May promote "islands of information" problems

The Entity Relationship Model

* Widely accepted and adapted graphical tool for data modeling
* Introduced by Chen in 1976
* Graphical representation of entities and their relationships in a database structure

Basic Structure of the Entity Relationship Model

* Entity relationship diagram (ERD)
o Uses graphic representations to model database components
o Entity is mapped to a relational table
* Entity instance (or occurrence) is row in table
* Entity set is collection of like entities
* Connectivity labels types of relationships

Relationships: the Basic Chen ERD

Note: we will be using mostly CrowsFoot notation

Fig 2.6 Chen ERD
Crow's Foot ERD

Fig 2.7 Crow's Foot ERD
Advantages of E.R. Model

* Exceptional conceptual simplicity
* Visual representation
* Effective communication tool
* Integrated with the relational data model

Disadvantages of E.R. Model

* Limited constraint representation
* Limited relationship representation
* No data manipulation language
* Loss of information content

The Object Oriented Model

* Semantic data model (SDM) developed by Hammer and McLeod in 1981
* Modeled both data and their relationships in a single structure known as an object
* Basis of object oriented data model (OODM)
* OODM becomes the basis for the object oriented database management system (OODBMS)

Object Oriented Model (cont)

* Object is described by its factual content
o Like relational model's entity
* Includes information about relationships between facts within object and relationships with other objects
o Unlike relational model's entity
* Subsequent OODM development allowed an object to also contain operations (aka methods)
* Object becomes basic building block for autonomous structures

Developments that Boosted OODM

* Growing costs put a premium on code reusability
* Complex data types and system requirements became difficult to manage with a traditional RDBMS
* Became possible to support increasingly sophisticated transaction & information requirements
* Ever-increasing computing power made it possible to support the large computing overhead required

OODM: Basic Structure

* Object: abstraction of a real-world entity
* Attributes describe the properties of an object
* Objects that share similar characteristics are grouped in classes
* Classes are organized in a class hierarchy
* Inheritance is the ability of an object within the class hierarchy to inherit the attributes and methods of classes above it

Comparison of OO and ER Models

Fig 2.6 Comparison between OO and ER Model
OO Model Advantages

* Adds semantic content
* Visual presentation includes semantic content
* Database integrity
* Both structural and data independence

OO Model Disadvantages

* Slow pace of OODM standards development
* Complex navigational data access
* Steep learning curve
* High system overhead slows transactions
* Lack of market penetration
* Current relational DBMS's have many OO characteristics

Other Models: Extended Relational Data Model (ERDM)

* Semantic data model developed in response to increasing complexity of applications
* DBMS based on the ERDM often described as an object/relational database management system (O/RDBMS)
* Primarily geared to business applications

Date's Objection to ERDM

* Given proper support for domains, relational data models are quite capable of handling complex data
o Therefore, capability that is supposedly being extended is already there
* O/RDM label is not accurate because the relational data model's domain is not an object model structure

Data Models: A Summary

* Each new data model was intended to overcome the shortcomings of previous models
* Common characteristics:
o Conceptual simplicity without compromising the semantic completeness of the database
o Represent the real world as closely as possible
o Representation of real-world transformations (behavior) must be in compliance with consistency and integrity characteristics of any data model

Degrees of Abstraction

* Way of classifying data models
* Many processes begin at high level of abstraction and proceed to an ever-increasing level of detail
* Designing a usable database follows the same basic process

ANSI/SPARC Data Models

American National Standards Institute/Standards Planning and Requirements Committee (ANSI/SPARC)
Classified data models according to their degree of abstraction (1970s):

* Conceptual
* External
* Internal

Data Abstraction Levels

Data Abstraction Levels
The Conceptual Model

* Represents global view of the database
* Enterprise-wide representation of data as viewed by high-level managers
* Basis for identification and description of main data objects, avoiding details
* Most widely used conceptual model is the entity relationship (ER) model

Conceptual Model Tiny College

Advantages of Conceptual Model

* Provides a relatively easily understood macro level view of data environment
* Independent of both software and hardware
o Does not depend on the DBMS software used to implement the model
o Does not depend on the hardware used in the implementation of the model
o Changes in either the hardware or the DBMS software have no effect on the database design at the conceptual level

The Internal Model

* Representation of the database as �seen� by the DBMS
* Adapts the conceptual model to the DBMS
* Software dependent
* Hardware independent

The External Model

* End users� view of the data environment
* Requires that the modeler subdivide set of requirements and constraints into functional modules that can be examined within the framework of their external models
* Good design should:
o Consider such relationships between views
o Provide programmers with a set of restrictions that govern common entities

Division of Conceptual Model into External Models

Fig 2-13
Advantages of External Models

* Use of database subsets makes application program development much simpler
o Facilitates designer�s task by making it easier to identify specific data required to support each business unit�s operations
o Provides feedback about the conceptual model�s adequacy
* Creation of external models helps to ensure security constraints in the database design

External Models for Tiny College

The Physical Model

* Operates at lowest level of abstraction, describing the way data are saved on storage media such as disks or tapes
* Software and hardware dependent
* Requires that database designers have a detailed knowledge of the hardware and software used to implement database design

Levels of Data Abstraction

Summary

* A good DBMS will perform poorly with a poorly designed database
* A data model is a (relatively) simple abstraction of a complex real-world data-gathering environment
* Basic data modeling components are
o Entities
o Attributes
o Relationships



Files and Databases




Chapter 1

File Systems and Databases



Database
Persistent collection of data and
Metadata (data about the characteristics of the data and relationships
of the data)
Database Management System
collection of programs to manage the database
Manages and enforces database structure
Includes interface to manipulate the data
Allows data to be shared among multiple users


Data vs Information



  • Data:

    • Raw facts; building blocks of information
    • Unprocessed information

  • Information:

    • Data processed to reveal meaning


  • Accurate, relevant, and timely information is key to good decision making
  • Good decision making is key to survival in global environment

Importance of DBMS



  • Makes data management more efficient and effective
  • Query language allows quick answers to ad hoc queries
  • Provides easier access to more and better-managed data
  • Promotes an integrated view of organization�s operations
  • Reduces the chance of inconsistent data
  • Helps protect against loss of data

DBMS Manages Interface Between Data and Users




Database Design: Why Design is important



  • The database is the foundation of the information system.
  • Design should reflect the expected use
  • Poor design results in unwanted redundancy
  • Poor design leads to inconsistent data
  • Poor design leads to poor performance
  • Poor design leads to improper information systems operation

Historical Roots



  • First business computer applications focused on clerical tasks
  • Requests for information quickly followed
  • File systems developed to address needs
  • Data organized according to expected use
  • Data Processing (DP) specialists computerized manual file systems


File Systems


Figure 1.5


File System Data Management



  • Requires extensive programming, typically in Third Generation Language
    (3GL)
  • Leads to islands of information and data redundancy
  • Difficult to make ad hoc queries to obtain information
  • Difficult to maintain data integrity

Data and Structural Dependence



  • Data characteristics are embodied in programs not stored with the data.
  • Changes in data characteristics requires modifying programs
  • Changes in file structures require modification of related programs


Data Redundancy



  • Different and possibly conflicting versions of same data
  • Results in problems during data:

    • Modification (e.g. address changes)
    • Insertion
    • Deletion

  • Data inconsistency: Lack of integrity

Database Systems




  • Database consists of logically related data stored in a single repository
  • Advantages over file system management approach:

    • Eliminates inconsistency, data anomalies, data dependency, and structural
      dependency problems
    • Stores data structures, relationships, and access paths


Database vs. File Systems


Figure 1.6


Database System Environment




Database System Types



  • Scale

    • Single User (desktop)
    • Workgroup
    • Enterprise
    • Distributed or Federated

  • Use

    • Production/Transaction
    • Decision Support/Data Warehouse



Uses of Databases



  • Transactional (or production):

    • Supports a company�s day-to-day operations

  • Data warehouse:

    • Stores data used to generate information required to make tactical or strategic decisions
    • Such decisions typically require �data massaging�
    • Often used to store historical data
    • Structure is quite different



DBMS Functions



  • Metadata/Data Dictionary Management
  • Data storage management
  • Data transformation and presentation
  • Security management and Multiuser access control
  • Backup and recovery management
  • Data integrity management
  • Database language and application programming interfaces
  • Database communication interfaces

Database Models


Collection of logical constructs used to represent data structure and relationships




  • Conceptual Models: logical nature of data representation
  • Implementation Models: how data are represented

Database Models


The hierarchical and network models are of historical interest only.


Database Models:



  • Relational
  • Entity-Relationship
  • Object oriented


Relational Model



  • Most common model
  • Perceived by user as collection of tables containing data
  • Actually has a formal definitin based on set theory
  • Tables are a series of row/column intersections
  • Tables related by sharing common entity characteristic(s)

Relational Database


Relational Tables


Relational Database Model Advantages



  • Structural independence
  • Improved conceptual simplicity
  • Easier database design, implementation, management, and use
  • Ad hoc query capability with SQL (standard interface)
  • Powerful database management system


Relational Database Model Disadvantages



  • Substantial hardware and system software overhead
  • Poor design and implementation is made easy
  • Not a cure all: May promote "islands of information" problems
  • SQL is not completely standardized. One DBMS is not a "drop in"
    replacement for another.
  • May have problems storing some types of data


Entity Relationship Database Model



  • Primarily a database design tool.
  • Complements the relational data model concepts
  • Represented in an entity relationship diagram (ERD)
  • Based on entities, attributes, and relationships

ER-Diagram


Figure 01-14


ER Model Advantages



  • Conceptual simplicity
  • Visual representation
  • Effective communication tool
  • Integrated with the relational database model


ER Model Disadvatages



  • Limited constraint representation
  • Limited relationship representation
  • No data manipulation language
  • Loss of information content
  • May be overly complex for end users

Object-Oriented Model



  • Objects or abstractions of real-world entities are stored
  • Attributes describe properties
  • Collection of similar objects is a class
  • Methods represent real world actions of classes
  • Classes are organized in a class hierarchy
  • Objects inherit attributes and methods of classes above.


Object Oriented Model


Fig 1.15


OO Model Advantages



  • Adds semantic context
  • Structural and data independence
  • May mesh well with Object Oriented Programming

OO Model Disadvantages



  • Lack of standards in model
  • Lack of standard manipulation languages
  • Complex navigational data access
  • Steep learning curve
  • Poor performance







Thursday, September 11, 2008

Interview questions for Linux admin

Advantages/disadvantages of script vs compiled program.
Name a replacement for PHP/Perl/MySQL/Linux/Apache and show main differences. .Why have you choosen such a combination of products?
Differences between two last MySQL versions. Which one would you choose and when/why?
Main differences between Apache 1.x and 2.x. Why is 2.x not so popular? Which one would you choose and when/why?
Which Linux distros do you have experience with?
Which distro you prefer? Why?
Which tool would you use to update Debian / Slackware / RedHat / Mandrake / SuSE ?
You’re asked to write an Apache module. What would you do?
Which tool do you prefer for Apache log reports?
Your portfolio. (even a PHP guest book may work well)
What does ‘route’ command do?
Differences between ipchains and iptables.
What’s eth0, ppp0, wlan0, ttyS0, etc.
What are different directories in / for?
Partitioning scheme for new webserver. Why?