Thursday, November 6, 2008

* The basic commands and functions of SQL
* How to use SQL for data administration (to create tables, indexes, and views)
* How to use SQL for data manipulation (to add, modify, delete, and retrieve data)
* How to use SQL to query a database to extract useful information

Introduction to SQL Part I

* The relational DBMS is the standard for database management.
* The Structured Query Language, SQL, is the standard for working with them
* This chapter is an introduction to essential SQL.

SQL strengths

Covers both

*

Data definition
*

Data manipulation

SQL is relatively easy to learn.

ANSI prescribes a standard SQL.
SQL Weaknesses

*

Some eccentric notation
o

use of ' marks; in strings, rather than "
o

Wildcards: % instead of *
*

Some things are hard to do
*

Different Vendors implement different dialects
*

Not a good conceptual match to most programming language
*

Strictly DDL and DML no standard procedural language.

DB2 concepts

* DB2 consists of multiple "instances" on each server (we have one)
* Within each instance there are databases: we have two, SAMPLE and DBMS, and will be using DBMS
* Within the databases are schemas one for each user.
* Authorization for users is via the system (cs1 account). Your schema name and user name are the same as your username on CS1

Setup for demonstrations

1. Use the winsql program to connect to the datasource DBMS using your CS1 username and password
2. Perform the following commands to create the tables:

drop table vendor; drop table product; drop table customer;
create table vendor like CH06_SALESCO.vendor;
insert into vendor select * from CH06_SALESCO.vendor;
alter table vendor add primary key (v_code);

create table product like CH06_SALESCO.product;
insert into product select * from CH06_SALESCO.product;
alter table product add primary key (p_code)
add foreign key (v_code) references vendor on delete set null on update restrict;

Data Definition Commands
The Database Model

Simple Database -- PRODUCT and VENDOR tables
Each product is supplied by only a single vendor.
A vendor may supply many products.

Data Definition Commands
The Tables and Their Components

*

The VENDOR table contains vendors who are not referenced in the PRODUCT table. PRODUCT is optional to VENDOR.
*

Existing V_CODE values in the PRODUCT table must have a match in the VENDOR table.
*

A few products are supplied factory-direct, a few are made in-house, and a few may have been bought in a special warehouse sale. That is, a product is not necessarily supplied by a vendor. VENDOR is optional to PRODUCT.

Common SQL Datatypes

Data Type

SQL

Numeric

NUMBER(L,D)

DECIMAL(L,D)

INTEGER

SMALLINT

Character

CHAR (L)

VARCHAR (L)

Date

DATE
Data Definition Commands
Creating the Database Structure

This varies among databases. In DB2 there are instances

within each instance there are databases

within each database there are schemas (one for each user). DB2 users are the same as Operating System users.

Statements in DB2 referencing a table include schema (SELECT * FROM schema.tablename) (current schema are implicit).
Data Definition Commands
Creating Table Structures

CREATE TABLE

(
);

Data Definition Commands

CREATE TABLE VENDOR(
V_CODE INTEGER NOT NULL PRIMARY KEY DEFAULT 0,
V_NAME VARCHAR(15),
V_CONTACT VARCHAR(50),
V_AREACODE VARCHAR(3),
V_PHONE VARCHAR(8),
V_STATE VARCHAR(2),
V_ORDER VARCHAR(1)
)

Data Definition Commands

CREATE TABLE PRODUCT(
P_CODE VARCHAR(10) NOT NULL PRIMARY KEY,
P_DESCRIPT VARCHAR(35),
P_INDATE DATE,
P_ONHAND SMALLINT DEFAULT 0,
P_MIN SMALLINT DEFAULT 0,
P_PRICE DECIMAL(15, 2) DEFAULT 0,
P_DISCOUNT DOUBLE DEFAULT 0,
V_CODE INTEGER DEFAULT 0 REFERENCES VENDOR(V_CODE)
)

Data Definition Commands
SQL Integrity Constraints

Entity Integrity
PRIMARY KEY
NOT NULL and UNIQUE
Referential Integrity
FOREIGN KEY
ON DELETE
ON UPDATE
Check Constraint
Validates data when an attribute value is entered

Basic Data Management
Data Entry

INSERT INTO

VALUES (attribute 1 value, attribute 2 value, ... etc.);

INSERT INTO VENDOR VALUES(26000, 'Quality Tools', 'Johnson', '915','555-3234', 'TX', 'N');
INSERT INTO PRODUCT VALUES('14 ABC12', 'Concrete Saw', '09/02/1996', 2, 1, 510.99, 0.00, 26000, '');

Basic Data Management
Committing Changes

Changes do not take place until they are committed assuming autocommit is off. Many end user query environments (including ours) do not support turning autocommit off.

COMMIT ;

Listing the Table Contents

SELECT * FROM PRODUCT;

SELECT P_CODE, P_DESCRIPT, P_INDATE, P_ONHAND,
P_MIN, P_PRICE, P_DISCOUNT, V_CODE
FROM PRODUCT;

Basic Data Management
Making a Correction

UPDATE PRODUCT SET P_INDATE = '2003-11-15'
WHERE P_CODE = '13-Q2/P2';

UPDATE PRODUCT SET P_INDATE = '2003-11-15', P_PRICE = 15.99, P_MIN = 10
WHERE P_CODE = '13-Q2/P2';

Restoring the Table Contents (assumes autocommit).

ROLLBACK

Basic Data Management
Deleting Table Rows

DELETE FROM PRODUCT WHERE P_CODE = '2238/QPD';

DELETE FROM PRODUCT WHERE P_MIN = 5;

Delete is a dangerous command. Typing:
DELETE FROM

will neatly delete all the records in the table!
Queries
Partial Listing of Table Contents

SELECT FROM

WHERE ;

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE
FROM PRODUCT
WHERE V_CODE = 21344;

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE
FROM PRODUCT
WHERE V_CODE <> 21344;

SELECT P_DESCRIPT, P_ONHAND, P_MIN, P_PRICE
FROM PRODUCT
WHERE P_PRICE <= 10; Queries Using Mathematical Operators on Character Attributes SELECT P_DESCRIPT, P_ONHAND, P_MIN, P_PRICE FROM PRODUCT WHERE P_CODE < '1558-QWI'; Using Mathematical Operators on Dates SELECT P_DESCRIPT, P_ONHAND, P_MIN, P_PRICE, P_INDATE FROM PRODUCT WHERE P_INDATE >= '01/01/2004';

Queries
Logical Operators: AND, OR, and NOT

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE
FROM PRODUCT
WHERE V_CODE = 21344 OR V_CODE = 21225;

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE
FROM PRODUCT
WHERE P_PRICE <> '01/01/2004';

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE
FROM PRODUCT
WHERE (P_PRICE <> '01/01/2004')
OR V_CODE = 24288;

Queries: Special Operators

*

BETWEEN - used to define range limits.
*

IS NULL - used to check whether an attribute value is null
*

LIKE - used to check for similar character strings.
*

IN - used to check whether an attribute value matches a value contained within a (sub)set of listed values.
*

EXISTS - used to check whether an attribute has a value. In effect, EXISTS is the opposite of IS NULL.

Queries: Special Operators
BETWEEN is used to define range limits.

SELECT *
FROM PRODUCT
WHERE P_PRICE BETWEEN 10.00 AND 100.00;

SELECT * FROM PRODUCT
WHERE P_PRICE > 10.00 AND P_PRICE <>

MODIFY ( );

ALTER TABLE

ADD ( );

Changing a Column's Data Type

Probably illegal in DB2

ALTER TABLE PRODUCT MODIFY (V_CODE CHAR(5));
ALTER TABLE PRODUCT MODIFY (P_PRICE DECIMAL(9,2));

Adding a New Column to the Table

ALTER TABLE PRODUCT ADD column P_SALECODE CHAR(1) ;

Updating Data

UPDATE PRODUCT SET P_SALECODE = '2' WHERE P_CODE = '1546-QQ2';
UPDATE PRODUCT SET P_SALECODE = '1' WHERE P_CODE IN ('13-Q2/P2', '2232/QTY');
UPDATE PRODUCT SET P_SALECODE = '2' WHERE P_INDATE < '01/01/2004'; UPDATE PRODUCT SET P_SALECODE = '1' WHERE P_INDATE >= '01/01/2004'AND P_INDATE < '10/20/2004'; Copying Tables Copying table definitions and data CREATE TABLE NEWPRODUCT LIKE PRODUCT; INSERT INTO NEWPRODUCT SELECT * FROM PRODUCT; Copying Parts of Tables CREATE TABLE PART (PART_CODE CHAR(8) NOT NULL UNIQUE, PART_DESCRIPT CHAR(35), PART_PRICE DECIMAL(8,2), PRIMARY KEY(PART_CODE)); INSERT INTO PART (PART_CODE, PART_DESCRIPT, PART_PRICE) SELECT P_CODE, P_DESCRIPT, P_PRICE FROM PRODUCT; Deleting a Table from the Database DROP TABLE

;

DROP TABLE PART;

Primary and Foreign Key Designation

(Note we did these when we created the table)

ALTER TABLE PRODUCT ADD PRIMARY KEY (P_CODE);

ALTER TABLE PRODUCT ADD FOREIGN KEY (V_CODE) REFERENCES VENDOR;

ALTER TABLE PRODUCT ADD PRIMARY KEY (P_CODE) ADD FOREIGN KEY (V_CODE) REFERENCES VENDOR;

Notes

Normalization

* What is Normalization?
* Why is its done?
* The normal forms 1NF, 2NF, 3NF, BCNF, 4NF
* Transforming normal forms
* E-R modeling and normalization
* Denormalization

Database Tables and Normalization

Normalization is a process for assigning attributes to entities to:

* Reduce data redundancies
o Help eliminate data anomalies
* Produce controlled redundancies to link tables
* No information is lost in normalization
* Result will be a database that can produce the same information as the original

Normalization Process

Normalizatin works through a series of stages called normal forms:

* First Normal form (1NF)
* Second normal form (2NF)
* Third normal form (3NF)
* etc (4th and 5th)

* 2NF is better than 1NF; 3NF is better than 2NF
* For most business database design purposes, 3NF is highest we need to go in the normalization process
* Highest level of normalization is not always most desirable

The Need for Normalization

Example: company that manages building projects

* Charges its clients by billing hours spent on each contract
* Hourly billing rate is dependent on employee�s position
* Periodically, a report is generated that contains information as follows

Sample Report

Table derived form Above

The Need for Normalization

* Structure of data set in Figure 5.1 does not handle data very well
* The table structure appears to work; report is generated with ease
* Unfortunately, the report may yield different results, depending on what data anomaly has occurred

Issues

* Table entries invite data inconsistencies
* Table displays potential data anomalies
o Update: Modifying JOB_CLASS
o Insertion: New Employee must be assigned project
o Deletion: If employee deleted, other vital data lost: if emp 103 leaves lose info on Elect Engineers

Repeating Group

Repeating group
Derives its name from the fact that a group of multiple (related) entries can exist for any single key attribute occurrence

* Relational table must not contain repeating groups
* Normalizing the table structure will reduce these data redundancies
* Normalization is three-step procedure

Converting to First Normal Form

A table in a relational database must be in 1NF.

* Repeating groups must be eliminated
* Primary key determined
o Uniquely identify attribute values (rows)
o All attributes dependent on primary key
o In example: Combination of PROJ_NUM and EMP_NUM

Dependencies

* Dependencies can be depicted with the help of a diagram
* Dependency diagram:
o Depicts all dependencies found within a given table structure
o Helpful in getting bird�s-eye view of all relationships among a table�s attributes
o Use makes it much less likely that an important dependency will be overlooked

* Desirable dependencies based on entire primary key
* Less desirable dependencies

Partial:
Based on part of composite primary key
Transitive:
One nonprime attribute depends on another nonprime attribute

Dependency Diagram

Fig 5.3: Dependency Diagram
1NF: Definition

* Tabular format in which:
o All key attributes are defined
o There are no repeating groups in the table
o All attributes are dependent on primary key
* All relational tables must satisfy 1NF requirements
* Some tables contain partial dependencies
o Dependencies based on only part of the primary key
o Sometimes used for performance reasons, but should be used with caution
* Still subject to data redundancies

Second Normal Form

1. Identify all key components
* Write each key component on separate line
* Write original key on last line
* Write dependent attributes after each key.
2. Each line will become a new table

Second Normal Form Conversion Results

Second Normal Form Defined

Table is in second normal form (2NF) if:

* It is in 1NF and
* It includes no partial dependencies:
* No attribute is dependent on only a portion of the primary key

Converting to Third Normal Form

* Resolve transitive dependencies (attributes dependent on non-key attributes)
* Create separate table for each transitive dependency

3NF Results

Boyce-Codd Normal Form

* Every determinant in the table is a candidate key
o Has same characteristics as primary key, but for some reason, not chosen to be primary key
* If a table contains only one candidate key, the 3NF and the BCNF are equivalent
* BCNF can be violated only if the table contains more than one candidate key

BCNF (cont)

* Most designers consider the Boyce-Codd normal form (BCNF) as a special case of 3NF
* A table is in 3NF if it is in 2NF and there are no transitive dependencies
o A table can be in 3NF and not be in BCNF
o A transitive dependency exists when one nonprime attribute is dependent on another nonprime attribute
o A nonkey attribute is the determinant of a key attribute

Table in 3nf but not BCNF

Decomposition to BCNF

Decomposition to BCNF
Fourth Normal Form

* Table is in 3NF
* Has no multiple sets of multivalued dependencies

Conversion to 4NF

* 4NF is largely academic if tables conform to the following two rules:
o All attributes are dependent on primary key but independent of each other
o No row contains two or more multivalued facts about an entity

Improving the Design

* Table structures are cleaned up to eliminate the troublesome initial partial and transitive dependencies
* Normalization cannot, by itself, be relied on to make good designs
* It is valuable because its use helps eliminate data redundancies

Improving the Design (cont)

The following changes were made:

* PK assignment
* Naming conventions
* Attribute atomicity
* Adding attributes
* Adding relationships
* Refining PKs
* Maintaining historical accuracy
* Dealing with derived attributes

Completed Database

Completed Database
Completed Database: Assign Table

5.6b: Assign Table
Completed Database: Employee

Final ERD for contracting company

Final ERD
Limitations on System Assigned Keys

* System-assigned primary key may not prevent confusing entries
* Data entries in Table 5.2 are inappropriate because they duplicate existing records
* Yet there has been no violation of either entity integrity or referential integrity
* Perhaps Job Description needs to be unique

Normalization and Database Design

* Normalization should be part of design process
* Make sure that proposed entities meet required normal form before table structures are created
* Many real-world databases have been improperly designed or burdened with anomalies if improperly modified during course of time
* You may be asked to redesign and modify existing databases

Normalization and Database Design (cont)

* E-R Diagram provides macro view, determines entities
* Normalization provides micro view of entities
o Focuses on characteristics of specific entities
o May yield additional entities
* Difficult to separate Normalization and ER diagramming
* Extra check: No attribute that is not a (primary/foreign) key should be repeated in the database (except to record historical data)

Denormalization

* Normalization is one of many database design goals
Normalized table requirements
o Additional processing
o Loss of system speed
* Normalization purity is difficult to sustain due to conflict in
o Design efficiency
o Information requirements
o Processing
* Do not be too quick to denormalize

Unnormalized Table Defects

* Data updates less efficient
* Indexing more cumbersome
* No simple strategies for creating views

Overnormalization

* This is done for performance reason frequently in distributed and clustered database systems
* Splits tables beyond the pont required for normal forms.
* In horizontal partitioning rows of a single logical table are split among several physical tables (e.g. geographically)
* In vertical partitioning a table is split vertically with commonly accessed columns in separate physical tables

Entity Relationship Modeling

* How relationships between entities are defined and refined, and how such relationships are incorporated into the database design process
* How ERD components affect database design and implementation
* How to interpret the modeling symbols for the four most popular ER modeling tools
* That real-world database design often requires that you reconcile conflicting goals

Entity Relationship Model and Diagram

* ER model forms the basis of an ER diagram
* ERD represents the conceptual database as viewed by end user
* ERDs depict the ER model�s three main components:
o Entities
o Attributes
o Relationships
* Several different diagramming conventions

Entities

* Refers to the entity set and not to a single entity occurrence
* Corresponds to a table and not to a row in the relational environment
* In both the Chen and Crow�s Foot models, an entity is represented by a rectangle containing the entity�s name
* Entity name, a noun, is usually written in capital letters

Attributes

* Characteristics of entities
* Domain is set of possible values
* Primary keys underlined

Attributes (cont)
Attributes (cont)

Simple
Cannot be subdivided
Age, sex, GPA
Composite
Can be subdivided
Address: street city state zip
Single-valued
Has only a single value
Social security number
Multi-valued
Can have many values
Person may have several college degrees
Derived
Can be calculated from other information
Age can be derived from D.O.B.

Multivalued Attributes

Resolving Multivalued Attribute Problems

Although the conceptual model can handle multivalued attributes, you should not implement them in the relational DBMS

* Within original entity, create several new attributes, one for each of the original multivalued attribute�s components
o Can lead to major structural problems in the table
* Create a new entity composed of original multivalued attribute�s components

Creating New Attributes

Creating New Entity Set

Relationships

* Associations between entities
* Established by Business Rules
* Connected entities termed participants
* Connectivity describes relationship classification:
o 1:1, 1:M, M:N
* Cardinality
o Number of entity occurences associated with one occurence of related entity

Connectivity and Cardinality in an ERD

Relationship Strength

* Existence Dependent
o Entity's existence depends on existence of another related entities
o Existence-independent entities can exist apart from related entities
o Employee claims Child
Child is dependent on employee
* Weak (non-identifying)
o One entity is existence-independent on another
o PK of dependent entity doesn't contain PK component of parent entity
o Book is somewhat confused on this
* Strong (identifying)
o One entity is existence-dependent on another
o PK of related entity contains PK component of parent entity

Relationship Participation

* Optional
o Entity occurrence does not require a corresponding occurrence in related entity
o Shown by drawing a small circle on side of optional entity on ERD
* Mandatory
o Entity occurrence requires corresponding occurrence in related entity
o If no optionality symbol is shown on ERD, it is mandatory

Weak Entity

* Existence-dependent on another entity
* Has primary key that is partially or totally derived from parent entity

Weak Entity
Mandatory Class Course relationship

Optional Class Entity in Professor Teaches Class

Degree of Relationship

A relationships degree indicates the number of associated entities.

Implementation of a Ternary Relationship

Composite Entity

* Used to replace M:N relationships with 1:N relationships
* Bridge entities composed of primary keys of each entity needing connection

Entity Subtypes and Supertypes
Generalization Hierarchy

* Depicts relationships between higher-level supertype and lower-level subtype entities
* Supertype has shared attributes
* Subtypes have unique attributes
* Disjoint relationships
o Unique subtypes
o Non-overlapping
o Indicated with a `G'
* Overlapping subtypes use `Gs' Symbol

Nulls Created by Unique Attributes

Generalization Hierarchy: Disjoint

Generalization Hierarchy: Overlapping and Disjoint

Generalization
Supertype/Subtype relationship in an ERD

Comparison of ER Modeling Symbols

ER Modeling Symbols
Developing an E-R Diagram

* Iterative Process
1. Develop general narrative of organizational operations
2. Draw Basic E-R Model
3. Modify E-R model to incorporate newly discovered components/relationships
* Repeat until designers and users agree E-R model comple

Dealing with Conflicting Goals in Database Design

* Database must be designed to conform to design standards
* High-speed processing may require design compromises
* Quest for timely information may be the focus of database design
Other concerns:
o Security
o Performance
o Shared access
o Integrity
o Capabilities of actual DBMS

The Relational Database Model

* Tthe relational database model takes a logical view of data
* The relational model's basic components are entities, attributes, and relationships among entities
* How entities and their attributes are organized into tables
* About relational database operators, the data dictionary, and the system catalog
* How data redundancy is handled in the relational database model
* Why indexing is important

A logical View of Data

* Relational model
o Enables us to view data logically rather than physically
o Reminds us of simpler file concept of data storage or spreadsheet concept of table
* Table
o Two-dimensional structure composed of rows and columns
o Has advantages of structural and data independence
o Resembles a file from conceptual point of view
o Easier to understand than its hierarchical and network database predecessors

Tables

* Table also called a relation because the relational model's creator, Codd, used this term.
* A relation (and a relational DBMS's table) differs from a table in that there is no default order of rows or columns
o Note SQL does use a column order but this is a limitation of SQL, not the model.
* Each row (tuple) represents a single entity in an entity set
* Each column represents an attribute and each must have a distinct name
* Each row/column intersection (cell) represents a single atomic value
* All values in a column must have same data format
* Each column has a specific range of values known as a domain
* Each row must have some combination of attributes that uniquely identifies it (key)

Student Table Example

Student Table
Keys

* A key consists of one or more attributes that determine other attributes
* Primary key (PK) is an attribute (or a combination of attributes) that uniquely identifies any given entity (row)
* Key's role is based on determination
o If you know the value of attribute A, you can look up (determine) the value of attribute B

Types of Keys

Composite key
Key consisting of multiple attributes
Key attribute
Any attribute that is part of a key
Superkey
Any key that uniquely identifies each entity
The entire row (all attributes) is a superkey
Candidate key
A superkey without redundancies

Null Values

* Can represent
o An unknown attribute value
o A known, but missing, attribute value
o A 'not applicable' condition
* Not permitted in Primary key attributes
* Not the same as 0 or ""
* Can create problems in logic and using formulas

Controlled Redundancy

* Makes the relational database work
* Tables within the database share common attributes that enable us to link tables together
* Multiple occurrences of values in a table are not redundant when they are required to make the relationship work
* Uncontrolled Redundancy is unnecessary duplication of data
* In general any values duplicated between tables should be part of a Foreign Key -- Primary key relation (see below)

Foreign Keys

Foreign key (FK)
An attribute whose values match primary key values in the related table
Referential integrity
FK contains a value that refers to an existing valid tuple (row) in another relation
Secondary key
Key used strictly for data retrieval purposes

Keys (Summary)
Key Type Definition
Superkey Any attribute or combination of attributes that uniquely identifies a row in the table
Candiate Key aka Minimal Superkey. A superkey that does not contain a subset of attributes that is itself a superkey
Primary Key The candidate key selected to uniquely identify all rows
Cannot contain null values
Foreign Key An attribute (or combination of attributes) in one table that must either match the primary key of another table or be null
Secondary Key An attribute or combination of attributes used to make data retrieval more efficient.
Integrity Rules
Entity Integrity

* All entities are unique
* Each entity has unique primary key

Referential Integrity

* Foreign key must match primary key of referenced table or be null
* Impossible to delete row whose primary key has matching foreign key in another table

Relational Algebra
Key operators:

* SELECT
* PROJECT
* JOIN

Other operators

* INTERSECT
* UNION
* DIFFERENCE
* PRODUCT
* DIVIDE

SELECT

Subset of rows based on condition

Select
PROJECT

Subset of columns

PRODUCT

All possible pairings of two tables

Product
JOIN

* Combines information from two or more tables
* Real power behind the relational database, allowing the use of independent tables linked by common attributes
* Join is actually a combination of Product, Select, and Project

Fig 3-11 Join tables
Natural JOIN

Links tables by selecting rows with common values in common attribute(s) (typically equality on foreign keys).

Three stage process:

* Product creates one table
* Select yields appropriate rows
* Project yields single copy of each attribute to eliminate duplicate columns

Natural Join: Step 1 Product

Natural Join: Select
Natural Join Step 2: Select

Natural Join: Step 2 Select
Natural Join Step 3: Project

Natural Join Step 3: Project
Natural Join: Outcome

* Final outcome yields table that
o Does not include unmatched pairs
o Provides only copies of matches
* If no match is made between the table rows,
o the new table does not include the unmatched row

Other Joins
EquiJOIN

* Links tables based on equality condition that compares specified columns of tables
* Does not eliminate duplicate columns
* Join criteria must be explicitly defined

Theta JOIN

* EquiJOIN that compares specified columns of each table

Outer JOIN

* Matched pairs are retained
* Unmatched values in other tables left null
* Right and left

Intersect

Difference

Yields all rows in one table not found in the other table�that is, it subtracts one table from the other

Union

Divide

Requires user of single-column table and two-column table (I have never seen a DIVIDE)

Divide
Data Dictionary and System Catalog
Data Dictionary

* Metadata
* Detailed account of all tables in database

System Catalog

* System-created database
* Stores database characteristics and contents
* The database stores its own description
* Tables can be queried just like any other tables
* Automatically produces database documentation

Relationships within Relational Database

Classified by cardinalities of entities participating

* 1:M
o Relational modeling ideal
o Should be the norm in any relational database design
* M:N
o Must be avoided because they lead to data redundancies
o Split into a pari of 1:M Relationships
* 1:1
o Should be rare

The 1:1 Relationship

* One entity can be related to only one other entity, and vice versa
* Often means that entity components were not defined properly
* Could indicate that two entities actually belong in the same table
* Sometimes 1:1 relationships are appropriate to help avoid nulls

E-R Diagram

* Graphical representation of model
* Chen
* Crows Feet

ERD Symbols

* Rectangles represent entities
* Chen:
o diamonds represent relations
o numbers or m and n are cardinalities
* Crows Foot
o Lines represent relationships
o bar crossing line is "1" side
o 3 pronged crows foot is many side

Example 1:M Relationship

ERD diagram 1:M Relationship
Implementation of 1:M Relationship

M:N Relationships

M:N Relationship ERD
M:N Relationship

* Cannot be directly implemented in a Relational DBMS

Student and Class Tables
Linking Table

* Implementation of a composite entity
* Yields required M:N to 1:M conversion
* Composite entity table must contain at least the primary keys of original tables
* Linking table contains multiple occurrences of the foreign key values
* Additional attributes may be assigned as needed

Composite ERD

Composite Entities
Linking Tables: Implementation

Linking Tables
Data Redundancy Revisited

Foreign keys control and reduce redundancy.

It is not that there is no redundancy in a properly designed database: the redundancy there is is controlled.
Indexes

Note: I do not consider indexes part of the relational model. They are an implementation artifact to improve performance

* Arrangement used to logically access rows in a table
* Index key
o Index�s reference point
o Points to data location identified by the key
* Unique index
o Index in which the index key can only have one pointer value (row) associated with it
* Each index is associated with only one table

Wednesday, November 5, 2008

Chapter 2: Data Models

Note There's minimal coverage of Hierarichcal and Network models as I consider them of historical importance only

Topics

* Business rules
* Data models
o About the basic data-modeling building blocks
o How the major data models evolved, and their advantages and disadvantages
o How data models can be classified by level of abstraction

Business Rules

* A policy, procedure, or principle within a specific organization's environment
* Applies to any organization that stores and uses data to generate information
* Description of operations that help to create and enforce actions within that organization's environment
* Sometimes are external to the organization
* Describe characteristics of the data as viewed by the company

Business Rules (cont)

* Must be rendered in writing
* Brief, precise, unambiguous
* Must be kept up to date
* Must be easy to understand and widely disseminated

Sources of Business Rules

* Company managers
* Policy makers
* Department managers
* Written documentation
o Procedures
o Standards
o Operations manuals
* Direct interviews with end users

Importance of Business Rules

* Promote creation of an accurate data model
* Standardize company's view of data
* Constitute a communications tool between users and designers
* Allow designer to understand the nature, role, and scope of data
* Allow designer to understand business processes
* Allow designer to develop appropriate relationship participation rules and constraints

The Importance of Data Models

* Good database design uses an appropriate data model as its foundation
* End-users have different views and needs for data
* Data model organizes data for various users

Data model
Relatively simple representation, usually graphical, of complex real-world data structures
Communications tool to facilitate interaction among the designer, the applications programmer, and the end user

Data Model Building Blocks

Entity
Anything about which data are to be collected and stored
Attribute
A characteristic of an entity (e.g. last name)
Relationship
an association among (two or more) entities

* One-to-many (1:M) relationship
* Many-to-many (M:N or M:M) relationship
* One-to-one (1:1) relationship

The Evolution of Data Models

* Hierarchical
* Network
* Relational
* Entity relationship
* Object oriented

Crucial Database Components

Schema
Conceptual organization of entire database as viewed by the database administrator
Subschema
Defines database portion "seen" by the application programs that actually produce the desired information from data contained within the database
Data Definition Language (DDL)
Define data characteristics and data structure in order to manipulate the data

DML Concepts

Schema Data Definition Language (DDL)
Enables database administrator to define schema components and relationships
Subschema DDL
Allows application programs to define database components that will be used
DML
Manipulates database contents

The Relational Model

* Developed by Codd (IBM) in 1970
* Considered ingenious but impractical in 1970
* Conceptually simple
* Computers lacked power to implement the relational model
* Today, microcomputers can run sophisticated relational database software

Relational Model: Basic Structure

* Relational Database Management System (RDBMS)
* Most important advantage of the RDBMS is its ability to let the user/designer operate in a human logical environment

Relation (Table)
Matrix consisting of a series of row/column intersections
Related to each other by sharing a common entity characteristic
Relational schema
Representation of relational database's entities, attributes within those entities, and relationships between those entities
Represented as DDL or Visually

Alternative Terminology
Relational Common Traditional
relation table file
tuple row record
attribute cell field
instance value value
Relational Schema

Relational Schema
Linking Relational Tables

Fig 2.4 Linking Relations
Advantages of the Relational Model

* Structural independence
* Improved conceptual simplicity
* Easier database design, implementation, management, and use
* Ad hoc query capability
* Powerful database management system

Disadvantages of the Relational Model

* Substantial hardware and system software overhead
* May not fit all business models
* Can facilitate poor design and implementation
* May promote "islands of information" problems

The Entity Relationship Model

* Widely accepted and adapted graphical tool for data modeling
* Introduced by Chen in 1976
* Graphical representation of entities and their relationships in a database structure

Basic Structure of the Entity Relationship Model

* Entity relationship diagram (ERD)
o Uses graphic representations to model database components
o Entity is mapped to a relational table
* Entity instance (or occurrence) is row in table
* Entity set is collection of like entities
* Connectivity labels types of relationships

Relationships: the Basic Chen ERD

Note: we will be using mostly CrowsFoot notation

Fig 2.6 Chen ERD
Crow's Foot ERD

Fig 2.7 Crow's Foot ERD
Advantages of E.R. Model

* Exceptional conceptual simplicity
* Visual representation
* Effective communication tool
* Integrated with the relational data model

Disadvantages of E.R. Model

* Limited constraint representation
* Limited relationship representation
* No data manipulation language
* Loss of information content

The Object Oriented Model

* Semantic data model (SDM) developed by Hammer and McLeod in 1981
* Modeled both data and their relationships in a single structure known as an object
* Basis of object oriented data model (OODM)
* OODM becomes the basis for the object oriented database management system (OODBMS)

Object Oriented Model (cont)

* Object is described by its factual content
o Like relational model's entity
* Includes information about relationships between facts within object and relationships with other objects
o Unlike relational model's entity
* Subsequent OODM development allowed an object to also contain operations (aka methods)
* Object becomes basic building block for autonomous structures

Developments that Boosted OODM

* Growing costs put a premium on code reusability
* Complex data types and system requirements became difficult to manage with a traditional RDBMS
* Became possible to support increasingly sophisticated transaction & information requirements
* Ever-increasing computing power made it possible to support the large computing overhead required

OODM: Basic Structure

* Object: abstraction of a real-world entity
* Attributes describe the properties of an object
* Objects that share similar characteristics are grouped in classes
* Classes are organized in a class hierarchy
* Inheritance is the ability of an object within the class hierarchy to inherit the attributes and methods of classes above it

Comparison of OO and ER Models

Fig 2.6 Comparison between OO and ER Model
OO Model Advantages

* Adds semantic content
* Visual presentation includes semantic content
* Database integrity
* Both structural and data independence

OO Model Disadvantages

* Slow pace of OODM standards development
* Complex navigational data access
* Steep learning curve
* High system overhead slows transactions
* Lack of market penetration
* Current relational DBMS's have many OO characteristics

Other Models: Extended Relational Data Model (ERDM)

* Semantic data model developed in response to increasing complexity of applications
* DBMS based on the ERDM often described as an object/relational database management system (O/RDBMS)
* Primarily geared to business applications

Date's Objection to ERDM

* Given proper support for domains, relational data models are quite capable of handling complex data
o Therefore, capability that is supposedly being extended is already there
* O/RDM label is not accurate because the relational data model's domain is not an object model structure

Data Models: A Summary

* Each new data model was intended to overcome the shortcomings of previous models
* Common characteristics:
o Conceptual simplicity without compromising the semantic completeness of the database
o Represent the real world as closely as possible
o Representation of real-world transformations (behavior) must be in compliance with consistency and integrity characteristics of any data model

Degrees of Abstraction

* Way of classifying data models
* Many processes begin at high level of abstraction and proceed to an ever-increasing level of detail
* Designing a usable database follows the same basic process

ANSI/SPARC Data Models

American National Standards Institute/Standards Planning and Requirements Committee (ANSI/SPARC)
Classified data models according to their degree of abstraction (1970s):

* Conceptual
* External
* Internal

Data Abstraction Levels

Data Abstraction Levels
The Conceptual Model

* Represents global view of the database
* Enterprise-wide representation of data as viewed by high-level managers
* Basis for identification and description of main data objects, avoiding details
* Most widely used conceptual model is the entity relationship (ER) model

Conceptual Model Tiny College

Advantages of Conceptual Model

* Provides a relatively easily understood macro level view of data environment
* Independent of both software and hardware
o Does not depend on the DBMS software used to implement the model
o Does not depend on the hardware used in the implementation of the model
o Changes in either the hardware or the DBMS software have no effect on the database design at the conceptual level

The Internal Model

* Representation of the database as �seen� by the DBMS
* Adapts the conceptual model to the DBMS
* Software dependent
* Hardware independent

The External Model

* End users� view of the data environment
* Requires that the modeler subdivide set of requirements and constraints into functional modules that can be examined within the framework of their external models
* Good design should:
o Consider such relationships between views
o Provide programmers with a set of restrictions that govern common entities

Division of Conceptual Model into External Models

Fig 2-13
Advantages of External Models

* Use of database subsets makes application program development much simpler
o Facilitates designer�s task by making it easier to identify specific data required to support each business unit�s operations
o Provides feedback about the conceptual model�s adequacy
* Creation of external models helps to ensure security constraints in the database design

External Models for Tiny College

The Physical Model

* Operates at lowest level of abstraction, describing the way data are saved on storage media such as disks or tapes
* Software and hardware dependent
* Requires that database designers have a detailed knowledge of the hardware and software used to implement database design

Levels of Data Abstraction

Summary

* A good DBMS will perform poorly with a poorly designed database
* A data model is a (relatively) simple abstraction of a complex real-world data-gathering environment
* Basic data modeling components are
o Entities
o Attributes
o Relationships

Files and Databases

Chapter 1

File Systems and Databases

Database: Persistent collection of data and; Metadata (data about the characteristics of the data and relationships
of the data)
Database Management System: collection of programs to manage the database; Manages and enforces database structure; Includes interface to manipulate the data; Allows data to be shared among multiple users

Data vs Information

Data:
- Raw facts; building blocks of information
- Unprocessed information
Information:
- Data processed to reveal meaning
Accurate, relevant, and timely information is key to good decision making
Good decision making is key to survival in global environment

Importance of DBMS

Makes data management more efficient and effective
Query language allows quick answers to ad hoc queries
Provides easier access to more and better-managed data
Promotes an integrated view of organization�s operations
Reduces the chance of inconsistent data
Helps protect against loss of data

DBMS Manages Interface Between Data and Users

Database Design: Why Design is important

The database is the foundation of the information system.
Design should reflect the expected use
Poor design results in unwanted redundancy
Poor design leads to inconsistent data
Poor design leads to poor performance
Poor design leads to improper information systems operation

Historical Roots

First business computer applications focused on clerical tasks
Requests for information quickly followed
File systems developed to address needs
Data organized according to expected use
Data Processing (DP) specialists computerized manual file systems

File Systems

Figure 1.5

File System Data Management

Requires extensive programming, typically in Third Generation Language
(3GL)
Leads to islands of information and data redundancy
Difficult to make ad hoc queries to obtain information
Difficult to maintain data integrity

Data and Structural Dependence

Data characteristics are embodied in programs not stored with the data.
Changes in data characteristics requires modifying programs
Changes in file structures require modification of related programs

Data Redundancy

Different and possibly conflicting versions of same data
Results in problems during data:
- Modification (e.g. address changes)
- Insertion
- Deletion
Data inconsistency: Lack of integrity

Database Systems

Database consists of logically related data stored in a single repository
Advantages over file system management approach:
- Eliminates inconsistency, data anomalies, data dependency, and structural
  dependency problems
- Stores data structures, relationships, and access paths

Database vs. File Systems

Figure 1.6

Database System Environment

Database System Types

Scale
- Single User (desktop)
- Workgroup
- Enterprise
- Distributed or Federated
Use
- Production/Transaction
- Decision Support/Data Warehouse

Uses of Databases

Transactional (or production):
- Supports a company�s day-to-day operations
Data warehouse:
- Stores data used to generate information required to make tactical or strategic decisions
- Such decisions typically require �data massaging�
- Often used to store historical data
- Structure is quite different

DBMS Functions

Metadata/Data Dictionary Management
Data storage management
Data transformation and presentation
Security management and Multiuser access control
Backup and recovery management
Data integrity management
Database language and application programming interfaces
Database communication interfaces

Database Models

Collection of logical constructs used to represent data structure and relationships

Conceptual Models: logical nature of data representation
Implementation Models: how data are represented

Database Models

The hierarchical and network models are of historical interest only.

Database Models:

Relational
Entity-Relationship
Object oriented

Relational Model

Most common model
Perceived by user as collection of tables containing data
Actually has a formal definitin based on set theory
Tables are a series of row/column intersections
Tables related by sharing common entity characteristic(s)

Relational Database

Relational Tables

Relational Database Model Advantages

Structural independence
Improved conceptual simplicity
Easier database design, implementation, management, and use
Ad hoc query capability with SQL (standard interface)
Powerful database management system

Relational Database Model Disadvantages

Substantial hardware and system software overhead
Poor design and implementation is made easy
Not a cure all: May promote "islands of information" problems
SQL is not completely standardized. One DBMS is not a "drop in"
replacement for another.
May have problems storing some types of data

Entity Relationship Database Model

Primarily a database design tool.
Complements the relational data model concepts
Represented in an entity relationship diagram (ERD)
Based on entities, attributes, and relationships

ER-Diagram

Figure 01-14

ER Model Advantages

Conceptual simplicity
Visual representation
Effective communication tool
Integrated with the relational database model

ER Model Disadvatages

Limited constraint representation
Limited relationship representation
No data manipulation language
Loss of information content
May be overly complex for end users

Object-Oriented Model

Objects or abstractions of real-world entities are stored
Attributes describe properties
Collection of similar objects is a class
Methods represent real world actions of classes
Classes are organized in a class hierarchy
Objects inherit attributes and methods of classes above.

Object Oriented Model

Fig 1.15

OO Model Advantages

Adds semantic context
Structural and data independence
May mesh well with Object Oriented Programming

OO Model Disadvantages

Lack of standards in model
Lack of standard manipulation languages
Complex navigational data access
Steep learning curve
Poor performance

Thursday, September 11, 2008

Interview questions for Linux admin

Advantages/disadvantages of script vs compiled program.
Name a replacement for PHP/Perl/MySQL/Linux/Apache and show main differences. .Why have you choosen such a combination of products?
Differences between two last MySQL versions. Which one would you choose and when/why?
Main differences between Apache 1.x and 2.x. Why is 2.x not so popular? Which one would you choose and when/why?
Which Linux distros do you have experience with?
Which distro you prefer? Why?
Which tool would you use to update Debian / Slackware / RedHat / Mandrake / SuSE ?
You’re asked to write an Apache module. What would you do?
Which tool do you prefer for Apache log reports?
Your portfolio. (even a PHP guest book may work well)
What does ‘route’ command do?
Differences between ipchains and iptables.
What’s eth0, ppp0, wlan0, ttyS0, etc.
What are different directories in / for?
Partitioning scheme for new webserver. Why?

Unix sysadmin interview questions

1. How would you make the following SQL statement run faster? SELECT * FROM TABLEA WHERE COL1=’A’ AND COL2=’B'; A: Make sure that COL1 and COL2 have indexes.Find out which condition will return less values and use that as the first conditonal.
2. What is Data Mining A: Data Minig is the process of sifting through extremeley large amounts of Data to find trends or relevent information.
3. Name the Seven layers in the OSI Model. A: Appication, Presentation, Session, Transport, Network, Data Link, Phyiscal
4. What is one way to view a unix network share on a Windows computer, within explorer A: NFS, The Unix computer can be running a NFS Server Daemon.
5. How would you find all the processes running on your computer. A: Unix, is ps -ef or ps -aux depending on version.
6. What is DHCP A: DHCP is a way to dynamically assign IP address to computers. Dyanmic Host Configuration Protocol
7. What is HTTP Tunneling A: HTTP Tunneling is a security method that encryptes packets traveling throught the internet. Only the intended reciepent should be able to decrypt the packets. Can be used to Create Virtual Private Networks. (VPN)
8. Scenario: You have 9 identical looking balls, however one ball is heavier than the others. You have two chances to use a balance. How do you find out which ball is the heaviest? A: Split into groups of three, randomly choose two groups and use balance on them. If one group is heavier, then discard the other 6 balls. If the two groups are the same weight. The heavier ball must be in the group that was not on the scale. Now randomly choose two balls and test on balance. If they are the same weight, the heaviest ball is on one that was not tested. Else the heaviest ball is already known from the balance

Unix/Linux programming interview questions

1. What is the major advantage of a hash table? (Asked by Silicon Magic Corp. people) Answer: The major advantage of a hash table is its speed. Because the hash function is to take a range of key values and transform them into index values in such a way that the key values are distributed randomly across all the indices of a hash table.
2. What are the techniques that you use to handle the collisions in hash tables?(Asked by Silicon Magic Corp. people)
Answer: We can use two major techniques to handle the collisions. They are open addressing and separate chaining. In open addressing, data items that hash to a full array cell are placed in another cell in the array. In separate chaining, each array element consist of a linked list. All data items hashing to a given array index are inserted in that list.
3. In Unix OS, what is the file server? (Asked by Silicon Magic Corp. people)
Answer: The file server is a machine that shares its disk storage and files with other machines on the network.
4. What is NFS? What is its job?(Asked by Silicon Magic Corp. people)
Answer: NFS stands for Network File System. NFS enables filesystems physically residing on one computer system to be used by other computers in the network, appearing to users on the remote host as just another local disk.
5. What is CVS? List some useful CVS commands.(Asked by Silicon Magic Corp.people)
Anser: CVS is Concurrent Version System. It is the front end to the RCS revision control system which extends the notion of revision control from a collection of files in a single directory to a hierarchical collection of directories consisting of revision controlled files. These directories and files can be combined together to form a software release. There are some useful commands that are being used very often. They are
cvs checkout cvs update cvs add cvs remove cvs commit

Unix/Linux administration interview questions

1.What is LILO?
LILO stands for Linux boot loader. It will load the MBR, master boot record, into the memory, and tell the system which partition and hard drive to boot from.
2.What is the main advantage of creating links to a file instead of copies of the file?
A: The main advantage is not really that it saves disk space (though it does that too) but, rather, that a change of permissions on the file is applied to all the link access points. The link will show permissions of lrwxrwxrwx but that is for the link itself and not the access to the file to which the link points. Thus if you want to change the permissions for a command, such as su, you only have to do it on the original. With copies you have to find all of the copies and change permission on each of the copies.
3.Write a command to find all of the files which have been accessed within the last 30 days.
find / -type f -atime -30 > December.files
This command will find all the files under root, which is ‘/’, with file type is file. ‘-atime -30′ will give all the files accessed less than 30 days ago. And the output will put into a file call December.files.
4.What is the most graceful way to get to run level single user mode?
A: The most graceful way is to use the command init s.If you want to shut everything down before going to single user mode then do init 0 first and from the ok prompt do a boot -s.
5.What does the following command line produce? Explain each aspect of this line.
$ (date ; ps -ef awk ‘{print $1}’ sort uniq wc -l ) >> Activity.log
A: First let’s dissect the line: The date gives the date and time as the first command of the line, this is followed by the a list of all running processes in long form with UIDs listed first, this is the ps -ef. These are fed into the awk which filters out all but the UIDs; these UIDs are piped into sort for no discernible reason and then onto uniq (now we see the reason for the sort - uniq only works on sorted data - if the list is A, B, A, then A, B, A will be the output of uniq, but if it’s A, A, B then A, B is the output) which produces only one copy of each UID.
These UIDs are fed into wc -l which counts the lines - in this case the number of distinct UIDs running processes on the system. Finally the results of these two commands, the date and the wc -l, are appended to the file "Activity.log". Now to answer the question as to what this command line produces. This writes the date and time into the file Activity.log together with the number of distinct users who have processes running on the system at that time. If the file already exists, then these items are appended to the file, otherwise the file is created.

SQL Server interview questions

How do you read transaction logs?
How do you reset or reseed the IDENTITY column?
How do you persist objects, permissions in tempdb?
How do you simulate a deadlock for testing purposes?
How do you rename an SQL Server computer?
How do you run jobs from T-SQL?
How do you restore single tables from backup in SQL Server 7.0/2000? In SQL Server 6.5?
Where to get the latest MDAC from?
I forgot/lost the sa password. What do I do?
I have only the .mdf file backup and no SQL Server database backups. Can I get my database back into SQL Server?
How do you add a new column at a specific position (say at the beginning of the table or after the second column) using ALTER TABLE command?
How do you change or alter a user defined data type?
How do you rename an SQL Server 2000 instance?
How do you capture/redirect detailed deadlock information into the error logs?
How do you remotely administer SQL Server?
What are the effects of switching SQL Server from ‘Mixed mode’ to ‘Windows only’ authentication mode? What are the steps required, to not break existing applications?
Is there a command to list all the tables and their associated filegroups?
How do you ship the stored procedures, user defined functions (UDFs), triggers, views of my application, in an encrypted form to my clients/customers? How do you protect intellectual property?
How do you archive data from my tables? Is there a built-in command or tool for this?
How do you troubleshoot ODBC timeout expired errors experienced by applications accessing SQL Server databases?
How do you restart SQL Server service automatically at regular intervals?
What is the T-SQL equivalent of IIF (immediate if/ternary operator) function of other programming languages?
How do you programmatically find out when the SQL Server service started?
How do you get rid of the time part from the date returned by GETDATE function?
How do you upload images or binary files into SQL Server tables?
How do you run an SQL script file that is located on the disk, using T-SQL?
How do you get the complete error message from T-SQL while error handling?
How do you get the first day of the week, last day of the week and last day of the month using T-SQL date functions?
How do you pass a table name, column name etc. to the stored procedure so that I can dynamically select from a table?
Error inside a stored procedure is not being raised to my front-end applications using ADO. But I get the error when I run the procedure from Query Analyzer.
How do you suppress error messages in stored procedures/triggers etc. using T-SQL?
How do you save the output of a query/stored procedure to a text file?
How do you join tables from different databases?
How do you join tables from different servers?
How do you convert timestamp data to date data (datetime datatype)?
Can I invoke/instantiate COM objects from within stored procedures or triggers using T-SQL?
Oracle has a rownum to access rows of a table using row number or row id. Is there any equivalent for that in SQL Server? Or How do you generate output with row number in SQL Server?
How do you specify a network library like TCP/IP using ADO connect string?
How do you generate scripts for repetitive tasks like truncating all the tables in a database, changing owner of all the database objects, disabling constraints on all tables etc?
Is there a way to find out when a stored procedure was last updated?
How do you find out all the IDENTITY columns of all the tables in a given database?
How do you search the code of stored procedures?
How do you retrieve the generated GUID value of a newly inserted row? Is there an @@GUID, just like @@IDENTITY?

SQL Server, DBA interview questions

Questions are categorized under the following sections, for your convenience:

Database design (8 questions)

SQL Server architecture (12 questions)

Database administration (13 questions)

Database programming (10 questions)

Database design

What is normalization? Explain different levels of normalization?

Check out the article Q100139 from Microsoft knowledge base and of course, there’s much more information available in the net. It’ll be a good idea to get a hold of any RDBMS fundamentals text book, especially the one by C. J. Date. Most of the times, it will be okay if you can explain till third normal form.

What is denormalization and when would you go for it?

As the name indicates, denormalization is the reverse process of normalization. It’s the controlled introduction of redundancy in to the database design. It helps improve the query performance as the number of joins could be reduced.

How do you implement one-to-one, one-to-many and many-to-many relationships while designing tables?

One-to-One relationship can be implemented as a single table and rarely as two tables with primary and foreign key relationships. One-to-Many relationships are implemented by splitting the data into two tables with primary key and foreign key relationships. Many-to-Many relationships are implemented using a junction table with the keys from both the tables forming the composite primary key of the junction table. It will be a good idea to read up a database designing fundamentals text book.

What’s the difference between a primary key and a unique key?

Both primary key and unique enforce uniqueness of the column on which they are defined. But by default primary key creates a clustered index on the column, where are unique creates a nonclustered index by default. Another major difference is that, primary key doesn’t allow NULLs, but unique key allows one NULL only.

What are user defined datatypes and when you should go for them?

User defined datatypes let you extend the base SQL Server datatypes by providing a descriptive name, and format to the database. Take for example, in your database, there is a column called Flight_Num which appears in many tables. In all these tables it should be varchar(8). In this case you could create a user defined datatype called Flight_num_type of varchar(8) and use it across all your tables. See sp_addtype, sp_droptype in books online.

What is bit datatype and what’s the information that can be stored inside a bit column?

Bit datatype is used to store boolean information like 1 or 0 (true or false). Untill SQL Server 6.5 bit datatype could hold either a 1 or 0 and there was no support for NULL. But from SQL Server 7.0 onwards, bit datatype can represent a third state, which is NULL.

Define candidate key, alternate key, composite key.

A candidate key is one that can identify each row of a table uniquely. Generally a candidate key becomes the primary key of the table. If the table has more than one candidate key, one of them will become the primary key, and the rest are called alternate keys. A key formed by combining at least two or more columns is called composite key.

What are defaults? Is there a column to which a default can’t be bound?

A default is a value that will be used by a column, if no value is supplied to that column while inserting data. IDENTITY columns and timestamp columns can’t have defaults bound to them. See CREATE DEFAULT in books online.

What is a transaction and what are ACID properties?

A transaction is a logical unit of work in which, all the steps must be performed or none. ACID stands for Atomicity, Consistency, Isolation, Durability. These are the properties of a transaction. For more information and explanation of these properties, see SQL Server books online or any RDBMS fundamentals text book. Explain different isolation levels An isolation level determines the degree of isolation of data between concurrent transactions. The default SQL Server isolation level is Read Committed. Here are the other isolation levels (in the ascending order of isolation): Read Uncommitted, Read Committed, Repeatable Read, Serializable. See SQL Server books online for an explanation of the isolation levels. Be sure to read about SET TRANSACTION ISOLATION LEVEL, which lets you customize the isolation level at the connection level. Read Committed - A transaction operating at the Read Committed level cannot see changes made by other transactions until those transactions are committed. At this level of isolation, dirty reads are not possible but nonrepeatable reads and phantoms are possible. Read Uncommitted - A transaction operating at the Read Uncommitted level can see uncommitted changes made by other transactions. At this level of isolation, dirty reads, nonrepeatable reads, and phantoms are all possible. Repeatable Read - A transaction operating at the Repeatable Read level is guaranteed not to see any changes made by other transactions in values it has already read. At this level of isolation, dirty reads and nonrepeatable reads are not possible but phantoms are possible. Serializable - A transaction operating at the Serializable level guarantees that all concurrent transactions interact only in ways that produce the same effect as if each transaction were entirely executed one after the other. At this isolation level, dirty reads, nonrepeatable reads, and phantoms are not possible.

CREATE INDEX myIndex ON myTable(myColumn)What type of Index will get created after executing the above statement?

Non-clustered index. Important thing to note: By default a clustered index gets created on the primary key, unless specified otherwise.

What’s the maximum size of a row?

8060 bytes. Don’t be surprised with questions like ‘what is the maximum number of columns per table’. 1024 columns per table. Check out SQL Server books online for the page titled: "Maximum Capacity Specifications". Explain Active/Active and Active/Passive cluster configurations Hopefully you have experience setting up cluster servers. But if you don’t, at least be familiar with the way clustering works and the two clusterning configurations Active/Active and Active/Passive. SQL Server books online has enough information on this topic and there is a good white paper available on Microsoft site. Explain the architecture of SQL Server This is a very important question and you better be able to answer it if consider yourself a DBA. SQL Server books online is the best place to read about SQL Server architecture. Read up the chapter dedicated to SQL Server Architecture.

What is lock escalation?

Lock escalation is the process of converting a lot of low level locks (like row locks, page locks) into higher level locks (like table locks). Every lock is a memory structure too many locks would mean, more memory being occupied by locks. To prevent this from happening, SQL Server escalates the many fine-grain locks to fewer coarse-grain locks. Lock escalation threshold was definable in SQL Server 6.5, but from SQL Server 7.0 onwards it’s dynamically managed by SQL Server.

What’s the difference between DELETE TABLE and TRUNCATE TABLE commands?

DELETE TABLE is a logged operation, so the deletion of each row gets logged in the transaction log, which makes it slow. TRUNCATE TABLE also deletes all the rows in a table, but it won’t log the deletion of each row, instead it logs the deallocation of the data pages of the table, which makes it faster. Of course, TRUNCATE TABLE can be rolled back. TRUNCATE TABLE is functionally identical to DELETE statement with no WHERE clause: both remove all rows in the table. But TRUNCATE TABLE is faster and uses fewer system and transaction log resources than DELETE. The DELETE statement removes rows one at a time and records an entry in the transaction log for each deleted row. TRUNCATE TABLE removes the data by deallocating the data pages used to store the table’s data, and only the page deallocations are recorded in the transaction log. TRUNCATE TABLE removes all rows from a table, but the table structure and its columns, constraints, indexes and so on remain. The counter used by an identity for new rows is reset to the seed for the column. If you want to retain the identity counter, use DELETE instead. If you want to remove table definition and its data, use the DROP TABLE statement. You cannot use TRUNCATE TABLE on a table referenced by a FOREIGN KEY constraint; instead, use DELETE statement without a WHERE clause. Because TRUNCATE TABLE is not logged, it cannot activate a trigger. TRUNCATE TABLE may not be used on tables participating in an indexed view

Explain the storage models of OLAP

Check out MOLAP, ROLAP and HOLAP in SQL Server books online for more infomation.

What are the new features introduced in SQL Server 2000 (or the latest release of SQL Server at the time of your interview)? What changed between the previous version of SQL Server and the current version?

This question is generally asked to see how current is your knowledge. Generally there is a section in the beginning of the books online titled "What’s New", which has all such information. Of course, reading just that is not enough, you should have tried those things to better answer the questions. Also check out the section titled "Backward Compatibility" in books online which talks about the changes that have taken place in the new version.

What are constraints? Explain different types of constraints.

Constraints enable the RDBMS enforce the integrity of the database automatically, without needing you to create triggers, rule or defaults. Types of constraints: NOT NULL, CHECK, UNIQUE, PRIMARY KEY, FOREIGN KEY. For an explanation of these constraints see books online for the pages titled: "Constraints" and "CREATE TABLE", "ALTER TABLE"

What is an index? What are the types of indexes? How many clustered indexes can be created on a table? I create a separate index on each column of a table. What are the advantages and disadvantages of this approach?

Indexes in SQL Server are similar to the indexes in books. They help SQL Server retrieve the data quicker. Indexes are of two types. Clustered indexes and non-clustered indexes. When you create a clustered index on a table, all the rows in the table are stored in the order of the clustered index key. So, there can be only one clustered index per table. Non-clustered indexes have their own storage separate from the table data storage. Non-clustered indexes are stored as B-tree structures (so do clustered indexes), with the leaf level nodes having the index key and it’s row locater. The row located could be the RID or the Clustered index key, depending up on the absence or presence of clustered index on the table. If you create an index on each column of a table, it improves the query performance, as the query optimizer can choose from all the existing indexes to come up with an efficient execution plan. At the same t ime, data modification operations (such as INSERT, UPDATE, DELETE) will become slow, as every time data changes in the table, all the indexes need to be updated. Another disadvantage is that, indexes need disk space, the more indexes you have, more disk space is used.

What is RAID and what are different types of RAID configurations?

RAID stands for Redundant Array of Inexpensive Disks, used to provide fault tolerance to database servers. There are six RAID levels 0 through 5 offering different levels of performance, fault tolerance. MSDN has some information about RAID levels and for detailed information, check out the RAID advisory board’s homepage

What are the steps you will take to improve performance of a poor performing query?

This is a very open ended question and there could be a lot of reasons behind the poor performance of a query. But some general issues that you could talk about would be: No indexes, table scans, missing or out of date statistics, blocking, excess recompilations of stored procedures, procedures and triggers without SET NOCOUNT ON, poorly written query with unnecessarily complicated joins, too much normalization, excess usage of cursors and temporary tables. Some of the tools/ways that help you troubleshooting performance problems are: SET SHOWPLAN_ALL ON, SET SHOWPLAN_TEXT ON, SET STATISTICS IO ON, SQL Server Profiler, Windows NT /2000 Performance monitor, Graphical execution plan in Query Analyzer. Download the white paper on performance tuning SQL Server from Microsoft web site. Don’t forget to check out sql-server-performance.com

What are the steps you will take, if you are tasked with securing an SQL Server?

Again this is another open ended question. Here are some things you could talk about: Preferring NT authentication, using server, databse and application roles to control access to the data, securing the physical database files using NTFS permissions, using an unguessable SA password, restricting physical access to the SQL Server, renaming the Administrator account on the SQL Server computer, disabling the Guest account, enabling auditing, using multiprotocol encryption, setting up SSL, setting up firewalls, isolating SQL Server from the web server etc. Read the white paper on SQL Server security from Microsoft website. Also check out My SQL Server security best practices

What is a deadlock and what is a live lock? How will you go about resolving deadlocks?

Deadlock is a situation when two processes, each having a lock on one piece of data, attempt to acquire a lock on the other’s piece. Each process would wait indefinitely for the other to release the lock, unless one of the user processes is terminated. SQL Server detects deadlocks and terminates one user’s process. A livelock is one, where a request for an exclusive lock is repeatedly denied because a series of overlapping shared locks keeps interfering. SQL Server detects the situation after four denials and refuses further shared locks. A livelock also occurs when read transactions monopolize a table or page, forcing a write transaction to wait indefinitely. Check out SET DEADLOCK_PRIORITY and "Minimizing Deadlocks" in SQL Server books online. Also check out the article Q169960 from Microsoft knowledge base.

What is blocking and how would you troubleshoot it?

Blocking happens when one connection from an application holds a lock and a second connection requires a conflicting lock type. This forces the second connection to wait, blocked on the first. Read up the following topics in SQL Server books online: Understanding and avoiding blocking, Coding efficient transactions. Explain CREATE DATABASE syntax Many of us are used to creating databases from the Enterprise Manager or by just issuing the command: CREATE DATABAE MyDB.

But what if you have to create a database with two filegroups, one on drive C and the other on drive D with log on drive E with an initial size of 600 MB and with a growth factor of 15%?

That’s why being a DBA you should be familiar with the CREATE DATABASE syntax. Check out SQL Server books online for more information.

How to restart SQL Server in single user mode? How to start SQL Server in minimal configuration mode?

SQL Server can be started from command line, using the SQLSERVR.EXE. This EXE has some very important parameters with which a DBA should be familiar with. -m is used for starting SQL Server in single user mode and -f is used to start the SQL Server in minimal configuration mode. Check out SQL Server books online for more parameters and their explanations.

As a part of your job, what are the DBCC commands that you commonly use for database maintenance?

DBCC CHECKDB, DBCC CHECKTABLE, DBCC CHECKCATALOG, DBCC CHECKALLOC, DBCC SHOWCONTIG, DBCC SHRINKDATABASE, DBCC SHRINKFILE etc. But there are a whole load of DBCC commands which are very useful for DBAs. Check out SQL Server books online for more information.

What are statistics, under what circumstances they go out of date, how do you update them?

Statistics determine the selectivity of the indexes. If an indexed column has unique values then the selectivity of that index is more, as opposed to an index with non-unique values. Query optimizer uses these indexes in determining whether to choose an index or not while executing a query. Some situations under which you should update statistics: 1) If there is significant change in the key values in the index 2) If a large amount of data in an indexed column has been added, changed, or removed (that is, if the distribution of key values has changed), or the table has been truncated using the TRUNCATE TABLE statement and then repopulated 3) Database is upgraded from a previous version. Look up SQL Server books online for the following commands: UPDATE STATISTICS, STATS_DATE, DBCC SHOW_STATISTICS, CREATE STATISTICS, DROP STATISTICS, sp_autostats, sp_createstats, sp_updatestats

What are the different ways of moving data/databases between servers and databases in SQL Server?

There are lots of options available, you have to choose your option depending upon your requirements. Some of the options you have are: BACKUP/RESTORE, dettaching and attaching databases, replication, DTS, BCP, logshipping, INSERT…SELECT, SELECT…INTO, creating INSERT scripts to generate data.

Explain different types of BACKUPs avaialabe in SQL Server? Given a particular scenario, how would you go about choosing a backup plan?

Types of backups you can create in SQL Sever 7.0+ are Full database backup, differential database backup, transaction log backup, filegroup backup. Check out the BACKUP and RESTORE commands in SQL Server books online. Be prepared to write the commands in your interview. Books online also has information on detailed backup/restore architecture and when one should go for a particular kind of backup.

What is database replication? What are the different types of replication you can set up in SQL Server?

Replication is the process of copying/moving data between databases on the same or different servers. SQL Server supports the following types of replication scenarios: � Snapshot replication � Transactional replication (with immediate updating subscribers, with queued updating subscribers) � Merge replication See SQL Server books online for indepth coverage on replication. Be prepared to explain how different replication agents function, what are the main system tables used in replication etc.

How to determine the service pack currently installed on SQL Server?

The global variable @@Version stores the build number of the sqlservr.exe, which is used to determine the service pack installed. To know more about this process visit SQL Server service packs and versions.

What are cursors? Explain different types of cursors. What are the disadvantages of cursors? How can you avoid cursors?

Cursors allow row-by-row processing of the resultsets. Types of cursors: Static, Dynamic, Forward-only, Keyset-driven. See books online for more information. Disadvantages of cursors: Each time you fetch a row from the cursor, it results in a network roundtrip, where as a normal SELECT query makes only one roundtrip, however large the resultset is. Cursors are also costly because they require more resources and temporary storage (results in more IO operations). Further, there are restrictions on the SELECT statements that can be used with some types of cursors. Most of the times, set based operations can be used instead of cursors. Here is an example: If you have to give a flat hike to your employees using the following criteria: Salary between 30000 and 40000 — 5000 hike Salary between 40000 and 55000 — 7000 hike Salary between 55000 and 65000 — 9000 hike. In this situation many developers tend to use a cursor, determine each employee’s salary and update his salary according to the above formula. But the same can be achieved by multiple update statements or can be combined in a single UPDATE statement as shown below:

UPDATE tbl_emp SET salary = CASE WHEN salary BETWEEN 30000 AND 40000 THEN salary + 5000 WHEN salary BETWEEN 40000 AND 55000 THEN salary + 7000 WHEN salary BETWEEN 55000 AND 65000 THEN salary + 10000 END

Another situation in which developers tend to use cursors: You need to call a stored procedure when a column in a particular row meets certain condition. You don’t have to use cursors for this. This can be achieved using WHILE loop, as long as there is a unique key to identify each row. For examples of using WHILE loop for row by row processing, check out the ‘My code library’ section of my site or search for WHILE. Write down the general syntax for a SELECT statements covering all the options. Here’s the basic syntax: (Also checkout SELECT in books online for advanced syntax).

SELECT select_list [INTO new_table_] FROM table_source [WHERE search_condition] [GROUP BY group_by_expression] [HAVING search_condition] [ORDER BY order_expression [ASC DESC] ]

What is a join and explain different types of joins.

Joins are used in queries to explain how different tables are related. Joins also let you select data from a table depending upon data from another table. Types of joins: INNER JOINs, OUTER JOINs, CROSS JOINs. OUTER JOINs are further classified as LEFT OUTER JOINS, RIGHT OUTER JOINS and FULL OUTER JOINS. For more information see pages from books online titled: "Join Fundamentals" and "Using Joins".

Can you have a nested transaction?

Yes, very much. Check out BEGIN TRAN, COMMIT, ROLLBACK, SAVE TRAN and @@TRANCOUNT

What is an extended stored procedure? Can you instantiate a COM object by using T-SQL?

An extended stored procedure is a function within a DLL (written in a programming language like C, C++ using Open Data Services (ODS) API) that can be called from T-SQL, just the way we call normal stored procedures using the EXEC statement. See books online to learn how to create extended stored procedures and how to add them to SQL Server. Yes, you can instantiate a COM (written in languages like VB, VC++) object from T-SQL by using sp_OACreate stored procedure. Also see books online for sp_OAMethod, sp_OAGetProperty, sp_OASetProperty, sp_OADestroy. For an example of creating a COM object in VB and calling it from T-SQL, see ‘My code library’ section of this site.

What is the system function to get the current user’s user id?

USER_ID(). Also check out other system functions like USER_NAME(), SYSTEM_USER, SESSION_USER, CURRENT_USER, USER, SUSER_SID(), HOST_NAME().

What are triggers? How many triggers you can have on a table? How to invoke a trigger on demand?

Triggers are special kind of stored procedures that get executed automatically when an INSERT, UPDATE or DELETE operation takes place on a table. In SQL Server 6.5 you could define only 3 triggers per table, one for INSERT, one for UPDATE and one for DELETE. From SQL Server 7.0 onwards, this restriction is gone, and you could create multiple triggers per each action. But in 7.0 there’s no way to control the order in which the triggers fire. In SQL Server 2000 you could specify which trigger fires first or fires last using sp_settriggerorder. Triggers can’t be invoked on demand. They get triggered only when an associated action (INSERT, UPDATE, DELETE) happens on the table on which they are defined. Triggers are generally used to implement business rules, auditing. Triggers can also be used to extend the referential integrity checks, but wherever possible, use constraints for this purpose, instead of triggers, as constraints are much faster. Till SQL Server 7.0, triggers fire only after the data modification operation happens. So in a way, they are called post triggers. But in SQL Server 2000 you could create pre triggers also. Search SQL Server 2000 books online for INSTEAD OF triggers. Also check out books online for ‘inserted table’, ‘deleted table’ and COLUMNS_UPDATED()

There is a trigger defined for INSERT operations on a table, in an OLTP system. The trigger is written to instantiate a COM object and pass the newly insterted rows to it for some custom processing. What do you think of this implementation? Can this be implemented better?

Instantiating COM objects is a time consuming process and since you are doing it from within a trigger, it slows down the data insertion process. Same is the case with sending emails from triggers. This scenario can be better implemented by logging all the necessary data into a separate table, and have a job which periodically checks this table and does the needful.

What is a self join? Explain it with an example.

Self join is just like any other join, except that two instances of the same table will be joined in the query. Here is an example: Employees table which contains rows for normal employees as well as managers. So, to find out the managers of all the employees, you need a self join.

CREATE TABLE emp ( empid int, mgrid int, empname char(10) )

INSERT emp SELECT 1,2,’Vyas’ INSERT emp SELECT 2,3,’Mohan’ INSERT emp SELECT 3,NULL,’Shobha’ INSERT emp SELECT 4,2,’Shridhar’ INSERT emp SELECT 5,2,’Sourabh’

SELECT t1.empname [Employee], t2.empname [Manager] FROM emp t1, emp t2 WHERE t1.mgrid = t2.empid Here’s an advanced query using a LEFT OUTER JOIN that even returns the employees without managers (super bosses)

SELECT t1.empname [Employee], COALESCE(t2.empname, ‘No manager’) [Manager] FROM emp t1 LEFT OUTER JOIN emp t2 ON t1.mgrid = t2.empid

Thursday, November 6, 2008

Wednesday, November 5, 2008

Chapter 1 File Systems and Databases

Data vs Information

Importance of DBMS

DBMS Manages Interface Between Data and Users

Database Design: Why Design is important

Historical Roots

File Systems

File System Data Management

Data and Structural Dependence

Data Redundancy

Database Systems

Database vs. File Systems

Database System Environment

Database System Types

Uses of Databases

DBMS Functions

Database Models

Database Models

Relational Model

Relational Database

Relational Database Model Advantages

Relational Database Model Disadvantages

Entity Relationship Database Model

ER-Diagram

ER Model Advantages

ER Model Disadvatages

Object-Oriented Model

Object Oriented Model

OO Model Advantages

OO Model Disadvantages

Thursday, September 11, 2008

Interview questions for Linux admin

Unix sysadmin interview questions

Unix/Linux programming interview questions

Unix/Linux administration interview questions

SQL Server interview questions

SQL Server, DBA interview questions

Blog Archive

About Me

Chapter 1

File Systems and Databases