Monday, March 28, 2011

RDBMS Fundamentals

RDBMS FUNDAMENTALS 

I. INTRODUCING DATABASES :

Concept of a Database :

Traditional Approach : In this approach, independent application programs access their own independent data files. This results in many problems in data storage and retrieval.
Database Approach : In this approach, all application access a common database, which is a centralized data storage system. This approach has the following advantages :
Redundancy of data storage is reduced, Inconsistency in data is eliminated & Data sharing between applications is possible.

Interacting with a Database :

Database Management System (DBMS) : DBMS is a software that interfaces between applications and a database for all data processing activities.
Users of a DBMS : End Users, Application Programmers and Database Administrators use a DBMS, either directly or indirectly.

How users interact with a Database :
1. End users send queries to the DBMS through applications.
2. The DBMS translates the queries.
3. The DBMS retrieves data from the database.
4. The DBMS sends data to the application, which present the data to the end users.

Functions of a DBMS :

Function of DBMS Description Provided by using
Defining the data structure Defining structure of data to be stored in database Data Definition Language (DDL)
Manipulating Data Retrieving, adding, modifying, deleting data. Data Manipulation Language (DML)
Data Security Preventing unauthorized access to data. User-ids and Passwords.
Control of Data Access Allowing users to use only relevant data Data Control Language (DCL)
Architecture of a Database :
Need for an Architecture : The details about complexity and structure of data in a database in not required by end-users. Therefore, differentiating what the end-users see and what is actually there in a database is important.
Architecture of a Database : The architecture of a database comprises a set of three levels at which a database can be viewed.
External Level or View, Conceptual Level or View & Internal Level or View.

II. USING RELATIONAL DATABASE :

Basics of Relational Database :
Relational Database Management System (RDBMS) : RDBMS is the most popular form of DBMS used in the world. It uses a relational database to organize data. A relational database comprise relations, which are represented as tables.

Relation : A relation stores information about an object in the real world. A relation is represented as a table.
Attribute : Each attribute of a relation stores a piece of information about an object. Attributes are represented as columns in a tables and can be arranged in any order. Each attribute in a relation is unique and contain atomic values i.e. Atomic value contain a single value of data and Non-Atomic values contain a set of values. The number of attributes in a relation is called the degree of the relation.
Tuple : A row in a table is called a tuple of the relation. The number of tuples in a relation is known as the cardinality of the relation. Tuples in a table are unique and can be arranged in any order.
Domain : A domain is a set of valid atomic values that an attribute can take. Within a single database, an attribute cannot have different domains associated with it. A domain can include a null value, if the value for the domain is unknown or does not exist.

Identifiers for Relations :
Primary Key : An attribute that uniquely identifies a row in a table is called its primary key. A relation can have only one primary key. The primary key cannot have any null values. In case no unique key is found in a relation, two or more attributes can be treated as the primary key. Such keys are called Composite Keys.

Candidate Key : A relation can have more than one attribute that uniquely identifies a tuple. Any one of these keys can be selected as the primary key. All such attributes are called Candidate Keys. All candidate keys that are not primary keys are called Alternate Keys.

Foreign Key : An attribute that is not a candidate key is called a Nonkey. A nonkey attribute of a relation whose value matches the primary key in some other table is called Foreign Key OR is a column in a table that uniquely identifies rows from a different table.

III. INTERPRETING DATA :

Entities and Relationships :

Entity : An entity is an object that exists in the real world and is distinguishable from other objects. Each entity is represented as a table in a relational database.

Types of Entities : Entities can be classified in two ways - based on existence and based on subsets.

Based on existence, entities can be classified as Dominant and Weak entities.
Based on subsets, entities can be classifies as Supertypes and Subtypes.

Relationships : A relationship is an association between two entities.

Types of Relationships : Relationships are classified into three types based on the occurrence of the related entities.

One-to-One(1-1), One-to-Many(1-M) & Many-to-Many(M-M).
Using E/R Diagram : A E/R diagram represent entities and relationships in a database system.
Reducing E/R Diagrams to Relations :

Mapping Entities : A dominant entity is mapped to a new relation. A weak entity is mapped to a new relation. The primary key of the corresponding dominant entity is included as the foreign key in the weak entity relation.

Supertypes and subtypes are mapped to separate relations. The primary key of the supertype becomes the primary key of the subtype.

Mapping Relationships : A 1-1 relationship is mapped using a foreign key. The primary key of either of the entities is include as a foreign key in the relation of the other entity. This relationship is rare, because data elements related in this way are normally placed in the same table.

A 1-M or M-1 is mapped by introducing a foreign key. A primary key is the "one" side of the relationship, and the foreign key is the "many" side of the relationship. This relationship are most common.

A M-M involves the creation of a new relation. M-M are problematic and cannot be adequately expressed directly in a relational db. It is expressed using intersection tables. An intersection table contains two (or more) foreign keys, relating the primary key values of two (or more) tables to each other. The role of an intersection table is to convert the M-M into two 1-M relationships that can be easily handled by the database.

IV. SIMPLIFYING DATA :
Need for Simplifying Data :

Normalization : Normalization is a formal process of developing data structures in a manner that eliminates redundancy and promotes integrity. You need to simplify structure of data in relations for easy storage and retrieval. The process of simplifying relations is called normalization. The new relations that are obtained after normalization are called normalized relations.

Normalization has three well defined steps :

The relations that you get at the end of the first step are said to be in 1NF.
The relations that you get at the end of the second step are said to be in 2NF.
The relations that you get at the end of the third step are said to be in 3NF.

Simplifying Data to 1NF (Eliminate Repeating Groups) : A repeating group is a set of columns that store similar info that repeats in the same table. To simplify data to 1NF, you ensure that all attributes values in a relation have atomic values. If there are attributes in a relation with non-atomic values, move these attributes to a new relation and choose an appropriate primary key for it. E.g. SupItem Table Item field having atomic.

Simplifying Data to 2NF (Eliminate Redundant Data) :

Redundant data is data that is expressed multiple times unnecessarily, or depends only on part of a multi-valued key.

Functionally Dependent Attributes : Functionally Dependent Attributes are those that belong to a single entity or relationship and depend on its unique identifier. To simplify data to 2NF, you ensure that all nonkey attributes in a relation are functionally dependent on the whole key and not part of the key.

Conversion from 1NF to 2NF : To convert a relation in 1NF to 2NF, move all nonkey attributes that are not wholly dependent on the primary key, to a new relation. Then, choose an appropriate primary key for the new relation. E.g. Separating Sup. table and Item table.

Simplifying Data to 3NF (Eliminate Columns not Dependent on the Key) :

Columns in each table should be a group of columns in which the data in each column contributes to the description of each row in the table.

Transitively Dependent Attributes : Transitively Dependent Attributes in a relation are those that are dependent on a nonkey attribute and not the primary key. To simplify data to 3NF, you ensure that there are no attributes in a relation that are transitively dependent on other attributes.

Conversion from 2NF to 3NF : To convert a relation in 2NF to 3NF, move all transitively dependent attributes to a new relation. Then, choose an appropriate primary key for the new relation. E.g. Status is dependent on City in Sup. table, so move those two to separate table.

Simplifying Data to 4NF (Isolate Independent Multiple Relationships) :

V. STORING & RETRIEVING DATA :
Language Support for an RDBMS :

SQL :SQL is the language that provides command to interact with the data in the database. SQL consists of three components - DDL, DML & DCL.

DDL : DDL comprises commands you can use to create and modify the database structure.
DML : DML comprises commands you can use to add, modify, delete and query data in the database.
DCL : DCL comprises commands you can use to control the user access to the database.

Organizing the Database :
Base Tables : A database comprises base tables, which have the following features :
They physically exist on the disk, Each of them has a unique name & they contain data that is crucial to an organization.

Their attributes have data types such as character, integer, decimal, date and time.
CREATE TABLE : This is a DDL command in SQL that creates a new table in a database.
Syntax : CREATE TABLE table-name (column-name data-type [[size]]
NOT NULL/DEFAULT default-value]]
CHECK (column-name > 0)
UNIQUE (column-name)
PRIMARY KEY (column-name)
FOREIGN KEY (column-name) REFERENCES table-name)
ALTER TABLE : This is a DDL command in SQL that modifies the structure of an existing table.
Syntax : ALTER TABLE table-name
ADD (column-name data-type [[size]] [[NOT NULL DEFAULT]]...)
primary key definition / foreign key definition
DROP PRIMARY KEY / DROP FOREIGN KEY)
DROP TABLE : This is DDL command in SQL that deletes the an existing table. Once you delete a table, all data contained in it is lost and cannot be recovered. The storage space used by this table is also released.
Syntax : DROP TABLE table-name
Interacting with a Database :
SELECT : This is a DML command in SQL that retrieves data from the database in the form of query results. The command supports the following keywords and clauses :
FROM This keyword specifies the name of the table.
* This keyword selects all the columns of the table.
WHERE This keyword gives the search condition that specifies the data to be retrieved.
AND This operator is used to combine two or more search conditions.
ORDER BY This keyword sorts the query result on one or more columns.
GROUP BY This keyword groups the query result and lets you generate summary result for each group.
NULL values This value indicates that the data is not present.
Subquery This is the query that is place inside the main query. It passes its query result to the main query.
INSERT : This is a DML command in SQL that you use to add data in rows of a table.
SYNTAX : INSERT INTO table-name (column-names) VALUES (constant/NULL)
UPDATE : This is a DML command in SQL that you use to change data on rows of a table.
Syntax : UPDATE table-name SET column-name-value WHERE condition
DELETE : This is a DML command in SQL that removes one or more rows of data from a table.
Syntax : DELETE FROM table-name WHERE condition.
End-user's View of a Database :
Views : Views are relations that are derived from one or more source tables. Views have the following features:
Views let you restrict the access to data so that end-users see data relevant to them.
Views do not physically exist in the database and only their definition is stored by an RDBMS.
An RDBMS accesses the source tables for data to be retrieved from a view.
Any changes that users make to views do not reflect in the source tables if the view has been created using a Join condition.

Views created WITH CHECK OPTION allows for an added measure of security in a view. For example, the user will not be able to insert or update a row that could not be selected by the view-with check option prevents this from happening.

CREATE VIEW : A view can be created using the CREATE VIEW command.
Syntax : CREATE VIEW view-name (column-names) AS query.
Retrieving Data from a View : Once you create a view, you can retrieve data from it using the SELECT command, just as you do for a table.

Restricting Access to a Database :
GRANT : This is a DCL command in SQL that you use to grant a specific set of authorities to one or more users.
Syntax : GRANT (SQL command) (column-names) ON table-name TO user-name.
REVOKE : This is a DCL command in SQL that you use to take away a specific set of authorities from one or more users.
Syntax : REVOKE (SQL command) ON table-name TO user-name.

VI. ENSURING INTEGRITY OF DATA :
The concept of Data Integrity :

Data Integrity : Data Integrity refers to the correctness and completeness of data in a database.
Integrity Constraints : Integrity constraints allows only correct changes to be made to a database. There are two types of integrity constraints - entity integrity and referential integrity.

Entity Integrity : Entity Integrity ensures that for each row in a table, the value of the primary key is unique and is not null.
Referential Integrity : Referential Integrity ensures that for each row in a table, the value of the foreign key is present in the reference table.

Grouping commands related to a task :

Transaction Processing : A transaction is a sequence of one or more SQL commands that together form a logical task. Transaction Processing ensures that when the RDBMS is making changes related to a single task, either all changes are made as a unit or no changes are made.

Commit : Commit is an SQL command that indicates the successful end of a transaction. After an RDBMS executes this command all the changes are made to the database.

Rollback : Rollback is an SQL command that cancels a transaction before it is complete. The rollback command removes the changes of all previous commands in a transaction from the buffer.
Controlling Concurrent Data Access :

Concurrency Control : All RDBMS must ensure that the transactions of concurrent users do not interfere with each other. If it does not handle the transactions properly, the problems of lost update, uncommitted data, or inconsistent data might occur.

Lost Update Problem : Lost update problem occurs when an update made by a transaction is lost due to an update made by another transaction.

Uncommitted Data Problem : Uncommitted data problem occurs when a transaction accesses data that has been updated by a previous transaction that has not yet ended.
Inconsistent Data Problem : Inconsistent data problem occurs when a transaction accesses data from the database and simultaneously another transaction is changing that data.
Locking : Locking is a facility provided by an RDBMS to ensure that a transaction does not interfere with any other transaction. Locking prevents the problem of lost update, uncommitted data and inconsistent data. An RDBMS provided two types of locks for locking a part of the database - shared locks and exclusive locks.
Shared Locks : If a transaction is only reading data from a database, it gets a shared lock on that part of the database. Other transactions can also get a shared lock on that part of the database to read data. However, they cannot change the data.

Exclusive Locks : If a transaction is updating data in a database, it gets an exclusive lock on that part of the database. No other transaction can read or change this data.

Client Server Computing Model


I. Client Server Paradigm :
Introduction : In the past decade, organizations have restructured and become global. The globalization and restructuring have led to a need for distributed and flexible information access. That traditional computing paradigms like host-based and master/slave processing do not adequately address the information requirements of modern business. That the client/server model provides an architecture that harness contemporary technology to meet the computing needs of the modern business organization. Also called Distributed Application Processing or Co-Operative Application Processing

Host Based Processing - The Centralized Paradigm : Centralized computing treated applications as an integrated unit. Applications ran on a single processor. The same processor generated the user interface and manipulated data. Dumb terminals were used for data access and display. Disadvantages are : Data was centralized and often not accessible to those who needed it. The host computed did all the work, was frequently overloaded, and response times were often poor. Hardware choices and scalability were limited by proprietary architecture's. User interfaces were unfriendly, Data access was inflexible and governed by available 3GL programs.

Master/Slave - The First Distributed Computing Paradigm : As PC's and intelligent terminals became available, a limited amount of processing was transferred to the terminal. Intelligent terminals often validated data and provided editing functions. Such terminal were called Slaves. Slaves were controlled by Master computer which did the main processing and stored all data. It had one distinct advantage. It reduced the load on the main processor. However, the other problems associated with host-based processing remained.

Client/Server - A Distributed Computing Paradigm : The client/server paradigm evolved as an attempt to address the new computing needs of business and utilize new technologies.

Advantages : Client/Server makes it possible to harness the computing power available on PCs and other workstations in an organization. Response times are improved as processors are not overloaded. Hardware and software choices can be application oriented as they do not necessarily have to run on a proprietary computer. Network traffic is reduced because the processing power available at the end user terminal makes it unnecessary to send detailed instructions over the network as in the case of host based & master/slave systems. Computing power can be added in smaller units making the system easily scaleable.
Disadvantages :The inherent heterogeneity makes System Integration, Administration & Maintenance a formidable challenge.

ORACLE 7 is an exceptional RDBMS & also an excellent DB server because it supports all major OS
for both clients & servers i.e. MSDOS, Netware, Unixware, OS/2 etc. Oracle Network software SQL*NET support all major network communication protocols TCP/IP, SPX/IPS, Named Pipes & DECNET. It has got client server feature that developers can use to minimize network traffic between clients & servers. Has features that makes it easy to administer a complicated client server system.

Client/Server Approach : The client/server paradigm optimizes the use of computer resources by distributing application processing among multiple processors. The client/server model computers are classified as Clients and Servers, where Clients are requesters of services and Servers are provider of services. Typically, clients handle user interface function & server handle data management function.

Client/Server Architecture requires Processing to be distributed over more than one computing system. Client-initiated client/server interactions. Server control over the services that can be requested by the client. Servers to arbitrate between conflicting client requests. A communication system that enables clients & servers to communicate.

Multitasking can be defines as the capability of an OS to run multiple applications concurrently. A Multitasking OS allocates a certain amount of CPU time to each task. In Preemptive multitasking, the OS controls the amount of CPU time allocated to a task. In non-preemptive multitasking, the application controls the CPU time and the CPU is released only after the task is completed.

Multithreading is an extension of multitasking. It allows multitasking within an application. Programs and subroutines within a program can execute concurrently in a multithreaded environment. Several user process for a single server process.

Database Server should have preemptive multitasking & multithreading capability. Support a standard RDBMS. Support a standard Network Operating System.

Tools such as RDBMS, Application Software, Application Program Interfaces, Stored Procedures, Remote Procedure Call (RPC) & Application Development Tools are an important part of client/server systems. Such tools improve productivity and also play a role in making client/server systems more open. An API is a set of functions that translates client requests into a format that the server can understand. RPC is essentially a messaging system which allows stored procedures to be invoked. Many RPCs allows procedures to be invoked across heterogeneous platforms and also provide the required data translation services.
There are several application designs possible in the client/server model depending on how application processing is distributed.

In Distributed Presentation, the presentation function is divided between the client & the server. Useful in situations where PC or workstations are connected to mainframes. Used to enhance the user interface of mainframe based applications.

In Remote Presentation, the entire presentation part of the application resides on a different computer than the one that has the logic function. Used in applications where user interaction is completely static and predetermined.

In Distributed Logic, the logic function is placed on more than one platform thus, improving response time by allowing the logic to execute simultaneously on several processors.

In Remote Data Management, the application logic resides on different computer than the one that has the data and the DBMS. Easy to implement and often provide end-users with totally transparent access to data.
In Distributed Data Management, the data and DBMS is distributed among multiple nodes and distribution of application logic.

Goals of Client/Server Paradigm is the end-user. Client/Server seeks to provide end-user transparent access to the computing resources of the organization. The goal is referred to as single system image. There are four attributes of single system image :

Location Transparency : Users must be able to access data without knowing the location of the data. Users should not have to learn & use different commands for accessing data from different locations.
Interoperability requires that applications and processing tasks be freely portable across heterogeneous computing resources.

Consistent User Interfaces require that applications retain the same user interface across heterogeneous computing platforms. Common computing tasks are represented consistently across applications.
Enterprise-wide Resource Sharing is the common thread that links all of an enterprise's computing resource

II. Concepts for Client/Server :

Introduction : RDBMS's standardize data storage and access and are therefore ideal for implementing client/server systems. The physical structure of a network is called Network Topology i.e. refers to the way the cabling of a network is physically structured. The rules that govern the process of data transmission in a network are collectively referred to as Network Protocol. Graphical User Interfaces improve productivity because they reduce learning time
and are easier to use.

Distributed Database Support : The capability of an RDBMS to manage databases lying at more than one location. To provide distributed database support, an RDBMS must be able to provide transparent access to data, Join tables on different platforms, Handle and manage distributed queries and Ensure that transactions are successfully completed on all relevant databases.

Network : Network has six basic functions, Naming, Segmentation, Segmenting, Flow Control, Synchronization, Priority & Error Control. There are three types of network topologies, Bus, Star & Ring. There are three types of transmission media, Twisted Pair, Coaxial Cable & Optic Fibre. There are three types of data transmission methods, Centralized, Distributed & Random.

GUIs : GUI must support mouse events, keyboard events, menu events, resizing events,
activation/deactivation events and initialize/terminate events. GUI should be portable, should support wide variety of development tools & be an industry standard.
III. Client/Server Software :

Back-end Software : It is made up of Database Servers & Network Operating Systems. The database server manages data on the system & maintains data integrity. Database server requires some special features. Compatibility i.e. must be able to work on different operating systems. SQL Implementation i.e. must support standard ANSI SQL since it can communicate with different SQL dialects. Stored Procedures i.e. must be able to use SP as they are analyzed, compiled and optimized. Referential Integrity i.e. allows the server to synchronize change to columns that are part of multiple tables. Declarative RI establishes precise rules for use of columns that are common across tables. Built into db software and enforced by default. In Procedural RI, each db command is associated with a trigger. When command is issued, the trigger sets of a series of commands that make the necessary changes. Disadvantage is programmers must write triggers leading to errors. Multithreading i.e. support execution of multiple tasks simultaneously. Distributed Database Support i.e. able to divide database tasks among multiple CPUs. Also, able to join tables located on different servers, & manage SQL queries sent to different servers. Concurrency Control i.e. support automatic escalation, the server locks a record, if a single record is being modified. A page, if several records are being modified. A table, if several pages are being modified. Transaction Control i.e. protect transactions from system failures. In two phase commit, all workstations are notified of database changes, and if system fails, all sites rollback.
NOS controls the transmission of data across the network and manages the sharing of hardware and software resource on the network. Four important features are Operating Environment Support, Workstation Support, Security & Protocol Support.

Development Tools are made up of SQL/3GL programming tools & Front-end development tools.
End User Tools are made up of Decision Support Tools and PC-based add-ons.

IV. Migrating to Client/Server :

Evaluating the Client/Server Option : To evaluate the following before deciding on client/server. Application requirements, Geographical requirements & Productivity Gains. Client/Server is most suitable for applications are decision support or on-line transaction processing (OLTP). The distances are site or city & both developer & end-user productivity are expected to increase.

Planning for Migration : Planning is needed to reduce problems with network load, training & systems maintenance. Migration plan must include analysis, selection, prototyping & implementation as activities. Planning must include end-users, developers and system administrators as resources. To specify system requirements for end-users, developers, system managers and the business as a whole. To evaluate business priorities to eliminate stagnation & disruption.

Implementing Client/Server : The Systems Integration Life Cycle (SILC) is made up of preparation, detailing and execution. Managing SILC projects involves planning, controlling & completing. System maintenance involves ensures reliability, ensuring serviceability and monitoring performance. The training must cover end-users, developers & system administrators.

No comments:

Post a Comment