Database management system (DBMS) is a collection of interrelated data and a set of
programs to access those data.
a) Banking
b) Airlines
c) Universities
d) Credit card transactions
e) Tele communication
f) Finance
g) Sales
h) Manufacturing
i) Human resources
The disadvantages of file processing systems are
a) Data redundancy and inconsistency
b) Difficulty in accessing data
c) Data isolation
d) Integrity problems
e) Atomicity problems
f) Concurrent access anomalies
The advantages of using a DBMS are
a) Controlling redundancy
b) Restricting unauthorized access
c) Providing multiple user interfaces
d) Enforcing integrity constraints.
e) Providing back up and recovery
a) Physical level
b) logical level
c) view level
Instance: Collection of data stored in the data base at a particular moment is called an Instance of the database.
Schema: The overall design of the data base is called the data base schema.
Physical schema: The physical schema describes the database design at the physical level, which is the lowest level of abstraction describing how the data are actually stored.
Logical schema: The logical schema describes the database design at the logical level, which describes what data are stored in the database and what relationship exists among the data.
The schemas at the view level are called subschemas that describe different views
of the database.
A data model is a collection of conceptual tools for describing data, data relationships,
data semantics and consistency constraints.
A storage manager is a program module that provides the interface between the low level data stored in a database and the application programs and queries submitted to the system.
The storage manager components include
a) Authorization and integrity manager
b) Transaction manager
c) File manager
d) Buffer manager
The storage manager is responsible for the following
a) Interaction with he file manager
b) Translation of DML commands in to low level file system commands
c) Storing, retrieving and updating data in the database
The storage manager implements the following data structure
a) Data files
b) Data dictionary
c) indices
A data dictionary is a data structure which stores meta data about the structure of the database ie. the schema of the database.
The entity relationship model is a collection of basic objects called entities and relationship among those objects. An entity is a thing or object in the real world that is distinguishable from other objects.
An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each member of an entity set.
Example: possible attributes of customer entity are customer name, customer id,
customer street, customer city.
A relationship is an association among several entities.
Example: A depositor relationship associates a customer with each account that he/she has.
Entity set: The set of all entities of the same type is termed as an entity set.
Relationship set: The set of all relationships of the same type is termed as a
relationship set.
Composite attributes can be divided in to sub parts.
In some cases a particular entity may not have an applicable value for an attribute or if we do not know the value of an attribute for a particular entity. In these cases null value is used.
The degree of relationship type is the number of participating entity types.
Key attribute: An entity type usually has an attribute whose values are distinct from each individual entity in the collection. Such an attribute is called a key attribute.
Value set: Each simple attribute of an entity type is associated with a value set that specifies the set of values that may be assigned to that attribute for each individual entity.
Weak entity set: entity set that do not have key attribute of their own are called weak entity sets.
Strong entity set: Entity set that has a primary key is termed a strong entity set.
Mapping cardinalities or cardinality ratios express the number of entities to which another entity can be associated. Mapping cardinalities must be one of the following:
One to one
One to many
Many to one
Many to many
Total: The participation of an entity set E in a relationship set R is said to be total if every entity in E participates in at least one relationship in R.
Partial: if only some entities in E participate in relationships in R, the participation of entity set E in relationship R is said to be partial.
DDL: Data base schema is specified by a set of definitions expressed by a special language called a data definition language.
DML: A data manipulation language is a language that enables users to access or manipulate data as organized by the appropriate data model.
The relational model uses a collection of tables to represent both data and the relationships among those data. The relational model is an example of a record based model.
Attributes: column headers
Tuple: Row
Relation is a subset of a Cartesian product of list domains.
Tuple variable is a variable whose domain is the set of all tuples.
For each attribute there is a set of permitted values called the domain of that attribute.
Minimal super keys are called candidate keys.
Primary key is chosen by the database designer as the principal means of identifying an entity in the entity set.
A super key is a set of one or more attributes that collectively allows us to identify uniquely an entity in the entity set.
The relational algebra is a procedural query language. It consists of a set of operations that take one or two relation as input and produce a new relation as output.
The select operation selects tuples that satisfy a given predicate. We use the lowercase letter 𝞂 to denote selection.
The project operation is a unary operation that returns its argument relation with certain attributes left out. Projection is denoted by pi (π).
A functional dependency is a constraint between two sets of attributes from the data base. A functional dependency , denoted by
X Y
Between two sets of attributes X and Y that are subsets of R specifies a constraint on the possible tuples that can form a relation instance r of R.R={ A1,A2,…,An}.
Normalization of data is a process during which unsatisfactory relation schemas are decomposed by breaking up their attributes into smaller relation shemas that possess desirable properties.
1NF states that the domains of attributes must include only atomic values and that the value of any attribute in a tuple must be a single value from the domain of that attribute. It disallows multivalued attributes, composite attributes and their combinations.
A relation shema R is in 2NF if every nonprime attribute A in R is fully functionally dependent on the primary key. A functional dependency Xà Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more; that is, for any attribute AÎX, (X-{A}) X Y.
A relation shema R is in 3NF if it is in 2NF and no nonprime attribute of R is transitively dependent on the primary key. A funtional dependency X à Y in arelation shema R is a transitive dependency if ther is a set of attributes Z that is not a subset of any key of R, and both XàY and ZàY hold.
A relation schema R is in BCNF if whenever a functional dependency XàY holds in R,then X is a superkey of R.The only difference between BCNF and 3NF : the 3NF allows A to be a prime if X is not a superkey, is absent from BCNF.
Multivalued dependencies are a result of 1NF which disallowed an attribute in a tuple to have a set of values. A multivalued dependency X->->Y specified on relation schema R where X and Y are subsets of R specifies the following constraint on any relation r of R:
If two tuples t1 and t2 exist in r such that t1[x]=t2[x] then two tuples t3 and t4 should also exist in r with the following properties:
Lossless join property or nonadditive property ensures that no spurious tuples(tuples containing wrong information) are generated when a natural join operation is applied to the relations in the decomposition.
Boyce-Codd Normal form: It is stricter than 3NF, meaning that every relation in BCNF is also in 3NF; however a relation in 3NF is not necessarily in BCNF. A relation is in BCNF if and only if every determinant is a candidate key (i.e) a relatioln schema R is in BCNF if whenever a functional dependency X->A holds in R, then X is a superkey of R
A multivaluedd functional dependency x->>Y in R is called a trivial MVD if
e.g., the FD enameà> pname is trivial
Ename |
Pname |
|
|
|
|
A multivalued functional dependency xà> Y in R is called a no-trivial MVD if it does not satisfy the following:
Ename |
Eno |
Dob |
Dno |
Dname |
dmgrno |
dependencies.
Functional dependency of the form α ->β. is trivial if α C β. Trivial functional dependencies are satisfied by all the relations.
UNIT II
SQL & QUERY OPTIMIZATION
The SQL language has several parts:
SQL commands are divided in to the following categories:
1. Data - Definitition Language
2. Data Manipulation language
3. Data Query Language
4. Data Control Language
5. Data Administration Statements
6. Transaction Control Statements
SQL expression consists of three clauses:
Select A1, A2…………., An
From R1, R2……………, Rm
Where P
Rename operation is used to rename both relations and a attributes.It uses the as clause, taking the form:
Old-name as new-name
1) Pattern matching Operation
2) Concatenation
3) Extracting character strings
4) Converting between uppercase and lower case letters.
1) Union
2) Intersect operation
3) The except operation
Union: The result of this operation includes all tuples that are either in r1 or in r2 or in both r1 and r2.Duplicate tuples are automatically eliminated.
Intersection: The result of this relation includes all tuples that are in both r1 andr2.
SQL?
Aggregate functions are functions that take a collection of values as input and return a single value.
Aggregate functions supported by SQL are
Group by clause is used to apply aggregate functions to a set of tuples. The attributes given in the group by clause are used to form groups. Tuples with the same value on all attributes in the group by clause are placed in one group.
A sub query is a select-from-where expression that is nested with in another query. A common use of sub queries is to perform tests for set membership, make set comparisions, and determine set cardinality.
Any relation that is not part of the logical model, but is made visible to a user as a virtual relation is called a view.
create view command is
Create view v as <query expression>
The with clause provides a way of defining a temporary view whose definition is available only to the query in which the with clause occurs.
Transaction is a unit of program execution that accesses and possibly updated various data items.
SQL supports the following domain types.
1)Char(n) 2) varchar(n) 3) int 4) numeric(p,d) 5) float(n) 6) date.
Integrity constraints ensure that changes made to the database by authorized users do not result in a loss of data consistency. Thus integrity constraints guard against accidental damage to the database.
Triggers are statements that are executed automatically by the system as the side effect of a modification to the database.
A domain is a set of values that may be assigned to an attribute .all values that appear in a column of a relation must be taken from the same domain.
A value that appears in one relation for a given set of attributes also appears for a certain set of attributes in another relation.
An assertion is a predicate expressing a condition that we wish the database always to satisfy.
Create assertion <assertion name>check<predicate>
Triggers are useful mechanisms for alerting humans or for starting certain tasks automatically when certain conditions are met.
The requirements are
Database security refers to the protection from unauthorized access and malicious destruction or alteration.
Passing of authorization from one user to another can be represented by an authorization graph.
An audit trail is a log of all changes to the database along with information such as which user performed the change and when the change was performed.
application code.
declaratively in SQL makes it hard to ensure the absence of loopholes.
Improving of the strategy for processing a query is called “Query Optimization”. It is the responsibility of the system to transform the query as entered by the user into an equivalent query which can be computed more efficiently.
Query processing refers to the range of activities involved in extracting data froma database.
The basic steps are:
A relational algebra operation annotated with instructions on how to evaluate is called an evaluation primitive.
A sequence of primitive operations that can be used to evaluate a query is a query evaluation plan or a query execution plan.
The query execution engine takes a query evaluation plan, executes that plan, and returns the answers to the query.
UNIT III
TRANSACTION PROCESSING AND CONCURRENCY CONTROL
Collections of operations that form a single logical unit of work are called transactions.
The two statements regarding transaction of the form:
The properties of transactions are:
Ensuring durability is the responsibility of a software component of the base system called the recovery management component.
Any changes that the aborted transaction made to the database must be undone. Once the changes caused by an aborted transaction have been undone, then the transaction has been rolled back.
The states of transaction are
It is simple, but efficient, scheme called the shadow copy schemes. It is based on making copies of the database called shadow copies that one transaction is active at a time. The scheme also assumes that the database is simply a file on disk.
The reasons for allowing concurrency is if the transactions run serially, a short transaction may have to wait for a preceding long transaction to complete, which can lead to unpredictable delays in running a transaction. So concurrent execution reduces the unpredictable delays in running transactions.
The average response time is that the average time for a transaction to be completed after it has been submitted.
The two types of serializability is
Lock is the most common used to implement the requirement is to allow a transaction to access a data item only if it is currently holding a lock on that item.
The modes of lock are:
Neither of the transaction can ever proceed with its normal execution. This situation is called deadlock.
locks.
known as upgrade.
known as downgrade.
The partial ordering implies that the set D may now be viewed as a directed acyclic graph, called a database graph.
The two methods for dealing deadlock problem is deadlock detection and deadlock recovery.
An integral part of a database system is a recovery scheme that can restore the database to the consistent state that existed before the failure.
The two types of errors are:
The most widely used structures for recording database modifications is the log.The log is a sequence of log records, recording all the update activities in the database. There are several types of log records.
The immediate-modification technique allows database modifications to be output to the database while the transaction is still in the active state. Data modifications written by active transactions are called uncommitted modifications.
An alternative to log-based crash recovery technique is shadow paging. This technique needs fewer disk accesses than do the log-based methods.
The database is partitioned into some number of fixed-length blocks, which are referred to as pages.
protocol.
transaction commits.
UNIT IV
TRENDS IN DATABASE TECHNOLOGY
The storage types are:
The database system resides permanently on nonvolatile storage, and is into fixed-length storage units called blocks.
The input and output operations are done in block units. The blocks residing on the disk are referred to as physical blocks.
The blocks residing temporarily in main memory are referred to as buffer blocks.
The area of memory where blocks reside temporarily is called the disk buffer.
Garbage may be created also as a side effect of crashes. Periodically, it is necessary to find all the garbage pages and to add them to the list of free pages. This process is called garbage collection.
An index is a structure that helps to locate desired records of a relation quickly,without examining all records.
Query optimization refers to the process of finding the lowest –cost method of evaluating a given query.
If the controller detects that a sector is damaged when the disk is initially formatted, or when an attempt is made to write the sector, it can logically map the sector to a different physical location.
Access time is the time from when a read or write request is issued to when data transfer begins.
The time for repositioning the arm is called the seek time and it increases with the distance that the arm is called the seek time.
The average seek time is the average of the seek times, measured over a sequence of random requests.
The time spent waiting for the sector to be accessed to appear under the head is called the rotational latency time.
The average latency time of the disk is one-half the time for a full rotation of the disk.
The data-transfer rate is the rate at which data can be retrieved from or stored to the disk.
The mean time to failure is the amount of time that the system could run continuously without failure.
A block is a contiguous sequence of sectors from a single track of one platter.Each request specifies the address on the disk to be referenced. That address is in the form of a block number.
File systems that support log disks are called journaling file systems.
A variety of disk-organization techniques, collectively called redundant arrays of independent disks are used to improve the performance and reliability.
The simplest approach to introducing redundancy is to duplicate every disk. This technique is called mirroring or shadowing.
The mean time to failure is the time it takes to replace a failed disk and to restore the data on it.
Data striping consists of splitting the bits of each byte across multiple disks. This is called bit-level striping.
Block level striping stripes blocks across multiple disks. It treats the array of disks as a large disk, and gives blocks logical numbers
RAID can be implemented with no change at the hardware level, using only software modification. Such RAID implementations are called software RAID systems and the systems with special hardware support are called hardware RAID systems.
Hot swapping permits the removal of faulty disks and replaces it by new ones without turning power off. Hot swapping reduces the mean time to repair.
The slotted-page structure is used for organizing records within a single block.The header contains the following information.
Anchor block: Contains the first record of a chain.
Overflow block: Contains the records other than those that are the first record of a chain.
In the heap file organization, any record can be placed anywhere in the file where there is space for the record. There is no ordering of records. There is a single file for each relation.
In the sequential file organization, the records are stored in sequential order,according to the value of a “search key” of each record.
In the hashing file organization, a hash function is computed on some attribute of each record. The result of the hash function specifies in which block of the file the record should be placed.
In the clustering file organization, records of several different relations are stored in the same file.
An attribute or set of attributes used to look up records in a file is called a search key.
A primary index is an index whose search key also defines the sequential order of the file.
The files that are ordered sequentially with a primary index on the search key, are called index-sequential files.
Indices with two or more levels are called multilevel indices.
A B-tree eliminates the redundant storage of search-key values .It allows search key values to appear only once.
A B+-Tree index takes the form of a balanced tree in which every path from the root of the root of the root of the tree to a leaf of the tree is of the same length.
A hash index organizes the search keys, with their associated pointers, into a hash file structure.
Search algorithms that use an index are referred to as index scans.
Sorting of relations that do not fit into memory is called as external sorting.
The system repeats the splitting of the input until each partition of the build input fits in the memory. Such partitioning is called recursive partitioning.
The merge operation is a generalization of the two-way merge used by the standard in-memory sort-merge algorithm. It merges N runs, so it is called an N-way merge.
The number of partitions is increased by a small value called the fudge factor,which is usually 20 percent of the number of hash partitions computed.
UNIT V
ADVANCED TOPICS
Data mining is a process of extracting or mining knowledge from huge amount of data.
Statistics is used to
It is a two-step process. In the first step, a model is built describing a pre-determined set of data classes or concepts. The model is constructed by analyzing database tuples described by attributes. In the second step the model is used for classification.
Association rule finds interesting association or correlation relationships among a large set of data items, which is used for decision-making processes. Association rules analyzes buying patterns that are frequently associated or purchased together.
Association rule mining is a two-step process.
Data warehouse life cycle approach is essential because it ensures that the project pieces are brought together in the right order and at the right time.
ANAND INSTITUTE OF HIGHER TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CS6302/DATABASE MANAGEMENT SYSTEMS
PART-B
UNIT I
UNIT II
r1 has 20,000 tuple, r2 has 45,000 tuples, 25 tuples of r1 fit on one block and 30 tuples of r2 fit on the block. Estimate the number of block transfers and seeks required, using each of the following join stratesies for r1 * r2 :
employee(empname,street,city)
works(empname,companyname,salary)
company(companyname,city)
manages(empname,management)
Give an expression in the relational algebra for each request.
1) Find the names of all employees who work for First Bank Corporation.
2) Find the names, street addresses and cities of residence of all employees who work for First Bank Corporation and earn more than 200000 per annum.
3) Find the names of all employees in this database who live in the same city as the company for which they work.
4) Find the names of all employees who earn more than every employees of small Bank Corporation.
UNIT III
UNIT IV
UNIT V
Source: https://www.snscourseware.org/snsct/files/CW_58944fe4eb3c4/dbms-question-bank2-marks-16-marks.doc
Web site to visit: https://www.snscourseware.org
Author of the text: indicated on the source document of the above text
If you are the author of the text above and you not agree to share your knowledge for teaching, research, scholarship (for fair use as indicated in the United States copyrigh low) please send us an e-mail and we will remove your text quickly. Fair use is a limitation and exception to the exclusive right granted by copyright law to the author of a creative work. In United States copyright law, fair use is a doctrine that permits limited use of copyrighted material without acquiring permission from the rights holders. Examples of fair use include commentary, search engines, criticism, news reporting, research, teaching, library archiving and scholarship. It provides for the legal, unlicensed citation or incorporation of copyrighted material in another author's work under a four-factor balancing test. (source: http://en.wikipedia.org/wiki/Fair_use)
The information of medicine and health contained in the site are of a general nature and purpose which is purely informative and for this reason may not replace in any case, the council of a doctor or a qualified entity legally to the profession.
The texts are the property of their respective authors and we thank them for giving us the opportunity to share for free to students, teachers and users of the Web their texts will used only for illustrative educational and scientific purposes only.
All the information in our site are given for nonprofit educational purposes