UNIT: 3

DATA STORAGE AND QUERY PROCESSING

1. What is an index?

An index is a structure that helps to locate desired records of a relation quickly, without examining all records

.

2. Define query optimization.

Query optimization refers to the process of finding the lowest –cost method of evaluating a given query.

3. What are called jukebox systems?

Jukebox systems contain a few drives and numerous disks that can be loaded into one of the drives automatically.

4. What are the types of storage devices?

Primary storage

Secondary storage

Tertiary storage

Volatile storage

Nonvolatile storage

5. What is called remapping of bad sectors?

If the controller detects that a sector is damaged when the disk is initially formatted, or when an attempt is made to write the sector, it can logically map the sector to a different physical location.

6. Define access time.

Access time is the time from when a read or write request is issued to when data transfer begins.

7. Define seek time.

The time for repositioning the arm is called the seek time and it increases with the distance that the arm is called the seek time.

8. Define average seek time.

The average seek time is the average of the seek times, measured over a sequence of random requests.

9. Define rotational latency time.

The time spent waiting for the sector to be accessed to appear under the head is called the rotational latency time.

10. Define average latency time.

The average latency time of the disk is one-half the time for a full rotation of the disk.

11. What is meant by data-transfer rate?

The data-transfer rate is the rate at which data can be retrieved from or stored to the disk.

12. What is meant by mean time to failure?

The mean time to failure is the amount of time that the system could run continuously without failure.

13. What is a block and a block number?

A block is a contiguous sequence of sectors from a single track of one platter. Each request specifies the address on the disk to be referenced. That address is in the form of a block number.

14. What are called journaling file systems?

File systems that support log disks are called journaling file systems.

15. What is the use of RAID?

A variety of disk-organization techniques, collectively called redundant arrays of independent disks are used to improve the performance and reliability.

16. What is called mirroring?

The simplest approach to introducing redundancy is to duplicate every disk. This technique is called mirroring or shadowing.

17. What is called mean time to repair?

The mean time to failure is the time it takes to replace a failed disk and to restore the data on it.

18. What is called bit-level striping?

Data striping consists of splitting the bits of each byte across multiple disks. This is called bit-level striping.

19. What is called block-level striping?

Block level striping stripes blocks across multiple disks. It treats the array of disks as a large disk, and gives blocks logical numbers.

20. What are the two main goals of parallelism?

Load –balance multiple small accesses, so that the throughput of such accesses increases. . Parallelize large accesses so that the response time of large accesses is reduced

21. What are the factors to be taken into account when choosing a RAID level?

o   Monetary cost of extra disk storage requirements.

o   Performance requirements in terms of number of I/O operations

o   Performance when a disk has failed.

o   Performances during rebuild.

22. What is meant by software and hardware RAID systems?

RAID can be implemented with no change at the hardware level, using only software modification. Such RAID implementations are called software RAID systems and the systems with special hardware support are called hardware RAID systems.

23. Define hot swapping?

Hot swapping permits the removal of faulty disks and replaces it by new ones without turning power off. Hot swapping reduces the mean time to repair.

24. What are the ways in which the variable-length records arise in database systems?

Storage of multiple record types in a file.

Record types that allow variable lengths for one or more fields.

Record types that allow repeating fields.

25. What is the use of a slotted-page structure and what is the information present in the header?

The slotted-page structure is used for organizing records within a single block.

The header contains the following information.

The number of record entries in the header.

The end of free space

An array whose entries contain the location and size of each record.

26. What are the two types of blocks in the fixed –length representation? Define them.

·        Anchor block: Contains the first record of a chain.

·        Overflow block: Contains the records other than those that are the first record of a chain.

27. What is known as heap file organization?

In the heap file organization, any record can be placed anywhere in the file where

there is space for the record. There is no ordering of records. There is a single file for each relation.

28. What is known as sequential file organization?

In the sequential file organization, the records are stored in sequential order, according to the value of a “search key” of each record.

29. What is hashing file organization?

In the hashing file organization, a hash function is computed on some attribute of each record. The result of the hash function specifies in which block of the file the record should be placed.

30. What is known as clustering file organization?

In the clustering file organization, records of several different relations are stored in the same file.

31. What are the types of indices?

Ordered indices

Hash indices

32. What are the techniques to be evaluated for both ordered indexing and hashing?

Access types

Access time

Insertion time

Deletion time

Space overhead

33. What is known as a search key?

An attribute or set of attributes used to look up records in a file is called a search key.

34. What is a primary index?

A primary index is an index whose search key also defines the sequential order of the file.

35. What are called index-sequential files?

The files that are ordered sequentially with a primary index on the search key, are called index-sequential files.

36. What are the two types of indices?

Dense index

Sparse index

37. What are called multilevel indices?

Indices with two or more levels are called multilevel indices.

38. What is B-Tree?

A B-tree eliminates the redundant storage of search-key values .It allows search key values to appear only once.

39. What is a B+-Tree index?

A B+-Tree index takes the form of a balanced tree in which every path from the root of the root of the root of the tree to a leaf of the tree is of the same length.

40. What is a hash index?

A hash index organizes the search keys, with their associated pointers, into a hash file structure.

41. What is called query processing?

Query processing refers to the range of activities involved in extracting data from a database.

42. What are the steps involved in query processing?

The basic steps are:

parsing and translation

optimization

evaluation

43. What is called an evaluation primitive?

A relational algebra operation annotated with instructions on how to evaluate is called an evaluation primitive.

44. What is called a query evaluation plan?

A sequence of primitive operations that can be used to evaluate ba query is a query evaluation plan or a query execution plan.

45. What is called a query –execution engine?

The query execution engine takes a query evaluation plan, executes that plan, and returns the answers to the query.

46. What are called as index scans?

Search algorithms that use an index are referred to as index scans.

47. What is called as external sorting?

Sorting of relations that do not fit into memory is called as external sorting.

48. What is called as recursive partitioning?

The system repeats the splitting of the input until each partition of the build input fits in the memory. Such partitioning is called recursive partitioning.

49. What is called as an N-way merge?

The merge operation is a generalization of the two-way merge used by the standard in-memory sort-merge algorithm. It merges N runs, so it is called an N-way merge.

50. What is known as fudge factor?

The number of partitions is increased by a small value called the fudge factor,

which is usually 20 percent of the number of hash partitions computed.

TRANSACTION PROCESSING

1. What is transaction?

Collections of operations that form a single logical unit of work are called transactions.

2. What are the two statements regarding transaction?

The two statements regarding transaction of the form:

Begin transaction

End transaction

3. What are the properties of transaction?

The properties of transactions are:

Atomicity

Consistency

Isolation

Durability

4. What is recovery management component?

Ensuring durability is the responsibility of a software component of the base system called the recovery management component.

5. When is a transaction rolled back?

Any changes that the aborted transaction made to the database must be undone. Once the changes caused by an aborted transaction have been undone, then the transaction has been rolled back.

6. What are the states of transaction?

The states of transaction are

Active

Partially committed

Failed

Aborted

Committed

Terminated

7. What is a shadow copy scheme?

It is simple, but efficient, scheme called the shadow copy schemes. It is based on making copies of the database called shadow copies that one transaction is active at a time. The scheme also assumes that the database is simply a file on disk.

8. Give the reasons for allowing concurrency?

The reasons for allowing concurrency is if the transactions run serially, a short transaction may have to wait for a preceding long transaction to complete, which can lead to unpredictable delays in running a transaction. So concurrent execution reduces the unpredictable delays in running transactions.

9. What is average response time?

The average response time is that the average time for a transaction to be completed after it has been submitted.

10. What are the two types of serializability?

The two types of serializability is

Conflict serializability

View serializability

11. Define lock?

Lock is the most common used to implement the requirement is to allow a transaction to access a data item only if it is currently holding a lock on that item.

12. What are the different modes of lock?

The modes of lock are:

Shared

Exclusive

13. Define deadlock?

Neither of the transaction can ever proceed with its normal execution. This situation is called deadlock.

14. Define the phases of two phase locking protocol

Growing phase: a transaction may obtain locks but not release any lock.

Shrinking phase: a transaction may release locks but may not obtain any new locks.

15. Define upgrade and downgrade?

It provides a mechanism for conversion from shared lock to exclusive lock is known as upgrade.

It provides a mechanism for conversion from exclusive lock to shared lock is known as downgrade.

16. What is a database graph?

The partial ordering implies that the set D may now be viewed as a directed acyclic graph, called a database graph.

17. What are the two methods for dealing deadlock problem?

The two methods for dealing deadlock problem is deadlock detection and deadlock recovery.

18. What is a recovery scheme?

An integral part of a database system is a recovery scheme that can restore the database to the consistent state that existed before the failure.

19. What are the two types of errors?

The two types of errors are:

Logical error

System error

20. What are the storage types?

The storage types are:

Volatile storage

Nonvolatile storage

21. Define blocks?

The database system resides permanently on nonvolatile storage, and is partitioned into fixed-length storage units called blocks.

22. What is meant by Physical blocks?

The input and output operations are done in block units. The blocks residing on the disk are referred to as physical blocks.

23. What is meant by buffer blocks?

The blocks residing temporarily in main memory are referred to as buffer blocks.

24. What is meant by disk buffer?

The area of memory where blocks reside temporarily is called the disk buffer.

25. What is meant by log-based recovery?

The most widely used structures for recording database modifications is the log. The log is a sequence of log records, recording all the update activities in the database. There are several types of log records.

26. What are uncommitted modifications?

The immediate-modification technique allows database modifications to be output to the database while the transaction is still in the active state. Data modifications written by active transactions are called uncommitted modifications.

27. Define shadow paging.

An alternative to log-based crash recovery technique is shadow paging. This technique needs fewer disk accesses than do the log-based methods.

28. Define page.

The database is partitioned into some number of fixed-length blocks, which are

referred to as pages.

29. Explain current page table and shadow page table.

The key idea behind the shadow paging technique is to maintain two page tables

during the life of the transaction: the current page table and the shadow page table. Both the page tables are identical when the transaction starts. The current page table may be changed when a transaction performs a write operation.

30. What are the drawbacks of shadow-paging technique?

Commit Overhead

Data fragmentation

Garbage collection

30. Define garbage collection.

Garbage may be created also as a side effect of crashes. Periodically, it is

necessary to find all the garbage pages and to add them to the list of free pages. This process is called garbage collection.

32. Differentiate strict two phase locking protocol and rigorous two phase locking protocol.

In strict two phase locking protocol all exclusive mode locks taken by a transaction is held until that transaction commits.

Rigorous two phase locking protocol requires that all locks be held until the transaction commits.

33. How the time stamps are implemented

o   Use the value of the system clock as the time stamp. That is a transaction’s time stamp is equal to the value of the clock when the transaction enters the system.

o   Use a logical counter that is incremented after a new timestamp has been assigned; that is the time stamp is equal to the value of the counter.

34. What are the time stamps associated with each data item?

o   W-timestamp (Q) denotes the largest time stamp if any transaction that executed WRITE (Q) successfully.

o   R-timestamp (Q) denotes the largest time stamp if any transaction that executed READ (Q) successfully.

CURRENT TRENDS

1. What is meant by object-oriented data model?

The object-oriented paradigm is based on encapsulation of data and code related to an object in to a single unit, whose contents are not visible to the outside world.

2. What is the major advantage of object-oriented programming paradigm?

The ability to modify the definition of an object without affecting the rest of the system is the major advantage of object-oriented programming paradigm.

3. What are the methods used in object-oriented programming paradigm?

*read-only

*update

4. What is the main difference between read-only and update methods?

A read-only method does not affect the values of a variable in an object, whereas an update method may change the values of the variables.

5. What is the use of keyword ISA?

The use of keyword ISA is to indicate that a class is a specialization of another class.

6. Differentiate sub-class and super-class?

The specialization of a class is called subclasses.eg: employee is a subclass of person and teller is a subclass of employee.Conversely, employee is a super class of teller, and person is a super class of employee.

7. What is substitutability?

Any method of a class-say A can equally well be invoked with any object belonging to any subclasses B of A. This characteristic leads to code reuse, since the messages, methods, and functions do not have to be written again for objects of class B.

8. What is multiple inheritance?

Multiple inheritance permits a class to inherit variables and methods from multiple super classes.

9. What is DAG?

The class-subclass relationship is represented by a directed acyclic graph.eg: employees can be temporary or permanenet.we may create subclasses temporary and permanenet, of the class employee.

10. What is disadvantage of multiple inheritance?

There is potential ambiguity if the same variable or method can be inherited from more than one superclass.eg: student class may have a variable dept identifying a student's department, and the teacher class may orrespondingly have a variable dept identifying a teacher's department.

11. What is object identity?

An object retains its identity even if some or all the values of variables or definitions of methods change overtime.

12. What are the several forms of identity?

*Value

*Name

*Built-in

13. What is a value?

A data value is used for identity. This form of identity is used in relational systems.eg: The primary key value of a tuple identifies the tuple.

14. What is a Name?

A user-supplied name is used for identity. This form of identity is used for files in file systems. The user gives each file a name that uniquely identifies it, regardless of its contents.

15What is a Built-in

A notation of identity is built-into the data model or programming language and no user-supplied identifier is required. This form of identity is used in object- oriented systems.

16 What is meant by object identifiers?

Object-oriented systems use an object identifier to identify objects. Object identifiers are unique: that is each object has a single identifier, and no two objects have the same identifier.

17. What are composite objects?

Objects that contain other objects are called complex objects or composite objects.

18. What is object containment?

References between objects can be used to model different real-world concepts.

19. Why containment is important in oosystems?

Containment is an important concept in oosystems because it allows different users to view data at different granularities.

20. Define object-relational systems?

Systems that provide object-oriented extensions to relational systems are called object-relational systems.

21. How persistent programming languages differ from traditional programming languages?

Database languages differ from traditional programming languages in that they directly manipulate data that are persistent-that is, data that continue to exist even after the program terminated. Relation in a database and tuples in a relation are examples of persistent data. In contrast, the only persistent data that traditional programming languages directly manipulate are files.

22. Define atomic domains?

A domain is atomic if elements of the domain are considered to be indivisible units.

23. Define 1NF?

First normal form is one which requires that all attributes have atomic domains.

24. What is nested relational model?

The nested relational model is an extension of relational model in which domains may be either atomic or relation valued.

25. List some instances of collection types?

*sets

*arrays

*multisets

26. How to create values of structured type?

Constructor functions are used to create values of structured types. A function with the same name as a structured type is a constructor function for the structured type.

27. Write a query to define tables students and teachers as sub tables of people?

Create table students of student under people

Create table teachers of teacher under people

28. What is a homogeneous distributed database?

In homogeneous distributed databases, all sites have identical database management system software, are aware of one another, and agree to cooperate in processing user's requests.

29. What is a heterogeneous distributed database?

In a heterogeneous distributed database, different sites may use different schemas, and different dbms s/w.The sites may not be aware of one another, and they may provide only limited facilities for cooperation in transaction processing.

30. What are the two approaches to store relations in distributed database?

*Replication

*Fragmentation

31. What are the two different schemes for fragmenting a relation?

*horizontal

*vertical

32. What is horizontal fragmentation?

Horizontal fragmentation splits the relation by assuming each tuple of r to one or more fragments.

33. What is vertical fragmentation?

Vertical fragmentation splits the relation by decomposing the scheme R of relation r.

34. What are the various forms of data transparency?

*fragmentation transparency

*replication transparency

*location transparency

35. Define decision tree classifiers?

As the name suggests decision tree classifiers use a tree: each leaf node has an associated class, and each internal node has a predicate associated with it.

studymaterials

Monday, 26 August 2013

DBMS 2 MRKS

UNIT: 3

DATA STORAGE AND QUERY PROCESSING

CURRENT TRENDS

No comments:

Post a Comment