BC0050

8/13/2019 BC0050

1/3

Sikkim Manipal University Page No. 1

What are the uses of Distributed Databases?

There are several reasons why distributed databases are developed. The following is a list of the

main motivations.

Organizational and economic reasons

Usage and interconnection of existing databases Incremental growth of an organization

Reduced communication overhead

Performance aspects

Increased reliability and availability

Organizational and economic reasons:Many organizations are decentral- ized, and a distributed

database approach fits more naturally the structure of the organization. With the recent

developments in computer technology, the economy-of-scale motivation for having large,

centralized computer centers is becoming questionable. The organizational and economic

motivations are probably the most important reason for developing distributed databases.

Interconnection of existing databases:Distributed databases are the nat- ural solution when several

databases already exist in an organization and the necessity of performing global applications arises.

In this case, the distributed database is created bottom-up from the preexisting local databases. This

process may require a certain degree of local restructuring; however, the effort which is required by

this restructuring is much less than that needed for the creation of a completely new centralized

database.

Incremental growth:If an organization grows by adding new, relatively autonomous organizational

units (new branches, new warehouses, etc.), then the distributed database approach supports a

smooth incremental growth with a minimum degree of impact on the already existing units.

Reduced communication overhead: In a geographically distributed database like the database ofExample 1.1, the fact that many applications are local clearly reduces the communication overhead

with respect to a centralized database. Therefore, the maximization of the locality of applications is

one of the primary objectives in distributed database design.

Performance considerations:The existence of several autonomous processors results in the

increase of performance through a high degree of parallelism. This consideration can be applied to

any multiprocessor system, and not only to distributed databases. However, distributed databases

have the advantage in that the decomposition of data reflects application dependent criteria which

maximize application locality; in this way the mutual interference between different processors is

minimized.

Reliability and availability:The distributed database approach, especially with redundant data, can

be used also in order to obtain higher reliability and availability. However, obtaining this goal is not

straightforward and requires the use of techniques which are still not completely understood. The

autonomous processing capability of the different sites does not by itself guarantee a higher overall

reliability of the system, but it ensures a graceful degradation property; in other words, failures in a

distributed database can be more frequent than in a centralized one because of the greater number

of components, but the effect of each failure is confined to those applications which use the data of

the failed site, and complete system crash is rare.

8/13/2019 BC0050

2/3


Explain any three characteristics of Query processor.

Characterization of Query Processors

It is very difficult to give the characteristics, which differentiates centralized and distributed query

processors. Still some of them have been listed here. Out of them, the first four are common to

both and the next four are particular to distributed query processors.1. Languages:The input language to the query processor can be based on relational calculus or

relational algebra. In distributed context, the output language is generally some form of

relational algebra augmented with communication primitives.

2. Types of Optimization:Conceptually, query optimization is to choose a best point of

solution space that leads to the minimum cost. A popular approach called exhaustive search

is used. This is a method where heuristic techniques are used. In both centralized and

distributed systems a common heuristic is to minimize the size of intermediate relations.

Performing unary operations first and ordering the binary operations by the increasing size

of their intermediate relations can do this.

3. Optimization Timing:A query may be optimized at different times relative to the actual time

of query execution. Optimization can be done statically before executing the query ordynamically as the query is executed. The main advantage of the later method is that the

actual sizes of the intermediate relations are available to the query processor, thereby

minimizing the probability of a bad choice.

4. Statistics:The effectiveness of the query optimization is based on statistics on the database.

Dynamic query optimization requires statistics in order to choose the operation that has to

be done first. Static query optimization requires statistics to estimate the size of

intermediate relations. The accuracy of the statistics can be improved by periodical

updating.

5. Decision Sites:Most of the systems use centralized decision approach, in which a single site

generates the strategy. However, the decision process could be distributed among various

sites participating in the elaboration of the best strategy. The centralized approach is simplerbut requires the knowledge of the complete distributed database where as the distributed

approach requires only local information.

6. Exploitation of the Network Topology:the distributed query processor exploits the network

topology. This issue reduces the work of distributed query optimization, which can be

dealt as two separate problems:

Selection of the global execution strategy, based on the inter-site communication and selection of

each local execution strategy, based on a centralized query processing algorithms. With local area

networks, communication costs are comparable to I/O costs.

1. Exploitation of Replicated Fragments:For reliability purposes it is useful to have fragments

replicated at different sites. Query processors have to exploit this information either

statically or dynamically for processing the query efficiently.

2. Use of Semi-Joins:The semi-join operation reduces the size of the data that are exchanged

between the sites so that the communication cost can be reduced.

Explain the properties of Transaction?

8/13/2019 BC0050

3/3


The Transaction is an application or part of application that is characterized by the following

properties.

1. Atomicity: Either all or none of the transactions operations are performed. It requires that if

a transaction is interrupted by a failure its partial results are not at all taken into

consideration and the whole operation has to be repeated. The two types of problems that

do not allow the transaction to complete are: Transaction aborts: This may be requested by the transaction itself as some of its inputs are

wrong or it has been estimated that the results produced may become useless. It also may

be forced by the system for its own reason. The activity of ensuring atomicity in the

presence of Transaction aborts is called Transaction recovery.

System Crashes: It is because of some catastrophic effects that crash the system without any

prior knowledge. The activity of ensuring atomicity in the presence of system crashes is

called crash recovery.

The completion of transaction is called Commit. The primitives that can be used for carrying out the

transaction are:

Begin _Transaction Begin _Transaction Begin _Transaction

Commit Abort X System

Forces Abort

2. Durability:Once a transaction is committed, the system must guarantee that the results of

operations will never be lost, independent of subsequent failures. The activity of providing

Durability of the transaction is called Database recovery.

3. Serializability: If many transactions execute concurrently, the result must be same as if they

were executed serially in the same order. The activity of providing Serializability of the

transaction is called Concurrency control.

4. Isolation:This property states that an incomplete transaction cannot disclose its result to

other transactions until it is committed. This property has to be strictly followed to avoid a

problem called Cascading Aborts (Domino Effect). According to this all the transactions that

have observed the partial results have to be aborted.

BC0050

Documents

Transcript of BC0050