BC0050

download BC0050

of 3

Transcript of BC0050

  • 8/13/2019 BC0050

    1/3

    Sikkim Manipal University Page No. 1

    What are the uses of Distributed Databases?

    There are several reasons why distributed databases are developed. The following is a list of the

    main motivations.

    Organizational and economic reasons

    Usage and interconnection of existing databases Incremental growth of an organization

    Reduced communication overhead

    Performance aspects

    Increased reliability and availability

    Organizational and economic reasons:Many organizations are decentral- ized, and a distributed

    database approach fits more naturally the structure of the organization. With the recent

    developments in computer technology, the economy-of-scale motivation for having large,

    centralized computer centers is becoming questionable. The organizational and economic

    motivations are probably the most important reason for developing distributed databases.

    Interconnection of existing databases:Distributed databases are the nat- ural solution when several

    databases already exist in an organization and the necessity of performing global applications arises.

    In this case, the distributed database is created bottom-up from the preexisting local databases. This

    process may require a certain degree of local restructuring; however, the effort which is required by

    this restructuring is much less than that needed for the creation of a completely new centralized

    database.

    Incremental growth:If an organization grows by adding new, relatively autonomous organizational

    units (new branches, new warehouses, etc.), then the distributed database approach supports a

    smooth incremental growth with a minimum degree of impact on the already existing units.

    Reduced communication overhead: In a geographically distributed data- base like the database ofExample 1.1, the fact that many applications are local clearly reduces the communication overhead

    with respect to a centralized database. Therefore, the maximization of the locality of applications is

    one of the primary objectives in distributed database design.

    Performance considerations:The existence of several autonomous processors results in the

    increase of performance through a high degree of parallelism. This consideration can be applied to

    any multiprocessor system, and not only to distributed databases. However, distributed databases

    have the advantage in that the decomposition of data reflects application dependent criteria which

    maximize application locality; in this way the mutual interference between different processors is

    minimized.

    Reliability and availability:The distributed database approach, especially with redundant data, can

    be used also in order to obtain higher reliability and availability. However, obtaining this goal is not

    straightforward and requires the use of techniques which are still not completely understood. The

    autonomous processing capability of the different sites does not by itself guarantee a higher overall

    reliability of the system, but it ensures a graceful degradation property; in other words, failures in a

    distributed database can be more frequent than in a centralized one because of the greater number

    of components, but the effect of each failure is confined to those applications which use the data of

    the failed site, and complete system crash is rare.

  • 8/13/2019 BC0050

    2/3

    Sikkim Manipal University Page No. 2

    Explain any three characteristics of Query processor.

    Characterization of Query Processors

    It is very difficult to give the characteristics, which differentiates centralized and distributed query

    processors. Still some of them have been listed here. Out of them, the first four are common to

    both and the next four are particular to distributed query processors.1. Languages:The input language to the query processor can be based on relational calculus or

    relational algebra. In distributed context, the output language is generally some form of

    relational algebra augmented with communication primitives.

    2. Types of Optimization:Conceptually, query optimization is to choose a best point of

    solution space that leads to the minimum cost. A popular approach called exhaustive search

    is used. This is a method where heuristic techniques are used. In both centralized and

    distributed systems a common heuristic is to minimize the size of intermediate relations.

    Performing unary operations first and ordering the binary operations by the increasing size

    of their intermediate relations can do this.

    3. Optimization Timing:A query may be optimized at different times relative to the actual time

    of query execution. Optimization can be done statically before executing the query ordynamically as the query is executed. The main advantage of the later method is that the

    actual sizes of the intermediate relations are available to the query processor, thereby

    minimizing the probability of a bad choice.

    4. Statistics:The effectiveness of the query optimization is based on statistics on the database.

    Dynamic query optimization requires statistics in order to choose the operation that has to

    be done first. Static query optimization requires statistics to estimate the size of

    intermediate relations. The accuracy of the statistics can be improved by periodical

    updating.

    5. Decision Sites:Most of the systems use centralized decision approach, in which a single site

    generates the strategy. However, the decision process could be distributed among various

    sites participating in the elaboration of the best strategy. The centralized approach is simplerbut requires the knowledge of the complete distributed database where as the distributed

    approach requires only local information.

    6. Exploitation of the Network Topology:the distributed query processor exploits the network

    topology. This issue reduces the work of distributed query optimization, which can be

    dealt as two separate problems:

    Selection of the global execution strategy, based on the inter-site communication and selection of

    each local execution strategy, based on a centralized query processing algorithms. With local area

    networks, communication costs are comparable to I/O costs.

    1. Exploitation of Replicated Fragments:For reliability purposes it is useful to have fragments

    replicated at different sites. Query processors have to exploit this information either

    statically or dynamically for processing the query efficiently.

    2. Use of Semi-Joins:The semi-join operation reduces the size of the data that are exchanged

    between the sites so that the communication cost can be reduced.

    Explain the properties of Transaction?

  • 8/13/2019 BC0050

    3/3

    Sikkim Manipal University Page No. 3

    The Transaction is an application or part of application that is characterized by the following

    properties.

    1. Atomicity: Either all or none of the transactions operations are performed. It requires that if

    a transaction is interrupted by a failure its partial results are not at all taken into

    consideration and the whole operation has to be repeated. The two types of problems that

    do not allow the transaction to complete are: Transaction aborts: This may be requested by the transaction itself as some of its inputs are

    wrong or it has been estimated that the results produced may become useless. It also may

    be forced by the system for its own reason. The activity of ensuring atomicity in the

    presence of Transaction aborts is called Transaction recovery.

    System Crashes: It is because of some catastrophic effects that crash the system without any

    prior knowledge. The activity of ensuring atomicity in the presence of system crashes is

    called crash recovery.

    The completion of transaction is called Commit. The primitives that can be used for carrying out the

    transaction are:

    Begin _Transaction Begin _Transaction Begin _Transaction

    Commit Abort X System

    Forces Abort

    2. Durability:Once a transaction is committed, the system must guarantee that the results of

    operations will never be lost, independent of subsequent failures. The activity of providing

    Durability of the transaction is called Database recovery.

    3. Serializability: If many transactions execute concurrently, the result must be same as if they

    were executed serially in the same order. The activity of providing Serializability of the

    transaction is called Concurrency control.

    4. Isolation:This property states that an incomplete transaction cannot disclose its result to

    other transactions until it is committed. This property has to be strictly followed to avoid a

    problem called Cascading Aborts (Domino Effect). According to this all the transactions that

    have observed the partial results have to be aborted.