DW Lecture 10

download DW Lecture 10

of 14

Transcript of DW Lecture 10

  • 7/29/2019 DW Lecture 10

    1/14

    Lecture 10

    Tue, April 14, 2009 1800 : 2100

    FAST NU, Karachi

  • 7/29/2019 DW Lecture 10

    2/14

    2

    Physical Database Design Logical database design

    What are the facts and dimensions?

    Goals: Simplicity, Expressiveness

    Make the database easy to understand

    Make queries easy to ask

    Physical database design

    How should the data be arranged on disk? Goal: Performance

    Manageability is an important secondary concern

    Make queries run fast

  • 7/29/2019 DW Lecture 10

    3/14

    3

    Load vs. Query Trade-off Trade-off between query performance and load

    performance

    To make queries run fast Precompute as much as possible

    Build lots of data structures

    Indexes

    Materialized views Cube structures (MOLAP/ROLAP)

  • 7/29/2019 DW Lecture 10

    4/14

    4

    A Lesson in Data Warehouse

    Evolution ROLAP MOLAP MATERAILIZED VIEWS

    AGGREGATE TABLES? Whats the difference?? While Star Schema Data Warehouses originally gained high

    performance from well-designed database indexes, bothMOLAP and ROLAP took the approach of aggregating datagrouped by dimensional hierarchies to speed up queries byorders of magnitude. When this came to the attention of thedatabase research community, it became clear that

    MOLAP/ROLAP efficiency, leaving specialized semanticsapart, could be traced to an approach of pre-computing queryresults, or in database terms materializing views, and atremendous outpouring of papers on materialized viewsfollowed

  • 7/29/2019 DW Lecture 10

    5/14

    5

    Materialized Views (MVs) View

    A derived relation (or a function) defined in terms of base (stored)relations

    Typically recomputed every time the view is referenced

    Materialized view A view can be materialized by storing the tuples of the view in the

    database Works like a cache

    Why use materialized views

    Provides fast access to data Critical in applications with high query rate and complex views

    not possible to re-compute the view for every query Many DBMSs support materialized views

    Goal: faster response for related queries

  • 7/29/2019 DW Lecture 10

    6/14

    6

    View Maintenance The process of updating a materialized view in

    response to changes to the underlying data

    MV gets dirty whenever the underlying base relations

    are modified Incremental View Maintenance

    It is wasteful to maintain a view by re-computing it fromscratch

    Often it is cheaper to use the heuristic of inertia (only a partof the view changes in response to changes in the baserelations)

    Compute only the changes in the view to update itsmaterialization

  • 7/29/2019 DW Lecture 10

    7/147

    Classification of the View

    Maintenance Problem Four dimensions along which the view maintenance

    problem can be studied

    Information Dimension

    Modification Dimension

    Language Dimension

    Instance Dimension

  • 7/29/2019 DW Lecture 10

    8/148

    Information Dimension Information Dimension

    The amount of information available for viewmaintenance

    Information may include base relations, materializedview itself and knowledge of constraints and keys

  • 7/29/2019 DW Lecture 10

    9/149

    Information Dimension (Example) Consider relation part (part_num; part_cost;

    contract); and

    View expensive_ parts (part_num) = part_numwhere part_cost > 1000

    Consider maintaining the view when a tuple p1 isinserted into relation part

    Different view maintenance algorithms can bedesigned depending upon the information available

    Consider following cases

  • 7/29/2019 DW Lecture 10

    10/1410

    Information Dimension (Example) CASE 1: The materialized view alone is available

    Use the old materialized view to determine if part_numalready is present in the view

    If so, there is no change to the materialization, else insert partp1 into the materialization

    CASE 2: The base relation part alone is available Use relation part to check if an existing tuple in the relation

    has the same part_num but greater or equal cost

    If such a tuple exists then the inserted tuple does notcontribute to the view

    CASE 3: It is known that part no is the key Infer that part_num cannot already be in the view, so it must

    be inserted

  • 7/29/2019 DW Lecture 10

    11/14

    11

    Modification Dimension Modification Dimension

    What modifications can the view maintenancealgorithm handle?

    Modifications may include insertion and deletion oftuples to base relations, direct updates, deletionsfollowed by insertions, changes to the view definitionetc.

  • 7/29/2019 DW Lecture 10

    12/14

    12

    Language Dimension Language Dimension

    Is the view expressed as relational algebra (SQL or asubset of SQL)

    Can it have duplicates? Can it use aggregation?

  • 7/29/2019 DW Lecture 10

    13/14

    13

    Instance Dimension Instance Dimension

    Database instance: Does the view maintenancealgorithm work for all instances of the database?

    Modification instance: Does it work for all instances ofthe modification?

  • 7/29/2019 DW Lecture 10

    14/14

    14

    Instance Dimension (Example) Extend the previous example with a new view

    View supplier_parts as the equijoin between relationssupplier (supplier_num; part_num; price) and part(part_num,)

    The view contains the distinct part numbers that are suppliedby at least one supplier

    Maintenance Use of a join makes it impossible to maintain supplier_parts

    in response to insertions to part when using only the view View supplier_ parts is maintainable if the view contains

    part_num p1 but not otherwise Maintainability of a view depends also on the particular

    instances