DW Lecture 05

download DW Lecture 05

of 22

Transcript of DW Lecture 05

  • 7/29/2019 DW Lecture 05

    1/22

    Lecture 05

    Tue, Feb 17, 2009 1800 : 2100

    FAST NU, Karachi

  • 7/29/2019 DW Lecture 05

    2/22

    2

    Architectural Components 3 Major Areas

    Data Acquisition Extraction, Transformation, Cleansing, Integration, Staging

    Data Storage Loading, Archiving, Management

    Information Delivery Reports, Query Processing, Complex Analysis

    Building Blocks of the Data Warehouse Source Data

    Data Staging Data Storage Information Delivery Metadata Management and Control

  • 7/29/2019 DW Lecture 05

    3/22

    3

    Architectural Components

    Source Data

    Data Staging

    Data Storage

    Metadata

    Management & ControlInformation Delivery

    Reports / Queries

    OLAP

    Data Mining

    Data Marts

    Data Warehouse MDDB

    DATAACQUISITION

    DATASTORA

    GE

    INFORMAT

    IONDELIVERY

    External

    Production

    Internal

    A

    rchived

  • 7/29/2019 DW Lecture 05

    4/22

    4

    Infrastructure Supporting

    Architecture Operational

    People

    Procedures

    Training

    Management Software

    Physical

    Hardware Operating System

    DBMS

    Network Software

  • 7/29/2019 DW Lecture 05

    5/22

    5

    Platform Options Single Platform Option Hybrid Option

    Source Data Platforms Staging Area Platforms

    Options for Staging Area Source Data Platforms Data Storage Platforms Separate Platforms

    Data Movement Options Shared Disk Mass Data Transmission Real Time Connection Manual Methods

  • 7/29/2019 DW Lecture 05

    6/22

    6

    Server Hardware SMP (Symmetric Multiprocessing)

    Clusters

    MPP (Massively Parallel Processing) ccNuma or NUMA (Cache-coherent Nonuniform

    Memory Architecture)

  • 7/29/2019 DW Lecture 05

    7/227

    Symmetric Multiprocessing Features

    Shared everything architecture Simplest parallel processing

    Benefits

    Proven technology since 1970 Workload balance Scalable performance Easy administration

    Limitations Limited available memory

    Limited bandwidth Limited availability

    Consideration Data warehouse size is two to three hundred gigabytes and concurrency

    requirements are reasonable

  • 7/29/2019 DW Lecture 05

    8/228

    Symmetric MultiprocessingProcessor Processor Processor Processor

    Shared Disks

    Shared Memory

    Common Bus

  • 7/29/2019 DW Lecture 05

    9/229

    Clusters Features

    Each node has one or more processors and associated memory Memory is shared within each node only High speed bus communication

    Shared disks Cluster of nodes

    Benefits High availability Preserves the concept of one database Incremental growth

    Limitations Bus bandwidth High O/S overhead Cache consistency maintenance for inter-node synchronization

    Consideration If data warehouse is expected to grow in a well defined increments

  • 7/29/2019 DW Lecture 05

    10/2210

    ClustersProcessor Processor

    Shared Disks

    SharedMemory

    Common High Speed Bus

    Processor Processor

    SharedMemory

  • 7/29/2019 DW Lecture 05

    11/2211

    Massively Parallel Processing Features

    Shared nothing architecture Focus of disk access than memory access Works well with O/S that supports transparent disk access Inter-node communication through processor to processor connection

    Benefits Highly scalable Fast access between nodes Improved system availability Cost per node is low

    Limitations Requires rigid data partitioning Restricted data access Limited work load balance Cache consistency must be maintained

    Considerations Medium to large size data warehouse of four to five hundred gigabytes

  • 7/29/2019 DW Lecture 05

    12/22

    12

    Massively Parallel ProcessingProcessor

    Memory

    Disk

    Processor

    Memory

    Disk

    Processor

    Memory

    Disk

    Processor

    Memory

    Disk

  • 7/29/2019 DW Lecture 05

    13/22

    13

    Cache-coherent Nonuniform

    Memory Architecture Features

    New architecture, since early 1990s Big SMP broken into smaller SMP Single real memory address space over entire machine

    Benefits Maximum flexibility Overcome memory limitations of SMP Better scalability than SMP Partitioning with centralized approach

    Limitations

    Complex programming Limited software support Still maturing

    Consideration For experienced technology users

  • 7/29/2019 DW Lecture 05

    14/22

    14

    Cache-coherent Nonuniform

    Memory ArchitectureProcessor Processor

    SharedMemory

    Disks

    ProcessorProcessor

    SharedMemory

    Disks

  • 7/29/2019 DW Lecture 05

    15/22

    15

    Software Tools Data Modeling

    Data Extraction

    Data Transformation

    Data Loading

    Data Quality

    Queries and Reports

    OLAPAlert Systems

    Middleware and Connectivity

    Data Warehouse Management

    Architecture First,Then Tools

  • 7/29/2019 DW Lecture 05

    16/22

    16

    Metadata Definitions

    Data about data

    Table of contents for the data Catalog for the data

    Data warehouse atlas

    Data warehouse roadmap

    Data warehouse directory The nerve center

  • 7/29/2019 DW Lecture 05

    17/22

    17

    Metadata

    ExampleEntity Name CustomerAlias Names Account, Client

    Definition A person or an organization that purchases good or services

    from the company

    Remarks It includes regular, current and past customersSource Systems Finished Goods Orders, Maintenance Contracts, Online Sales

    Created Date January 15, 1999

    Last Update Date January 21, 2001

    Update Cycle Weekly

    Last Full Refresh December 29, 2000Full Refresh Cycle Every Six Months

    Data Quality Reviewed January 25, 2001

    Last Deduplication January 10, 2001

    Planned Archival Every Six Months

    Responsible User Jane Brown

  • 7/29/2019 DW Lecture 05

    18/22

    18

    Need of Metadata For Using Data Warehouse

    For Building Data Warehouse

    For Administering Data WarehouseWho needs it?

    IT Professionals

    Power Users

    Casual Users

  • 7/29/2019 DW Lecture 05

    19/22

    19

    A Nerve Center

    Data WarehouseMetadata

    SourceSystems

    ExtractionTools

    CleansingTools

    Trans-formation

    Tools

    DataLoad

    Function

    ExternalData

    Applications

    DataMining

    OLAPTool

    ReportingTool

    QueryTool

  • 7/29/2019 DW Lecture 05

    20/22

    20

    Metadata by Functional Areas Data Acquisition

    Extraction, Transformation, Cleansing, Integration,Staging

    Data Storage Loading, Archiving, Management

    Information Delivery Reports, Query Processing, Complex Analysis

    Business Metadata

    Technical Metadata

  • 7/29/2019 DW Lecture 05

    21/22

    21

    Metadata Requirements Capturing and Storing Data

    Variety of Metadata Sources

    Metadata Integration Metadata Standardization

    Rippling through Revisions

    Keeping Metadata Synchronized

    Metadata Exchange

  • 7/29/2019 DW Lecture 05

    22/22

    22

    Metadata Sources Source Systems

    Data Extraction

    Data Transformation and Cleansing Data Loading

    Data Storage

    Information Delivery