HANA DB Column Store

download HANA DB Column Store

of 56

Transcript of HANA DB Column Store

  • 7/29/2019 HANA DB Column Store

    1/56

    Sharone ZehaviJordan Jordanov

  • 7/29/2019 HANA DB Column Store

    2/56

    2011 SAP AG. All rights reserved. 2

    Agenda

    Concepts of Column Store

    Structure Compared to Row Store

    Performance issues Compared to Row Store

    Go through examples to make the points

    In-Depth view of Column Store

    Architecture

    Delta Store

    Consistent View Data Compression

    Accessing Data

    Join Operation

  • 7/29/2019 HANA DB Column Store

    3/56

    2011 SAP AG. All rights reserved. 3

    Performance bottleneck

  • 7/29/2019 HANA DB Column Store

    4/56

    2011 SAP AG. All rights reserved. 4

    Orders of Magnitudepresented by Jeff Dean (Google)

    Activity Time in ns

    L1 cache reference 0.5

    Branch mis-prediction 5

    L2 cache reference 7

    Mutex lock/unlock 25

    Main memory reference 100

    Compress 1K bytes with Zippy 3,000

    Send 2K bytes over 1 Gbps network 20,000

    Read 1 MB sequentially from memory 250,000

    Round trip within same datacenter 500,000

    Disk seek 10,000,000

    Read 1 MB sequentially from disk 20,000,000

    Send packet CA->Netherlands->CA 150,000,000

    http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf

  • 7/29/2019 HANA DB Column Store

    5/56

    2011 SAP AG. All rights reserved. 5

    HANA Table Types

    Column

    Row

    History Column (Temporal)

    Global Temporary

    Local Temporary

    In this presentation we will focus on Column tables only,and will mention a little bit of Row tables for the sake ofcomparison.

  • 7/29/2019 HANA DB Column Store

    6/56

    2011 SAP AG. All rights reserved. 6

    Logical Structure Of a Table

    Row 1

    Row 2

    .

    . . .

    . . .

    . . .

    .

    .

    Row N

    DataPage1

    DataPage2

    DataPage3

    DataPage4

    DataPage5

    Data

    Page6

    Data

    Page7

    Data

    Page8

    Data

    Page9

    Data

    Page10

    DataPage(5n-4)

    DataPage(5n-3)

    DataPage(5n-2)

    DataPage(5n-1)

    DataPage(5n)

    Column 1 Column 2 Column 3 Column 4 Column 5

  • 7/29/2019 HANA DB Column Store

    7/56

    2011 SAP AG. All rights reserved. 7

    Row Store - Physical Structure

    The address of

    Row 1

    The address of

    row 2 can becalculated

    . . .

    . . .

    . . .Row n = (the size of

    all columns) * n

    DataPage1

    DataPage2

    DataPage3

    DataPage4

    DataPage5

    Data

    Page6

    Data

    Page7

    Data

    Page8

    Data

    Page9

    Data

    Page10

    DataPage(5n-4)

    DataPage(5n-3)

    DataPage(5n-2)

    DataPage(5n-1)

    DataPage(5n)

    Column 1 Column 2 Column 3 Column 4 Column 5

  • 7/29/2019 HANA DB Column Store

    8/56

    2011 SAP AG. All rights reserved. 8

    Column Store - Physical Structure(Simplified)

    Row 1

    Row 2

    .

    .

    . . . . . .

    . . . . . .

    . . . . . .

    .

    .

    Row N

    DataPage1

    DataPage2

    DataPage3

    DataPage4

    DataPage5

    Data

    Page6

    Data

    Page7

    Data

    Page8

    Data

    Page9

    Data

    Page10

    DataPage(5n-4)

    DataPage(5n-3)

    DataPage(5n-2)

    DataPage(5n-1)

    DataPage(5n)

    Column 1 Column 2 Column 3 Column 4 Column 5

  • 7/29/2019 HANA DB Column Store

    9/56

    2011 SAP AG. All rights reserved. 9

    Example Logical Structure

    SalesProductCountry

    3000AlphaUS1250BetaUS

    700AlphaJP

    450AlphaUK

    TableRow Store

    US

    Alpha

    3000

    US

    Beta

    1250

    JP

    Alpha

    700UK

    Alpha

    450

    Row1

    Row2

    Row3

    Row4

    Column Store

    US

    US

    JP

    UK

    Alpha

    Beta

    Alpha

    Alpha

    30001250

    700

    450

    Country

    Product

    Sales

  • 7/29/2019 HANA DB Column Store

    10/56

    2011 SAP AG. All rights reserved. 10

    Example (cont.) For Column Store:How is the logical Structure Preserved?

    Row IDColumn Store

    US (Row ID 1)

    US

    JP

    UK

    Alpha (Row ID 1)

    Beta

    Alpha

    Alpha

    3000 (Row ID 1)

    1250

    700

    450

    Country

    Product

    Sales

  • 7/29/2019 HANA DB Column Store

    11/56

    2011 SAP AG. All rights reserved. 11

    Data Dictionary

  • 7/29/2019 HANA DB Column Store

    12/56

    2011 SAP AG. All rights reserved. 12

    Column Store performing selectwhere CITY = New York

  • 7/29/2019 HANA DB Column Store

    13/56

    2011 SAP AG. All rights reserved. 13

  • 7/29/2019 HANA DB Column Store

    14/56

    2011 SAP AG. All rights reserved. 14

    Row vs. Column Store

    3 Topics to Consider

    Read (Select)

    Write (Update)

    Write (Insert)

  • 7/29/2019 HANA DB Column Store

    15/56

    2011 SAP AG. All rights reserved. 15

    Row vs. Column Store Reading Data

    We will understand the Pros and Cons of each method following an example.

    Lets look at the following school table:

    6th

    Grade

    5th

    Grade

    4th

    Grade

    3rd

    Grade

    2nd

    Grade

    1st

    Grade

    MotherFatherFamily

    KevinnullDonnanullnullJimMirandaRichardSmith

    nullJeffreynullAlexEricGaryGiselleStephenGalway

    AlexisnullnullRolandnullTimothyBarbaraJohnBush

    SusannullDonaldnullJasonSandraAbbyJackBrown

    LarryBriannullDavidnullJessicaGinnyJohnTaylor

    LauraKarenRuthnullAngelaRonaldNancyPeterMoore

    nullJanetFrankJerryHeatherDennisRuthClarkHarris

    BrendanullMelissanullCynthiaShirleyMichelleJamesTaylor

  • 7/29/2019 HANA DB Column Store

    16/56

    2011 SAP AG. All rights reserved. 16

    Row vs. Column Store Reading Data(cont.)

    So what if we just wanted to read the entire table?

    select * from School

    Recall the Physical Structure discussed earlier Which Storing method will enableus a faster read? Why?

    Hints:

    We are going to fully scan the table in any case. We need to read entire rowsone after the other, so which physical structure will enable us a smooth read?

    What actions are required when performing the query with Column Store?

    What actions are required when performing the query with Row Store?

  • 7/29/2019 HANA DB Column Store

    17/56

    2011 SAP AG. All rights reserved. 17

    Row vs. Column Store Reading Data(cont.)

    Now, what if we want to get a list of all 1st grade pupils?

    select 1st_grade from School where 1st_grade is not null

    Again, recall the Physical Structure discussed earlier Which Storing method willenable us a faster read? Why?

    Hints:

    Are we going to fully scan the table? We are going to scan all rows in any case,but only one column, so which physical structure will enable us a smooth

    read? What actions are required when performing the query with Column Store?

    What actions are required when performing the query with Row Store?

  • 7/29/2019 HANA DB Column Store

    18/56

    2011 SAP AG. All rights reserved. 18

    Row vs. Column Store Reading Data(cont.)

    Now, what if we want to get a list of all Families who have children in 1st grade, 3rd

    grade and 6th grade?

    select Family

    from Schoolwhere 1st_grade is not null

    and 3rd_grade is not null

    and 6th_grade is not null

    Again, recall the Physical Structure discussed earlier and try to answer WhichStoring method will enable us a faster read?

    Can we have a definite answer here?

    What are the Pros and Cons?

  • 7/29/2019 HANA DB Column Store

    19/56

    2011 SAP AG. All rights reserved. 19

    Row vs. Column Store Writing Data Update

    Now, a mistake was found with the tables data, and we found out that David Taylorfrom 3rd grade is actually in 4th grade. So we need to update the table accordingly:

    update School

    set 3rd_grade = null,4th_grade = David

    where Family = Taylor

    and Father = John

    and Mother = Ginny

    Again, recall the Physical Structure discussed earlier and try to answer Which

    Storing method will enable us a faster update?

  • 7/29/2019 HANA DB Column Store

    20/56

    2011 SAP AG. All rights reserved. 20

    Row vs. Column Store Writing Data Update(cont.)

    For Column Store, we first need to search for the conditions:

    RowID

    Family

    4Brown

    3Bush

    2Galway

    7Harris

    6Moore

    1Smith

    5Taylor

    8Taylor

    RowID

    Father

    7Clark

    4Jack

    8James

    3John

    5John

    6Peter

    1Richard

    2Stephen

    RowID

    Mother

    4Abby

    3Barbara

    5Ginny

    2Giselle

    8Michelle1Miranda

    6Nancy

    7Ruth

  • 7/29/2019 HANA DB Column Store

    21/56

    2011 SAP AG. All rights reserved. 21

    Row vs. Column Store Writing Data Update(cont.)

    We found out that the Row ID for change is 5, so now is the time for update:

    RowID

    3rd_grade

    2Alex

    5David

    7Jerry

    3Roland

    1null

    4null

    6null

    8null

    RowID4th_grade

    1Donald

    4Donna

    7Frank

    8Melissa

    6Ruth

    2null

    3null

    5null

  • 7/29/2019 HANA DB Column Store

    22/56

    2011 SAP AG. All rights reserved. 22

    Row vs. Column Store Writing Data Update(cont.)

    So we have to update the values as requested, but we also have to sort thecolumns to reflect the new order, based on the new values:

    RowID

    3rd_grade

    2Alex

    7Jerry

    3Roland

    5null

    1null

    4null

    6null

    8null

    RowID4th_grade

    5David

    1Donald

    4Donna

    7Frank

    8Melissa

    6Ruth

    2null

    3null

  • 7/29/2019 HANA DB Column Store

    23/56

    2011 SAP AG. All rights reserved. 23

    Row vs. Column Store Writing Data Update(cont.)

    For Row Store, assuming no indexes are present, we simply scan the table row forrow, stopping every time we find a match for the conditions, and updating.

    But the table scan is full, meaning, the table is scanned until the end. On the other

    hand, it is scanned only once.

    So where did we get better performance for update?

    Can we have a definite answer here?

  • 7/29/2019 HANA DB Column Store

    24/56

    2011 SAP AG. All rights reserved. 24

    Row vs. Column Store Writing Data Insert

    A new family has moved into town, and they registered their kids to the school. Wewant to reflect this with an insert command:

    insert into School

    values (Donovan, Harry, Pamela, null, Martha, null,Brenda, Albert, Justin)

    How would we implement this action in both methods?

  • 7/29/2019 HANA DB Column Store

    25/56

    2011 SAP AG. All rights reserved. 25

    Row vs. Column Store Writing Data Insert(cont.)

    For Column Store, after allocating a new Row ID, we will need to do the followingfor each column:

    1. Add the new value

    2. Re-sort the column, and maybe reorder, assuming we want the values` to becontiguous.

    For Row Store, we simply allocate new data pages at the end of the table andsimply pour the data in there. It should take o(1) time.

    So we can see the straightforward advantage of Row Store when inserting newdata is involved.

  • 7/29/2019 HANA DB Column Store

    26/56

    2011 SAP AG. All rights reserved. 26

    Advantages of Column Store

    So when does Column Store have a clear cut advantage over Row Store?

    Calculations are typically executed on a single or a few columns only

    The table is searched based on values of a few columns

    The table has a big number of columns

    The table has a big number of rows and columnar operations are required(aggregate, scan, etc.)

    High compression rates can be achieved because the majority of the columns

    contain only few distinct values (compared to number of rows)

    Elimination of indexes

    Parallelization

  • 7/29/2019 HANA DB Column Store

    27/56

    2011 SAP AG. All rights reserved. 27

    Advantages of Row Store

    Row Store tables are better when:

    The application needs to process only one single record at one time (manyselects and /or updates of single records).

    The application typically needs to access the complete record

    The columns contain mainly distinct values so compression rate would be low

    Neither aggregations nor fast searching are required

    The table has a small number of rows (for example configuration tables)

  • 7/29/2019 HANA DB Column Store

    28/56

    2011 SAP AG. All rights reserved. 28

    Column Store Conceptual Architecture

  • 7/29/2019 HANA DB Column Store

    29/56

    2011 SAP AG. All rights reserved. 29

    Column Store Delta Storage

    So we saw that inserting a new row (and sometimes update too) is a veryexpensive action to perform for Column Store. So what do we do to ease the pain?

    Every write operation (Insert or Update) in Column Store does not directly modify

    compressed data, but rather goes into a separate area called the Delta Storage.

    The changes are taken over from the delta storage asynchronously at some laterpoint in time. This action is called Delta Merge. The Delta Merge operationintegrates committed changes collected in delta storage into main storage.

    The following steps are taken when a write operation occurs:

  • 7/29/2019 HANA DB Column Store

    30/56

    2011 SAP AG. All rights reserved. 30

    Write operations in a Columnar Store

  • 7/29/2019 HANA DB Column Store

    31/56

    2011 SAP AG. All rights reserved. 31

    Write operations in a Columnar Store

  • 7/29/2019 HANA DB Column Store

    32/56

    2011 SAP AG. All rights reserved. 32

    Write operations in a Columnar Store

  • 7/29/2019 HANA DB Column Store

    33/56

    2011 SAP AG. All rights reserved. 33

    Column Store Delta Storage Cont.

    If the current transaction is not already a write transaction, the transaction manager istold to make it a write transaction and to provide an updated transaction token.

    For updates and deletes a write lock is requested from the transaction manager for therecord (identified by its key). The operation is blocked until the lock is available. The lockis held until the transaction is committed or rolled back.

    For inserts and updates, the operation inserts a new row into the delta storage with theupdated data.

    The write operation tells the consistent view manager about the change. The consistentview manager stores transaction related information that is needed to create theconsistent view for a specific read operation. This includes the information which rows in

    delta storage were inserted by some transaction and which other rows were invalidated.In case of a deletion the consistent view manager just stores the information that thepreviously valid row now becomes invalid.

    Unless it is a temporary table, the write operation writes an entry into the delta log.

  • 7/29/2019 HANA DB Column Store

    34/56

    2011 SAP AG. All rights reserved. 34

    Consistent View of Current Data

    With the delta concept, updates in the Column Store do not physicallychange existing rows.

    Updates are always done by inserting a new entry to the delta storage.

    Therefore a mechanism is required, to ensure each transaction readsthe data it is supposed to read, be it from the Main Store or from theDelta Store

    The Consistent View Manager takes care of exactly this.

    To understand Consistent View, we first need to understand IsolationLevels:

  • 7/29/2019 HANA DB Column Store

    35/56

    2011 SAP AG. All rights reserved. 35

    Consistent View of Current DataIsolation Levels

    Read Committed

    Corresponds to Statement Level Read Consistency

    With statement level snapshot isolation, different statements in atransaction may see different snapshots of the system.

    The statement in a transaction sees consistent snapshots of thesystem.

    Each statement sees the changes that were committed when the

    execution of the statement started.

  • 7/29/2019 HANA DB Column Store

    36/56

    2011 SAP AG. All rights reserved. 36

    Consistent View of Current DataIsolation Levels

    Repeatable Read / Serializable

    Corresponds to Transaction Level Snapshot Isolation

    All statements of a transaction see the same snapshot of thedatabase.

    This snapshot contains all changes that were committed at the timethe transaction started.

    This snapshot contains, in addition, the changes made by the

    transaction itself.

    Now, back to Consistent View, lets follow an example:

  • 7/29/2019 HANA DB Column Store

    37/56

    2011 SAP AG. All rights reserved. 37

    Consistent View of Current Data

  • 7/29/2019 HANA DB Column Store

    38/56

    2011 SAP AG. All rights reserved. 38

    Delta Merge

    Executed on Table Level when:

    Number of lines in delta storage for this table exceeds specifiednumber

    Memory consumption of delta storage exceeds specified limit

    Merge is triggered explicitly by a client using SQL

    The delta log for a columnar table exceeds the defined limit. Asthe delta log is truncated only during merge operation, a merge

    operation needs to be performed in this case.

  • 7/29/2019 HANA DB Column Store

    39/56

    2011 SAP AG. All rights reserved. 39

    Delta Merge

  • 7/29/2019 HANA DB Column Store

    40/56

    2011 SAP AG. All rights reserved. 40

    Data Compression

  • 7/29/2019 HANA DB Column Store

    41/56

    2011 SAP AG. All rights reserved. 41

    Data Compression Additional Compression

    Prefix Coding

    If the column starts with a long sequence of the same value V, thesequence is replaced by storing the value once, together with the numberof occurrences.

    This makes sense if there is one predominant value in the column and the

    remaining values are mostly unique or have low redundancy.

  • 7/29/2019 HANA DB Column Store

    42/56

    2011 SAP AG. All rights reserved. 42

    Data Compression Additional Compression

    Run Length Encoding

    Run length encoding replaces sequences of the same value with a singleinstance of the value and its start position.

    This variant of run length encoding was chosen, as it speeds up access

    compared to storing the number of occurrences with each value.

  • 7/29/2019 HANA DB Column Store

    43/56

    2011 SAP AG. All rights reserved. 43

    Data Compression Additional Compression

    Cluster Encoding

    Cluster encoding partitions the sequence into N blocks of fixed size (1024elements). If a cluster contains only occurrences of a single value, thecluster is replaced by a single occurrence of that value. A bit vector oflength N indicates which clusters were replaced by a single value.

  • 7/29/2019 HANA DB Column Store

    44/56

    2011 SAP AG. All rights reserved. 44

    Data Compression Additional Compression

    Sparse Encoding

    Sparse encoding removes the value V that appears most often. A bit vectorindicates at which positions V was removed from the original sequence.

  • 7/29/2019 HANA DB Column Store

    45/56

    2011 SAP AG. All rights reserved. 45

    Data Compression Additional Compression

    Indirect Encoding

    Indirect encoding is also based on partitioning into blocks of 1024elements. If a block contains only a few distinct values, an additionaldictionary is used to encode the values in that block.

    Here is the concept with a block size of 8 elements. The first and the thirdblock consist of not more than 4 distinct values, so a dictionary with 4entries and an encoding of values with 2 bits is possible.

    For the second block this kind of compression makes no sense. With 8distinct values the dictionary alone would need the same space as the

    uncompressed sequence.The implementation also needs to store the information which blocks areencoded with an additional dictionary and the links to the additionaldictionaries.

  • 7/29/2019 HANA DB Column Store

    46/56

    2011 SAP AG. All rights reserved. 46

    Data Compression Additional Compression

    Indirect Encoding

  • 7/29/2019 HANA DB Column Store

    47/56

    2011 SAP AG. All rights reserved. 47

    Data Compression Additional Compression

    String Delta Compression

    The dictionary is stored as a sequence of blocks that contain 16 stringvalues that are compressed using the delta compression

    For each string value the following information is stored:

    The length of the prefix which this value has in common with itspredecessor

    The number of remaining characters after the common prefix

    The remaining characters after the common prefix.

  • 7/29/2019 HANA DB Column Store

    48/56

    2011 SAP AG. All rights reserved. 48

    Data Compression Additional Compression

    String Delta Compression

  • 7/29/2019 HANA DB Column Store

    49/56

    2011 SAP AG. All rights reserved. 49

    Accessing Data in Column Store

    Search by Attribute Value

    Search all rows with a given attribute value (select * from Table whereattribute = value), so a reverse lookup is needed.

    A binary search is performed on the Dictionary

    If the value exists in the Dictionary, the result of the reverse lookup is thevalue ID of the specified value.

    The value ID sequence is searched for all occurrences of the foundvalue ID.

    Lets look at an example:

  • 7/29/2019 HANA DB Column Store

    50/56

    2011 SAP AG. All rights reserved. 50

    Accessing Data in Column Store

    Search by Attribute Value (no index)

  • 7/29/2019 HANA DB Column Store

    51/56

    2011 SAP AG. All rights reserved. 51

    Accessing Data in Column Store

    Search by Attribute Value With Index

    Normally, even full column scans can be executed with high performance.However, in cases where the performance of column scans is not sufficient,an index can be defined on the column. It contains references to the rowsthat contain the value.

  • 7/29/2019 HANA DB Column Store

    52/56

    2011 SAP AG. All rights reserved. 52

    Accessing Data in Column Store

    Access by Row ID

    After a Row ID was determined:

    The value ID is read from the value ID sequence, by simply accessingthe corresponding row ID.

    Then the value ID is used to lookup the corresponding value in theDictionary.

    Lets look at an example:

  • 7/29/2019 HANA DB Column Store

    53/56

    2011 SAP AG. All rights reserved. 53

    Accessing Data in Column Store

    Access by Row ID

  • 7/29/2019 HANA DB Column Store

    54/56

    2011 SAP AG. All rights reserved. 54

    Column Store Join Operation

    Can calculate Inner joins, Right Outer joins, Left Outerjoins, and Full Outer joins.

    Limited to Equi-Joins only.

    Following is a Join example (using Value ID):

  • 7/29/2019 HANA DB Column Store

    55/56

    2011 SAP AG. All rights reserved. 55

    Column Store Join Operation

  • 7/29/2019 HANA DB Column Store

    56/56

    Thank You!