Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

27
Updates in MonetDB/XQuery Database T Peter Boncz (CWI) Sjoerd Mullender update actions Jens Teubner XQUF parsing Niels Nes logging Stefan Manegold the rest everything you always wanted to know about Updates in MonetDB/XQuery but were afraid to ask

description

everything you always wanted to know about Updates in MonetDB/XQuery but were afraid to ask. Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging Stefan Manegoldthe rest. XQuery Update Facility (XQUF) semantics & the update tape - PowerPoint PPT Presentation

Transcript of Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Page 1: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Peter Boncz (CWI)

Sjoerd Mullender update actionsJens Teubner XQUF parsingNiels Nes loggingStefan Manegold the rest

everything you always wanted to know about

Updates in MonetDB/XQuerybut were afraid to ask

Page 2: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Overview• XQuery Update Facility (XQUF)

• semantics & the update tape

• Updatable XML storage in BATs• maintaining order in an array without O(N) cost

• Snapshot Isolation• why we want it, how we got it

• Concurrency Control• optimistic, with “abort convoys”

• Durability• physical logging

• Conclusion & Future Challenges

Page 3: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

XQuery Update Facility (XUF)

January 2006, first proposal

Internal primitives:upd:insertBeforeupd:insertAfterupd:insertIntoupd:insertIntoAsLastupd:insertAttributesupd:deleteupd:replaceValueupd:rename

Pending update list concept

upd:applyUpdates

Page 4: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

insert

<item id="{id}">

<location>Brazil</location>

<quantity>200</quantity>

<name>XML in a nutshell</name>

<payment>Credit Card, Personal check</payment>

<shipping>Will ship internationally</shipping>

<incategory category="category1"/>

</item>

as last into

fn:doc("xmark.xml")/site/regions/samerica

Example

Page 5: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Semantics

let $root = doc(“foo.xml”)

for $i in (1,2,3)

return

do insert <x>$i</x> as first into $root),

do insert <y>$i</y> as first into $root))

Page 6: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Semantics

let $root = doc(“foo.xml”)

for $i in (1,2,3)

return

(do insert <x>$i</x> as first into $root),

do insert <y>$i</y> as first into $root))

We need to

• define an execution order, and

• enforce it

Page 7: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

The Update Tapeupdate = sequence ( int, node, node/str, node/str)

fn:delete() (DELETE, node, nil, nil)

fn:insert_*() (INSERT, tgt-node, tgt-level, expr-node)

fn:set-attr() (ATTR, node, qn, val)

fn:unset-attr() (ATTR, node, qn, nil)

fn:set-text() (TEXT, node, val, nil)

fn:set-pi() (PI, node, ins-val, arg-val)

fn:set-comment() (COMMENT, node, val, nil)

( element construction ), that combines updates, will enforce the correct order of the update tape.

Pathfinder compiler automatically inserts call to

fn:update(item*)

on the result of all update queries

Page 8: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

XPath Accellerator [SIGMOD02]

pre posta 0 9b 1 3c 2 2d 3 0e 4 1f 5 8g 6 4h 7 7i 8 5j 9 6

Node-based relational encoding of XQuery's data model

<a> <b> <c> <d/> <e/> </c> </b> <f> <g/> <h> <i/> <j/> </h> </f></a>

descendant

ancestor following

preceding

Page 9: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

XML Storage Revisited

pre size level0 9 01 3 12 2 23 0 34 0 35 4 16 0 27 2 28 0 39 0 3

pre posta 0 9b 1 3c 2 2d 3 0e 4 1f 5 8g 6 4h 7 7i 8 5j 9 6

post = pre + size - level

Page 10: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Updates: Mission Impossible?

pre posta 0 9b 1 3c 2 2d 3 0e 4 1f 5 8g 6 4h 7 7i 8 5j 9 6

size(following) = O(N) killer (?)

<a> <b> <c> <d/> <e/> </c> </b> <f> <g/> <h> <i/> <j/> </h> </f></a>

descendant

ancestor following

precedingINSERT SUBTREE

SIZE + |I|

PRE+ |I|

Page 11: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

XML Storage Revisited

rid size level nid0 11 0 N01 5 1 N12 -1 null null3 0 null null4 2 2 N25 0 3 N36 0 3 N47 4 1 N58 0 2 N6

9 2 2 N710 0 3 N811 0 3 N9

pre size level0 9 01 3 12 2 23 0 34 0 35 4 16 0 27 2 28 0 39 0 3

pre size level0 11 01 5 12 -1 null3 null null4 2 25 0 36 0 37 4 18 0 29 2 210 0 311 0 3

pre posta 0 9b 1 3c 2 2d 3 0e 4 1f 5 8g 6 4h 7 7i 8 5j 9 6

post = pre + size - level

Allow holes Define logical pages

Page 12: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

XML Storage Revisited

rid size level nid0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N87 0 3 N98 2 2 N2

9 0 3 N310 0 3 N411 4 1 N5

pre size level0 9 01 3 12 2 23 0 34 0 35 4 16 0 27 2 28 0 39 0 3

pre size level0 11 01 5 12 -1 null3 null null4 2 25 0 36 0 37 4 18 0 29 2 210 0 311 0 3

pre posta 0 9b 1 3c 2 2d 3 0e 4 1f 5 8g 6 4h 7 7i 8 5j 9 6

post = pre + size - level

Allow holes Define logical pages

page map0 01 22 1

rid = pre.swizzle( )

Page 13: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

XML Storage RevisitedUpdate-friendly• rid-table is append-only• rid-tuples may be unused• rid = autoincrement column

MonetDB: • rid not stored but computed (virtual oid)• allows positional lookup/join

Not stored no need to update it either

rid size level nid0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N87 0 3 N98 2 2 N2

9 0 3 N310 0 3 N411 4 1 N5

Page 14: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

XML Storage RevisitedUpdate-friendly• rid-table is append-only• rid-tuples may be unused• rid = autoincrement column

rid size level nid0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N87 0 3 N98 2 2 N2

9 0 3 N310 0 3 N411 4 1 N5

Updatable document collection:- pf:add-doc(URI, docname, perc>0)- pf:add-doc(URI, docname, collname, perc>0)

pre := nid.leftfetchjoin(nid_rid).swizzle(map_pid)

Read-only document collection:- pf:add-doc(URI, docname, 0)- pf:add-doc(URI, docname, collname, 0)NID = RID = PREpre := nid.leftfetchjoin(nid_rid).swizzle(map_pid) = FREE!!

pre size level nid0 11 0 N01 5 1 N12 -1 null null3 0 null null4 2 2 N25 0 3 N36 0 3 N47 4 1 N58 0 2 N6

9 2 2 N710 0 3 N811 0 2 N9

Page 15: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Snapshot Isolation Versus 2-phase locking (2PL) == full serializability

Why not 2PL XML:

• lock semantics much more complex than in relational case (order matters!!)

• node-level locking in staircase join?? (now 10 cycles/node…)

Page 16: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Snapshot Isolation

Page 17: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Snapshot Isolation Versus 2-phase locking (2PL) == full serializability

Why not 2PL XML:

• lock semantics much more complex than in relational case (order matters!!)

• node-level locking in staircase join?? (now 10 cycles/node…)

Why Snapshot Isolation:

• great for read-queries, great for ll_scj (runs unmodified)

• quite strong. Better than repeatable read. Oracle/Postgres do it.

Problem with Snapshot Isolation:

• in XQuery, it is unknown at compile-time what to snapshot (fn:doc(..))

Page 18: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Snapshot Isolation Read Query1 Read Query 2 Update Query

rid size level Nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

Isolation By Shadow Paging (copy-on-write mmap)

• rid/pre delete/insert + attr-replace

Touch one byte per physical page: *addr = *addr;

MMU traps, OS replaces page by a copy

• we would like to replace the master copy once, not all client copies

Page 19: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Snapshot Isolation Read Query1 Read Query 2 Update Query

rid size level Nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N87 0 3 N9

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

Isolation By Shadow Paging (copy-on-write mmap)

• rid/pre delete/insert + attr-replace

Touch one byte per physical page: *addr = *addr;

MMU traps, OS replaces page by a copy

• we would like to replace the master copy once, not all client copies

Isolate-page

Page 20: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Snapshot Isolation Read Query1 Read Query 2 Update Query

rid size level Nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N87 0 3 N9

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

Isolation By Shadow Paging (copy-on-write mmap)

• rid/pre delete/insert + attr-replace

Touch one byte per physical page: *addr = *addr;

MMU traps, OS replaces page by a copy

Isolate-page

Page 21: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Snapshot Isolation Read Query1 Read Query 2 Update Query

rid size level Nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N87 0 3 N9

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

Isolation By Shadow Paging (copy-on-write mmap)

• rid/pre delete/insert + attr-replace

Touch one byte per physical page: *addr = *addr;

MMU traps, OS replaces page by a copy

• we would like to replace the master copy once, not all client copies

Master-update

Page 22: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Durability Masters become dirty

• no time to flush them during query

• log all changes to a WAL

= log all tuples that changed = entire pages

Recovery:

• after a crash, we do not know whether dirty pages got saved

• solution: overwrite tables with values from the WAL

Checkpointing Thread:

• every 5 minutes, if ‘many’ changes occurred, checkpoint

• memory mapped bats are sync()-ed ony dirty pages get written

• checkpoint locks collection, halts query processing

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

Page 23: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Durability Masters become dirty

• no time to flush them during query

• log all changes to a WAL

= log all tuples that changed = entire pages

Recovery:

• after a crash, we do not know whether dirty pages got saved

• solution: overwrite tables with values from the WAL

Checkpointing Thread:

• every 5 minutes, if ‘many’ changes occurred, checkpoint

• memory mapped bats are sync()-ed ony dirty pages get written

• checkpoint locks collection, halts query processing

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

Page 24: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

The Update Sequence Execute Query

• build update tape

• queries get isolated copies of a document (VM copy-on-write mmap)

Prepare Intensional Updates

• execute update tape.

• does not modify masters (except append-only tables)

Commit Phase (locked phase – per doc-collection)

• precommit

• detect conflicts (not the size-ancestors)

•write WAL (globally locked)

• read master-size-ancestors, use delta, log result

• update master tables

• isolate first! Only then update masters.

• update index structures

Page 25: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Many more Issues Solved

Conflicting Updates

• detect conflicting queries:

• look at RID page numbers and attr-IDs

• reacting to conflicts:

• abort query + automatic restart

• run CONVOY of 5 next update queries serially

Indexing and Updates

• Runtime QN NID mapping, with hash table

• read-only: not a hash, but keep sorted & persistent

• keep INS + DEL deltas to commit without changing the hash table

• Runtime NID ATTR hash table

• isolation loses you MonetDB dynamic hash table reuse

• share an old copy, exploit append-mostly

ACID properties on the Meta Level

• Shredding a new doc into a collection Query

• Shredding a new doc into a collection Update

• Using a collection Deleting/adding documents

• Meta Querying Deleting/adding documents

Concurrency

Updates Checkpoint

Shredding Query

Shredding Updates

Allocating New Pages and NIDS

• Offload shredding interference with freelist

• Unlocked access to private pages

Page 26: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Snapshot Isolation Versus 2-phase locking (2PL) == full serializability

Why not 2PL XML:

• lock semantics much more complex than in relational case (order matters!!)

• node-level locking in staircase join?? (now 10 cycles/node…)

Why Snapshot Isolation:

• great for read-queries, great for ll_scj (runs unmodified)

• quite strong. Better than repeatable read. Oracle/Postgres do it.

Problem with Snapshot Isolation:

• in XQuery, it is unknown at compile-time what to snapshot (fn:doc(..))

2PL (++)375 transactions/5 minutes

= 1.2 transaction/sec

Page 27: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Conclusions It works! Reasonable/good performance!

• transaction mgmt as a module extension outside a kernel works

• identified VM primitives that databases really need

Future work:

• Test on XML update benchmark TPOX (DB2: 700 trans/second)

• Packed Memory Arrays: alternative for page remapping?

• page remapping is technically O(N)

• Engineering:

• support for value-indexing (does PF support it already)

• asynchronous WAL writing to boost throughput

• port MIL to C primitives; port C primitives to Monet5