Gegevensbanken laatste les: XML...

89
1 Gegevensbanken “Laatse Les” Prof. Erik Duval 2009 - 2010 Sunday 30 May 2010
  • date post

    17-Oct-2014
  • Category

    Education

  • view

    1.297
  • download

    3

description

Last lecture for 2010 course on databases. Focuses on XML.

Transcript of Gegevensbanken laatste les: XML...

Page 1: Gegevensbanken laatste les: XML...

1

Gegevensbanken“Laatse Les”

Prof. Erik Duval2009 - 2010

Sunday 30 May 2010

Page 2: Gegevensbanken laatste les: XML...

2

http

://w

ww.

slid

esha

re.n

et/e

rik.d

uval

Sunday 30 May 2010

Page 3: Gegevensbanken laatste les: XML...

• NoSQL (Met dank aan Steven Noels)

• XML (met dank aan prof. Olivié!)

• over het examen...

3

Sunday 30 May 2010

Page 4: Gegevensbanken laatste les: XML...

4

http

://en

.wik

iped

ia.o

rg/w

iki/E

xten

sibl

e_M

arku

p_La

ngua

ge

Sunday 30 May 2010

Page 5: Gegevensbanken laatste les: XML...

5

http

://w

ww

.itjo

bboa

rd.b

e/IC

T-ba

nen/

xml/B

elgi

e/al

le/0

/rel

evan

tie/n

l/

Sunday 30 May 2010

Page 6: Gegevensbanken laatste les: XML...

6 http://www.khbo.be/12385

Sunday 30 May 2010

Page 7: Gegevensbanken laatste les: XML...

http://www.w3.org/XML7

Sunday 30 May 2010

Page 8: Gegevensbanken laatste les: XML...

8 http://www.w3c.it/talks/2005/openCulture/slide7-0.html

Sunday 30 May 2010

Page 9: Gegevensbanken laatste les: XML...

9

http

://en

.wik

iped

ia.o

rg/w

iki/L

ist_

of_X

ML_

mar

kup_

lang

uage

s

Sunday 30 May 2010

Page 10: Gegevensbanken laatste les: XML...

XML is not ...• Extension of HTML

• XHTML is XML-compliant, and extensible

• Just for Web pages

• Useful when data are stored or exchanged

• Concerned with semantics

• XML does not define semantics, just syntax

• Innovative new technology

• Standard, building on existing technology

• Only a hype

• Though also 10

Sunday 30 May 2010

Page 11: Gegevensbanken laatste les: XML...

XML is ...• Endorsed by W3C and major companies

• Extensible

• No tag name limitations

• No language limitations

• Human software developer-readable

• Can be processed with basic text tools

• Open standard

• no vendor lock-in (in theory...)

• Easy to implement

• powerful, cheap (free), off-the-shelf XML tools11

Sunday 30 May 2010

Page 12: Gegevensbanken laatste les: XML...

• 1969: SGML (Standard Generalized Markup Language)

• Meta-language: describe other languages

• Powerful, but rather complicated

• 1986: ISO standard

• 1992: HTML (HyperText Markup Language)

• Based on SGML

• Simple, but limited

• 1996: Start design of XML

• By World Wide Web Consortium (W3C)

• 1998: Publication of XML 1.012

Sunday 30 May 2010

Page 13: Gegevensbanken laatste les: XML...

Design Goals

• Easy to use over the Internet

• Power of SGML

• Simplicity of HTML

• Human-legible

• Easy to create

• Compactness is not an issue

• “The ASCII of the Web”13

Sunday 30 May 2010

Page 14: Gegevensbanken laatste les: XML...

XML Basics<Person>

<Name>

<First>Thomas</First>

<Last>Atkinson</Last>

</Name>

<Age>30</Age>

</Person>

• Self-defined, meaningful tags

• Separate data and its representation14

Sunday 30 May 2010

Page 15: Gegevensbanken laatste les: XML...

• Language for defining syntax

• Records and fields have explicit boundaries

• parse-able without knowing structure (self-descriptive)

• Unicode support (UTF-8, UTF-16, ...)

• Web-aware

• DTD, ENTITY and Schema can be loaded through URL

• Strictly parsed: no ambiguity (case sensitive!)

• Extensible: namespaces

15

Sunday 30 May 2010

Page 16: Gegevensbanken laatste les: XML...

<?xml version="1.0” encoding=“UTF-8”?> <!-- processing instruction: XML follows --><!DOCTYPE addressbook SYSTEM "http://www/~koenh/ddml/addressbook.dtd”> <!-- Document Type Declaration... --> <!-- ExternalDTDPointer --><addressbook> <!--root element --> <person first-name="John" family-name="Doe” employee-number="1234"> <contact-info> <email address="[email protected]"/> </contact-info> <address street="Celestijnenlaan” number="200A"/> </person></addressbook>

16

Sunday 30 May 2010

Page 17: Gegevensbanken laatste les: XML...

17

<H1 align=”center” > a Heading </H1>

attribute

openingtag

openingtag

openingtag

content closing tag

elementelementelementelementelement

Sunday 30 May 2010

Page 18: Gegevensbanken laatste les: XML...

• Cfr. HTML markup tags

17

<H1 align=”center” > a Heading </H1>

attribute

openingtag

openingtag

openingtag

content closing tag

elementelementelementelementelement

Sunday 30 May 2010

Page 19: Gegevensbanken laatste les: XML...

• Cfr. HTML markup tags

17

<H1 align=”center” > a Heading </H1>

attribute

openingtag

openingtag

openingtag

content closing tag

elementelementelementelementelement

Sunday 30 May 2010

Page 20: Gegevensbanken laatste les: XML...

• Cfr. HTML markup tags

17

<H1 align=”center” > a Heading </H1>

attribute

openingtag

openingtag

openingtag

content closing tag

elementelementelementelementelement

Sunday 30 May 2010

Page 21: Gegevensbanken laatste les: XML...

• Cfr. HTML markup tags

17

<H1 align=”center” > a Heading </H1>

attribute

openingtag

openingtag

openingtag

content closing tag

elementelementelementelementelement

Sunday 30 May 2010

Page 22: Gegevensbanken laatste les: XML...

• Cfr. HTML markup tags

17

<H1 align=”center” > a Heading </H1>

attribute

openingtag

openingtag

openingtag

content closing tag

elementelementelementelementelement

Sunday 30 May 2010

Page 23: Gegevensbanken laatste les: XML...

• Cfr. HTML markup tags

17

<H1 align=”center” > a Heading </H1>

attribute

openingtag

openingtag

openingtag

content closing tag

elementelementelementelementelement

Sunday 30 May 2010

Page 24: Gegevensbanken laatste les: XML...

• Cfr. HTML markup tags

• Major differences:

• Case sensitive

• Proper nesting: No <A> … <B> … </A> … </B>

• Unicode instead of ASCII17

<H1 align=”center” > a Heading </H1>

attribute

openingtag

openingtag

openingtag

content closing tag

elementelementelementelementelement

Sunday 30 May 2010

Page 25: Gegevensbanken laatste les: XML...

Vocabularies

• Agreed-upon XML tag sets for specific domain

• Examples

• Chemical Markup Language (CML)

• Business: ebXML, RosettaNet, BizTalk

• Mathematics: MathML

• Multimedia: Synchronized Multimedia Integration Language (SMIL)

• Etc.

18

Sunday 30 May 2010

Page 26: Gegevensbanken laatste les: XML...

• well-formed: follows XML syntax

• Proper tag and attribute names

• Tags properly closed

• Attributes and text between tags do not contain ‘<‘ (escape with &lt;)

• valid: well-formed and vocabulary

• All elements and their attributes declared in DTD

• Attribute values follow DTD type declarations

• CDATA, ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, enumerated

• Nesting and sequencing of elements follows DTD19

Sunday 30 May 2010

Page 27: Gegevensbanken laatste les: XML...

Elements• XML’s container for

• Attributes

• Character data

• Other elements (“child” elements)

• Delimited by opening and closing tags

• Non-empty element: <name>..</name>

• Empty element: <name/>

• Form a simple hierarchic tree

• Root = “document element”20

Sunday 30 May 2010

Page 28: Gegevensbanken laatste les: XML...

Attributes and Strings

• Attributes

• Name-value pairs: name=value

• Only strings as value!

• Strings

• Enclosed by ‘...’ or “...”→ replace with &apos; or &quot;

• Character data

• Any text that is not markup

• ‘&’, ‘<’ and ‘>’ are markup → replace with &amp; &lt; and &gt;

21

Sunday 30 May 2010

Page 29: Gegevensbanken laatste les: XML...

Document structure

• Prolog (optional)• <?xml version="1.0” encoding=“UTF-8”?>

• version="number" (compulsory)

• encoding="character encoding" (optional)

• Document type declaration• <!DOCTYPE document_element ... >

• Body–The document element

22

Sunday 30 May 2010

Page 30: Gegevensbanken laatste les: XML...

Another example<?xml version="1.0" standalone="no"?>

<!DOCTYPE BankAccounts ...>

<!-- This is an example XML document -->

<BankAccounts>

<Account accountNr="123-456789-01" use="personal">

<Owners> <Person ID="1258-a8d72-98"> <Name>John Smith</Name></Person>

<Person ID="5842-df5ef-e9"> <Name>Claudia Scott</Name></Person>

</Owners>

<CreditCards><CreditCard number="12345"/></CreditCards>

<Balance Currency="EUR">50000</Balance>

</Account>

...

</BankAccounts> 23

Sunday 30 May 2010

Page 31: Gegevensbanken laatste les: XML...

Document Type Definition

<!ELEMENT address EMPTY> <!-- no content, used for attributes only -->

<!ATTLIST address city CDATA #REQUIRED <!-- character data: any string --> <!-- value for that attribute must be present -->

state NMTOKEN #REQUIRED <!-- name token: letters, numbers, ., -, _ and : only -->

number CDATA #REQUIRED street CDATA #REQUIRED>

<!ELEMENT addressbook (person+)> <!-- 1 or more -->

<!ELEMENT contact-info (home-phone|mobile-phone|email)*> <!-- choice --> <!-- o or more -->

24

Sunday 30 May 2010

Page 32: Gegevensbanken laatste les: XML...

Document Type Definition<!ELEMENT email EMPTY><!ATTLIST email address CDATA #REQUIRED>

<!ELEMENT home-phone EMPTY><!ATTLIST home-phone number CDATA #REQUIRED>

<!ELEMENT job-info EMPTY><!ATTLIST job-info is-manager (yes|no) 'no’ <!-- default -->

emp-type (FullTime|PartTime) 'FullTime’ job-description CDATA #REQUIRED>

<!ELEMENT misc-info (#PCDATA)> <!-- Parsed Character Data: cannot contain subelements -->

<!ELEMENT mobile-phone EMPTY><!ATTLIST mobile-phone number CDATA #REQUIRED>25

Sunday 30 May 2010

Page 33: Gegevensbanken laatste les: XML...

Document Type Definition

<!ELEMENT manager EMPTY><!ATTLIST manager empnumber IDREF #REQUIRED> <!-- reference to empnumber of person -->

<!ELEMENT person (contact-info,address, job-info?,manager?,misc-info?)> <!-- sequence --> <!-- zero or one -->

<!ATTLIST person first-name CDATA #REQUIRED middle-initial CDATA #IMPLIED <!-- can, but need not be provided -->

employee-number ID #REQUIRED <!-- can be referred to by manager.empnumber -->

family-name CDATA #REQUIRED>

26

Sunday 30 May 2010

Page 34: Gegevensbanken laatste les: XML...

namespaces: problem

<widget type="gadget">

<head size="medium"/>

<big><subwidget ref="gizmo"/></big>

<info>

<head><title>Gadget</title></head>

<body><h1>Gadget</h1>

A gadget contains a big gizmo

</body>

</info>

</widget>

Name collision!

27

Sunday 30 May 2010

Page 35: Gegevensbanken laatste les: XML...

namespaces: approach

• A collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names

•xmlns:prefix="URI"

• URI used only as identifier

• does not need to point to anything

• applies to all nested elements and attributes

28

Sunday 30 May 2010

Page 36: Gegevensbanken laatste les: XML...

namespaces: example<widget xmlns="http://www.widget.org"

xmlns:xhtml="http://www.w3.org/TR/xhtml1"

type="gadget">

<head size="medium"/>

<big><subwidget ref="gizmo"/></big>

<info><xhtml:head><xhtml:title>Gadget

</xhtml:title></xhtml:head>

<xhtml:body><xhtml:h1>Gadget

</xhtml:h1>A gadget contains...

</xhtml:body></info>

</widget> 29

Sunday 30 May 2010

Page 37: Gegevensbanken laatste les: XML...

Another example

<Address><Street>Celestijnenlaan</Street>

<Nr>200A</Nr> <City>Heverlee-Leuven</City> <Country>Belgium</Country></Address>

<Server><Name>www</Name><Address> 134.58.43.1 </Address>

</Server>

?30

Sunday 30 May 2010

Page 38: Gegevensbanken laatste les: XML...

Another example (2)

<Address xmlns="www.all.edu/departments"> <Street>Celestijnenlaan</Street> <Nr>200A</Nr> <City>Heverlee-Leuven</City> <Country>Belgium</Country></Address>

<Server xmlns="www.dns.net/servers"> <Name>www</Name> <Address> 134.58.43.1 </Address></Server>

<Department xmlns:edu="www.all.edu/departments" xmlns:dns="www.dns.net/servers"> <edu:Address> <Street>Celestijnenlaan</Street> ... </edu:Address> <dns:Name>www</dns:Name> <dns:Address>134.58.43.1</dns:Address></Department>

31

Sunday 30 May 2010

Page 39: Gegevensbanken laatste les: XML...

Accessing XML documents

• Manual text file manipulation

• Cumbersome & Error-prone

• Parser

• Simplifies document manipulation

• Ensures proper grammar, well-formedness

• Abstracts content from grammar

• Accessed through standard API

• Document Object Model (DOM)

• Simple API for XML (SAX)32

Sunday 30 May 2010

Page 40: Gegevensbanken laatste les: XML...

• DOM parser

• create DOM object tree

• SAX parser

• generates events when elements encountered

• one-pass translation

• no need to keep whole document tree in memory

• Both can be validating or non-validating

• Many available(most freeware, open source)

• ibm xml4j, apache xerces, sun parser, microsoft, datachannel, oracle, ...

33

Sunday 30 May 2010

Page 41: Gegevensbanken laatste les: XML...

DOM approach

http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/overview/3_apis.html#JAXP34

Sunday 30 May 2010

Page 42: Gegevensbanken laatste les: XML...

DOM Node Tree

<?xml version="1.0"?><!-- An example XML document -->

<BankAccounts> <Account accountNr="123-456789-01“> <Owner ID="1258-a8d72-98"> John Smith </Owner> <Balance Currency="EUR"> 50000 </Balance> </Account> <Account ...> ...</BankAccounts>

Doc

Com An example XML document

El BankAccounts

El Account

Att accountNr = “123-456789-01”

El Owner = “John Smith”

Att ID = “1258-a8d72-98”

El Balance = “50000”

Att Currency = “Eur”

El Account...

35

Sunday 30 May 2010

Page 43: Gegevensbanken laatste les: XML...

parsing: DOM

public void print(Node node) {

...

NodeList nlist=node.getChildNodes();

if (nlist != null) {

int l = nlist.getLength();

for (int i=0; i<l; i++) {

print(nlist.item(i));

...

}...}...}36

Sunday 30 May 2010

Page 44: Gegevensbanken laatste les: XML...

Dom Benefits & Drawbacks

• Benefits

• W3C Recommendation

• Language- and platform-independent

• Random access

• Intuitive

• Drawback

• Entire object tree in memory37

Sunday 30 May 2010

Page 45: Gegevensbanken laatste les: XML...

Simple API for XML (SAX)

• Not an official standard

• Ad-hoc product by XML developers

• Primarily Java API

• Event-based mechanism

• Don’t call the parser, the parser calls you

• No object model in memory

• Programmer must keep state information

38

Sunday 30 May 2010

Page 46: Gegevensbanken laatste les: XML...

SAX approachht

tp://

java

.sun

.com

/xm

l/jax

p/di

st/1

.1/d

ocs/

tuto

rial

/ove

rvie

w/3

_api

s.ht

ml#

JAX

P

39

Sunday 30 May 2010

Page 47: Gegevensbanken laatste les: XML...

SAX parsing modelApplication

Parser

ContentHandlernew ContentHandler()

new Parser()

setContentHandler()

parse()

startDocument()

startElement()

characters()

endElement()

endDocument()

40

Sunday 30 May 2010

Page 48: Gegevensbanken laatste les: XML...

parsing: SAX$xml_parser = xml_parser_create();

xml_set_element_handler($xml_parser,

"startQuestion","endQuestion");

...

xml_parse($xml_parser,$data,feof($fp))

...

function startQuestion($parser,$name,$attrs) {

...if ($name == "QUESTION")

...new Question($attrs["QTEXT"]);

... 41

Sunday 30 May 2010

Page 49: Gegevensbanken laatste les: XML...

• Start and end of document

–startDocument()

–endDocument()

• Start and end of element

–startElement(namespace, name, qname, attlist)

–endElement(namespace, name, qname)

• Character data

–characters(char[] ch, int start, int length)

• Processing Instruction

–processingInstruction(target, data)

• No event for comments!42

Sunday 30 May 2010

Page 50: Gegevensbanken laatste les: XML...

Another SAX example<?xml version="1.0" standalone="no"?>

<!DOCTYPE BankAccounts ...>

<!-- This is an example XML document -->

<BankAccounts>

<Account accountNr="123-456789-01" use="personal">

<Owners>

<Person ID="1258-a8d72-98"><Name>John Smith</Name></Person>

<Person ID="5842-df5ef-e9"><Name>Claudia Scott</Name></Person>

</Owners>

<CreditCards><CreditCard number="12345"/></CreditCards>

<Balance Currency="EUR">50000</Balance>

</Account>

...

</BankAccounts> 43

Sunday 30 May 2010

Page 51: Gegevensbanken laatste les: XML...

public class AvgBalanceCalculator extends DefaultHandler {private double total = 0.0; private int count = 0; private boolean isBalance = false;

public void startElement(String uri, String name, String qname, Attributes atts) {if (name.equals(“Balance")) { isBalance = true; count++; }}

public void characters(char[] ch, int start, int len) throws SaxException {if (isBalance) { String help = new String(ch, start, len); double balance = (new double(help)).doubleValue(); total = total + balance; isBalance = false; }}

public void endDocument() {if (count != 0)

System.out.println(“Average balance is ”+(total/count)); }}

44

Sunday 30 May 2010

Page 52: Gegevensbanken laatste les: XML...

SAX Benefits & Drawbacks• Benefits

• Suitable when

• parsing large documents

• constructing proprietary object structures

• only small subset of information is needed

• Simple and fast

• Drawbacks

• Read-only

• No random access

• Complex searches messy to program45

Sunday 30 May 2010

Page 53: Gegevensbanken laatste les: XML...

46

beperkingen van DTDs

• geen typering van tekst elementen en attributen

• alle waarden zijn strings, geen integers, reals, enz.

• ongeordende verzameling van subelementen moeilijk te definiëren

• orde is meestal irrelevant in gegevensbanken

• IDs en IDREFs zijn niet getypeerd

• het DNO attribuut van een EMPLOYEE kan een referentie bevatten aan een andere EMPLOYEE, wat zinloos isvb. <EMPLOYEE SSN="_888665555 " SEX="M" DNO="_888665555 ">

• het DNO attribuut zou als beperking moeten hebben dat het slechts aan een DEPARTMENT kan refereren

Sunday 30 May 2010

Page 54: Gegevensbanken laatste les: XML...

47

XML Schema

• typering van waarden

• vb. integer, string, enz.

• ook beperkingen op min/max waarden

• types door gebruiker gedefinieerd

• is gespecificeerd in XML syntax,

• meer gestandaardiseerde voorstelling

• is geïntegreerd met namespaces

• en nog andere mogelijkheden

• lijst types, uniciteitsbeperking op sleutels, verwijssleutelbeperkingen, overerving,…

Sunday 30 May 2010

Page 55: Gegevensbanken laatste les: XML...

48

XSDL

• XML Schema Definition Language

• documenten met suffix .xsd

Sunday 30 May 2010

Page 56: Gegevensbanken laatste les: XML...

49

XML Schema: voorbeeldXML schema

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">....<xsd:element name="PWORKER" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="HOURS" type="xsd:float"/> </xsd:sequence> <xsd:attribute name="SSN" type="xsd:IDREF" use="required"/> </xsd:complexType></xsd:element>....</xsd:schema>

XML instantie

<PWORKER SSN="_123456789"> <HOURS>7.5</HOURS> </PWORKER>

Sunday 30 May 2010

Page 57: Gegevensbanken laatste les: XML...

50

XML: eenvoudige types– ingebouwde eenvoudige types

• string, integer, decimal, float, boolean, date, time,…

• <xsd:element name=“gebdat” type=“xsd:date” />

– door gebruiker gedefinieerde eenvoudige types

• gedefinieerd met simpleType element

• restriction element geeft het basistype waarop gesteund is

• <xsd:simpleType name=“salaryRange”> <xsd:restriction base=“xsd:integer”> <xsd:minInclusive value=“25000” /> <xsd:maxInclusive value=“100000” /> </xsd:restriction></xsd:simpleType>

Sunday 30 May 2010

Page 58: Gegevensbanken laatste les: XML...

XML: eenvoudige types<xsd:simpleType name=“studentClassificatie”> <xsd:restriction base=“xsd:string”> <xsd:enumeration value=“bachelorstudent” /> <xsd:enumeration value=“masterstudent” /> <xsd:enumeration value=“doctorstudent” /> </xsd:restriction></xsd:simpleType>

<xsd:simpleType name=“deptType”> <xsd:restriction base=“xsd:string”> <xsd:length value=“3” /> </xsd:restriction></xsd:simpleType>

51

Sunday 30 May 2010

Page 59: Gegevensbanken laatste les: XML...

52

Sunday 30 May 2010

Page 60: Gegevensbanken laatste les: XML...

53

Sunday 30 May 2010

Page 61: Gegevensbanken laatste les: XML...

54

Sunday 30 May 2010

Page 62: Gegevensbanken laatste les: XML...

55

Sunday 30 May 2010

Page 63: Gegevensbanken laatste les: XML...

56

XPath (example)/COMPANY/EMPLOYEE

ROOT

COMPANY

EMPLOYEE

SSN

_333445555

EMPLOYEE

SSN

_123456789

EMPLOYEE

SSN

_999887777

Sunday 30 May 2010

Page 64: Gegevensbanken laatste les: XML...

57

/COMPANY/EMPLOYEE

ROOT

COMPANY

EMPLOYEE

SSN

_333445555

EMPLOYEE

SSN

_123456789

EMPLOYEE

SSN

_999887777

Sunday 30 May 2010

Page 65: Gegevensbanken laatste les: XML...

58

/COMPANY/EMPLOYEE

ROOT

COMPANY

EMPLOYEE

SSN

_333445555

EMPLOYEE

SSN

_123456789

EMPLOYEE

SSN

_999887777

Sunday 30 May 2010

Page 66: Gegevensbanken laatste les: XML...

59

/COMPANY/EMPLOYEE

ROOT

COMPANY

EMPLOYEE

SSN

_333445555

EMPLOYEE

SSN

_123456789

EMPLOYEE

SSN

_999887777

Sunday 30 May 2010

Page 67: Gegevensbanken laatste les: XML...

60

/COMPANY/EMPLOYEE

ROOT

COMPANY

EMPLOYEE

SSN

_333445555

EMPLOYEE

SSN

_123456789

EMPLOYEE

SSN

_999887777

Sunday 30 May 2010

Page 68: Gegevensbanken laatste les: XML...

61

XPath/COMPANY/EMPLOYEE

ROOT

COMPANY

EMPLOYEE

SSN

_333445555

EMPLOYEE

SSN

_123456789

EMPLOYEE

SSN

_999887777

<EMPLOYEE SSN="_123456789" SEX="M“ SUPERSSN="_333445555" DNO="_5">

<FNAME>John</FNAME> <MINIT>B</MINIT>

....</EMPLOYEE>

<EMPLOYEE SSN="_333445555" SEX="M“ SUPERSSN="_888665555" DNO="_5">

<FNAME>Franklin</FNAME> <MINIT>T</MINIT>

<LNAME>Wong</LNAME> <BDATE>08-DEC-45</BDATE>

</EMPLOYEE><EMPLOYEE SSN="_999887777" SEX="F“ SUPERSSN="_987654321" DNO="_4">

<FNAME>Alicia</FNAME>

.....

Sunday 30 May 2010

Page 69: Gegevensbanken laatste les: XML...

XML family of technologies

• Xlink: hypertext

• XSL: Extensible Style Sheet Language

• XSL-T Transformation

• Formatting Objects

• Xschema: additional constraints on attribute types

• and more...

62

Sunday 30 May 2010

Page 70: Gegevensbanken laatste les: XML...

XML applications

• RDF: Resource Description Framework

• infra

• XHTML: eXtensible HTML en HTML5

• XML compliant HTML

• MathML

• SMILE: synchronized multimedia presentation

• Many others

• Chemical Markup Language, Vector Graphics Markup Language, Open Software Description Format, Weather observation, astronomical data, financial data, electronic components, workflow, business cards, real estate, newspaper, classifieds, javadoc, human resource, advertising, architecture ….

63

Sunday 30 May 2010

Page 71: Gegevensbanken laatste les: XML...

XML Working Groups• XML Coordination

• XML Core

• XSL (XSLT, XSL/FO) -> W3C architecture

• Efficient XML Interchange

• XML Processing Model

• XML Query (XQuery, XPath)

• XML Schema

• Service Modeling Language (SML)64

Sunday 30 May 2010

Page 72: Gegevensbanken laatste les: XML...

65

More XPath Features• Operator “|” used to implement union

• E.g. //EMPLOYEE[count(DEPENDENT) = 1] | //EMPLOYEE[not(DEPENDENT)]

• gives employees with either 0 or 1 dependents

• “//” can be used to skip multiple levels of nodes

• E.g. /COMPANY//FNAME

• finds any FNAME element anywhere under the /COMPANY element, regardless of the element in which it is contained.

• A step in the path can go to:

parents, siblings, ancestors and descendants

of the nodes generated by the previous step, not just to the children

• “//”, described above, is a short from for specifying “all descendants”

• “..” specifies the parent.

• e.g. : /COMPANY//FNAME/../BDATESunday 30 May 2010

Page 73: Gegevensbanken laatste les: XML...

• laat toe om meer algemene queries te formuleren dan XPath

• algemene vorm: FLWOR uitdrukking

FOR < for-variabele > IN < in-uitdrukking >

LET < let-variabele > := < let-uitdrukking >

[ WHERE < filter-uitdrukking > ]

[ ORDER BY < orde-specificatie > ]

RETURN < uitdrukking >

• opm: FOR en LET kunnen alleen of samen voorkomen66

XQuery

Sunday 30 May 2010

Page 74: Gegevensbanken laatste les: XML...

67

• Q1: voornaam en familienaam van alle werknemers die meer dan 70000 verdienen

• FOR $x IN doc(www.company.com/info.xml)// employee [employeeSalary > 70000] / employeeNameRETURN < res > $x / firstName, $x / lastName </ res >

• alternatief:FOR $x IN doc(www.company.com/info.xml)company / employeeWHERE $x / employeeSalary > 70000RETURN < res > $x / employeeName / firstName, $x / employeeName / lastName </ res >

Sunday 30 May 2010

Page 75: Gegevensbanken laatste les: XML...

68

• Q3: voornaam en familienaam van alle werknemers die meer dan 20 uur op project nummer 5 werken, met dat aantal uren

• FOR $x IN doc(www.company.com/info.xml)/ company / project [projectNumber = 5] / projectWorker ,$y IN doc(www.company.com/info.xml) / company / employeeWHERE $x/hours > 20.0 AND $y.ssn = $x.ssnRETURN < res > $y / employeeName / firstName, $y / employeeName / lastName,$x / hours </ res >

Sunday 30 May 2010

Page 76: Gegevensbanken laatste les: XML...

The End...

Bedankt!Vragen...?

69

Sunday 30 May 2010

Page 77: Gegevensbanken laatste les: XML...

NoSQL

• non-relational

• distributed

• open source

• horizontally scalable

• “web scale”

70

Sunday 30 May 2010

Page 78: Gegevensbanken laatste les: XML...

NoSQL

• non-relational

• distributed

• open source

• horizontally scalable

• “web scale”

70

• schema free

• easy replication

• simple API

• BASE (not ACID)

Sunday 30 May 2010

Page 79: Gegevensbanken laatste les: XML...

Systems

• Core: Hadoop, HBase, Cassandra, Hypertable, ...

• Docs: CouchDB, MongoDB, Riak, Terrastore, ...

• Key-Value, tuple: Amazon SimpleDB, Azure, ...

• Graph: Neo4J, Bigdata, InfoGrid, HyperGraph, ...

• Object: Versant, Perst, ZODB, ...

• Grid: GigaSpaces, Hazelcast, ...

• XML: Tamino, eXist, Mark Logic, Xindice, ...

• ...71 http://nosql-databases.org/

Sunday 30 May 2010

Page 80: Gegevensbanken laatste les: XML...

nosql

• Google BigTable

• Amazon Dynamo

• Open source: HBase

• Cassandra: last.fm, FaceBook

72

Sunday 30 May 2010

Page 81: Gegevensbanken laatste les: XML...

nosql: why

• big data sets:

• Digg green badge: 3 TB

• Facebook inbox: 50 TB

• eBay overall data: 2 PB

73

Sunday 30 May 2010

Page 82: Gegevensbanken laatste les: XML...

74

http://about.digg.com/blog/looking-future-cassandra

Sunday 30 May 2010

Page 83: Gegevensbanken laatste les: XML...

74

http://about.digg.com/blog/looking-future-cassandra

Sunday 30 May 2010

Page 84: Gegevensbanken laatste les: XML...

74

http://about.digg.com/blog/looking-future-cassandra

14 seconds

Sunday 30 May 2010

Page 85: Gegevensbanken laatste les: XML...

75http://about.digg.com/blog/looking-future-cassandra

Sunday 30 May 2010

Page 86: Gegevensbanken laatste les: XML...

76

Text

http://www.slideshare.net/oemebamo/database-sharding-at-netlog-presentation

Sunday 30 May 2010

Page 87: Gegevensbanken laatste les: XML...

no attempt to ACID

• Atomicity

• Consistency

• Isolation

• Durability

• trade ACID off in favor of high availability

77

Sunday 30 May 2010

Page 88: Gegevensbanken laatste les: XML...

query

• associative array, key-value pair

• XQuery

• SPARQL

78

Sunday 30 May 2010

Page 89: Gegevensbanken laatste les: XML...

Vragen...?

79

Sunday 30 May 2010