s3 XML-XPath-Xquery-Intro-eng.ppt [Modo de compatibilidad]vjsosa/clases/tssd/s3_XML-XPat… · Dr....

35
31/05/2016 1 Dr. Víctor J. Sosa Sosa [email protected] eXtensible Markup Language (XML) Xpath & Xquery Introduction XML: Definition Simplified subset of SGML (Standard Generalized Markup Language). Metalanguage… A language created for coding other languages that usually define the structure and content of documents. Simplified subset of SGML (Standard Generalized Markup Language). Metalanguage… A language created for coding other languages that usually define the structure and content of documents. eXtensible Markup Language (XML)

Transcript of s3 XML-XPath-Xquery-Intro-eng.ppt [Modo de compatibilidad]vjsosa/clases/tssd/s3_XML-XPat… · Dr....

  • 31/05/2016

    1

    Dr. Víctor J. Sosa [email protected]

    eXtensible Markup Language (XML)

    Xpath & Xquery

    Introduction

    XML: Definition

    Simplified subset of SGML (Standard Generalized

    Markup Language).

    Metalanguage… A language created for coding

    other languages that usually define the structure

    and content of documents.

    Simplified subset of SGML (Standard Generalized

    Markup Language).

    Metalanguage… A language created for coding

    other languages that usually define the structure

    and content of documents.

    eXtensible Markup Language (XML)

  • 31/05/2016

    2

    W3C Objectives:

    � XML should be directly usable on internet.

    � XML should support a wide variety of applications.

    � XML should be compatible with SGML.

    � It should be easy writing programs that process XML documents.

    � The number of optional characteristics in XML should be minimun,

    ideally zero.

    � The XML documents should be readable for human and reasonably

    clear.

    �The design of XML should be prepared quickly, formal and concise.

    �The documents XML should be created easily.

    Open Standards

    �XML. Extensible Markup Language

    � DOM. Document Object Model

    �XSL. Extensible Stylesheet Language

    � XLL. Extensible Linking Language

  • 31/05/2016

    3

    Well-formed and Valid XML Documents

    � In a well-formed XML document, you can invent your

    own tags, but it is necessary to verify that:

    � All tags are properly nested.

    � A valid XML document is a well-formed XML

    document, which also conforms to the rules of a DTD

    (Document Type Definition: defines the structure with a list

    of legal elements of an XML document, correct tags and

    “grammar”).

    Well-formed XML Document

    � Begins with a declaration delimited by

    � STANDALONE = “DTD is not provided”

    � The content of the document begins with a root tag

    that encompasses a set of nested tags.

  • 31/05/2016

    4

    Tags

    � Tags, as in HTML, are normally associated

    in pairs as: ......

    � Tags can be arbitrary nested.

    � Some tags could not require the ending tag

    as in HTML:

    .

    Example: Well-formed XML document

    eclipse

    < NAME > Sol

    20

    < BEER >

    < NAME >XX

    25

  • 31/05/2016

    5

    Semi-structured Data with XML

    � A well-formed XML document with its nested

    tags provides the same idea of a semi-structured

    data tree.

    � Notice that XML also allows non-tree type

    structures, similar to the semi-structured data

    model.

    Example

    bars

    bar bar

    name

    callejon

    bar

    beer

    name

    Indio 30

    price

    .. .

    name

    Sol 25

    price

    beer

    XML Document for BARS

  • 31/05/2016

    6

    Document Type Definition (DTD)

    � DTD defines the structure and the legal

    elements and attributes of an XML document.

    � It is essentially a context-free grammar to

    describe the XML tags and its nesting

    structure.

    � For every domain of interest (e.g., electronic

    components, bar-beers-drinkers, etc.,), a DTD

    describes all the documents that the group will

    share.

    DTD Structure

    [

    ( )

    ]>

  • 31/05/2016

    7

    DTD Elements

    � An XML element is everything from the

    element's start tag to the element's end tag.

    Sol

    � An element can contain: text, attributes, other elements or a mix of these.

    � An Element can be empty: or

    � Text elements or leaves contain #PCDATA

    instead of nested tags.

    DTD Example

    ]>Name and

    price are texts

    A beer has one

    name and one

    price

    A bar has a name

    and one or more

    beer elements

    A bars element

    has zero or

    more nested

    bar elements

  • 31/05/2016

    8

    Element Description

    � Tags should appear in a specific order

    � A tag could come with a multiplicity symbol:

    * = zero or more

    + = one or more

    ? = zero or one

    � The | symbol can connect a sequence of

    optional tags.

    Element Description

    A name consists of: an optional title (e.g.

    “Prof.”, “Dr.”), the first name and the last

    name, in this order, or it could be just an IP

    address:

    (title?, first_name, last_name) | IP_addr

    ) >

    Example:

  • 31/05/2016

    9

    Application of DTD

    1.- Setting up STANDALONE = “no”

    2.- Means:

    � To include the DTD as a preamble of the

    XML document, or

    � To include DOCTYPE and the

    by SYSTEM and a path to the file where the

    DTD can be found.

    Example

    < name > eclipse

    < name >Sol< beer > 20

    < beer >< name >XX< beer > 30

    . . . . .. .

    DTD

    XML Document

    ]>

  • 31/05/2016

    10

    Example

    Assume the DTD is in the file bar.dtd

    < name > eclipse

    < name >Sol

    < price> 20

    < name >XX

    < price> 30

    . . . . .. .

    Get the DTD from the

    file bar.dtd

    Attributes

    � Opening tags in XML can have attributes, as in HTML:

    � [type] >

    provides a list of attributes with their data type for a specific element.

    Some data types for attributes are: DATA, ID, IDREF, IDREFS.

    � An Attribute should have a specific declaration. It can consist of a

    specific value or #REQUIRED (a value is required for this attribute) or

    #IMPLIED (non predetermined value has been provided and the attribute can

    be omitted). If an attribute has a predetermined value, documents that

    have no value in that attribute are filled up with the predetermined

    value.

  • 31/05/2016

    11

    Attributes

    Example: Bars can have an attribute type, acharacter string describing the bar.

    “nightclub” >

    Attributes

    El rincón de los milagros

    Corona

    25

    . . . .

    Example:

  • 31/05/2016

    12

    IDs & IDREFs

    Attributes can be pointers from one object to another

    as in HTML: NAME = “something” HREF “#something”

    Allows the structure of an XML document to be a general

    graph, raher than just a tree.

    An attribute of type ID can be used to give an object a unique

    value. No other object in the document must have the same ID.

    An attribute of type IDREF refers to an object through its

    ID.� The attribute IDREF must have a value that appears in the attribute ID of

    some element in the document.

    � IDREFS allows refering to multiple elements

    IDs & IDREFs

    � Let´s redesign our DTD for Bars. In the new DTD, Bar and

    Beer contain attributes of type ID (name).

    � The objects Bar contain subobjects Price , which is a

    number (price of the beer) and a IDREF (TheBeer), leading

    to a Beer. The objects Beer will have an attribute called

    SoldBy, which is an IDREFS, leading to all the bars that sell

    it.

    Example:

  • 31/05/2016

    13

    IDs & IDREFs

    �Example : DTD.

    ]>The objects beer have an attribute of type ID called name, and an

    attribute SoldBy that is a set of names of bar

    The objects bar have an attribute

    ID called name and contains one

    or more subobjects price

    The objects price have a

    number (price) and a

    reference to a beer.

    XML Document

    < price TheBeer = “Sol”>20

    < price TheBeer = “Indio”>30

    . . . .

    . . . .

  • 31/05/2016

    14

    XML Schema

    � XML Schema arises as a consequence of the DTD

    limitations:

    Written in a different syntax than XML

    Without namespace support

    Limited data types

    Difficulty to define sets of unordered elements.

    XML Schema: Document Validation

    � The structure of documents are described in termsof restrictions

    � Two types of restrictions:

    Content:

    Order and sequence of elements (a beer has a name, price, etc)

    Data types:Valid data units (The price of a beer has the format XXXX.XX).

  • 31/05/2016

    15

    � XML scheme solves DTD limitations.

    Better way to

    express

    restrictions

    Less validation

    work

    Robust

    exchange

    of data

    XML Schema: Document Validation

    XML Schemas: Advantages

    � Richer datatypes (double, date, etc).

    � User defined data types (archetypes)

    � Specialized restriction on data types, e.g. max andmin values.

    � Attribute grouping (attributeGroup element)

    � Refinable archetypes or “inheritance".

    � Namespace support.

  • 31/05/2016

    16

    Complex Types

    Complex types

    Bud

    2.50

    Here the name

    of the element

    is unknown

    An instance of beerType:

  • 31/05/2016

    17

    Complex Types

    xs:attribute

    � xs:attribute elements can be used within a

    complex type to indicate attributes of

    elements of that type.

    � attributes of xs:attribute:

    name and type as for xs.element.

    use = ”required” or ”optional”.

  • 31/05/2016

    18

    Example: xs:attribute

    The element is empty, since there are no declared

    subelements.

    Instance of beerType:

    Restricted Simple Types

    � xs:simpleType can describe enumerations

    and range-restricted base types.

    � name is an attribute

    � xs:restriction is a subelement.

  • 31/05/2016

    19

    Restrictions

    � Attribute base gives the simple type to be restricted,

    e.g., xs:integer.

    � xs:{min, max}{Inclusive, Exclusive} are four

    attributes that can give a lower or upper bound on a

    numerical range.

    � xs:enumeration is a subelement with attribute

    value that allows enumerated types

    Example: license Attribute for BAR

  • 31/05/2016

    20

    Example: Prices in Range [1,5)

    Examples of types in XML Schema

    � String

    � Integer

    � Decimal

    � Boolean

    � User-defined types (from simples or complex)

    � ComplexType

    � Sequence

  • 31/05/2016

    21

    More Examples…[Silberschatz et al.]

    ]>

    (DTD => XML Schema)

    Versión XML Schema del

    DTD anterior.

  • 31/05/2016

    22

    XML Schema: Keys and Foreign Keys

    � Attributes:� Attributes can be added to elements. For example, we can add the numero_cuenta

    attribute to the cuenta element:

    � Keys:

    It is posible to define a key called numero_cuenta in the cuenta element. This can be done

    in the root element banco:

    � scope of the restriction

    � attribute that forms the key

    � Foreign keys:

    A foreign key can de defined as a restriction from impositor to cuenta:

    XML Schema: Query & Transformation

    � XPath: Query language based on a tree representation of the XMLdocument, which provides the ability to navigate around the tree,selecting nodes by a variety of criteria. The fundament of the XML querylanguages.

    � XQuery: Query and functional programming language that queries andtransforms collections of structured and unstructured data, usually in theform of XML, text and with vendor-specific extensions for other dataformats (JSON, binary, etc.). It was modeled from SQL, but oriented tothe XML structure.

    � XSLT: Language for transforming XML documents into other XMLdocuments or other formats such as HTML for web pages, plain text orinto XSL Formatting Objects. It also can express queries.

  • 31/05/2016

    23

    XPath

    � XPath is used to navigate through elements and attributes in an XML

    document.

    � Example: /banco/cliente/nombre_cliente

    The expression selects all nombre_cliente from every cliente:

    Pedro

    Juanito

    María

    If the tags are not needed:

    /banco/cliente/nombre_cliente/text()

    � Accessing attributes: /banco/cuenta/@numero_cuenta

    Selects values of all numero_cuenta attributes from the cuenta elements.

    XPath: Predicates

    � Predicates in Xpath are possible, they are included in square

    brackets.

    � Select cuenta elements with saldo > 400:

    /banco/cuenta[saldo>400]

    � Select only numero_cuenta from cuenta elements with saldo

    > 400: /banco/cuenta[saldo>400]/@numero_cuenta

    � Select all numero_cuenta attributes from cuenta elements

    regardless of the value of saldo :

    /banco/cuenta[saldo]/@numero_cuenta

  • 31/05/2016

    24

    XPath: Predicates

    � XPath provides several functions that can be used as part of a predicate:

    /banco/cuenta/[count(./cliente) > 2]

    � Select all clientes referenced (use of IDREFS) by the titular attribute

    from cuenta elements: /banco/cuenta/(@titular)

    � To obtain the root of a specific xml document: doc(“banco.xml”)

    This function can be part of a operation:

    doc(“banco.xml”)/banco/cuenta

    XQUERY

    � Language for querying XML data

    � XQuery for XML is like SQL for databases

    � XQuery is built on XPath expressions

    � XQuery is supported by all major databases

    � XQuery is a W3C Recommendation

    � XQuery 1.0 and XPath 2.0 share the same data

    model and support the same functions and operators

    � It comes from a query language called Quilt

  • 31/05/2016

    25

    XQUERY: FLWOR Expressions

    Xquery has 5 sections: For, Let, Where,

    Order by, Return (FLWOR)

    � For - selects a sequence of nodes

    � Let - binds a sequence to a variable

    � Where - filters the nodes

    � Order by - sorts the nodes

    � Return - what to return (gets evaluated once for

    every node)

    XQUERY: Example

    � To obtain the account numbers from the checking acounts (using ID and IDREFS):

    for $x in /banco/cuenta

    let $numcuenta := $x/@numero_cuenta

    where $x/saldo > 400

    return {$numcuenta}

    � The for clause works similar to the from clause in SQL

    � Variables obtained in for contain resulting values from Xpath expressions.

    � If more than one variable is in the for clause then a cartesian product is executed.

    � The let clause binds a sequence of values (resulting from a Xpath expression) to a variable to simplify the statement.

    � The where clause expresses predicates similar to those in SQL.

    � The order by clause allows to order the output.

    � The return clause builds the resulting XML document.

  • 31/05/2016

    26

    XQUERY: FLWOR Expressions

    � Some clauses are not necessary:for $x in /banco/cuenta[saldo > 400]

    return {$x/@numero_cuenta}

    � { } describe expressions to evaluate which output will be included in the XML text. This is also applicable with quoted braces :

    return

    XQUERY: FLWOR Expressions

    � XQuery provides another form for creating elements usingthe element and attribute constructors.

    � Example: To generate cuenta elements with the followingsub-objects: numero_cuenta, nombre_sucursal and saldo:

    return element cuenta {

    atribute numero_cuenta {$x/@numero_cuenta},

    atribute nombre_sucursal {$x/@nombre_sucursal},

    element saldo {$x/saldo}

    }

  • 31/05/2016

    27

    Example:

    < nombre > eclipse

    Sol

    < precio> 20

    Indio

    < precio> 30

    < nombre > El rincón de los milagros

    Victoria

    < precio> 20

    . . . .

    Consulta en XQUERY

    FOR $ba IN document(“http://cinvestav.mx/bares.xml”)

    //BAR[@tipo = “deportes”],

    $be IN $ba/CERVEZA[NOMBRE = “Sol”]

    WHERE $ba/CERVEZA/[NOMBRE = “Indio”]

    RETURN $be/PRICE;

    � Find the price of the “Sol” beer in bars of type

    “deportes” that also serves the “Indio” beer

  • 31/05/2016

    28

    XQUERY: Joins

    � Joins are defined similar to those in SQL. For example, join of the elementsimpositor, cuenta and cliente:

    for $a in /banco/cuenta, $c in /banco/cliente, $i in /banco/impositor

    where $a/numero_cuenta = $i/numero_cuenta

    and $c/nombre_cliente = $i/nombre_cliente

    return {$c $a}

    � The same query using XPath:for $a in /banco/cuenta, $c in /banco/cliente,

    $i in /banco/impositor [numero_cuenta=$a/numero_cuenta and

    nombre_cliente = $c/nombre_cliente]

    return {$c $a}

    � Notes:

    Examples of some operators: eq, ne, lt, gt, le, ge

    Take into account that when sequences are evaluated such as $x/saldo = $y/saldo, the predicatewill be true if any of the returned value in the first expression is equal to any other value in thesecond expression.

    XQUERY: Nested queries

    � The XQuery FLWOR expressions can be nested in the return clause,generating nesting strcutures that do not appear in the source document.

    � Example:

    {

    for $c in /banco/cliente

    return

    {$c/*}

    {for $i in /banco/impositor[nombre_cliente = $c/nombre_cliente],

    $a in /banco/cuenta[numero_cuenta = $i/numero_cuenta]

    return $a}

    }

    � NOTE: query used to generate the document shown in [fig 10.4 Silberschatz] from the document in [fig 10.1 Silberzschatz].

  • 31/05/2016

    29

    Nesting representation

    Gonzalez

    Arenal

    La Granja

    C-101

    Centro

    500

    C-201

    Galagapar

    900

    Lopez

    Mayor

    Peguerinos

    C-102

    Navacerrada

    400

    Document from [fig 10.4 Silberzschatz].

    Document from [fig 10.1 Silberzschatz].

    C-101

    Centro

    500

    C-102

    Navacerrada

    400

    C-201

    Galapagar

    900

    Gonzalez

    Arenal

    La Granja

    Lopez

    Mayor

    Peguerinos

    C-101

    Gonzalez

    C-201

    Gonzalez

    C-102

    Lopez

  • 31/05/2016

    30

    Functions: SUM , COUNT

    � Xquery supports several functions that are common to XPath2.0 and can be used in any Xpath expression.

    � To avoid conflicts, the functions are associated to a namespace: http://www.w3.org/2004/10/xpath-functions

    � They have a predetermined prefix fn that refers to the namespace. In this way, ambiguity is avoided: fn:sum fn:count

    for $c in /banco/cliente

    return

    {$c/nombre_cliente}

    {fn:sum(for $i in /banco/impositor[nombre_cliente = $c/nombre_cliente],

    $a in /banco/cuenta/[numero_cuenta = $i/numero_cuenta]

    return $a/saldo

    ) }

    Sorting results

    � In Xquery, results can be ordered if an order by clause is included. Example:

    for $c in /banco/cliente,

    order by $c/nombre_cliente

    return {$c/* }

    In a descending way:

    order by $c/nombre_cliente descending

    � The ordering can be achieved in several nesting levels. Example:{

    for $c in /banco/cliente,order by $c/nombre_clientereturn

    {$c/* }

    { for $i in /banco/impositor/[nombre_cliente = $c/nombre_cliente],$a in /banco/cuenta/[numero_cuenta = $i/numero_cuenta]

    order by $a/numero_cuentareturn {$a/* } }

    }

  • 31/05/2016

    31

    User-Defined Functions

    � Even thoug XQuery has different predefined functions, such as numeric, comparison and manipulation functions, XQuery can support user defined functions.

    define function saldos(xs:string $c) as xs:decimal* {for $i in /banco/impositor/[nombre_cliente = $c],

    $a in /banco/cuenta/[numero_cuenta = $i/numero_cuenta]

    return $a/saldo}

    Sequence of values

    � Types can be partially defined; for instance, the element type allows elements with any tag, whereas element(cuenta) allows only elements with the cuenta tag.

    � XQuery carries out type conversion automatically. However, it also provides functions to convert, example: number(x)

    � When an element is passed into a function that waits for a string, the conversion to stringis made by concatenating all the text values contained in the element (nested values). Example of function to manipulate strings: contains(a,b)

    More features

    � XQuery provides additional features such as: if-then-else expressionsthat can be used in return clauses.

    � A predicate (in where clause) can include universal and existentialquantifiers:

    some $e in path satisfies P

    � path: path expression, P: predicate $e, quantifiers: some or every.

    � The norm XQJ provides an API to execute Xquery queries on an XMLdatabase system and obtain XML results. Its functionality is similar tothe JDBC API.

  • 31/05/2016

    32

    Examples with: BDB XML

    � Execute dbxml

    � Create a Container:

    createContainer Bancos

    � Add content:

    putDocument banco1 ‘ ‘ s

    � Query:

    query ‘ collection(“Bancos")/banco/cliente‘

    � Print results:

    print

    � Add many records:dbxml>createContainer parts.dbxml

    dbxml> putDocument "" '

    for $i in (0 to 99)

    return

    Description of {$i}

    {$i mod 10}

    {

    if (($i mod 10) = 0)

    then {$i mod 3}

    else ""

    }

    ' q

  • 31/05/2016

    33

    Verify response time

    time query '

    collection("parts.dbxml")/part[@number > 100

    and @number < 105]'

    Give HTML format to theoutput

    dbxml> query '

    {

    for $part in

    (collection("parts.dbxml")/part[@number > 100 and

    @number < 105])

    return

    {$part/description/string()}

    }

    '

  • 31/05/2016

    34

    Sorting

    dbxml> query '

    {

    for $part in

    (collection("parts.dbxml")/part[@number > 100 and @number < 105])

    order by xs:decimal($part/@number) descending

    return

    {$part/description/string()}

    }

    '

    |

    query 'for $x in (collection("banco.dbxml")/banco/cuenta[saldo > 400]) return $x'

  • 31/05/2016

    35

    Examples of XML engines

    � Oracle Berkeley DB XML (DB XML).

    http://www.oracle.com/technetwork/database/berkeleydb/overview/index.html

    � eXist-db Project

    http://exist-db.org/

    http://exist.sourceforge.net

    � Xbird

    http://code.google.com/p/xbird

    � Qizx

    http://www.xmlmind.com/qizx/

    � BaseX

    More information…

    � XML/SQL:

    http://www.stylusstudio.com/sqlxml_tutorial.html

    � XML with MySQL:

    http://dev.mysql.com/tech-resources/articles/mysql-5.1-xml.html

    � XPath:

    http://www.w3.org/TR/xpath

    � XQuery:

    http://www.w3.org/TR/xquery/

    � Xquery implementations:

    http://www.w3.org/XML/Query/#implementations

    � XQJ Xquery API for Java tutorial: http://www.xquery.com/tutorials/xqj_tutorial/

    � Complementary readings:

    Chapter 11 and 12 book: Database Systems, The Complete Book, Hector García-Molina et

    al. 2nd. Edition. 2009.