31/05/2016
1
Dr. Víctor J. Sosa [email protected]
eXtensible Markup Language (XML)
Xpath & Xquery
Introduction
XML: Definition
Simplified subset of SGML (Standard Generalized
Markup Language).
Metalanguage… A language created for coding
other languages that usually define the structure
and content of documents.
Simplified subset of SGML (Standard Generalized
Markup Language).
Metalanguage… A language created for coding
other languages that usually define the structure
and content of documents.
eXtensible Markup Language (XML)
31/05/2016
2
W3C Objectives:
� XML should be directly usable on internet.
� XML should support a wide variety of applications.
� XML should be compatible with SGML.
� It should be easy writing programs that process XML documents.
� The number of optional characteristics in XML should be minimun,
ideally zero.
� The XML documents should be readable for human and reasonably
clear.
�The design of XML should be prepared quickly, formal and concise.
�The documents XML should be created easily.
Open Standards
�XML. Extensible Markup Language
� DOM. Document Object Model
�XSL. Extensible Stylesheet Language
� XLL. Extensible Linking Language
31/05/2016
3
Well-formed and Valid XML Documents
� In a well-formed XML document, you can invent your
own tags, but it is necessary to verify that:
� All tags are properly nested.
� A valid XML document is a well-formed XML
document, which also conforms to the rules of a DTD
(Document Type Definition: defines the structure with a list
of legal elements of an XML document, correct tags and
“grammar”).
Well-formed XML Document
� Begins with a declaration delimited by
� STANDALONE = “DTD is not provided”
� The content of the document begins with a root tag
that encompasses a set of nested tags.
31/05/2016
4
Tags
� Tags, as in HTML, are normally associated
in pairs as: ......
� Tags can be arbitrary nested.
� Some tags could not require the ending tag
as in HTML:
.
Example: Well-formed XML document
eclipse
< NAME > Sol
20
< BEER >
< NAME >XX
25
31/05/2016
5
Semi-structured Data with XML
� A well-formed XML document with its nested
tags provides the same idea of a semi-structured
data tree.
� Notice that XML also allows non-tree type
structures, similar to the semi-structured data
model.
Example
bars
bar bar
name
callejon
bar
beer
name
Indio 30
price
.. .
name
Sol 25
price
beer
XML Document for BARS
31/05/2016
6
Document Type Definition (DTD)
� DTD defines the structure and the legal
elements and attributes of an XML document.
� It is essentially a context-free grammar to
describe the XML tags and its nesting
structure.
� For every domain of interest (e.g., electronic
components, bar-beers-drinkers, etc.,), a DTD
describes all the documents that the group will
share.
DTD Structure
[
( )
]>
31/05/2016
7
DTD Elements
� An XML element is everything from the
element's start tag to the element's end tag.
Sol
� An element can contain: text, attributes, other elements or a mix of these.
� An Element can be empty: or
� Text elements or leaves contain #PCDATA
instead of nested tags.
DTD Example
]>Name and
price are texts
A beer has one
name and one
price
A bar has a name
and one or more
beer elements
A bars element
has zero or
more nested
bar elements
31/05/2016
8
Element Description
� Tags should appear in a specific order
� A tag could come with a multiplicity symbol:
* = zero or more
+ = one or more
? = zero or one
� The | symbol can connect a sequence of
optional tags.
Element Description
A name consists of: an optional title (e.g.
“Prof.”, “Dr.”), the first name and the last
name, in this order, or it could be just an IP
address:
(title?, first_name, last_name) | IP_addr
) >
Example:
31/05/2016
9
Application of DTD
1.- Setting up STANDALONE = “no”
2.- Means:
� To include the DTD as a preamble of the
XML document, or
� To include DOCTYPE and the
by SYSTEM and a path to the file where the
DTD can be found.
Example
< name > eclipse
< name >Sol< beer > 20
< beer >< name >XX< beer > 30
. . . . .. .
DTD
XML Document
]>
31/05/2016
10
Example
Assume the DTD is in the file bar.dtd
< name > eclipse
< name >Sol
< price> 20
< name >XX
< price> 30
. . . . .. .
Get the DTD from the
file bar.dtd
Attributes
� Opening tags in XML can have attributes, as in HTML:
� [type] >
provides a list of attributes with their data type for a specific element.
Some data types for attributes are: DATA, ID, IDREF, IDREFS.
� An Attribute should have a specific declaration. It can consist of a
specific value or #REQUIRED (a value is required for this attribute) or
#IMPLIED (non predetermined value has been provided and the attribute can
be omitted). If an attribute has a predetermined value, documents that
have no value in that attribute are filled up with the predetermined
value.
31/05/2016
11
Attributes
Example: Bars can have an attribute type, acharacter string describing the bar.
“nightclub” >
Attributes
El rincón de los milagros
Corona
25
. . . .
Example:
31/05/2016
12
IDs & IDREFs
Attributes can be pointers from one object to another
as in HTML: NAME = “something” HREF “#something”
Allows the structure of an XML document to be a general
graph, raher than just a tree.
An attribute of type ID can be used to give an object a unique
value. No other object in the document must have the same ID.
An attribute of type IDREF refers to an object through its
ID.� The attribute IDREF must have a value that appears in the attribute ID of
some element in the document.
� IDREFS allows refering to multiple elements
IDs & IDREFs
� Let´s redesign our DTD for Bars. In the new DTD, Bar and
Beer contain attributes of type ID (name).
� The objects Bar contain subobjects Price , which is a
number (price of the beer) and a IDREF (TheBeer), leading
to a Beer. The objects Beer will have an attribute called
SoldBy, which is an IDREFS, leading to all the bars that sell
it.
Example:
31/05/2016
13
IDs & IDREFs
�Example : DTD.
]>The objects beer have an attribute of type ID called name, and an
attribute SoldBy that is a set of names of bar
The objects bar have an attribute
ID called name and contains one
or more subobjects price
The objects price have a
number (price) and a
reference to a beer.
XML Document
< price TheBeer = “Sol”>20
< price TheBeer = “Indio”>30
. . . .
. . . .
31/05/2016
14
XML Schema
� XML Schema arises as a consequence of the DTD
limitations:
Written in a different syntax than XML
Without namespace support
Limited data types
Difficulty to define sets of unordered elements.
XML Schema: Document Validation
� The structure of documents are described in termsof restrictions
� Two types of restrictions:
Content:
Order and sequence of elements (a beer has a name, price, etc)
Data types:Valid data units (The price of a beer has the format XXXX.XX).
31/05/2016
15
� XML scheme solves DTD limitations.
Better way to
express
restrictions
Less validation
work
Robust
exchange
of data
XML Schema: Document Validation
XML Schemas: Advantages
� Richer datatypes (double, date, etc).
� User defined data types (archetypes)
� Specialized restriction on data types, e.g. max andmin values.
� Attribute grouping (attributeGroup element)
� Refinable archetypes or “inheritance".
� Namespace support.
31/05/2016
16
Complex Types
Complex types
Bud
2.50
Here the name
of the element
is unknown
An instance of beerType:
31/05/2016
17
Complex Types
xs:attribute
� xs:attribute elements can be used within a
complex type to indicate attributes of
elements of that type.
� attributes of xs:attribute:
name and type as for xs.element.
use = ”required” or ”optional”.
31/05/2016
18
Example: xs:attribute
The element is empty, since there are no declared
subelements.
Instance of beerType:
Restricted Simple Types
� xs:simpleType can describe enumerations
and range-restricted base types.
� name is an attribute
� xs:restriction is a subelement.
31/05/2016
19
Restrictions
� Attribute base gives the simple type to be restricted,
e.g., xs:integer.
� xs:{min, max}{Inclusive, Exclusive} are four
attributes that can give a lower or upper bound on a
numerical range.
� xs:enumeration is a subelement with attribute
value that allows enumerated types
Example: license Attribute for BAR
31/05/2016
20
Example: Prices in Range [1,5)
Examples of types in XML Schema
� String
� Integer
� Decimal
� Boolean
� User-defined types (from simples or complex)
� ComplexType
� Sequence
31/05/2016
21
More Examples…[Silberschatz et al.]
]>
(DTD => XML Schema)
Versión XML Schema del
DTD anterior.
31/05/2016
22
XML Schema: Keys and Foreign Keys
� Attributes:� Attributes can be added to elements. For example, we can add the numero_cuenta
attribute to the cuenta element:
� Keys:
It is posible to define a key called numero_cuenta in the cuenta element. This can be done
in the root element banco:
� scope of the restriction
� attribute that forms the key
� Foreign keys:
A foreign key can de defined as a restriction from impositor to cuenta:
XML Schema: Query & Transformation
� XPath: Query language based on a tree representation of the XMLdocument, which provides the ability to navigate around the tree,selecting nodes by a variety of criteria. The fundament of the XML querylanguages.
� XQuery: Query and functional programming language that queries andtransforms collections of structured and unstructured data, usually in theform of XML, text and with vendor-specific extensions for other dataformats (JSON, binary, etc.). It was modeled from SQL, but oriented tothe XML structure.
� XSLT: Language for transforming XML documents into other XMLdocuments or other formats such as HTML for web pages, plain text orinto XSL Formatting Objects. It also can express queries.
31/05/2016
23
XPath
� XPath is used to navigate through elements and attributes in an XML
document.
� Example: /banco/cliente/nombre_cliente
The expression selects all nombre_cliente from every cliente:
Pedro
Juanito
María
If the tags are not needed:
/banco/cliente/nombre_cliente/text()
� Accessing attributes: /banco/cuenta/@numero_cuenta
Selects values of all numero_cuenta attributes from the cuenta elements.
XPath: Predicates
� Predicates in Xpath are possible, they are included in square
brackets.
� Select cuenta elements with saldo > 400:
/banco/cuenta[saldo>400]
� Select only numero_cuenta from cuenta elements with saldo
> 400: /banco/cuenta[saldo>400]/@numero_cuenta
� Select all numero_cuenta attributes from cuenta elements
regardless of the value of saldo :
/banco/cuenta[saldo]/@numero_cuenta
31/05/2016
24
XPath: Predicates
� XPath provides several functions that can be used as part of a predicate:
/banco/cuenta/[count(./cliente) > 2]
� Select all clientes referenced (use of IDREFS) by the titular attribute
from cuenta elements: /banco/cuenta/(@titular)
� To obtain the root of a specific xml document: doc(“banco.xml”)
This function can be part of a operation:
doc(“banco.xml”)/banco/cuenta
XQUERY
� Language for querying XML data
� XQuery for XML is like SQL for databases
� XQuery is built on XPath expressions
� XQuery is supported by all major databases
� XQuery is a W3C Recommendation
� XQuery 1.0 and XPath 2.0 share the same data
model and support the same functions and operators
� It comes from a query language called Quilt
31/05/2016
25
XQUERY: FLWOR Expressions
Xquery has 5 sections: For, Let, Where,
Order by, Return (FLWOR)
� For - selects a sequence of nodes
� Let - binds a sequence to a variable
� Where - filters the nodes
� Order by - sorts the nodes
� Return - what to return (gets evaluated once for
every node)
XQUERY: Example
� To obtain the account numbers from the checking acounts (using ID and IDREFS):
for $x in /banco/cuenta
let $numcuenta := $x/@numero_cuenta
where $x/saldo > 400
return {$numcuenta}
� The for clause works similar to the from clause in SQL
� Variables obtained in for contain resulting values from Xpath expressions.
� If more than one variable is in the for clause then a cartesian product is executed.
� The let clause binds a sequence of values (resulting from a Xpath expression) to a variable to simplify the statement.
� The where clause expresses predicates similar to those in SQL.
� The order by clause allows to order the output.
� The return clause builds the resulting XML document.
31/05/2016
26
XQUERY: FLWOR Expressions
� Some clauses are not necessary:for $x in /banco/cuenta[saldo > 400]
return {$x/@numero_cuenta}
� { } describe expressions to evaluate which output will be included in the XML text. This is also applicable with quoted braces :
return
XQUERY: FLWOR Expressions
� XQuery provides another form for creating elements usingthe element and attribute constructors.
� Example: To generate cuenta elements with the followingsub-objects: numero_cuenta, nombre_sucursal and saldo:
return element cuenta {
atribute numero_cuenta {$x/@numero_cuenta},
atribute nombre_sucursal {$x/@nombre_sucursal},
element saldo {$x/saldo}
}
31/05/2016
27
Example:
< nombre > eclipse
Sol
< precio> 20
Indio
< precio> 30
< nombre > El rincón de los milagros
Victoria
< precio> 20
. . . .
Consulta en XQUERY
FOR $ba IN document(“http://cinvestav.mx/bares.xml”)
//BAR[@tipo = “deportes”],
$be IN $ba/CERVEZA[NOMBRE = “Sol”]
WHERE $ba/CERVEZA/[NOMBRE = “Indio”]
RETURN $be/PRICE;
� Find the price of the “Sol” beer in bars of type
“deportes” that also serves the “Indio” beer
31/05/2016
28
XQUERY: Joins
� Joins are defined similar to those in SQL. For example, join of the elementsimpositor, cuenta and cliente:
for $a in /banco/cuenta, $c in /banco/cliente, $i in /banco/impositor
where $a/numero_cuenta = $i/numero_cuenta
and $c/nombre_cliente = $i/nombre_cliente
return {$c $a}
� The same query using XPath:for $a in /banco/cuenta, $c in /banco/cliente,
$i in /banco/impositor [numero_cuenta=$a/numero_cuenta and
nombre_cliente = $c/nombre_cliente]
return {$c $a}
� Notes:
Examples of some operators: eq, ne, lt, gt, le, ge
Take into account that when sequences are evaluated such as $x/saldo = $y/saldo, the predicatewill be true if any of the returned value in the first expression is equal to any other value in thesecond expression.
XQUERY: Nested queries
� The XQuery FLWOR expressions can be nested in the return clause,generating nesting strcutures that do not appear in the source document.
� Example:
{
for $c in /banco/cliente
return
{$c/*}
{for $i in /banco/impositor[nombre_cliente = $c/nombre_cliente],
$a in /banco/cuenta[numero_cuenta = $i/numero_cuenta]
return $a}
}
� NOTE: query used to generate the document shown in [fig 10.4 Silberschatz] from the document in [fig 10.1 Silberzschatz].
31/05/2016
29
Nesting representation
Gonzalez
Arenal
La Granja
C-101
Centro
500
C-201
Galagapar
900
Lopez
Mayor
Peguerinos
C-102
Navacerrada
400
Document from [fig 10.4 Silberzschatz].
Document from [fig 10.1 Silberzschatz].
C-101
Centro
500
C-102
Navacerrada
400
C-201
Galapagar
900
Gonzalez
Arenal
La Granja
Lopez
Mayor
Peguerinos
C-101
Gonzalez
C-201
Gonzalez
C-102
Lopez
31/05/2016
30
Functions: SUM , COUNT
� Xquery supports several functions that are common to XPath2.0 and can be used in any Xpath expression.
� To avoid conflicts, the functions are associated to a namespace: http://www.w3.org/2004/10/xpath-functions
� They have a predetermined prefix fn that refers to the namespace. In this way, ambiguity is avoided: fn:sum fn:count
for $c in /banco/cliente
return
{$c/nombre_cliente}
{fn:sum(for $i in /banco/impositor[nombre_cliente = $c/nombre_cliente],
$a in /banco/cuenta/[numero_cuenta = $i/numero_cuenta]
return $a/saldo
) }
Sorting results
� In Xquery, results can be ordered if an order by clause is included. Example:
for $c in /banco/cliente,
order by $c/nombre_cliente
return {$c/* }
In a descending way:
order by $c/nombre_cliente descending
� The ordering can be achieved in several nesting levels. Example:{
for $c in /banco/cliente,order by $c/nombre_clientereturn
{$c/* }
{ for $i in /banco/impositor/[nombre_cliente = $c/nombre_cliente],$a in /banco/cuenta/[numero_cuenta = $i/numero_cuenta]
order by $a/numero_cuentareturn {$a/* } }
}
31/05/2016
31
User-Defined Functions
� Even thoug XQuery has different predefined functions, such as numeric, comparison and manipulation functions, XQuery can support user defined functions.
define function saldos(xs:string $c) as xs:decimal* {for $i in /banco/impositor/[nombre_cliente = $c],
$a in /banco/cuenta/[numero_cuenta = $i/numero_cuenta]
return $a/saldo}
Sequence of values
� Types can be partially defined; for instance, the element type allows elements with any tag, whereas element(cuenta) allows only elements with the cuenta tag.
� XQuery carries out type conversion automatically. However, it also provides functions to convert, example: number(x)
� When an element is passed into a function that waits for a string, the conversion to stringis made by concatenating all the text values contained in the element (nested values). Example of function to manipulate strings: contains(a,b)
More features
� XQuery provides additional features such as: if-then-else expressionsthat can be used in return clauses.
� A predicate (in where clause) can include universal and existentialquantifiers:
some $e in path satisfies P
� path: path expression, P: predicate $e, quantifiers: some or every.
� The norm XQJ provides an API to execute Xquery queries on an XMLdatabase system and obtain XML results. Its functionality is similar tothe JDBC API.
31/05/2016
32
Examples with: BDB XML
� Execute dbxml
� Create a Container:
createContainer Bancos
� Add content:
putDocument banco1 ‘ ‘ s
� Query:
query ‘ collection(“Bancos")/banco/cliente‘
� Print results:
� Add many records:dbxml>createContainer parts.dbxml
dbxml> putDocument "" '
for $i in (0 to 99)
return
Description of {$i}
{$i mod 10}
{
if (($i mod 10) = 0)
then {$i mod 3}
else ""
}
' q
31/05/2016
33
Verify response time
time query '
collection("parts.dbxml")/part[@number > 100
and @number < 105]'
Give HTML format to theoutput
dbxml> query '
{
for $part in
(collection("parts.dbxml")/part[@number > 100 and
@number < 105])
return
{$part/description/string()}
}
'
31/05/2016
34
Sorting
dbxml> query '
{
for $part in
(collection("parts.dbxml")/part[@number > 100 and @number < 105])
order by xs:decimal($part/@number) descending
return
{$part/description/string()}
}
'
|
query 'for $x in (collection("banco.dbxml")/banco/cuenta[saldo > 400]) return $x'
31/05/2016
35
Examples of XML engines
� Oracle Berkeley DB XML (DB XML).
http://www.oracle.com/technetwork/database/berkeleydb/overview/index.html
� eXist-db Project
http://exist-db.org/
http://exist.sourceforge.net
� Xbird
http://code.google.com/p/xbird
� Qizx
http://www.xmlmind.com/qizx/
� BaseX
More information…
� XML/SQL:
http://www.stylusstudio.com/sqlxml_tutorial.html
� XML with MySQL:
http://dev.mysql.com/tech-resources/articles/mysql-5.1-xml.html
� XPath:
http://www.w3.org/TR/xpath
� XQuery:
http://www.w3.org/TR/xquery/
� Xquery implementations:
http://www.w3.org/XML/Query/#implementations
� XQJ Xquery API for Java tutorial: http://www.xquery.com/tutorials/xqj_tutorial/
� Complementary readings:
Chapter 11 and 12 book: Database Systems, The Complete Book, Hector García-Molina et
al. 2nd. Edition. 2009.
Top Related