The Changing XML Landscape



The Changing XML Landscape

A number of XML technologies have emerged that are redefining how XML is generated, stored, and queried. All of these have a direct impact on application development both now and in the future.

Growing XML Content Generation

Do you remember four years ago, when you received only a hundred e-mails per day? Now you probably receive closer to four hundred. The same parallels can be drawn for XML. Although its origins were as a metadata markup language for documents, XML has now proliferated as the primary form of data interchange, as configuration and log files, in the design of loosely coupled service-based applications, and as the primary format for representing hierarchical data.

The release of Microsoft Office 2003 will significantly fuel the generation of more XML content because it is positioned as an application development platform to unleash the content of documents as XML, thereby allowing the information to be aggregated, searched, managed, and reused. This requires developers to be aware of XML when building applications that integrate with Microsoft Office.

XML Storage

In Chapter 8 we'll see how the new version of SQL Server (code-named "Yukon") now includes an XML data type that can be used to store XML documents. Now all those Microsoft Office 2003 XML documents have a place to be centrally stored, queried, and managed.

The Need for Another XML Query Language

XML has query languages available today in the form of XPath 1.0 and XSLT 1.0, both of which are hugely popular, to the point that XPath has become the de facto middle-tier query language. Although positioned as an XML-to-XML transformation language, XSLT is capable of performing queries across XML data sources. However, despite the availability of these technologies, the W3C decided that it was necessary to introduce a new XML query language. Enter XQuery. The W3C XQuery language specification includes this justification for XQuery:

A query language that uses the structure of XML intelligently can express queries across all kinds of data, whether physically stored in XML or viewed as XML via middleware. This specification describes a query language called XQuery, which is designed to be broadly applicable across many types of XML data sources. . . .

XQuery is designed to meet the requirements identified by the W3C XML Query Working Group XML Query 1.0 Requirements and the use cases in XML Query Use Cases. It is designed to be a language in which queries are concise and easily understood. It is also flexible enough to query a broad spectrum of XML information sources, including both databases and documents.[1]

Essentially XQuery can be paraphrased in the following statement:

[1] From the W3C's "XQuery 1.0: An XML Query Language." Accessed in May 2003 at http://www.w3.org/TR/xquery/.

XQuery is to XML as the SQL language is to relational databases.

Chapter 8 details more on the XQuery language, but primarily it was designed to provide the following benefits to XML.

  • XQuery has a greater expressiveness than SQL in its ability to perform complex query operations such as joins, ordering, sorting, and so on.

  • It has a human-friendly syntax, as opposed to XSLT (which has XML syntax). Note that XQuery does include an XML syntax called XQueryX, but this is unlikely to be supported in Microsoft products because of the availability of XSLT.

  • It is strongly typed at both runtime and compile time; that is, through the use of W3C XML Schema types you can query for particular types of data in the data source.

  • It has a rich set of functions and operators based on the W3C XML Schema types.

From an application development perspective, XQuery provides distributed queries across numerous data sources that are exposed as XML. By taking advantage of the XML provider model mapped over data stores, XQuery is set to become a universal query language for data integration across disparate sources.

The landscape for XML is changing as rapidly now as when XML was first unleashed five years ago. More XML content is being produced at an increasingly rapid rate, relational DBMSs have evolved to store this content as a native type, and the emergence of XQuery provides a language to easily query, aggregate, and manipulate this data.