What's New in System.Xml Version 2.0



What's New in System.Xml Version 2.0

In this section of the chapter we overview what is new in System.Xml version 2.0 and the System.Xml.Query and System.Data.SqlXml namespaces that have been introduced. Greater detail and usage scenarios are provided for each of these in the next three chapters.

Middle-Tier XML Data Access

Middle-tier XML programming is the ability to query data sources exposed through an XML provider; manipulate the XML with the DOM, XSLT, and XQuery; maybe share the data as a Web Service to another application; and then push the changes back to the original data source. The primary aim is to maintain the fidelity and shape of the XML, taking advantage of the hierarchical data model and its interoperable cross-platform nature with the layered XML technology services, such as querying, validation, and APIs.

When considering SQL Server as the DBMS, there is rich XML support available—both built directly into SQL Server and available through the middle-tier native SQLXML 3.0 technology. Although this book does not cover the existing XML support in SQL Server, which has substantial coverage in The Guru's Guide to SQL Server Stored Procedures, XML, and HTML by Ken Henderson (Boston, MA: Addison-Wesley, 2002, ISBN 0-201-70046-8), there is a deep synergy between the new support for XML in the middle tier introduced in System.Xml version 2.0 and SQLXML. The combination of XML Views (which we discuss next) and XQuery is a migration of the SQLXML technology to the .NET Framework, providing richer querying support and a more flexible mapping format to cope with the increasing demands of building XML providers over relational and other sources.

XML Views: Extending the XML Provider Model

Since there are and always will be large amounts of data not stored as XML, being able to represent this data as XML without physically converting it to XML 1.0 is essential. This concept was briefly mentioned earlier. In order to virtualize the data stored in a relational DBMS, the concept of XML Views is introduced. XML Views are a declarative format to map data in one data model to the XML data model, and can be defined as follows.

XML Views are the translation, using a declarative mapping syntax, of data represented in one data model into an XML data model. The XML View maps structure and relationships between these differing data models in order to provide a virtualized XML document over the data source.

XML views have been introduced to System.Xml version 2.0 to provide the ability to map data stored in SQL Server tables to XML, thereby enabling queries and updates. Although primarily aimed at mapping relational database tables to XML, this virtualized XML is not constrained to just this use. For example, the same view technology is used by ObjectSpaces to map relational data to object graphs.

XML views are present in SQLXML through annotated XML schemas, so this concept is not new; however, the implementation in the .NET Framework takes XML views to new levels of functionality and flexibility by introducing a format called three-part mapping, which we will discuss later in this chapter. Figure on the next page shows how XML views are another type of XML provider that can be built to virtualize the data in a data store to be accessed via middle-tier XML APIs.

4. XML views as a type of XML provider

graphics/05fig04.gif

XML views offer the following benefits.

  • They act as a replacement for custom procedural mapping code. When you build an XML provider in code, you have to decide how to map between the different structures. For example, with a relational database you have to decide whether to map the table name as an element name and the tables and columns as attributes or elements. XML views replace the need to write this code by using a declarative design time representation.

  • They are easily updated without requiring code recompilation because they are a declarative file format.

  • They are bidirectional, in that they can be used for both queries and updates against the data store. Many XML providers are read-only.

  • They are ideal for use with design time tools.

XML views are syntax for transformation between data models. This means they are ideal for compilation in the same way that XML query languages are compiled. The System.Xml.Query.XmlViewSchema class represents a compiled XML view, with the System.Xml.Query.XmlView SchemaDictionary class being a dictionary of these that can be referenced in queries. We will use these classes when we do queries and updates to SQL Server in the following chapters.

An XML View Example

In order to provide some visualization of an XML view, the following series of figures provide an example of mapping relational tables to an XML schema to create an XML view. Here we have chosen to map the Customer and Order tables from the SQL Server Northwind sample database. This example will continue into the following chapters, using code to query and update SQL Server.

Figure shows the mappings for the Customer table. Each of its columns (for example, CustomerID) is mapped to an attribute on an element named Customer. The attributes represent each of the columns, so there is a CustomerID attribute. This is a simple XML view to construct and clearly shows that XML views are a data model structural transformation process.

5. Mapping the Customer table to a Customer element with attributes

graphics/05fig05.gif

The Customer element maps to a table, and its attributes map to columns. The structures in the different data models match. The number of Customer elements created depends on the number of rows in the Customer table. If you delete an element, you delete a row from the table. If you update an attribute, this updates a column in the table.

We can extend this concept further by mapping to a different XML structure. The mapping in Figure on the next page shows that the Customer table is mapped this time across two elements, named Customer and Address, where the Address element contains the address-related attributes. Here there is a transformation that creates an Address element for every Customer element. For every row in the Customer table, we are creating a Customer element, so the Address element also depends on the Customer table.

6. Mapping the Customer table to the Customer and Address elements

graphics/05fig06.gif

If you delete an Address element, that does not mean that you will delete the row from the Customer table, although deleting the Customer element does mean deleting a row from the Customer table. The Address element is simply owned by the Customer element and is just another XML shape for the table. This again shows that XML views are a structural transformation.

In the final example, in Figure, we introduce a child element for the Customer element named Order, which is mapped to the Order table. In the Northwind database, the Customer and Order tables are joined via a relationship based on the CustomerID column. Hence all the orders for a given customer are listed as child elements of the Customer element.

7. Mapping the Customer and Order tables

graphics/05fig07.gif

The behavior of this mapping is different than the previous examples. Deleting a Customer element has no effect on the Orders table, although the Address element is deleted. In other words, the Order element and Customer element behave independently from each other for update operations. XML views can get complicated because they add new behavior to the data model independent of its shape. In this case, if desired, it can be indicated in the mapping to cascade-delete the child Order elements when deleting a Customer element, but this needs to be done explicitly. These examples show that XML views provide a very powerful approach to accessing and updating relational data as XML.

The Three-Part Mapping Format for XML Views

Let's now look at how XML views are defined as a file format. To achieve the cross-domain mapping, three different formats are used—otherwise known as "three-part mapping." These three parts are a W3C XML Schema for the XML domain (XSD), a Relational Schema Definition (RSD) for the relational domain, and a Mapping Schema Definition (MSD) for the mapping transformation.

Taking the first mapping example we discussed previously, we can divide the domains shown in Figure among the mapping schema formats as shown in Figure on the next page.

8. Representing each domain by its own schema format

graphics/05fig08.gif

Chapters 7 and 8 provide examples of each of these formats for part of the Northwind database. The XSD is simply an XML schema with complex and simple types. The RSD is the persisted format of a relational database, with sufficient information to perform the necessary domain mapping functionality. RSD is a subset of the database's metadata that describes one or more database structures (such as tables, views, or stored procedures) and the relationships between these structures. The MSD format is syntax for expressing mapping between each of the domains, to transform relational tables to elements and columns to attributes or elements. All three formats are persisted as XML.

The XPathDocument2 Class

System.Xml version 1.x has an XML store called the XPathDocument, which is built on the XPath data model and, in turn, is based on the XML InfoSet specification. Hence it does not have any of the XML 1.0 serialization constraints. This makes it ideal for representing XML as just data. The XPathDocument2 is a duplicate of the XPathDocument class but has been extended to allow updates and change tracking at a node level.

The XPathDocument2 class needs an API that reflects the XPath data model in order to manipulate and update the XML. In System.Xml version 2.0 the XPathEditor class provides this ability with an updatable cursor-style API, deriving from the XPathNavigator2 class. The typical editing model is to select parts of your document to update with XPath queries and then edit these through the XPathEditor. Furthermore, the XPathEditor exposes an XmlWriter class used to build a particular node tree at the current location. This involves the more intuitive and usable top-down push model approach, rather than the bottom-up DOM approach of creating, constructing, and attaching node trees.

For developers this has the added benefit of needing to learn only a single API, the XmlWriter, to generate XML—whether for a stream or a node tree. (For those of you who have followed the history of System.Xml development, this is the same as the fabled XmlNodeWriter.) Since the XPathDocument2 can be accessed only via XPathNavigators and XPath Editors, which can be considered as pointers to the underlying data, the internal node structure is not exposed. Hence the underlying XML store structure can be optimized for queries now and in future releases, as XML query optimization technologies evolve.

Another feature of the XPathDocument2 class is its ability to track changes to individual nodes, keeping current and previous values for inserted, updated, and deleted nodes. This enables you to determine changes that have occurred to the XML document and to push these updates back into a data source. A class called XPathChangeNavigator has been introduced to allow navigation of the changed nodes, which also derives from the XPathNavigator2 class. This navigator can be considered as a view of the document changes, along with its current state. In effect, every node has both a current value and an original value, which are maintained and viewed through the XPathChangeNavigator.

Like the XPathDocument class, the XPathDocument2 (being based on the XPath data model) is the preferred store for all queries, both XSLT and XQuery, and is significantly faster than the XmlDocument class. Now that it is editable as well, there is much less need to use the XmlDocument for the editing and query scenarios—the XPathDocument2 does it all.

So, you might be asking, where does this leave the XmlDocument class? Well, having an object-oriented model based on XmlNodes is useful for extensibility to define your own type of XmlDocument store, but apart from that it has come to an innovation dead end. In fact, as discussed earlier, the very fact that it does expose a node-based API means that it cannot easily represent virtualized XML like the XPathNavigator can.

IMPORTANT

In System.Xml version 2.0, the XPathDocument2 and XPathNavigator2 are duplicates of the existing XPathDocument and XPathNavigator classes. In the Technology Preview release, this is an intermediate stage, and both these classes will be merged into the XPathDocument and XPathNavigator classes, respectively, in future beta releases.

Querying and Updating Data with the XmlAdapter

The System.Data.DataSet class is one of the most used data access components in ADO.NET, and rightly so. It represents a relational store on the client that can easily be synchronized to relational DBMSs and was built to support disconnected application scenarios. It also has the ability to read a W3C XML schema to create its table structure, read XML directly into the tables, persist the structure as a W3C XML schema, and write out the data as XML. This tight integration between relational data and XML was a primary goal of ADO.NET.

Despite its great XML support, the DataSet is fundamentally still a relational store. Once the XML has been loaded, you must use relational APIs to manipulate the data. This is fine when the XML that was loaded is regular and maps perfectly into DataSet tables, which in turn map to tables in the DBMS that are going to be updated. However, the world of XML, by its very nature, is typically not that structured. Even if you have defined and agreed on a schema with your business partners, there can be a variety of differently shaped XML documents that correctly conform to that schema.

And, of course, the database administrator defined the database table structure three years ago, and he or she is not going to change that just because some application is now pushing out XML to consume. This forcing of a square XML peg into a round database hole is described as an impedance mismatch. So what is the solution? Typically, developers today do one of the following.

  • Transform the XML, using XSLT, into another XML document that represents the correct relational table structure, load this into a DataSet class, and update the database.

  • Parse the XML using an XmlReader or XPathNavigator and generate insert, update, and delete SQL statements using ADO.NET for the specific database table. This is more efficient than the DataSet because it is not cached, but it requires more code and tends to be easier to break. This is a custom mapping solution (mentioned earlier), which the XML View technology is designed to replace.

  • Take advantage of the built-in database support, such as the SQLXML bulk XML import or the OPENXML features in SQL Server 2000. These are reasonable options, but they tend to be geared toward loading large amounts of data in a single shot.

To overcome these limitations, a new class called the XmlAdapter has been added to System.Xml version 2.0, which—combined with the XPath Document2—provides a powerful way to retrieve, manipulate, and update relational data as XML from SQL Server. The XmlAdapter is really just another data access class in the ADO.NET model, and it further accomplishes the goal of tight XML integration between relational data and XML in ADO.NET.

The XmlAdapter, like the ADO.NET DataAdapter, provides the ability to issue queries to SQL Server and return results. It uses XQuery as the query language and XML views to map the SQL Server tables to XML, filling an XPathDocument2 as the disconnected store. Figure illustrates this.

9. Updating changes made to the XPathDocument2 to SQL Server via the XmlAdapter

graphics/05fig09.gif

Once the XPathDocument2 is filled, you can take advantage of other XML technologies such as XQuery and XSLT, as well as use the XPath Editor API to update the XML store. And once updated, the changes to the XPathDocument2 are tracked and used to update SQL Server via the XML view that automatically generates the necessary SQL statements to update the respective tables. In effect, once you have defined an XML view, you are able to seamlessly treat your data in SQL Server as XML using the XPathDocument2 as a disconnected XML store.

The XmlAdapter overcomes all the limitations previously mentioned of storing XML in SQL Server. Also, since it streams relational data out as XML through the XML view, there is no need to cache the data into an intermediate DataSet in order to generate XML. The XML view provides complete control over the structure of the XML generated.

This design parallels the ADO.NET relational classes for XML, as shown in Figure on the next page. In version 2.0 of the .NET Framework, there is now an impressive middle-tier XML data access API for working with SQL Server.

10. Comparing the ADO.NET relational and XML APIs to access data

graphics/05fig10.gif

XQuery Programming

Previously we discussed the advent of XQuery as the emerging query language for XML. In order to accommodate this in the .NET Framework, a new class called the XQueryProcessor has been introduced into the System .Xml.Query namespace.

This class allows you to perform queries over either XML in-memory documents or XML views. Whereas the XmlAdapter is used to fill and update the XPathDocument2 store through an XML view, the XQueryProcessor class can also query the XPathDocument2 once loaded.

Another ability of the XQueryProcessor class is to stream the output of the query through a TextWriter class, without requiring this to be loaded into a client-side XML store. (Note that there will be support for Xml Reader and XmlWriter as output types in future releases.) We will see more details of the XQueryProcessor class in Chapter 8.

XQuery and XML Views

By combining the XQueryProcessor class with XML views, XQuery can be used as a data aggregation language over different data sources. An XML view can target multiple data sources via different connections and hence is able to query and aggregate this on the middle tier.

For example, the XQuery in Listing 5.1 creates a list of <customer> elements that each contain a list of customer <name> elements. Loading all the customers and all the orders in the Northwind database, via an XML view called Northwind, generates the names. A filter or predicate is applied through the where clause so that each CustomerID that is matched with the order's CustomerID returns the name for the customer. If a customer has multiple orders, there will be multiple names returned.

An XQuery to Extract Customer Names
<customer>
{
  for $cust in map:view('Northwind')/customers,
    $order in map:view ('Northwind')/orders
  where ($cust/Customer/@CustomerID = $order/CustomerID)
  return <name>{ $cust/Customer/@name } </name>
}
</customer>

Listing 5.2 shows an example of output from this query.

The Result of an XQuery to Extract Customer Names
<customer>
<name>Mark Fussell</name>
<name>Alex Homer</name>
<name>Dave Sussman</name>
</customer>

We will discuss XQuery and XML views in more depth and with examples in Chapters 7 and 8.

XSLT Programming

With the advent of the System.Xml.Query namespace, there has also been a significant innovation in XSLT processing. System.Xml version 1.x introduced the XslTransform class to execute XSLT stylesheets. The System .Xml.Query namespace was created in order to combine XML languages into a common query processing engine for XSLT, XQuery, and XPath— including XML Views, which are simply another XML transformation or query language. This treats XML languages like other CLR languages, in the sense that they are compiled and optimized and then generate MSIL instruction codes that can be executed, thereby significantly increasing the performance of the queries with some sacrifice for compilation time.

The System.Xml.Query.XsltProcessor class takes advantage of this improved query engine to perform XSL transformations over the XPath Document2 class. The XsltProcessor class should be regarded as still under development, and we will discuss it more with examples in Chapter 8.