Using the XPathDocument2 Class



Using the XPathDocument2 Class

Having seen how the XPathDocument2 class can be updated with the XPathEditor class, we'll now look at some of the methods on the XPath Document2 class that can be used in conjunction with the XPathEditor.

Validating an XPathDocument2 with an XML Schema

The content of an XPathDocument2 can be validated with an XML schema to check the structure of the XML document, using the CheckValidity method. This is useful because it allows the document to be revalidated when it is loaded or updated with the XPathEditor. Before we look at the code to achieve this, we need to discuss the new XML schema library called the XmlSchemaSet class.

The XmlSchemaSet Class

In System.Xml 1.x, W3C XML schemas can be loaded into an XmlSchema Collection class, which acts as a collection (or library) of schemas. Because of interpretations of the still young W3C XML Schema specification, the XmlSchemaCollection has incorrect behavior with multiple schemas with the same namespace. Schemas and their associated imported schemas were treated as independent entities, meaning that if a schema with namespace A imported a schema with namespace B, and then a schema with namespace C also imported schema B, there would be two separate instances of schema B in the XmlSchemaCollection.

Of course, this can continue n levels deep and cause all sorts of different dependency graphs between the schemas. In most cases schema B is identical, but this need not always be the case. If a second instance of a schema with the same namespace is imported, all schemas referencing the imported schema should be recompiled. The correct implementation is that there is a "set" into which all the schemas are compiled, so that there are no "islands" of schemas. Instead, there is one logical schema into which all references between all schemas are resolved and there are no isolated instances.

The XmlSchemaSet has been introduced into System.Xml 2.0 to fix this deficiency in XmlSchemaCollection—along with some other advantages, including deprecating Microsoft's XDR schema support. The differences between the XmlSchemaCollection and the XmlSchemaSet are summarized in Figure.

Differences between the XmlSchemaCollection and the XmlSchemaSet

XmlSchemaCollection

XmlSchemaSet

This class supports XDR and XML schemas.

This class supports only XML schemas.

Schemas are compiled when the Add method is called.

Schemas are not compiled when the Add method is called. This provides significant performance improvement during creation of the schema library.

Each schema generates an individual compiled version, which can result in "schema islands."

Compiled schemas generate a single logical schema, a "set" of schemas.

Only one schema for a particular ce target namespace can exist in the collection.

Multiple schemas for the same target namespa can be added as long as there are no conflicts. This is because schemas in the XmlSchemaSet are identified by the combination of their target namespace and schema location.

From an API perspective, the XmlSchemaSet is similar to the Xml SchemaCollection, with the main difference being the Compile method that is added for the XmlSchemaSet. This enables you to add multiple schemas to the XmlSchemaSet and then compile or recompile the entire logical schema set only when required.

Checking the Validity of an XML Document

Now we are ready to validate the XPathDocument2 with the CheckValidity method. First we must associate an XML schema with the books.xml document. To do this we have created a new XML document called bookswithschema.xml and added a default namespace declaration xmlns =http://adoxml/ch6/samples at the start of the document. This applies to all the child element nodes in the document (see Listing 6.10).

The bookswithschema.xml Document
<?xml version="1.0" encoding="utf-8"?>
<!-- This file represents a fragment of a bookstore database -->
<bookstore xmlns="http://adoxml/ch6/samples">
  <book genre="technology" publicationdate="10-27-2003"
        ISBN="1-861003-11-1">
    <title>ADO.NET and System.Xml V2</title>
    <author>
      <first-name>Alex</first-name>
      <last-name>Homer</last-name>
    </author>
    <author>
      <first-name>Mark</first-name>
      <last-name>Fussell</last-name>
    </author>
    <price>29.99</price>
  </book>
  <book genre="autobiography" publicationdate="1981"
        ISBN="1-861-11-0">
    <title>The Autobiography of Benjamin Franklin</title>
    <author>
      <first-name>Benjamin</first-name>
      <last-name>Franklin</last-name>
    </author>
    <price>8.99</price>
  </book>
  <book genre="novel" publicationdate="1967" ISBN="0-201-63361-2">
    <title>The Confidence Man</title>
    <author>
      <first-name>Herman</first-name>
      <last-name>Melville</last-name>
    </author>
    <price>11.99</price>
  </book>
  <book genre="philosophy" publicationdate="1991"
        ISBN="1-861001-57-6">
    <title>The Gorgias</title>
    <author>
      <name>Plato</name>
    </author>
    <price>9.99</price>
  </book>
</bookstore>

An XML schema, shown in Listing 6.11, is required to validate this document. This schema is called booksschema.xsd, and the targetNamespace is set to xmlns=http://adoxml/ch6/samples.

This schema was generated using the XML Schema Inference tool on the gotdotnet Web site. See http://apps.gotdotnet.com/xmltools/xsdinference/. The tool can generate schemas for any well-formed XML document and is not bound by any of the limitations of the xsd.exe tool that ships with the version 1.x Framework SDK.

The booksschema.xsd Schema
<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified"
           targetNamespace="http://adoxml/ch6/samples"
           elementFormDefault="qualified"
           xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xs:element name="bookstore">
  <xs:complexType>
   <xs:sequence>
    <xs:element maxOccurs="unbounded" name="book">
     <xs:complexType>
      <xs:sequence>
       <xs:element name="title" type="xs:string" />
       <xs:element maxOccurs="unbounded" name="author">
        <xs:complexType>
         <xs:sequence>
          <xs:element minOccurs="0" name="name" type="xs:string"/>
          <xs:element minOccurs="0" name="first-name"
                      type="xs:string" />
          <xs:element minOccurs="0" name="last-name"
                      type="xs:string"/>
         </xs:sequence>
        </xs:complexType>
       </xs:element>
       <xs:element name="price" type="xs:decimal" />
      </xs:sequence>
      <xs:attribute name="genre" type="xs:string" use="required" />
      <xs:attribute name="publicationdate"
                    type="xs:string" use="required" />
      <xs:attribute name="ISBN" type="xs:string" use="required" />
     </xs:complexType>
    </xs:element>
   </xs:sequence>
  </xs:complexType>
 </xs:element>
</xs:schema>

The code in Listing 6.12, for the next example, shows how to validate the bookswithschema.xml document after it has been loaded. First the booksschema.xsd schema is loaded into an XmlSchemaSet instance, setting the appropriate namespace, and the XmlSchemaSet is compiled to generate the schema set that will be used for validation. The bookswith schema.xml document is then loaded and the CheckValidity method called—passing the XmlSchemaSet and a callback handler with which to report any validation errors. This handler is called only if there are errors or warnings when the document is checked against the schema.

Validating the XML Document against Its Schema
Dim schemaSet As New XmlSchemaSet()
schemaSet.Add("http://adoxml/ch6/samples", "booksschema.xsd")
schemaSet.Compile()

Dim doc As New XPathDocument2()
doc.Load("bookswithschema.xml")

doc.CheckValidity(schemaSet, _
    New ValidationEventHandler(AddressOf ValidationCallback))
...

' the validation callback handler for error and warnings
Sub ValidationCallback(sender As Object, e As ValidationEventArgs)
  label1.Text = "Validation failed: " & e.Message
End Sub 'ValidationCallback

Revalidating the Document after Editing

If the document is edited, it can be revalidated against the schema before being saved. The code in Listing 6.13 shows a new book node being added to the bookswithschema.xml document. However, an invalid publisher attribute is also added—which means that when the CheckValidity method is called for a second time, the validation will fail. Notice that, when writing the XML, these nodes all belong to the namespace http://adoxml/ch6 /samples so that they can be validated with the schema.

Revalidating a Document after Editing
Dim schemaSet As New XmlSchemaSet()
schemaSet.Add("http://adoxml/ch6/samples", "booksschema.xsd")
schemaSet.Compile()

Dim doc As New XPathDocument2()
doc.Load("bookswithschema.xml")

Dim xmledit As XPathEditor = doc.CreateXPathEditor()
xmledit.MoveToFirstChild()
xmledit.MoveToNextSibling()

Dim writer As XmlWriter = xmledit.CreateFirstChild()

writer.WriteStartElement("book", "http://adoxml/ch6/samples")

'invalid attribute "publisher" written according to the schema
writer.WriteAttributeString("publisher", "Addison-Wesley")

writer.WriteAttributeString("genre", "technology")
writer.WriteAttributeString("publicationdate", "10-27-2003")
writer.WriteAttributeString("ISBN", "1-861003-11-1")
writer.WriteElementString("title", "http://adoxml/ch6/samples", _
                          "ADO.NET and System.Xml V2")
writer.WriteStartElement("author", "http://adoxml/ch6/samples")
writer.WriteElementString("first-name", "http://adoxml/ch6/samples", _
                          "Alex")
writer.WriteElementString("last-name", "http://adoxml/ch6/samples", _
                          "Homer")
writer.WriteEndElement()
writer.WriteStartElement("author", "http://adoxml/ch6/samples")
writer.WriteElementString("first-name", "http://adoxml/ch6/samples", _
                          "Mark")
writer.WriteElementString("last-name", "http://adoxml/ch6/samples", _
                          "Fussell")
writer.WriteEndElement()
writer.WriteElementString("price", "http://adoxml/ch6/samples", _
                          "29.99")
writer.WriteEndElement()
writer.Close()

doc.CheckValidity(schemaSet, _
    New ValidationEventHandler(AddressOf ValidationCallback))

The screenshot in Figure shows the results of executing this code. You can see that the page displays a validation error message, which is created by the ValidationCallback event handler shown at the end of Listing 6.12.

10. Validation failure on the document with an XML schema

graphics/06fig10.gif

Events and Event Handling on the XPathDocument2

The XPathDocument2 exposes events that are fired when changes are made to nodes within the document. The various events available, such as ItemInserting and ItemUpdating, were described in the XPathDocument2 methods and properties section in Figure.

Events can be classified into two categories: those that are raised before the node has changed (they end with the suffix ing, such as Item Inserting) and those that are raised after the node has changed (they end with the suffix ed, such as ItemInserted). A reference to an XPathDocument2ChangedEventArgs class is passed to all the event handlers, and this contains information about the changed node in the tree.

Events are useful for determining what changed in the document and hence for applying business rules to these changes (e.g., "Before you change the person's state, ensure that the country is set to USA"). Note that some business rules can also be applied via an XML schema, such as using the key/keyref attributes, enforcing minimum and maximum values, and pattern matching.

The ItemInserting Event

The code in Listing 6.14 shows a new book being inserted into the books .xml document but without a publicationdate attribute. An event handler named ItemInsertingEventHandler is added to the ItemInserting event of the XPathDocument2 before the new nodes are inserted. When the Close method is called for the XmlWriter, the new tree is written to the document.

However, ItemInserting events are raised for the new nodes before the values are actually changed in the document, calling the Item InsertingEventHandler method shown at the end of the code listing. Within this handler, the item being inserted is a book element identified via the Item property of the XPathDocument2ChangedEventArgs. An attempt is made to move the XPathNavigator2 to the publicationdate attribute, and if this fails (as it does in this case), an exception is thrown and an error message is displayed.

Handling the ItemInserting Event
Try

  Dim doc As New XPathDocument2()
  doc.Load("books.xml")

  AddHandler doc.ItemInserting, AddressOf ItemInsertingEventHandler

  Dim xmledit As XPathEditor = doc.CreateXPathEditor()
  xmledit.MoveToFirstChild()
  xmledit.MoveToNextSibling()

  Dim writer As XmlWriter = xmledit.CreateFirstChild()

  writer.WriteStartElement("book")
  writer.WriteAttributeString("genre", "technology")
  '  writer.WriteAttributeString ("publicationdate", "10-27-2003");
  writer.WriteAttributeString("ISBN", "1-861003-11-1")
  writer.WriteElementString("title", "Essential XML")
  writer.WriteEndElement()
  writer.Close()

Catch ex As Exception

  label1.Text = "Failed creating a book. " + ex.Message

End Try
...
...


' Check to ensure that a publicationdate attribute is supplied
' for an inserted book element. An exception is raised if no
' publicationdate attribute is supplied. This exception can be
' caught. The XPathDocument is guaranteed to be restored to
' its state prior to the insert after the exception is thrown.
Sub ItemInsertingEventHandler(sender As Object, _
                              e As XPathDocument2ChangedEventArgs)
  If e.Item.Name = "book" Then
    Dim nav As XPathNavigator2 = _
               e.Item.MoveToAttribute("publicationdate", "")

    If nav Is Nothing Then
      Throw New Exception("The 'publicationdate' attribute is " _
                        & "missing for the inserted book")
    End If
 End If

End Sub

The screenshot in Figure shows the failed business rule for the ItemInserting event. The error message indicates that the node was not inserted into the tree because the publicationdate attribute is missing. Remember that, when an exception is raised, the content of an XPath Document2 is left in the same state as it was before the process started. In other words, the node is not actually inserted into the document.

11. Failure to insert a new book without a publicationdate attribute

graphics/06fig11.gif

The ItemChanged Event

Often it is more useful to invoke a business rule once a node has been changed, rather than before it changes. In these cases the XPathChange Navigator can be used to decide whether to retain or revoke the change. The code in Listing 6.15, taken from an ASP.NET example, shows all the prices in the document being changed—but with one book (currently priced at 8.99) having its new price set to 0.

The document being referenced is held in ASP.NET Session state, and the event handler named ItemChangedEventHandler is attached to the ItemChanged event property of the XPathDocument2. After the SetValue method is called each time, the ItemChangedEventHandler is invoked for each price that changed.

Handling the ItemChanged Event
Sub Page_Load()

  Dim doc As New XPathDocument2()
  doc.Load("books.xml")
  Session("doc") = doc

End Sub


Sub UpdatingBtn_click(Src As Object, E As EventArgs)

  Try
    Dim doc As XPathDocument2 = CType(Session("doc"), XPathDocument2)
    AddHandler doc.ItemChanged, AddressOf ItemChangedEventHandler

    Dim xmledit As XPathEditor = doc.CreateXPathEditor()
    Dim iter As IEnumerable = xmledit.Select("//price/text()")

    For Each editor As XPathEditor In iter
      If editor.ReadStringValue() = "8.99" Then
        editor.SetValue("0")
      Else
        editor.SetValue("20")
      End If
    Next

  Catch ex As Exception

    label1.Text = "Failed to update a book. " + ex.Message

  End Try

End Sub

The event handler named ItemChangedEventHandler is shown in Listing 6.16. In this example, in order to get hold of an XPathChangeNavigator, a reference to the original document is maintained in the ASP.NET session and positioned on the changed item by calling the MoveTo method.

Since the event is fired on the text node, we move to the parent and check that this is a price element. If so, we check the NewValue property of the changed node. If this is 0, we treat it as an invalid price and call RejectChange on the XPathChangeNavigator to change the price back to its original value.

The Event Handler for the ItemChanged Event
Sub ItemChangedEventHandler(sender As Object, _
                            e As XPathDocument2ChangedEventArgs)
  Dim doc As XPathDocument2 = CType(Session("doc"), XPathDocument2)
  Dim nav As XPathChangeNavigator = doc.CreateXPathChangeNavigator()

  nav.MoveTo(e.Item)
  e.Item.MoveToParent()
  If e.Item.Name = "price" Then
    Dim price As Double = XmlConvert.ToDouble(e.NewValue)
    If price <= 0 Then
      nav.RejectChange()
    End If
  End If
End Sub

The difference between this example and the previous one in Listing 6.14 is that an exception is not thrown (which would cause the application to terminate). Instead, all the logic is handled within the change tracking capabilities of the XPathDocument2 and XPathChangeNavigator classes, which is the preferred approach. The screenshot in Figure shows the price still at 8.99 for the book that we tried to set to 0, whereas the others were updated successfully.

12. Failure to set the price of a book to 0

graphics/06fig12.gif

The ItemRemoved Event

Finally, we will look at the ItemRemoved event, which follows the same principles as the previous example. The code in Listing 6.17 shows all the title elements in a document being deleted with the DeleteCurrent method. However, the code has also added an event handler named ItemRemovedEventHandler to the ItemRemoved event property of the XPathDocument2.

Handling the ItemRemoved Event
Sub DeletingBtn_click(Src As Object, E As EventArgs)

  Try
    Dim doc As XPathDocument2 = CType(Session("doc"), XPathDocument2)
    AddHandler doc.ItemRemoved, AddressOf ItemRemovedEventHandler

    Dim xmledit As XPathEditor = doc.CreateXPathEditor()
    Dim iter As IEnumerable = xmledit.Select("//title")

    For Each editor As XPathEditor In iter
      editor.DeleteCurrent()
    Next editor

  Catch ex As Exception

    label1.Text = "Failed to update a book. " + ex.Message

  End Try

End Sub


Sub ItemRemovedEventHandler(sender As Object, _
                            e As XPathDocument2ChangedEventArgs)
  Dim doc As XPathDocument2 = CType(Session("doc"), XPathDocument2)
  Dim nav As XPathChangeNavigator = doc.CreateXPathChangeNavigator()

  nav.MoveTo(e.Item)
  Dim nav1 As XPathChangeNavigator = CType(nav.Clone(), _
                                     XPathChangeNavigator)
  nav1.MoveToParent()
  nav1.MoveToAttribute("publicationdate", "")

  If nav1.ReadStringValue() = "1981" Then
    nav.RejectChange()
  End If

End Sub

After the DeleteCurrent method is called for each title element, the ItemRemovedEventHandler is invoked for that node. An XPathChange Navigator is created from the document and positioned on the deleted item by calling the MoveTo method. The important aspect to remember here is that deleted nodes can be seen only via the XPathChangeNavigator, not via the XPathNavigator2.

So that we can call the RejectChange method on the deleted node, a second XPathChangeNavigator is copied to this node using the Clone method, and this is used to walk up to the parent node and see if the publicationdate attribute of the book node is 1981. If it is, the book must not have its title deleted, so we call RejectChange on the first XPath ChangeNavigator to put the deleted title element back into the document. The screenshot in Figure shows that the title element is still present for the first book, which was published in 1981, while the title elements have been removed from the other two.

13. Removing the title from each book whose publication date is not 1981

graphics/06fig13.gif