Programming with XML and Namespaces



Programming with XML and Namespaces

Within the .NET Framework, there are three ways to engage in XML programming:

  • Stream-based programming with XmlTextReader and XmlTextWriter

  • DOM-based programming with XmlDocument

  • A hybrid approach called XML Serialization

(XML Serialization relies on the stream APIs to provide a strongly typed object model for XML documents. Refer to Chapter 5, XML Serialization with .NET, for discussion.)

Streaming XML Processing

The two generally accepted ways to handle streams of XML are pull-based parsers and push-based parsers. Push-based parsers are generally based on the SAX standard (the Simple API for XML). The .NET Framework doesn't come with a push-based parser, but instead offers an intriguing alternative in the form of a pull-based parser.

Reading Streams

The basic process for reading goes something like this:

  1. Create an instance of the parser based on a stream of XML.

  2. Call the Read() method.

  3. Check the NodeType, Name, Value, and other properties of the parser for information on the current node.

  4. If the current node has what you are interested in, do something.

  5. Repeat steps 2 to 4 until the stream is EOF (end of file, a common term for a stream, file, or memory buffer that is empty), until Read() returns false, or you are done.

As an example, Listing 8.6 shows a small function that takes a filename and outputs the NodeType, Name, and Value properties from the parser, for each node it encounters in the XML file.

An XML Parsing Routine
private void ParseFile( String filename )
{
     try
     {
       FileStream stream = new FileStream( filename, FileMode.Open );
       XmlTextReader reader = new XmlTextReader( stream );

       while( reader.Read() )
       {
           WriteLine(
                reader.NodeType.ToString(),
                reader.Name,
                reader.Value );
      }
      stream.Close();
      }
      catch( Exception ex )
      {
          MessageBox.Show( ex.ToString() );
      }
}

Now, let's send the file in Listing 8.7 through the function in Listing 8.6. The result is output as in Figure (nicely formatted into a listview control within a Windows form application).

Sample XML
<?xml version="1.0" encoding="utf-8"?>
<RadioConfig
        xmlns:xsd="http://www.w3.org/2001/XMLSchema"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="http://radiosharp/">
  <Users>
    <User>
      <Username>Keith</Username>
      <Password>5BAA61E4C9B93F3F0682250B6CF8331B68FD8</Password>
      <Playlists />
    </User>
    <User>
      <Username>Stranger</Username>
      <Password>5D6033183093FA36265860CDDDC9FD88BB138</Password>
      <Playlists />
    </User>
  </Users>
</RadioConfig>
1. A GUI (Graphical User Interface) for an XML File

graphics/08fig01.gif

Notice that we come across several interesting NodeTypes: Element, EndElement, Whitespace, and Text. Many of these have names and values, but most don't. For example, Whitespace is just that—white space—and seldom indicates content. Examining the function in more detail, you can see that there isn't a lot to using the XmlTextReader:

  1. Use a FileStream for the file you want to read; this could easily be substituted with a NetworkStream.

  2. Call the Read() method while in a loop. The while loop will exit once Read() returns false, which will be when the file has been parsed completely.

  3. To obtain information about the current node after each read, you can check a host of properties and methods on the XmlTextReader itself, such as the following:

    • NodeType— one of the XmlNodeType enums; for example, XmlNodeType.Element or XmlNodeType.Whitespace

    • Name— the local name of the element; for example, address for <Address>

    • Namespace— the namespace of the element; for example, "http://keithba.com/Contacts"

    • IsEmptyElement— lets you know if the element is empty; for example, <Address/>

    • GetAttribute(String name)— gets the value of the attribute specified

Writing Streams

Writing XML to a stream is also very easy. In fact it's simpler than reading from a stream. With the .NET Framework, you can use the XmlTextWriter class. In general, the process is as follows:

  1. Create an XmlTextWriter object, based on a stream or filename.

  2. Use the WriteStartDocument method to start writing the XML document to the stream.

  3. For each element, start by using the WriteStartElement method.

  4. Now write the data for that element, usually via the WriteString method. (Attributes can also be written out with the WriteAttribute String or WriteAttribute method.)

  5. Close the element with the WriteEndElement method.

  6. Close the document with the WriteEndDocument method.

Listing 8.8 shows an example that uses these methods.

Using the XmlTextWriter Class
XmlTextWriter writer = new XmlTextWriter(
         "c:\\test.xml",
         System.Text.UTF8Encoding.UTF8 );
writer.WriteStartDocument();
           writer.WriteStartElement(
                "c", "Contacts", "http://keithba.com/Contacts");
                  writer.WriteStartElement("Address");
                       writer.WriteString("100 Main Street.");
                  writer.WriteEndElement();
           writer.WriteEndElement();
     writer.WriteEndDocument();
writer.Close();

The code in Listing 8.8 will write the following XML to the file C:\test.xml:

<?xml version="1.0" encoding="utf-8" ?> 
<c:Contacts xmlns:c="http://keithba.com/Contacts">
     <Address>100 Main Street.</Address>
</c:Contacts>

There are many useful methods and properties on the XmlTextWriter class, including some of the ones used in this example:

  • The Indentation property sets whether or not the XML is formatted as it is written. This property can be either Formatting.Indented or Formatting.None (the default).

  • The WriteStartElement method writes an open element tag: <Address>.

  • The WriteEndElement method closes up an open tag: </Address>.

  • The WriteAttributeString method writes out an attribute and its value.

A Sample Stream-Based Application

Suppose you have a web.config file from which you want to read certain values. This is useful because web.config files control the settings of ASP.NET applications. However, there is no easy-to-use API for manipulating values inside of this file. Listing 8.9 shows an example web.config.

Using web.config
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <system.web>
    <compilation
         defaultLanguage="c#"
         debug="true"
    />
    <customErrors
    mode="Off"
    />
    <authentication mode="Forms" >
          <forms loginUrl="login.aspx" timeout="60" />
    </authentication>
    <trace
        enabled="false"
        requestLimit="10"
        pageOutput="false"
        traceMode="SortByTime"
        localOnly="true"
    />
    <sessionState
            mode="InProc"
            stateConnectionString="tcpip=127.0.0.1:42424"
            sqlConnectionString="data source=127.0.0.1;"
            cookieless="false"
            timeout="20"
    />
    <globalization
            requestEncoding="utf-8"
            responseEncoding="utf-8"
   />
 </system.web>
</configuration>

Now suppose you wanted to adjust the default language, debug settings, and the mode for custom errors, via a simple Windows application, as shown in Figure. It would be fairly simple to use the XmlTextReader to read in these particular values, as shown in Listing 8.10.

Using XmlTextReader
FileStream stream = new FileStream( filename, FileMode.Open );
XmlTextReader reader = new XmlTextReader( stream );

while( reader.Read() )
{
     if( reader.Name == "compilation" )
     {
       txtLang.Text = reader.GetAttribute("defaultLanguage");
       txtDebug.Text = reader.GetAttribute("debug");
     }
     if( reader.Name == "customErrors" )
     {
         txtCustom.Text = reader.GetAttribute("mode");
     }
}
stream.Close();
2. A GUI for Changing web.config

graphics/08fig02.gif

And, to write out the changes, you could use the XmlTextWriter class, as in Listing 8.11.

Writing Changes with XmlTextWriter
FileStream oldStream = new FileStream( oldFile, FileMode.Open );
XmlTextWriter writer = new XmlTextWriter(
                             newFile, System.Text.UTF8Encoding.UTF8 );
writer.Formatting = Formatting.Indented;
writer.WriteStartDocument();
     writer.WriteStartElement("configuration");
          writer.WriteStartElement("system.web");
              writer.WriteStartElement("compilation");
              writer.WriteAttributeString("defaultLanguage", txtLang.Text);
              writer.WriteAttributeString("debug", txtDebug.Text );
           writer.WriteEndElement();
           writer.WriteStartElement("customErrors");
                   writer.WriteAttributeString("mode", txtCustom.Text);
           writer.WriteEndElement();
        writer.WriteEndElement();
      writer.WriteEndElement();
writer.WriteEndDocument();
writer.Close();

There are two important things to note from this example: First, unless you purely need to read or write XML, you will need to use both the XmlTextReader and the XmlTextWriter because each class does only one operation or the other exclusively. Second, you need a fair amount of code to manipulate streams of XML efficiently. And that's why we'll look next at the XML DOM programming model.

DOM-Based Programming

Another way to manipulate XML with the .NET Framework (or most other platforms) is to use a DOM (Document Object Model) that is an in-memory representation of the entire XML document. This allows you to manipulate an XML document generically, and with more ease than the previous stream code.

For example, to display the same XML with the streaming sample, we can write out this code with the XmlDocument class (which is one of .NET's many DOM classes), as in Listing 8.12.

Writing Code with the XmlDocument Class
using System;
using System.Xml;

class Class1
{
     [STAThread]
     static void Main(string[] args)
     {
          XmlDocument dom = new XmlDocument();
          dom.Load( @"c:\data.xml" );
          PrintChildNodes( dom, 0 );
          Console.ReadLine();
     }

     static void PrintChildNodes( XmlNode theNode, int tabNumber )
     {
          foreach( XmlNode node in theNode.ChildNodes )
          {
               String tabs = "";
               for( int i = 0; i< tabNumber; i++ )
               {
                    tabs += " ";
               }
               Console.WriteLine( tabs + node.NamespaceURI + ":" +
                          node.Name + " = " + node.Value );
               if( node.HasChildNodes )
               {
                    PrintChildNodes( node, tabNumber + 1 );
               }
          }
     }
}

The code in Listing 8.12 will output the XML in Listing 8.13 to the command line.

The XML Code That Results from Writing Code with the XmlDocument Class
:xml = version="1.0" encoding="utf-8"
http://radiosharp/:RadioConfig =
 http://radiosharp/:Users =
  http://radiosharp/:User =
   http://radiosharp/:Username =
    :#text = DizzyLizzy
   http://radiosharp/:Password =
    :#text = 5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8
   http://radiosharp/:Playlists =
  http://radiosharp/:User =
   http://radiosharp/:Username =
    :#text = Stranger
   http://radiosharp/:Password =
    :#text = 5D6033183093FA36265860CDDDC9FD88B63BB138
   http://radiosharp/:Playlists =

Notice that many of the nodes lack values, and many lack namespaces. A big difference from the stream-based APIs is the complete lack of any Whitespace node. DOMs typically perform poorly with information such as white space and carriage returns.

Now, with our web.config sample from Listing 8.9, we can write similar code in Listing 8.14 to access the same values we did earlier (not that I would, because the position of nodes is not a given with this file).

Searching XML for a Specific Attribute
XmlDocument dom = new XmlDocument();
dom.Load( filename );
txtLang.Text = dom.ChildNodes[1].ChildNodes[0] ChildNodes[1].Attributes["defaultLanguage"].Value;
txtDebug.Text = dom.ChildNodes[1].ChildNodes[0].ChildNodes[1] .Attributes["debug"].Value;
txtCustom.Text = dom.ChildNodes[1].ChildNodes[0].ChildNodes[3] .Attributes["mode"].Value;

In general, I don't recommend using DOM programming with .NET because it is slower and more memory intrusive than using the stream-based parsers. And both of the benefits that DOM offers—less code to write and an easy-to-understand model—can be obtained using XML Serialization. However, there are some exceptions to this, so you should be aware of this technology.