Reading and Accessing XML Data in Document Order




Reading and Accessing XML Data in Document Order

Problem

You need to read in all the elements of an XML document and obtain information about each element, such as its name and attributes.

Solution

Create an XmlReader and use its Read method to process the document as shown in Figure.

Reading an XML document

using System; 
using System.Xml;

// …

public static void Indent(int level) 
{ 
    for (int i = 0; i < level; i++) 
      Console.Write(" "); 
}

public static void AccessXML( ) 
{
    string xmlFragment = "<?xml version='1.0'?>" +
        "<!-- My sample XML -->" +
        "<?pi myProcessingInstruction?>" +
        "<Root>" +
        "<Node1 nodeId='1'>First Node</Node1>" +
        "<Node2 nodeId='2'>Second Node</Node2>" +
        "<Node3 nodeId='3'>Third Node</Node3>" +
        "</Root>";

    byte[] bytes = Encoding.UTF8.GetBytes(xmlFragment);
    using (MemoryStream memStream = new MemoryStream(bytes))
    {

        XmlReaderSettings settings = new XmlReaderSettings();
        // Check for any illegal characters in the XML.
        settings.CheckCharacters = true;

        using (XmlReader reader = XmlReader.Create(memStream, settings))
        {
            int level = 0;
            while (reader.Read())
            {
                switch (reader.NodeType)
                {
                    case XmlNodeType.CDATA: 
                        Indent(level); 
                        Console.WriteLine("CDATA: {0}", reader.Value); 
                        break;
                    case XmlNodeType.Comment: 
                        Indent(level); 
                        Console.WriteLine("COMMENT: {0}", reader.Value); 
                        break;
                    case XmlNodeType.DocumentType: 
                        Indent(level);
                        Console.WriteLine("DOCTYPE: {0}={1}", 
                            reader.Name, reader.Value); 
                        break;
                    case XmlNodeType.Element: 
                        Indent(level); 
                        Console.WriteLine("ELEMENT: {0}", reader.Name);
                        level++;
                        while (reader.MoveToNextAttribute()) 
                        {
                            Indent(level); 
                            Console.WriteLine("ATTRIBUTE: {0}='{1}'",
                                reader.Name, reader.Value); 
                        } 
                        break;
                    case XmlNodeType.EndElement: 
                        level--; 
                        break;
                    case XmlNodeType.EntityReference: 
                        Indent(level); 
                        Console.WriteLine("ENTITY: {0}", reader.Name); 
                        break;
                    case XmlNodeType.ProcessingInstruction: 
                        Indent(level); 
                        Console.WriteLine("INSTRUCTION: {0}={1}",
                            reader.Name, reader.Value);
                        break;
                    case XmlNodeType.Text: 
                        Indent(level); 
                        Console.WriteLine("TEXT: {0}", reader.Value);
                        break;
                    case XmlNodeType.XmlDeclaration: 
                        Indent(level); 
                        Console.WriteLine("DECLARATION: {0}={1}",
                            reader.Name, reader.Value);
                        break; 
                } 
            } 
        }
    } 
}

This code dumps the XML document in a hierarchical format:

	DECLARATION: xml=version='1.0'
	COMMENT: My sample XML
	INSTRUCTION: pi=myProcessingInstruction
	ELEMENT: Root
	 ELEMENT: Node1
	  ATTRIBUTE: nodeId='1'
	  TEXT: First Node
	 ELEMENT: Node2
	  ATTRIBUTE: nodeId='2'
	  TEXT: Second Node
	 ELEMENT: Node3
	  ATTRIBUTE: nodeId='3'
	  TEXT: Third Node

Discussion

Reading existing XML and identifying different node types is one of the fundamental actions that you will need to perform when dealing with XML. The code in the Solution creates an XmlReader from a string (it could also have used a stream), then iterates over the nodes while re-creating the formatted XML for output to the console window.

The Solution shows creating a MemoryStream from an XML fragment in a string like this:

	    string xmlFragment = "<?xml version='1.0'?>" +
	        "<!-- My sample XML -->" +
	        "<?pi myProcessingInstruction?>" +
	        "<Root>" +
	        "<Node1 nodeId='1'>First Node</Node1>" +
	        "<Node2 nodeId='2'>Second Node</Node2>" +
	        "<Node3 nodeId='3'>Third Node</Node3>" +
	        "</Root>";

	    byte[] bytes = Encoding.UTF8.GetBytes(xmlFragment);
	    MemoryStream memStream = new MemoryStream(bytes);

Once the MemoryStream has been established, the settings for the XmlReader need to be set up on an XmlReaderSettings object instance. These settings tell the XmlReader to check for any illegal characters in the XML fragment:

	    XmlReaderSettings settings = new XmlReaderSettings();
	    // Check for any illegal characters in the XML.
	    settings.CheckCharacters = true;

The while loop iterates over the XML by reading one node at a time and examining the NodeType property of the current node that the reader is on to determine what type of XML node it is:

	    while (reader.Read( ))
	    {
	        switch (reader.NodeType)
	        {

The NodeType property is an XmlNodeType enumeration value that specifies the types of XML nodes that can be present. The XmlNodeType enumeration values are shown in Figure.

The XmlNodeType enumeration values

Name

Description

Attribute

An attribute node of an element.

CDATA

A marker for sections of text to escape that would usually be treated as markup.

Comment

A comment in the XML:

<!my comment -->.

Document

The root of the XML document tree.

DocumentFragment

Document fragment node.

DocumentType

The document type declaration.

Element

An element tag:

<myelement>.

EndElement

An end element tag:

</myelement>.

EndEntity

Returned at the end of an entity after calling ResolveEntity.

Entity

Entity declaration.

EntityReference

A reference to an entity.

None

This is the node returned if Read has not yet been called on the XmlReader.

Notation

A notation in the DTD (document type definition).

ProcessingInstruction

The processing instruction:

<?pi myProcessingInstruction?>.

SignificantWhitespace

Whitespace when mixed content model is used or when whitespace is being preserved.

Text

Text content for a node.

Whitespace

The whitespace between markup entries.

XmlDeclaration

The first node in the document that cannot have children:

<?xml version='1.0'?>.


See Also

See the "XmlReader Class," "XmlNodeType Enumeration," and "MemoryStream Class" topics in the MSDN documentation.