XML Documents and Namespaces



XML Documents and Namespaces

XML documents are simple to produce. In this section, I'll cover how to create XML documents and the rules to follow. Two major specifications impact nearly every XML document: XML 1.0 and Namespaces in XML 1.0. Instead of reiterating these, I will focus on showing through example and informal text how modern systems are creating and using XML.

XML Specifications

You can find most XML specifications at the Web site of the W3C (World Wide Web Consortium). Http://www.w3.org/XML/ is the best place to start, and contains links to the related specifications as well, such as XML Include and the XML Information Set.

An XML document is a series of elements and attributes that can be applied to elements. Here is an element:

<Address>100 Main Street, North Bend, WA 98045</Address> 

Notice that the element <Address> contains some textual data, and that it is closed by an end element: </Address>. We could add an attribute to the <Address> element to modify the data further:

<Address addressType="Business"> 
     100 Main Street, North Bend, WA 98045
</Address>

In this case, we are modifying the address information to state what type of address it is, by using the addressType attribute. This attribute value, in addition to being "Business", could also be "Residential". The point is that we choose to encode data as either elements or attributes.

Now, because we will want to use easy-to-read English words (or almost) for most elements and attributes, we need to qualify these by giving a namespace. A namespace in XML is a unique identifier—usually a URL, but always a URI (uniform resource identifier)—that is a set of element and attribute names. Listing 8.2 is an example.

Using a Namespace in an XML Document
<MyDocument>
<Address
     xmlns="Keithba.com/Contacts"
     addressType="Business">
     100 Main Street, North Bend, WA 98045
</Address>
<Address xmlns="http://Keithba.com/Bookmarks" >
     http://www.msnbc.com
</Address>
</MyDocument>

Listing 8.2 uses the same word, Address, but with two different meanings: The first is a physical address from my list of contacts. The second is the URL of a bookmark from my Web browser. Because each of these elements has a different semantic, we qualify them with namespaces.

DEFINITION: URI

Uniform resource identifiers are similar to URLs (uniform resource locators), with which everyone in the world is familiar. The difference is that a URI is a larger class of items. Whereas a URL must point to a location (e.g., a Web page), a URI is merely a universally unique name, which may or may not point to a location.

Namespaces are indicated in an XML document with the special attribute xmlns. This attribute contains a string—a URI that (hopefully) is unique. Generally speaking, most namespaces are physical URLs, such as the ones in this example. However, keep in mind that there is no guarantee that these URLs will produce anything. That said, most namespaces, if they are URLs, actually produce the schema for that namespace.

In this example, every element under and including the element with the namespace declaration (the xmlns="" attribute) is considered to be in that namespace. Attributes are not in that same default; they are by default in the namespace of "". Namespace declarations also permit us to use prefixes, which is shorthand for the entire URI. This enables us to namespace qualify attributes, as well as mix together elements that are in different namespaces, as shown in Listing 8.3.

Qualifying Attributes by Namespace
<c:Contact
     xmlns:c="http://keithba.com/contacts"
     xmlns:b="http://keithba.com/bookmarks">
<c:Name>Keith Ballinger, Inc</c:Name>
<c:Address
      c:addressType="Business">
              100 Main Street, North Bend, WA 98045
     </c:Address>
<b:Address>
     http://www.msnbc.com
</b:Address>
</c:Contact>

In this example, the <Address> elements, along with everything else, are namespace qualified. This means that an XML parser can examine the namespace prefix, which is part of the element's name, to determine the meaning of the element. Defining the prefix "c" to be "http://keithba.com/contacts" and the prefix "b" to be "http://keithba.com/bookmarks" does all this.

Generally speaking, this combination of the actual namespace for an element (or attribute) along with the local name of the element (or attribute) is called the fully qualified name (FQN) of the element (or attribute). The equation is simple: namespace + local name = FQN.

For example, the fully qualified name of the root element in the preceding example would be: "http://keithba.com/contacts"+ "Contact"= "http://keithba.com/contacts:Contact". This last syntax isn't generally recognized, but the XML tools in .NET use it, and I find it compact and easy as well.

One final XML item to point out is the notion of mixed content. In general, an element can contain text (such as the <Address> element in the example), or it can contain other elements (such as the <Contact> element in the example). In actuality, you can have elements that contain both. In Listing 8.4, notice that the first <Address> element contains both text and a child element called <Country>. This is mixed content.

Mixed Content in an XML Document
<MyDocument>
<Address
     xmlns="Keithba.com/Contacts"
     addressType="Business">
          100 Main Street, North Bend, WA 98045
<Country>USA</Country>
</Address>
<Address xmlns="http://Keithba.com/Bookmarks" >
     http://www.msnbc.com
</Address>
</MyDocument>

HTML (actually, XHTML) is an example of mixed content, in which you use attributes to specify specific content pieces, as shown in the following example:

<HTML> 
<Body>
<P>This is some <B>text</B> that is bold.</P>
</Body>
</HTML>

In general, it's a bad idea to use mixed content in your XML for Web services. Mixed content works best for HTML-like documents—for cases in which humans will probably be interacting with the document, or the document will be used to display content, such as through a browser. Because most XML used with Web services is actually business data, mixed content has no real value and actually makes things harder to code. So, we could change Listing 8.4 to look like Listing 8.5.

An Alternative to Mixed Content
<MyDocument>
<Address
     xmlns="Keithba.com/Contacts"
     addressType="Business">
            <Street>
                  100 Main Street, North Bend, WA 98045
            </Street>
<Country>USA</Country>
</Address>
<Address xmlns="http://Keithba.com/Bookmarks" >
      http://www.msnbc.com
</Address>
</MyDocument>

Now, let's examine how we can use the .NET Framework to manipulate XML documents.

PIs and DTDs

PIs (processing instructions) and DTDs (document type definitions) are not discussed in this chapter. Generally speaking, these aspects of XML aren't widely used with Web services, if at all. To tell the truth, both are outdated. DTDs in particular have been completely subsumed by XML Schemas, which are discussed later in this chapter.

Some people would disagree with me on the outdatedness of PIs. But because they are outlawed in SOAP, the value they offer the Web service developer is seriously diminished.