June 7, 2009, 3:19 p.m.
posted by angryuser
Describing XML with SchemasSchemas are the solution to a problem that may not be entirely obvious at first. The problem is this: How do I describe what any potential XML document will look like, using XML? If you are like me, that seems like a nice idea, but where's the problem? Basically, developers need to know what the XML can look like, so that they can write their code correctly. You can imagine that this might be in the form of an e-mail from one developer to another:
To: Hervey
From Keith
Subject: What the Contact's XML doc can look like
Hey Hervey
I've added a new subsystem to the GigaWatt program.
Basically, I'm now saving out the contact information as XML – like I mentioned in our last meeting. Since you have to read that stuff in for the subsystem you are working on, I though I'd better describe what it can look like.
The root element will always be called <Contacts>. It can then have any number of <Contact> elements underneath; each of these can have a <Name> element, along with an <Address> element. Also, you can have an <Address> element for a URL as well. Everything is in the usual namespaces as well. You'll figure it out.
Thanks, KeithBa
This prose is certainly readable by one human to another, but how in the world do you figure out exactly what that XML is suppose to look like? Examples may help, but examples aren't an exhaustive set of documents. This way lie madness and bugs. Now, the XML 1.0 specification does come with a simple way to describe XML: DTDs (document type definitions). The problems with DTDs are legion, but here are the two largest: They aren't XML documents themselves, and they don't tell you everything about the XML documents' possible shapes. That they aren't in XML themselves is significant: If we had an XML language for describing XML documents, then building processors that could understand this XML language would be a lot easier, and we would get all of the other benefits of XML. Thus, the W3C gave us XML Schemas. XML Schemas are XML descriptions of a set of XML documents. The best metaphor is that of classes and objects. An XML Schema is a class; and any particular XML document that matches that schema is an instance of that class, an object. I won't try to describe XML Schemas to you exhaustively, but the rest of this chapter covers the major points. Listing 8.15 shows an example schema. An XML Schema
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="BookOrder" type="BookOrderType"/>
<xsd:complexType name="BookOrderType">
<xsd:sequence>
<xsd:element name="shippingAddress" type="Address"/>
<xsd:element name="books" type="Books"/>
</xsd:sequence>
<xsd:attribute name="DateOfOrder" type="xsd:date"/>
</xsd:complexType>
<xsd:complexType name="Address">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="street" type="xsd:string"/>
<xsd:element name="city" type="xsd:string"/>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="Books">
<xsd:sequence>
<xsd:element name="book" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Title" type="xsd:string"/>
<xsd:element name="quantity">
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:minExclusive value="100"/>
<xsd:maxExclusive value="1000"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:sequence>
<xsd:attribute name="ISBN" type="isbnType" use="required"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
<xsd:simpleType name="isbnType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{10}"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
Listing 8.15 describes XML that may look like Listing 8.16. An Instance of the Example XML Schema
<BookOrder DateOfOrder="1/1/2001" xmlns="">
<shippingAddress>
<name>Lara Ballinger</name>
<street>100 Main Street</street>
<city>North Bend</city>
<state>WA</state>
<zip>98045</zip>
</shippingAddress>
<books>
<book ISBN="0000000000">
<Title>Everyone Loves Keith</Title>
<quantity>999</quantity>
</book>
<book ISBN="1111111111">
<Title>No Really, I Love Keith</Title>
<quantity>101</quantity>
</book>
</books>
</BookOrder>
Data Types with XML SchemaSeveral built-in data types come with XML Schema. These include the normal set of things such as String, Int, and Decimal. There are also several XML-specific data-types, such as any URI and QName (qualified name). For example, consider the following XML document fragment:
<Person>
<Name>Alex DeJarnett</Name>
<Age Calendar="c:gregorian">25</Age>
</Person>
In this XML fragment, the data type of the <Name>element could be a String; the data type for the <Age> element could be an Int; and the data type for the Calendar attribute is probably a QName. I say "could be," because without looking at the schema that describes this fragment, we don't really know—although we can guess in some cases. Many of these data types are defined in the actual schema document for XML Schemas. When any data type is defined, whether in that document, or within a custom schema (such as the book-ordering schema in Listing 8.16), the element <simpleType> is used. Listing 8.17 shows an example. Listing 8.17 Using <simpleType> to Define Data Types
<xsd:simpleType name="isbnType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{10}"/>
</xsd:restriction>
</xsd:simpleType>
In this <simpleType> we define its name to be "isbnType". We also say that it is derived from the base type "xsd:string", which is the QName of the base String type as defined in the schema for XML Schemas. Next, we state that it is a subtype of String that only contains values that match the pattern "\d{10}". This pattern is a regular expression which says that this value is a String of ten digits, and only ten digits, which is what a book's ISBN number is. Notice that this is an abstract type, not a specific element called <isbnType>. Like String or Int, it can be applied to any element. Simple types also can be defined implicitly, such as with the quantity of books in Listing 8.18. Defining Simple Data Types Implicitly
<xsd:element name="quantity">
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:minExclusive value="100"/>
<xsd:maxExclusive value="1000"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
In this case, instead of defining an abstract data type, we define the element <quantity> as being derived from positiveInteger (which itself is derived from Integer, restricted to positive numbers only), but it is restricted to values between 100 and 1,000, using the <maxExclusive> and <minExclusive> elements. Simple types also can be derived to be a list of a particular type: <xsd:simpleType name="listOfIntegers"> <xsd:list itemType="xsd:Integer"/> </xsd:simpleType> This would end up making a space-delimited list of integers, for example: <SomeElement>4 10 11</SomeElement> Describing the XML ShapeIn addition to giving us a way to specify the data type of elements and attributes, the XML Schema also allows us to specify the shape of XML, such as what elements are children of other elements, what attributes they may have, and how often an element may occur. Typically, this is done with the <ComplexType> element, as in Listing 8.19. A complex type declaration typically describes the tree of elements that may occur, including which ones are required as well as which attributes are optional. Listing 8.19 Using <ComplexType> to Specify the Shape of XML
<xsd:complexType name="BookType">
<xsd:sequence>
<xsd:element name="Title" type="xsd:string"/>
<xsd:element name="quantity" type="xsd:Integer" />
</xsd:sequence>
<xsd:attribute name="ISBN" type="isbnType" use="required"/>
</xsd:complexType>
This complex type states that the element to which it is applied will contain two child elements: <Title> and <quantity>, as well as an attribute called ISBN:
<Book ISBN="0000000000">
<Title>Living without Web services: A horror story</Title>
<quantity>1</quantity>
</Book>
In addition to the <sequence> element, which specifies that the elements must appear in the order declared in the schema, there is the <all> element, which allows the elements to appear in any order. Listing 8.20 illustrates another interesting element: <choice>, which states that only one element could occur. Listing 8.20 Using the <choice> Element
<xsd:complexType name="BookType">
<xsd:choice>
<xsd:element name="Title" type="xsd:string"/>
<xsd:element name="ISBN" type="isbnType" />
</xsd:choice>
</xsd:complexType>
Now, the following is legal XML, based on the schema in Listing 8.20:
<Book>
<Title>Living without Web services: A horror story</Title>
</Book>
The following also is legal:
<Book>
<ISBN>0000000000</ISBN>
</Book>
But this is not legal:
<Book>
<Title>Living without Web services: A horror story</Title>
<ISBN>0000000000</ISBN>
</Book>
We can also constrain how often an element occurs with the minOccurs and maxOccurs attributes, as shown in Listing 8.21. Using minOccurs and maxOccurs
<xsd:complexType name="BookType">
<xsd:sequence>
<xsd:element name="Title"
minOccurs="1" maxOccurs="1" type="xsd:string"/>
<xsd:element name="quantity"
minOccurs="1" maxOccurs="1" type="xsd:Integer" />
</xsd:sequence>
<xsd:attribute name="ISBN" type="isbnType" use="required"/>
</xsd:complexType>
|
- Comment