The Basics



The Basics

Before trying to understand SOAP, it's important to know about and understand three other specifications: HTTP, XML, and XML Schema. Chances are you already know something about each of these, and feel free to skip ahead if you do. Otherwise, read on for a quick walkthrough.

HTTP

HTTP is one the most popular protocols used on the Internet. Every time a browser accesses a Web page, HTTP is the transport protocol being used.

DEFINITION: PROTOCOL

A standard mechanism for communication between two machines.

HTTP is very good at and well designed for request–response communication. Request–response means that a request is made, and a response to that particular request is received right away. Here is an example of a request:

GET /MyStockQuote HTTP/1.1 

And the response:

HTTP/1.1 200 OK 
Content-Type: text/html
Content-Length: nnnn

<HTML>
     <BODY>
          MSFT is up 4 today!
     </BODY>
</HTML>

DEFINITION: REQUEST–RESPONSE

This term is usually applied to times when a message is sent and a reply is received immediately on the same connection. Many transports support other modes of communication, but request–response is very popular.

This communication takes place over TCP/IP (Transmission Control Protocol/Internet Protocol). Specifically, the request is made over port 80 with TCP. (Port 80 is typically used for HTTP but isn't the only port used.) Most interesting to us, however, is how HTTP uses headers to send information. In the request just shown, no headers were sent, but the request could have sent any number of headers. With SOAP, all requests send a SOAPAction header. Furthermore, the request started with the GET verb. There are several other verbs, including POST, which is what SOAP uses.

For more information about HTTP, refer to http://www.w3.org/Protocols/. Chapter 7, Transport Protocols for Web Services, also covers HTTP in more detail, along with other popular transport protocols for Web services, such as TCP and SMTP (Simple Mail Transport Protocol).

DEFINITION: HEADERS

The word headers is heard a lot in the networking and Web services world. It can mean several different things, including headers

  • On a TCP packet

  • On an HTTP request

  • Within a SOAP message

XML and XML Schema

XML is the increasingly popular data markup format. Similar to HTML, XML is tag based. Although XML doesn't replace HTML, it is a complementary format. HTML is for describing human viewable content, whereas XML is used by a computer process. For example:

<StockReport> 
     <Symbol name="MSFT">120</Symbol>
</StockReport>

The rules of XML are simple:

  • All tags must be closed. For example, if you have a <StockReport> opening tag, then you must at some point afterward have a </StockReport> ending tag. Beware: XML is case sensitive.

  • Tags cannot overlap. For example, <A><B></A></B> is not allowed.

  • XML tags can have attributes, as in <Symbol name="msft">, but you must wrap the attributes values with quotation marks, and you must not repeat an attribute within a tag. For example, <Symbol name="MSFT" name="AAPL"> is not allowed.

Most XML documents also use namespaces. Namespaces allow for tags to have the same name but different meanings. For example:

<Car xmlns="http://contoso.com/cars/"> 
     <Model>Nissan</Model>
     <Make>Altima</Make>
     <Nissan:Make xmlns:Nissan="http://contoso.com/cars/nissan/">
          GXE
     </Nissan:Make>
</Car>

In this example, most of the tags are in the namespace "http://contoso.com/cars/"; however, there is also the second <Make> tag, which is in a different namespace: "http://contoso.com/cars/nissan/". As a matter of fact, rather than being programmatically referenced by their tag name alone, most tags are referenced by their entire qualified name, which includes the element name (usually called the local name) and the namespace. Qualified name is often shortened to QName.

DEFINITION: NAMESPACE

A unique identifier of resources on the Web. A resource can be a Web page, a SOAP endpoint, or even a person.

By itself XML is very interesting, and it already has found many uses all over the world. However, one thing missing from XML is a way to describe the shape of the XML document. It would be nice to be able to state: My stock report XML document will start with a <StockReport> tag, and will contain any number of child tags called <Symbol> that must have an attribute called name, and both are in the http://myExample.com namespace.

Several ways of describing this kind of information have been proposed. One of the most recent and popular is XML Schema. What's exciting about XML schemas is that they are XML documents themselves. Schemas are XML documents that describe the shape of any particular XML document. There are schema tags to describe, among other things,

  • Which elements and attributes can or must be present.

  • Their order.

  • How many are allowed.

  • The data type of each.

  • The namespaces of each.

Here is a short example of a schema:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> 
     <xsd:element name="Simple" type="xsd:string"/>
</xsd:schema>

This very simple schema says that the document will contain a single element, <Simple>. Of course, this is a very simple example. Schemas also provide for the description of much more complex structures of XML. Listing 2.1 shows a more complex example that makes use of more XML Schema features, such as specifying a sequence of elements and specifying both how few and how many of any particular element can appear.

An XML Schema Document
<schema
   xmlns=http://www.w3.org/2001/XMLSchema
   targetNamespace="http://tempuri.org/">
      <element name="test">
        <complexType>
          <sequence>
            <element
                 minOccurs="1"
                 maxOccurs="1"
                 name="a"
                 nillable="true"
                 type="s0:A"
           />
          </sequence>
        </complexType>
      </element>
      <complexType name="A">
        <sequence>
          <element
               minOccurs="1"
               maxOccurs="1"
               name="myString"
               nillable="true"
               type="string" />
          <element
               minOccurs="1"
               maxOccurs="1"
               name="myInt"
               nillable="true"
               type="string" />
        </sequence>
      </complexType>
    </schema>

The schema in Listing 2.1 describes an XML document that will look something like this:

<test xmlns="http://tempuri.org/"> 
      <a>
        <myString>string</myString>
        <myInt>string</myInt>
      </a>
</test>

Notice that this is merely one of a near infinite number of XML documents that could be described from this schema.

A Word About Data Types

Data types are common to almost all programming languages. In fact, the primitive data types that a language supports make up a large piece of that language's flavor and uniqueness. XML Schema (XSD), although not a procedural language like C++ or Java, also contains a set of primitive data types. It's important to realize that when you are trying to send data in a heterogeneous environment, you shouldn't use data types another system doesn't understand. XSD helps here, by defining a standard set of data types that all Web service vendors can understand.

It's interesting that a schema for a document often can be much longer than the actual instances of the schema. Of course, this makes sense once you realize that the schema should describe all possible instances.

Notice the major pieces in the schema. The <element> tag describes the various possible elements. It has attributes for the name of the element, as well as for the minimum number of times and maximum number of times it may appear. It also describes the data type of the element. The <complexType> tag describes a complex series of elements. Many other constructs are possible with a schema, such as the structural tags that describe whether elements are in a particular sequence or group. XML Schema has two interesting pieces: a data types specification and a structural specification. For more information about XML, refer to http://www.w3.org/XML/. For more information about XML Namespaces, refer to http://www.w3.org/TR/1999/REC-xml-names-19990114/.

And for more information about XML Schema, refer to http://www.w3.org/TR/2001/REC-xmlschema-0-20010502/. Note that XSD has three papers behind it: a primer, a data types paper, and a structures paper. I strongly recommend that you read the primer first. XML, XML Namespaces, and XML Schema are also discussed in depth in Chapter 8, Data and Format: XML and XML Schemas.