SQL Server as an XML Database
The beta 1 release of SQL Server, currently code-named "Yukon," has the facility to store XML as a native format, for example, as a dedicated column type, passed as a parameter, or declared as a variable. Until now, most users have stored XML in "character" or "text" columns, as a simple String value. However, now you can actually store it as XML and then access and query it as XML rather than having to extract it as a String and manage the process yourself.
With the new XML data type you can do the following things.
Store XML in "typed" form, where a schema is registered with SQL Server and the XML must conform to this schema. Store XML in "untyped" form, where no schema is required and the XML just has to be well formed. Select the value from an XML column in SQL statements and stored procedures, just like any other column type. Query the XML in the column directly, using XPath and XQuery techniques, to find and return specific values. Modify the XML in a column by adding or removing nodes or by changing the values of existing nodes. Bind XML to other SQL Server data types, allowing queries to access values in stored procedure variables and other non-XML columns. Access and return sets of node values, which resemble a single-column rowset. Create indexes on the XML columns to vastly improve performance when querying and updating the data.
All of these topics are covered in depth in the SQL Server Books Online reference documentation that is provided with the "Yukon" beta release. In the remainder of this chapter we'll briefly examine the important features, giving you a feel for what the new data type can do and how you can use it.
The XML Data Type in SQL Server "Yukon"
The XML data type in SQL Server "Yukon" is called xml, and you use it when creating tables, stored procedures, indexes, and so on in exactly the same way as any other native SQL Server data type. For example, the following declaration creates a table containing an int column named MyRowKey and an xml data type column named MyXmlDoc:
CREATE TABLE MyTable(MyRowKey int, MyXmlDoc xml)
However, you cannot use an xml column as the primary key of a table or as a foreign key column, and you cannot specify the UNIQUE or COLLATE properties for it or use it in a GROUP BY clause. There is also a maximum limit of 32 columns in a table that can be of the xml type.
Typed and Untyped XML Data Columns
The CREATE TABLE statement shown above creates a table with an untyped xml data type column. Effectively the XML you insert into this column is stored in character format complete with the element tags and attribute names, using UCS-2 encoding. An untyped column is ideal where you will generally extract the complete XML document as a String when you want to access the values.
However, although you can still perform all the other operations on the column (as described later), this process will be less efficient than using a typed column. And when you insert XML into the column, it is checked for well-formedness and rejected if it is not well formed. This is the same kind of behavior as you would expect when, for example, trying to insert a char (String) value into a column that is defined as being of type int.
The alternative is to create a typed xml data column. This is a little more complicated because you first have to register the schema with SQL Server using a CREATE XMLSCHEMA statement. We look at how you register an XML schema with SQL Server in the following subsection of this chapter.
However, once the schema is registered, you just have to specify the schema for the XML that will be placed in this column. The simple way to do this is to specify the schema (using its registered namespace) with the column data type when you create the table:
CREATE TABLE MyTable(MyRowKey int,
MyXmlDoc xml ('http://mysite.com/schemas/myschema'))
The advantages with a typed data column are that SQL Server can then perform validity checking on values that are inserted or when the individual nodes of the XML document are updated or deleted. It also changes the way that SQL Server stores the data in the XML document.
Because the schema defines the data types of the values within the XML document, SQL Server can store them as its native data types (e.g., varchar, int, money, datetime, and so on) and then map these values to the query and result when it is executed. This provides better performance when querying the XML and reduces storage overhead for the document.
The XML Schema Repository in SQL Server "Yukon"
When you register schemas with SQL Server, they are stored within the system tables of the Master database to create a schema repository. The schemas define the structure, format, and content of XML documents that will be considered valid, and a schema can therefore be applied to any column that will store instances of this document or to any variable that will hold an instance of the document. You can't remove (drop) the schema while any table still contains a column based on it or any stored procedure still contains a variable that references it.
Registering an XML Schema
To use a schema to define an XML data column, you must first register the schema with SQL Server using the CREATE XMLSCHEMA statement. This statement takes a single string-type parameter that contains or references the schema to be registered. In the simplest case:
CREATE XMLSCHEMA '<xsd:schema xmlns="...">... etc. ...</xsd:schema>'
After registration, the schema is identified by the namespace(s) you specify in the targetNamespace or xmlns attributes of that schema. An alternative way to register a schema is to assign it to a variable and then use the variable in the CREATE XMLSCHEMA statement:
DECLARE @myschema varchar(1000)
SET @myschema = '<xsd:schema xmlns="...">... etc. ...</xsd:schema>'
CREATE XMLSCHEMA @myschema
You can register more than one schema at a time by simply placing them all in the string value you pass to the CREATE XMLSCHEMA statement:
CREATE XMLSCHEMA '<xsd:schema xmlns="...">... etc. ...</xsd:schema>
<xsd:schema xmlns="...">... etc. ...</xsd:schema>
<xsd:schema xmlns="...">... etc. ...</xsd:schema>'
To manage security when registering schemas, SQL Server now exposes a new permission attribute named CREATE XML SCHEMA. The account being used to register the schema must have this permission. The permission can be assigned and managed in exactly the same way as the other permissions included in SQL Server, for example, by using the GRANT and REVOKE statements.
Dropping an XML Schema
To remove a schema from the SQL Server repository, you use the DROP XMLSCHEMA statement, specifying the namespace of the schema to be removed. For example:
DROP XMLSCHEMA NAMESPACE 'http://mysite.com/schemas/myschema'
This will fail if any table or variable within the database is currently using the specified schema.
Viewing an XML Schema
To view a schema that is registered in SQL Server, you use the xml_schema_namespace function and supply the schema namespace that was defined when it was registered:
SELECT xml_schema_namespace(N'http://mysite.com/schemas/myschema')
One point to note is that the returned schema is built up from the data stored in the system tables and is not the original "character" version you registered. For example, it will not contain any comments that were in the original schema and may differ in the ordering of the schema elements.
Limitations on Schemas Stored in SQL Server
There are some limitations applied to the use of schemas that define xml column types in SQL Server "Yukon."
You cannot register a schema that contains a redefine element or unique, key, or keyref expressions. Annotations and comments are not stored in the database. Notations are not stored in the database but are validated when the schema is registered. The XSD id attribute is not supported but is validated when the schema is registered. Default values cannot exceed 4,000 Unicode characters. An xsd:import element causes the target schema to be imported and incorporated into the current schema, and the registered schema will no longer reference the imported schema. You should avoid using schemas that define patterns for values in the data because these may not be suitable for validation if the internal representation of the values differs from the representation in the XML document or fragment. Date and time values are normalized to the current time zone and do not contain the time zone offsets when stored in the database.
Inserting and Selecting against an XML Data Column
You have several options for loading an XML document or fragment into an XML data column. Because the xml data type behaves just like any other data type, it can be used in SQL statements, stored procedures, and T-SQL statements that create and modify tables, indexes, and so on.
XML is, of course, notionally text or character data, and the normal data storage type used to handle it outside SQL Server is a String variable. A String containing XML is automatically and implicitly cast to an xml type when included in a SQL statement or stored procedure, so the following could be used to insert a row into a simple two-column table that contains one int column and one xml column:
INSERT INTO MyTable(MyRowKey, MyXmlDoc)
VALUES (42, '<cars><onecar id="1">... etc. ...</onecar></cars>')
Alternatively, you can explicitly CAST a string value into an xml type or use the CONVERT function:
INSERT INTO MyTable(MyRowKey, MyXmlDoc)
VALUES (42, CAST('<cars>... etc. ...</cars>') AS xml))
or:
INSERT INTO MyTable(MyRowKey, MyXmlDoc)
VALUES (42, CONVERT(xml, '<cars>... etc. ...</cars>'))
When extracting XML from a column of type xml, you can perform the reverse process to convert it to a character-type value:
SELECT CONVERT(nvarchar(500), MyXmlDoc, 'ucs-2')
FROM MyTable WHERE ... etc. ...
However, you can CAST or CONVERT only to char, varchar, nchar, and nvarchar types. You cannot CAST or CONVERT the value in an xml column to text or ntext types.
Querying and Updating the Contents of an XML Data Column
SQL Server provides several techniques for querying and updating the values stored in a column of type xml:
The query method with an XQuery statement to return a fragment of untyped XML The value method with an XQuery statement to return a single value from the XML and expose it as a standard (non-xml) SQL data type The exist method to test whether a specific node/value exists in the XML data The modify method to execute an XML Data Modification Language (XML DML) statement The xmlnoderefs function to return a single-column rowset of values
We'll look at each of these techniques in turn next. The syntax and usage of XQuery is covered in detail earlier in this chapter, so the next sections concentrate only on the syntax of the SQL Server XML functions.
The query Method
You use the query method of the xml data type when you want to query an xml column in a table and get back a fragment of untyped XML as the result. The syntax for the query method is:
query('xquery-statement'[, node-ref])
where xquery-statement is the XQuery as a literal string that selects the XML fragment (an element, attribute, and so on), and the optional node-ref parameter specifies a node or a set of nodes against which the XQuery expression is evaluated.
For example, if the XML document contains a <cars> element that itself contains multiple <car id="xxx"> elements, you can select a specific car on its id value using:
SELECT MyXmlDoc::query('/cars/car[@id="42"]') FROM MyTable
The result is an XML fragment such as:
<car id="42"><make>Rover</make><model>Rampage</model></car>
If you use a node-ref parameter, it must specify a set of nodes in the XML document that contain the value you are querying for. See the section on the xmlnoderefs function later in this part of the chapter for more details. If the XML instance itself (the column of type xml that is specified in the SELECT statement) contains null, the query method returns NULL.
The value Method
The value method also uses an XQuery statement to access a node in the XML document, but this time returns it as a SQL Server native data type such as int, char, or datetime (but not an xml type, user-defined type, image, text, ntext, or timestamp). The syntax for the value method is:
value('xquery-statement', sql-type[, node-ref])
where xquery-statement is the XQuery as a literal string that selects a single node from the XML, sql-type is the native data type to return the value as, and the optional node-ref parameter specifies a node or a set of nodes against which the XQuery expression is evaluated.
For example, the following uses an XQuery that specifies the single model node (element) for the car with id="42" from the same document used in the previous example:
SELECT MyXmlDoc::value('/cars/car[@id="42"]/model', nchar(50))
FROM MyTable
If you use a node-ref parameter, it must specify a set of nodes in the XML document that contain the value you are querying for. See the subsection on the xmlnoderefs function later in this chapter for more details. If the XML instance itself (the column of type xml specified in the SELECT statement) contains null, the value method returns NULL.
The exist Method
The exist method is used to test for the presence of a node or value in an XML document. The syntax of the exist method is:
exist('xquery-statement'[, node-ref])
where xquery-statement is the XQuery as a literal string that selects a single node, and the optional node-ref parameter specifies a node or a set of nodes against which the XQuery expression is evaluated. The method returns True (bit value 1) if the XQuery gives a nonempty result, False (bit value 0) if the result is empty. For typed XML documents where the target node contains null, the exist method returns NULL.
For example, you can specify that a SELECT statement should select only the row in a table named MyTable that contains the value 42 for a <car> element in the XML document stored in an xml column named MyXmlDoc like this:
SELECT SomeOtherColumn FROM MyTable
WHERE MyXmlDoc::exist('/cars/car[@id="42"]')
This example returns the value of the column named SomeOtherColumn, which is a native SQL Server data type such as nvarchar or int. However, if the SomeOtherColumn were an xml data type, you could combine the query or the value method with the exist methods in the SELECT statement to get back either untyped XML or a single value:
SELECT SomeOtherColumn::value('/root/enginetype', nchar(50))
FROM MyTable
WHERE MyXmlDoc::exist('/cars/car[@id="42"]')
If you use a node-ref parameter, it must specify a set of nodes in the XML document that contain the value you are querying for. See the subsection on the xmlnoderefs function later in this chapter for more details. If the XML instance itself (the column of type xml specified in the SELECT statement) contains null, the exist method returns NULL.
The modify Method
To modify the contents of an XML document in an xml column, you can use the modify method. It takes as a parameter an XML DML statement, which itself contains an insert, update, or delete statement that manipulates nodes within the xml column data. The syntax of the modify method is:
modify(xml-dml-statement)
The modify method of the xml data type can be used only in the SET clause of a SQL UPDATE statement. Let's look briefly at XML DML.
The XML Data Modification Language
XML Data Modification Language (XML DML) extends XQuery beyond the current W3C recommendations to provide a full-featured query and data modification language that can be used against the xml data type. It adds three keywords to XQuery for this purpose.
The insert keyword is used to insert one or more nodes into an XML document that resides in an xml data column. The delete keyword is used to remove one or more nodes from an XML document that resides in an xml data column. The update keyword is used to modify the value of one or more nodes in an XML document that resides in an xml data column.
XML DML statements containing these three keywords are used as parameters to the modify method of the xml data type.
Using the XML DML insert Keyword
The insert keyword specifies that one or more nodes will be inserted into the XML document as either children or siblings of a specified node. The syntax for the insert keyword is:
insert expression1 position expression2
where expression1 is either an XML fragment or an XQuery expression that selects an XML fragment, and expression2 identifies a single target node for the XML fragment to be inserted into. The position value indicates the location of the inserted content (which can be an element or an attribute) relative to the node specified by expression2 and follows this form:
{as first | as last} into | before | after
In other words, you specify either as first or as last, followed by into, before, or after. For example, to insert the XML fragment <seats>4</seats> into the <car> element whose id value is 42, you could use:
UPDATE MyTable SET MyXmlDoc::modify('insert <seats>4</seats>
as last into /cars/car[@id="42"]')
This locates the new element after all the existing elements in the <car id="42"> element. If you use the before or after keyword instead of into, the new element becomes a sibling of the <car> element (i.e., it is located at the same level in the hierarchy of the document) rather than a child of it.
When you use into, and the target node is an element that already contains other child elements, you must specify either as first or as last. However, these keywords are ignored if you are inserting an attribute into an existing element.
If you insert a sequence of values (e.g., a single-column rowset such as is returned from the xmlnoderefs function), they are inserted into the target node as a single text node containing the values delimited by single spaces. If you want to achieve the same result when specifying the XML explicitly as a character string for expression1, the values must be wrapped in parentheses and delimited by commas.
Using the XML DML delete Keyword
The delete keyword specifies that one or more nodes will be removed from the target XML document. The syntax for the delete keyword is simply:
delete expression
where expression is an XQuery expression that selects one or more nodes in the target document. For example, to delete the <car> node with id value 42 you could use:
UPDATE MyTable SET MyXmlDoc::modify('delete /cars/car[@id=42]')
To delete all the <make> elements from the document you could use:
UPDATE MyTable SET MyXmlDoc::modify('delete //make')
Using the XML DML update Keyword
The update keyword specifies that one or more nodes will be modified within the target XML document. The syntax for the update keyword is:
update [value of] expression1 to expression2
where expression1 identifies the target node whose value is to be updated, and expression2 is the new value for that node. The target node specified in expression1 must evaluate to a single node; otherwise, an exception is generated. The new value specified in expression2 can be an untyped XML fragment or a simple value. If it is an XML fragment that specifies multiple values, these values are inserted as a space-delimited list for the new value of the target node. The optional value of keyword indicates that expression2 is a value, not an untyped XML fragment.
As an example, the following will update the value of the <model> element within the <car> element that has id value 42 to a new string value:
UPDATE MyTable SET MyXmlDoc::modify('update value of
/cars/car[@id=42] to "Wrampage Mark 2"')
The xmlnoderefs Function
The xmlnoderefs function takes an XML document stored in an xml data column and returns a single-column rowset, called a noderef column table. This contains references to a sequence of nodes in the XML document, as specified by an XQuery you supply. The syntax of the xmlnoderefs function is:
xmlnoderefs(xml-column, xquery-statement [, node-ref])
where xml-column is the xml data column to extract the results from, and xquery-statement is an XQuery statement that selects the nodes in that column that will be returned. The optional node-ref parameter specifies a node or a set of nodes against which the XQuery expression is evaluated and allows nested noderef selections to be made. In other words, you can query against an existing set of nodes that are themselves extracted by the xmlnoderefs function.
For example, to create a noderef that references all the <car> elements in the document we've been using so far, you could simply use:
xmlnoderefs(MyXmlDoc, '/cars/car')
This noderef can then be used as the optional third parameter of the value method to select the model name of the car with the id value 42. In this case, the XQuery expression in the first parameter will be evaluated against the noderef list of <car> elements:
SELECT MyXmlDoc::value('car[@id="42"]/model',
nchar(50),
xmlnoderefs(MyXmlDoc, '/cars/car'))
FROM MyTable
A noderef column table can be used in the optional parameter for the query, value, and exist methods described earlier. This tends to improve performance of queries by preselecting and providing a path to the set of nodes on which the query will operate. However, bear in mind that a noderef cannot be used in situations other than those described above, nor stored for later reuse, because it contains references that are valid only during the current query.
Binding Relational Data inside XML
The new xml data type allows you to use values from other non-xml type columns and from non-xml variables within an XQuery statement. This process is referred to as binding XML to other SQL Server data types. For example, you may want to use an XQuery to select a value in an xml column but include in that XQuery a reference to another column that is not of type xml. To achieve this there are two new "pseudo-functions":
The sql:column function, which allows a non-xml column to be referenced within an XQuery expression The sql:variable function, which allows a non-xml variable to be referenced within an XQuery expression.
As an example of their use, the following SELECT statement extracts the value of a node from an XML document stored in the MyXmlDoc column, referencing both the xml column named MyXmlDoc and the int column named Seats in the WHERE clause:
SELECT MyXmlDoc::query('car/model', nchar(50))
FROM MyTable
WHERE MyXmlDoc::exist('/cars/car/[make="Rover"]
and sql:column("MyTable.Seats")=4)')
Or, if the number of seats were stored in a variable named @seats within a stored procedure, you could use:
SELECT MyXmlDoc::query('car/model', nchar(50))
FROM MyTable
WHERE MyXmlDoc::exist('/cars/car/[make="Rover"]
and sql:variable("@seats")=4)')
While you could easily adapt the two SQL statements above to avoid using the new functions, they come in particularly handy when using XQuery statements that return templated fragments of XML—for example:
DECLARE @price money
SET @price=2500.00
DECLARE @x xml
SELECT @x = CatalogDescription::query('
<Product ProductID="{sql:column("Product.ProductID")}"
ProductName="{ProductDescription[1]/@ProductModelName}"
ListPrice="{sql:column("Product.ListPrice")}"
DiscountPrice="{sql:variable("@price")}"
/>
')
FROM Product, ProductModel
WHERE Product.ProductModelID = ProductModel.ProductModelID
AND ProductID=168
The ProductID and ListPrice values that are inserted into the XQuery template come from non-xml type columns, while the ProductName value is extracted from the xml column. The DiscountPrice value comes from the @price variable (of type money) that is declared and set at the start of the procedure. This produces a result, depending on the data in the database, something like the following:
<Product ProductID="1062"
ProductName="ML Road Frame"
ListPrice="594.83"
DiscountPrice="125.5"
/>
However, note that the sql:column and sql:variable functions cannot be used for columns or variables that are of User-Defined Types, are unique identifiers, or are of type xml, timestamp, text, ntext, or image.
Indexing an XML Data Column
A useful feature of the new xml data type is that you can create an index on a column of this type (but only for a table, not for a view). This can increase performance considerably, but you must bear in mind a few limitations.
The table containing the xml column must also contain a clustered primary key, and once an xml column index exists, you cannot modify the clustered primary key unless you first drop all the xml indexes. You can specify only a single column in an xml index—multiple column indexes are not supported for xml columns. You must remove the index on a column before you can change it from "typed" to "untyped" (i.e., register a schema for it).
Note that, while this section covered the basics of creating and using the new xml data type, it is not a comprehensive reference to all the opportunities, techniques, and limitations. The SQL Server Books Online help files provided with "Yukon" contain plenty of useful information on the xml data type and how XML data columns behave.
|