Which Programming Data Model: Relational, Object, or XML?



Which Programming Data Model: Relational, Object, or XML?

It is all about programming data model choice. Similar to the way that Neo in The Matrix had to choose between the red pill and the blue pill, often an application consideration now involves choosing among relational, XML, and object data models. The .NET CLR and Microsoft Intermediate Language (MSIL) may have dismissed many of the language wars between developers (e.g., arguments about whether C++ is better or worse than Visual Basic), only to find data model wars breaking out instead (e.g., arguments about which is the best way to work with and represent data). The .NET platform enables you to have this choice and, when necessary, to move data as easily as possible between these often quite differing data representations.

If your data is structured, stored within a relational database, and displayed on a form as a grid, and you have spent many years learning the SQL query language, the relational data model is for you.

The relational data model has been in existence for well over 30 years. It is a tried and trusted model for storing very regular or structured data. With the huge investment in relational DBMS technology, the advances in query optimization, and a mathematically understood data model, relational DBMSs are the kings of data storage. The relational data model is flat in nature, using two-dimensional tables with explicit relationships that describe how to navigate between these tables. This model works very well because it is very straightforward. However, the data stored in databases probably represents less than 20% of the world's generated data, most residing outside as documents in file storage. The Microsoft Office 2003 release typifies this by providing a development platform centered on the generation and reuse of XML documents, thereby reducing the duplication of data and providing the ability to aggregate, search, and manage your data more easily. Despite its dominance in structured data, the relational data model has never been universally accepted for data interchange precisely because it is not able to represent all forms of data, particularly documents that are typically semi-structured in nature.

If you are in an enterprise that has built architectures based around object-oriented methodologies and practices in order to provide a layer of business object abstraction to an underlying data store, programming with object-oriented languages and object technologies with object persistence to relational databases makes perfect sense. These provide tremendous value and, being based on the type system and data model of the commonly used object-oriented languages like VB, C++, and C#, there is a natural ease of programming that allows you to work with the objects as data and then persist these to a data store. The ObjectSpaces technology briefly discussed in Chapter 3 is an example of this, with the primary charter for this technology being to provide data storage access to graph or object-oriented data models. However, like relational data, there has never been an agreement on a persistable object representation for all platforms, so you will never see objects being used for data interchange across platforms in the same way as XML.

Although the dominant data model for storage is relational, it has been a holy grail of the IT industry for decades to provide a richer mainstream data model. There is demand for data stores that work more naturally with hierarchical, semi-structured, and graph (object) data models. However, attempts to provide hierarchical or object data stores have failed to attract broad customer bases and tend to be fragmented into many incompatible niche players. XML represents a semi-structured data model that technologies like the DOM, XSLT, XPath, and XQuery can target, in the same manner that SQL targets the relational data model. Already XPath has seen far greater industry adoption than any of the previous semi-structured niche programming models. With XQuery enjoying the support of large vendors, including Microsoft, it is highly probable that XML as the semi-structured data and programming model will succeed in the mainstream.

It is very unlikely that semi-structured storage in the long run will be an alternative to relational storage, but rather there will be an evolution of the relational model. In other words, XML has challenged the industry to provide more flexible data models in the mainstream storage engines, and the industry has faced up to the challenge by providing XML type support. As relational DBMSs evolve to the semi-structured data model, there needs to be improvement in the way developers access this data. This is the primary goal of the System.Xml version 2.0 release as part of ADO.NET.

The XML data model is very expressive, in that it is able to represent structured, semi-structured, and totally irregular data. And, being hierarchical, it implicitly defines the relationships between entities within its structure. This means that it is able to represent all forms of complex data—be it flat, tree, recursive, or graph-like (references between entities) data. As a metadata language it is self-describing, has a versatile expressive type system (XML Schema), enforces document order, and is extensible through your own schema definitions and names. With the growing number of applications unifying under XML-based interfaces, because of decentralized data requirements, the XML model also promotes a loosely coupled architecture, which in turn makes designs more flexible. We are really still at the base of the curve for the XML data model, but with the advent of relational DBMS native storage of XML and better XML query languages such as XQuery, the trend is set to continue across the IT industry.