Item 40: Use objects-first persistence to preserve your domain model



Item 40: Use objects-first persistence to preserve your domain model

You've spent weeks, if not months, designing an elegant object model representing your application's business domain objects, searching for just the right combination of inheritance, composition, aggregation, and other object-modeling techniques to build it. It has taken quite a while, but you're finally there—you have an object model the team can be proud of. Now it's time to preserve that domain data into the database, and you want to keep your object model in place; after all, what good is an object model if you're just going to have to turn around and write a whole bunch of ugly SQL to push the data back out to the database?

In an objects-first approach to persistence, we seek to keep the view of objects during persistence; this means either the objects know how to silently persist themselves without any prompting from us or they provide some kind of object-centric API for doing the persistence and retrieval. Thus, in the ideal world, writing code like the following would automatically create an entry in the database containing the person data for Stu Halloway, age 25:






Person p = new Person("Stu", "Halloway", 25);

System.out.println(p);

    // Prints "Stu Halloway, age 25"


Writing these next lines of code would automatically update the existing row created in the first snippet to bump Stu's age from 25 to 30:






Person p = Person.find("Stu", "Halloway");

System.out.println(p);

    // Prints "Stu Halloway, age 25"

p.setAge(30);

System.out.println(p);

    // Prints "Stu Halloway, age 30"


Notice one of the great benefits of the objects-first persistence approach: no ugly SQL, no worries about whether we need to do an INSERT or an UPDATE, and so on. All we see are objects, and we like it that way.

An objects-first approach tends to break down fairly quickly when trying to retrieve objects, however. Generally speaking, an objects-first approach takes one of two possible methods: either we issue queries for objects by creating objects that contain our criteria in pure object-oriented fashion or we use some kind of "query language" specific to object queries.

In a strictly purist objects-first environment, we never want to see anything but objects, so we end up building Query Objects [Fowler, 316[1]] that contain the criteria we're interested in restricting the query around. Unfortunately, building a complex query that executes by using criteria other than the object's primary key (sometimes called an OID, short for object identifier, in the parlance of OODBMSs is often complicated and/or awkward:

[1] By the way, if you use Fowler's implementation of the Query Object found in his book, note that as written, the code could be vulnerable to a SQL injection attack (see Item 61).






QueryObject q = new QueryObject(Person.class);

q.add(Criteria.and(

          Criteria.greaterThan("dependents", 2)),

              Criteria.lessThan("income", 80000)));

q.add(Criteria.and(

          Criteria.greaterThan("dependents", 0)),

              Criteria.lessThan("income", 60000)));


Here, we're trying to build the equivalent of the following lines:






SELECT * FROM person p

  WHERE ( (p.dependents > 2 AND p.income < 80000)

          OR (p.dependents > 0 AND p.income < 60000) )


Which is easier to read? Things get exponentially worse if we start doing deeply nested Boolean logic in the query, such as looking for "people making less than $80,000 with more than 2 dependents who in turn claim them as parents, or people making less than $60,000 with any dependents who in turn claim them as parents." In fact, it's not uncommon to find that an objects-first purist query approach has much stricter restrictions on what can be queried than a general-purpose query language, like SQL.

Which leads us to the second approach, that of creating some kind of "query language" for more concisely expressing queries without having to resort to overly complex code. All of the objects-first technology approaches in Java have ultimately fallen back to this: EJB 2.0 introduced EJBQL, a query language for writing finders for entity beans; JDO introduced JDOQL, which does the same for JDO-enhanced persistent classes; and going way back, OODBMSs used OQL, the Object Query Language. These languages are subtly different from each other, yet all share one defining similarity: they all look a lot like SQL, which is what we were trying to get away from in the first place. (JDOQL is technically a language solely for describing a filter, which is essentially just the predicate part of the query—the WHERE clause—while still using a Query Object–style API.) Worse, EJBQL as defined in EJB 2.0 lacks many of the key features of SQL that make it so powerful to use for executing queries. The 2.1 release will address some of this lack, but several features of SQL are still missing from EJBQL.

Another unfortunate side effect of using an objects-first approach is that of invisible round-trips; for example, when using the entity bean below, how many trips to the database are made?






PersonHome ph =

  (PersonHome)ctx.lookup("java:comp/env/PersonHome");



// Start counting round-trips from here

//

Collection personCollection = ph.findByLastName("Halloway");

for (Iterator i = personCollection.iterator(); i.hasNext(); )

{

  Person p = (Person)i.hasNext();

  System.out.println("Found " + p.getFirstName() +

                          " " + p.getLastName());

}


Although it might seem like just one round-trip to the database (to retrieve each person whose last name is Halloway and to populate the entity beans from the PersonBean pool as necessary), in fact, this is what's called the N+1 query problem in EJB literature—the finder call will look only for the primary keys of the rows matching the query criteria, populate the Collection with entity bean stubs that know only the primary key, and lazy-load the data into the entity bean as necessary. Because we immediately turn around and access data on the entity bean, this in turn forces the entity bean to update itself from the database, and since we iterate over each of the items in the Collection, we make one trip for the primary keys plus N more trips, where N equals the number of items in the collection.

Astute developers will quickly point out that a particular EJB entity bean implementation isn't necessarily required to do something like this; for example, it would be possible (if perhaps nonportable—see Item 11) to build an entity bean implementation that, instead of simply pulling back the OIDs/primary keys of the entity as the result of a query, pulls back the entire data set stored within that entity, essentially choosing an eager-loading implementation rather than the more commonly used lazy-loading approach (see Items 47and 46, respectively). Unfortunately, this would create a problem in the reverse—now we will complain about too much data being pulled across, rather than too little.

The crux of the problem here is that in an objects-first persistence scenario, the atom of data retrieval is the object itself—to pull back something smaller than an object doesn't make sense from an object-oriented perspective, just as it doesn't make sense to pull back something smaller than a row in an SQL query. So when all we want is the Person's first name and last name, we're forced to retrieve the entire Person object to get it. Readers familiar with OQL will stand up and protest here, stating (correctly) that OQL, among others, allows for retrieval of "parts" of an object—but this leads to further problems. What exactly is the returned type of such a query? I can write something like the following lines:






SELECT p.FirstName, p.LastName

  FROM Person p

  WHERE p.LastName = 'Halloway';


But what, exactly, is returned? Normally, the return from an object query is an object of defined type (in the above case, a Person instance); what are we getting back here? There is no commonly accepted way to return just "part" of an object, so typically the result is something more akin to a ResultSet or Java Map (or rather, a List of Map instances).

Even if we sort out these issues, objects-first queries have another problem buried within them: object-to-object references. In this case, the difficulty occurs not so much because we can't come up with good modeling techniques for managing one-to-many, many-to-many, or many-to-one relationships within a relational database (which isn't trivial, by the way), but because when an object is retrieved, the question arises whether it should pull back all of its associated objects, as well. And how do we resolve the situation where two independent queries pull back two identical objects through this indirect reference?

For example, consider the scenario where we have four Person objects in the system: Stu Halloway is married to Joanna Halloway, and they have two children, Hattie Halloway and Harper Halloway. From any good object perspective, this means that a good Person model will have a field for spouse, of type Person (or, more accurately, references to Person), as well as a field that is a collection of some type, called children, containing references to Person again.

So now, when we execute the earlier query and retrieve the first object (let's use Stu), should pulling the Stu object across the wire mean pulling Joanna, Hattie, and Harper across, as well? Again, should we eager-load the data—remember, these objects are referenced from fields in the Stu object instance—or lazy-load it? And when we move to the next result in the query, Joanna, which in turn references Stu, will we have one Stu object in the client process space or two? What happens if we do two separate queries, one to pull back Stu alone, and the second to pull back Joanna? This notion of object identity is important because in Java, object identity is established by the this pointer (the object's location), and in the database it's conveyed via the primary key—getting the two of them to match up is a difficult prospect, particularly when we throw transactions into the midst. It's not an impossible problem—an Identity Map [Fowler, 195] is the typical solution—but as an object programmer, you need to be aware of this problem in case your objects-first persistence mechanism doesn't take care of it.

The ultimate upshot here is that if you're choosing to take an objects-first persistence approach, there are consequences beyond "it's easier to work with." In many cases, the power and attractiveness of working solely with objects is enough to offset the pain, and that's saying a lot.