Item 45: Never assume you own the data or the database



Item 45: Never assume you own the data or the database

Remember that one of our fundamental assumptions about enterprise software (from Chapter 1) is that an enterprise software system is one in which some or all of its resources are shared. In fact, the core of the enterprise system—the data—is most frequently the shared resource in question. Most of the time, we assume that the extent of the sharing occurs among the various users of the enterprise system. However, it doesn't stop there—many enterprise databases count other enterprise systems as part of their userbase, and unfortunately those "users" have much more stringent requirements than usual.

Some of these other systems making use of your database aren't strangers; it's not uncommon, for example, to design or purchase a reporting engine to allow end users to create their own reports from the database. This is a good thing; without that, the need to create the logic and nicely formatted output will fall into developers' laps. While constantly changing reports to keep up with user whims is one way to stay employed for the rest of your life, it also means you never get to do anything else on the project, and that's not my idea of an interesting job. Those reporting engines typically work directly against the database schema, and here's where we need to be careful.

If multiple systems or agents share a resource, the format or schema of that resource cannot change at the whim of one of those systems or agents without causing a ripple effect through the rest. Alterations to the schema, excessive database locks, and relocation of the database instance are changes that may seem localized to just your codebase but will in turn create problems and issues with these other systems. In fact, this is what keeps legacy systems alive for so long—it's extremely difficult to track down every client for a given resource or repository within the company (particularly large companies), much less figure out all the data dependencies of a given client against a single system. As a result, it's far easier to just leave the legacy system in place and create adapters, translators, and proxies than to replace the legacy system with something new.

To the J2EE developer, this has some sobering, if not chilling, implications: you do not own your database, nor do you own the data within it. Even if you are building the database entirely from scratch for this project, even if the project seems to be tucked away in a little corner of the business where nobody else will have any interest in it, you don't own that database. It's only a matter of time before somebody else within the company hears about your little system and wants to gain access to that data, so you start setting up back-end data feeds, direct connections, and so on. Or worse, somebody else on your team sets them up without telling you. Before long, you make a schema change, and people you've never even heard of, much less met, are howling into your phone demanding to know why you broke their application.

The conclusion here is clear: your schema, once designed, is in far greater danger of being locked in than your code. This is where having an encapsulation layer between the users and the database (as discussed in Item 42) becomes so critical—by forcing clients (any clients, including your own code) to go through some kind of encapsulatory barrier, such as a stored procedure layer or Web Service endpoint layer, to gain access to the data, you essentially plan for the future and the fact that others will need access to the data even as you modify the underlying schema and data description definitions.

This also means that the database schema shouldn't be optimized for the J2EE case: be careful when building your schema not to tie it too closely to your object model, be extremely careful about storing serialized Java objects in Binary Large Object (BLOB) fields since doing so means that data cannot be used as part of a SQL query (unless your database supports it, in which case the query will perform extremely slowly), and don't store "magic values" in the database. For example, although it might be convenient to store Java Date objects as long integers (the underlying value held in a java.util.Date, accessible via getTime, and the only nondeprecated way to construct a java.util.Date), other languages will have little to no idea what to do with a value like that, and converting it into a date value acceptable to them is difficult. Some databases may try to implicitly convert an integer value into a date, for example, but how they convert the value will be entirely different for each database. Despite the greater work in Java, it's far better to store the date in the database in either a string-based format or as an ANSI-standard DATETIME.

In fact, because your database is likely to become a shared database fairly quickly, and because J2EE isn't slated to take over the entire world any time soon (the .NET and PHP programmers may have a few words for those who think it will), you'll generally prefer to put whatever constraints you can into the database, rather than relying on J2EE code to enforce restrictions. For example, given the choice between checking to see whether an incoming string is fewer than 40 characters long before storing it to the database, go ahead and simply rely on database integrity constraints on the table to check the length of the string, rather than doing it in J2EE and setting the column's size to be something larger than that. Since the database is going to go through the size-constraint check anyway, why do it twice? Similarly, model any sort of relationship between tables/entities in the database directly within the database, using foreign-key constraints to ensure that both sides of the constraint are in fact present within the database.

It may seem easier, at first blush, to go ahead and encode this sort of logic directly within your object model and/or domain logic—after all, Java is probably your first language, and it's a natural human tendency to want to solve problems using the tools we're most comfortable with. Resist. The database provides a lot of functionality that can't be easily—or as universally—applied in your J2EE code. For example, most database products support the concept of triggers, blocks of database code that can be executed on any sort of row- or table-based access, giving you the ability to apply domain logic after the row has been inserted, updated, deleted, or whatever. Trying to build this into a J2EE application, which would have to be universally applied across all parts of the system, requires either building a complex notification mechanism into your code (keeping in mind the drawbacks of an RPC-based callback mechanism, as described in Item 20) or hand-coding directly into your domain object representations of the database entities themselves, leading you down a scary road of maintenance nightmares over time.

While it may seem convenient and entirely plausible to take shortcuts in the database layer to make it easier for you as a J2EE developer, resist the urge. Ultimately, it's just going to create more problems if you don't—when the inevitable question, "Hey, we noticed you were capturing some data we were interested in, and we want to get at it, so how can we?" comes up, it's going to be up to you to make that happen, and it'll be a lot easier to take care of at the front of the project than at the back.