June 10, 2007, 6:36 a.m.
posted by oxy
Item 32: Prefer local transactions to distributed onesCreating a transaction against a given resource is typically a technology-specific action—for example, in a relational database, a transaction is created by using the SQL statement BEGIN TRANSACTION. After this point, all work is done in some kind of temporary space where the effects of the executed instructions (SQL) aren't felt until the transaction itself is either completed (using the SQL COMMIT syntax) or abandoned (using SQL ROLLBACK). Nevertheless, problems still arise—most notably, what happens when we have two resources we need to operate against as part of a single transaction? The canonical scenario is that of operating against two different databases,[3] but under J2EE this can also be a database and a JMS provider, or a Connector provider. Ordinarily, when just one resource is in use, we can rely on the provider to deal with the problem of guaranteeing transaction inviolability. If we span vendors, however, things get more complicated.
To deal with this problem directly (which predates J2EE by quite a long time), a number of database players came together to define a distributed transaction protocol, called the two-phase commit (TPC) protocol. In the TPC protocol, we formally define three parties that are part of every transaction: the client; the Resource Manager (RM), which provides the shared resource we're trying to share access to; and the Transaction Manager (TM), which takes care of creating distributed transactions and handling the interaction between the clients and the RMs. The TPC protocol, pared down to its essentials, looks something like the following sequence.
As you can see, the TPC protocol is a relatively complex piece of machinery. Fortunately, it's also a very reliable one, having successfully powered database access for several decades now with exceedingly high consistency rates. In fact, most relational databases use a localized version of TPC for local transactions to offer the same kind of consistency and reliability—in these situations, the TM and the RM are the same process, so it becomes a much simpler process. Nothing comes for free, however, and TPC carries its own share of costs. In particular, because TPC requires distributed communication, the amount of time spent executing a distributed transaction is orders of magnitude higher than a local one. Even in the scenario where a single RM is enlisted against the transaction, the interaction between the TM and RM requires interprocess (if not intermachine) communication, and as Item 17 describes, this is nontrivial latency. This means that locks held inside the RM to provide the necessary ACID properties for this distributed transaction are held longer, which is an undesirable quality (as described in Item 29). Where does this leave you? Well, for obvious reasons, you want to avoid distributed transactions unless you absolutely must have them, which means you must have ACID properties against multiple resources (databases, JMS providers, JCA Connector providers, and so on). Take careful note of how that's phrased: you want distributed transactions only if you have multiple data sources and you must have transactional semantics when working with those multiple resources—when a message to a JMS Queue must go out if and only if the database INSERT succeeds, for example. While it may seem that this requirement comes up frequently, it turns out to be less prevalent than you might think. For many systems, for example, even though other databases may be the ultimate recipients for data gathered by this system, rather than access those databases directly, developers work against their own databases, and back-end processes pull the desired data and ship it around to those other databases. Or, your system may require real-time read access against another database but won't make any updates, so no transaction is necessary against that other database (see Item 35 for reasons). The worst part of this story, however, comes next: by default, all transactions taken out by your favorite EJB container will be distributed TPC transactions, even if only one resource manager is ever in use. The EJB Specification (Section 17.1.1) contains an informative paragraph (set off by italic font) that states:
Or, in other words, you can't guarantee that your container will choose to use local transactions, nor can you somehow indicate to the container that your enterprise beans should use local transactions. This is one area where the EJB Specification clearly takes the attitude that "the less you as a programmer know, the better." And if the container decides that your access to the database and the JMS Queue must take place under a distributed transaction, you're now running under a distributed transaction, whether you wanted to or not. And, thanks to the wonders of auto-enlistment—that magic within the EJB container that automatically enlists a resource, like a database, as part of the transaction as soon as it is retrieved from the JNDI Context—simply referencing a JMS Queue and a DataSource in the same method puts those two in the same transaction, even if you don't want them to be. (This is true once the transaction is open, regardless of whether container-managed transactions or bean-managed transactions are used. Thus your only hope of avoiding this situation—if you want or need to avoid it—is to write bean-managed transactions and not open the transaction until after you've acquired the resources that shouldn't be in this EJB transaction. Ugly.) Net result? If you want to keep your transaction windows as short as possible (see Item 29), due to the increased communication requirements of the TPC protocol, you really want to use local transactions wherever possible. Doing so may require other approaches to doing your enterprise logic because EJB doesn't allow you to enforce the use of local transactions and prefers instead to run distributed ones regardless of your opinion on the matter. Combined with EJB's concurrency model (described in Item 31), this in turn reinforces the notions that (a) EJB is really for transactional processing only, and (b) in particular, EJB is really for distributed transactional processing, as described in Item 9. |
- Comment