Item 32: Prefer local transactions to distributed ones



Item 32: Prefer local transactions to distributed ones

Creating a transaction against a given resource is typically a technology-specific action—for example, in a relational database, a transaction is created by using the SQL statement BEGIN TRANSACTION. After this point, all work is done in some kind of temporary space where the effects of the executed instructions (SQL) aren't felt until the transaction itself is either completed (using the SQL COMMIT syntax) or abandoned (using SQL ROLLBACK).

Nevertheless, problems still arise—most notably, what happens when we have two resources we need to operate against as part of a single transaction?

The canonical scenario is that of operating against two different databases,[3] but under J2EE this can also be a database and a JMS provider, or a Connector provider. Ordinarily, when just one resource is in use, we can rely on the provider to deal with the problem of guaranteeing transaction inviolability. If we span vendors, however, things get more complicated.

[3] Usually from two different vendors—for two different database instances of the same vendor type (Oracle and Oracle, for example), the vendor provides hooks to keep everything straight. It's when we're going against Oracle and DB/2, or DB/2 and Sybase, for example, that problems creep up.

To deal with this problem directly (which predates J2EE by quite a long time), a number of database players came together to define a distributed transaction protocol, called the two-phase commit (TPC) protocol. In the TPC protocol, we formally define three parties that are part of every transaction: the client; the Resource Manager (RM), which provides the shared resource we're trying to share access to; and the Transaction Manager (TM), which takes care of creating distributed transactions and handling the interaction between the clients and the RMs.

The TPC protocol, pared down to its essentials, looks something like the following sequence.

  1. The client acquires a distributed transaction from the TM.

  2. The client enlists the desired RMs as part of the transaction. It does so by presenting the transaction to the RMs, so that the RMs are aware of the distributed transaction and know to expect the remainder of the protocol.

  3. The client does work against the enlisted RMs as usual. The RMs, aware that this is part of a distributed transaction, ensure that the work isn't committed or rolled back until the TM is heard from. (It is the RMs' responsibility to keep this data around, for reasons that will become clear later.)

  4. When the client is ready to finish, the client signals the TM to commit. The TM takes over at this point.

  5. Phase 1: The TM first signals each RM on the transaction, asking it, in essence, "Are you prepared to commit?" This is the RM's only chance to back out of the transaction, for any reason whatsoever—low disk space, relational integrity constraints, and so on. As a result, the RM usually writes the data to "almost committed" state; it knows the data but keeps it hidden from other transactions on the system. If any RM on the transaction indicates some kind of failure, the entire transaction must be rolled back, and the TM immediately orders all RMs on the transaction to abort, even if they previously indicated they were willing to commit.

  6. Phase 2: If all parties (RMs) on the transaction signaled a willingness to commit, the TM then sends around a signal again, essentially telling them, "Go ahead and commit." There is no vote here—the RMs must commit the data, and there can be no backing out, no excuses for failure. If all RMs signal successful commits, the TM then signals a successful commit back to the client, and the transaction is finished.

As you can see, the TPC protocol is a relatively complex piece of machinery. Fortunately, it's also a very reliable one, having successfully powered database access for several decades now with exceedingly high consistency rates. In fact, most relational databases use a localized version of TPC for local transactions to offer the same kind of consistency and reliability—in these situations, the TM and the RM are the same process, so it becomes a much simpler process.

Nothing comes for free, however, and TPC carries its own share of costs. In particular, because TPC requires distributed communication, the amount of time spent executing a distributed transaction is orders of magnitude higher than a local one. Even in the scenario where a single RM is enlisted against the transaction, the interaction between the TM and RM requires interprocess (if not intermachine) communication, and as Item 17 describes, this is nontrivial latency. This means that locks held inside the RM to provide the necessary ACID properties for this distributed transaction are held longer, which is an undesirable quality (as described in Item 29).

Where does this leave you? Well, for obvious reasons, you want to avoid distributed transactions unless you absolutely must have them, which means you must have ACID properties against multiple resources (databases, JMS providers, JCA Connector providers, and so on). Take careful note of how that's phrased: you want distributed transactions only if you have multiple data sources and you must have transactional semantics when working with those multiple resources—when a message to a JMS Queue must go out if and only if the database INSERT succeeds, for example.

While it may seem that this requirement comes up frequently, it turns out to be less prevalent than you might think. For many systems, for example, even though other databases may be the ultimate recipients for data gathered by this system, rather than access those databases directly, developers work against their own databases, and back-end processes pull the desired data and ship it around to those other databases. Or, your system may require real-time read access against another database but won't make any updates, so no transaction is necessary against that other database (see Item 35 for reasons).

The worst part of this story, however, comes next: by default, all transactions taken out by your favorite EJB container will be distributed TPC transactions, even if only one resource manager is ever in use. The EJB Specification (Section 17.1.1) contains an informative paragraph (set off by italic font) that states:

Many applications will consist of one or several enterprise beans that all use a single resource manager (typically a relational database management system). The EJB Container can make use of resource manager local transactions as an optimization technique for enterprise beans for which distributed transactions are not needed....The container's use of local transactions as an optimization technique for enterprise beans with either container-managed transaction demarcation or bean-managed transaction demarcation is not visible to enterprise beans.

Or, in other words, you can't guarantee that your container will choose to use local transactions, nor can you somehow indicate to the container that your enterprise beans should use local transactions. This is one area where the EJB Specification clearly takes the attitude that "the less you as a programmer know, the better." And if the container decides that your access to the database and the JMS Queue must take place under a distributed transaction, you're now running under a distributed transaction, whether you wanted to or not. And, thanks to the wonders of auto-enlistment—that magic within the EJB container that automatically enlists a resource, like a database, as part of the transaction as soon as it is retrieved from the JNDI Context—simply referencing a JMS Queue and a DataSource in the same method puts those two in the same transaction, even if you don't want them to be. (This is true once the transaction is open, regardless of whether container-managed transactions or bean-managed transactions are used. Thus your only hope of avoiding this situation—if you want or need to avoid it—is to write bean-managed transactions and not open the transaction until after you've acquired the resources that shouldn't be in this EJB transaction. Ugly.)

Net result? If you want to keep your transaction windows as short as possible (see Item 29), due to the increased communication requirements of the TPC protocol, you really want to use local transactions wherever possible. Doing so may require other approaches to doing your enterprise logic because EJB doesn't allow you to enforce the use of local transactions and prefers instead to run distributed ones regardless of your opinion on the matter. Combined with EJB's concurrency model (described in Item 31), this in turn reinforces the notions that (a) EJB is really for transactional processing only, and (b) in particular, EJB is really for distributed transactional processing, as described in Item 9.