Isolation and Database Locking




Isolation and Database Locking

Transaction isolation (the "I" in ACID) is a critical part of any transactional system. This section explains isolation conditions, database locking, and transaction isolation levels. These concepts are important when deploying any transactional system.

Dirty, Repeatable, and Phantom Reads

Transaction isolation is defined in terms of isolation conditions called dirty reads, repeatable reads , and phantom reads . These conditions describe what can happen when two or more transactions operate on the same data.[*]To illustrate these conditions, let's think about two separate client applications using their own instances of the TravelAgent EJB to access the same dataspecifically, a cabin record with a primary key of 99. These examples revolve around the RESERVATION table, which is accessed by both the bookPassage( ) method (through the Reservation entity) discussed at the beginning of this chapter and in Chapter 11, and a new listAvailableCabins( ) method that uses EJB QL to query cabin lists:

[*] Isolation conditions are covered in detail by the ANSI SQL-92 Specification, Document Number: ANSI X3. 135-1992 (R1998).

public List listAvailableCabins(int bedCount)
    throws IncompleteConversationalState {
    if (cruise == null)
        throw new IncompleteConversationalState( );

    Query query = entityManager.createQuery("SELECT name FROM Cabin c
             WHERE c.ship = :ship AND c.bedCount = :beds AND
             NOT ANY (SELECT cabin from Reservation res
             WHERE res.cruise = :cruise");
    query.setParameter("ship", cruise.getShip( ));
    query.setParameter("beds", bedCount);
    query.setParameter("cruise", cruise);

    return query.getResultList( );
}

When two users execute these methods concurrently, different problems can surface or be avoided entirely, depending on the isolation level used by the database. For this example, assume that both methods have a transaction attribute of Required .

Dirty reads

A dirty read occurs when a transaction reads uncommitted changes made by a previous transaction. If the first transaction is rolled back, the data read by the second transaction becomes invalid because the rollback undoes the changes. The second transaction will not be aware that the data it has read has become invalid. Here's a scenario showing how a dirty read can occur (illustrated in Figure):

  1. Time 10:00:00: Client 1 executes the travelAgent.bookPassage( ) method. Along with the Customer and Cruise entities, Client 1 had previously chosen Cabin 99 to be included in the reservation.

  2. Time 10:00:01: Client 1's TravelAgent EJB creates a Reservation entity within the bookPassage( ) method. The EntityManager inserts a record into the RESERVATION table, which reserves Cabin 99.

  3. Time 10:00:02: Client 2 executes travelAgent.listAvailableCabins( ). Client 1 has reserved Cabin 99, so it is not in the list of available cabins that is returned from this method.

  4. Time 10:00:03: Client 1's TravelAgent EJB executes the ProcessPayment.byCredit( ) method within the bookPassage( ) method. The byCredit( ) method throws an exception because the expiration date on the credit card has passed.

  5. Time 10:00:04: the exception thrown by the ProcessPayment EJB causes the entire bookPassage( ) transaction to be rolled back. As a result, the record inserted into the RESERVATION table when the Reservation EJB was created is not made durable (i.e., it is removed). Cabin 99 is now available.

A dirty read


Client 2 is now using an invalid list of available cabins because Cabin 99 is available but not included in the list. This omission would be serious if Cabin 99 was the last available cabin, because Client 2 would inaccurately report that the cruise was booked. The customer would presumably try to book a cruise on a competing cruise line.

Repeatable reads

A repeatable read occurs when the data read is guaranteed to look the same if read again during the same transaction. Repeatable reads are guaranteed in one of two ways: either the data read is locked against changes, or it is a snapshot that doesn't reflect changes. If the data is locked, it cannot be changed by any other transaction until the current transaction ends. If the data is a snapshot, other transactions can change the data, but these changes will not be seen by this transaction if the read is repeated. Here's an example of a repeatable read (illustrated in Figure):

  1. Time 10:00:00: Client 1 begins an explicit javax.transaction.UserTransaction.

  2. Time 10:00:01: Client 1 executes travelAgent.listAvailableCabins(2), asking for a list of available cabins that have two beds. Cabin 99 is in the list of available cabins.

  3. Time 10:00:02: Client 2 is working with an interface that manages cabins. Client 2 attempts to change the bed count on Cabin 99 from 2 to 3.

  4. Time 10:00:03: Client 1 re-executes TRavelAgent.listAvailableCabins(2). Cabin 99 is still in the list of available cabins.

A repeatable read


This example is somewhat unusual because it uses javax.transaction.UserTransaction, which is covered in more detail later in this chapter. Essentially, what it does is allow a client application to control the scope of a transaction explicitly. In this case, Client 1 places transaction boundaries around both calls to listAvailableCabins( ) so that they are a part of the same transaction. If Client 1 didn't do this, the two listAvailableCabins( ) methods would have executed as separate transactions and our repeatable read condition would not have occurred.

Although Client 2 attempted to change the bed count for Cabin 99 to 3, Cabin 99 still shows up in the Client 1 call to listAvailableCabins( ) when a bed count of 2 is requested. Either Client 2 was prevented from making the change (because of a lock) or Client 2 was able to make the change, but Client 1 is working with a snapshot of the data that doesn't reflect that change.

A nonrepeatable read occurs when the data retrieved in a subsequent read within the same transaction can return different results. In other words, the subsequent read can see the changes made by other transactions.

Phantom reads

A phantom read occurs when new records added to the database are detectable by transactions that started prior to the insert. Queries will include records added by other transactions after their transaction has started. Here's a scenario that includes a phantom read (illustrated in Figure):

  1. Time 10:00:00: Client 1 begins an explicit javax.transaction.UserTransaction.

  2. Time 10:00:01: Client 1 executes travelAgent.listAvailableCabins(2), asking for a list of available cabins that have two beds. Cabin 99 is in the list of available cabins.

  3. Time 10:00:02: Client 2 executes bookPassage( ) and creates a reservation. The reservation inserts a new record into the RESERVATION table, reserving Cabin 99.

  4. Time 10:00:03: Client 1 reexecutes travelAgent.listAvailableCabins(2). Cabin 99 is no longer in the list of available cabins.

A phantom read


Client 1 places transaction boundaries around both calls to listAvailableCabins( ) so that they are part of the same transaction. In this case, the reservation was made between the listAvailableCabins( ) queries in the same transaction. Therefore, the record inserted in the RESERVATION table did not exist when the first listAvailableCabins( ) method was invoked, but it did exist and was visible when the second listAvailableCabins( ) method was invoked. The inserted record is called a phantom record.

Database Locks

Databases, especially relational databases, normally use several different locking techniques. The most common are read locks, write locks, and exclusive write locks. (I've taken the liberty of adding "snapshots" to this list of techniques, although this isn't a formal term.) These locking mechanisms control how transactions access data concurrently. Locking mechanisms impact the read conditions described in the previous section. These types of locks are simple concepts that are addressed to a degree by the Java Persistence specification, but we'll discuss this later. Database vendors implement these locks differently, so you should understand how your database addresses these locking mechanisms to best predict how the isolation levels described in this section will work.

The four types of locks are:


Read locks

Read locks prevent other transactions from changing data read during a transaction until the transaction ends, thus preventing nonrepeatable reads. Other transactions can read the data but not write to it. The current transaction is also prohibited from making changes. Whether a read lock locks only the records read, a block of records, or a whole table depends on the database being used.


Write locks

Write locks are used for updates. A write lock prevents other transactions from changing the data until the current transaction is complete but allows dirty reads by other transactions and by the current transaction itself. In other words, the transaction can read its own uncommitted changes.


Exclusive write locks

Exclusive write locks are used for updates. An exclusive write lock prevents other transactions from reading or changing the data until the current transaction is complete. It also prevents dirty reads by other transactions. Some databases do not allow transactions to read their own data while it is exclusively locked.


Snapshots

A snapshot is a frozen view of the data that is taken when a transaction begins. Some databases get around locking by providing every transaction with its own snapshot. Snapshots can prevent dirty reads, nonrepeatable reads, and phantom reads. They can be problematic because the data is not real-time data; it is old the instant the snapshot is taken.

Transaction Isolation Levels

Transaction isolation is defined in terms of the isolation conditions (dirty reads, repeatable reads, and phantom reads). Isolation levels are commonly used in database systems to describe how locking is applied to data within a transaction.[*]The following terms are used to discuss isolation levels:

[*] Isolation conditions are covered in detail by ANSI SQL-92 Specification, Document Number: ANSI X3.135- 1992 (R1998).


Read Uncommitted

The transaction can read uncommitted data (i.e., data changed by a different transaction that is still in progress). Dirty reads, nonrepeatable reads, and phantom reads can occur. Bean methods with this isolation level can read uncommitted changes.


Read Committed

The transaction cannot read uncommitted data; data that is being changed by a different transaction cannot be read. Dirty reads are prevented; nonrepeatable reads and phantom reads can occur. Bean methods with this isolation level cannot read uncommitted data.


Repeatable Read

The transaction cannot change data that is being read by a different transaction. Dirty reads and nonrepeatable reads are prevented; phantom reads can occur. Bean methods with this isolation level have the same restrictions as those in the Read Committed level and can execute only repeatable reads.


Serializable

The transaction has exclusive read and update privileges; different transactions can neither read nor write to the same data. Dirty reads, nonrepeatable reads, and phantom reads are prevented. This isolation level is the most restrictive.

These isolation levels are the same as those defined for JDBC. Specifically, they map to the static final variables in the java.sql.Connection class. The behavior modeled by the isolation levels in the connection class is the same as the behavior described here.

The exact behavior of these isolation levels depends largely on the locking mechanism used by the underlying database or resource. How the isolation levels work depends in large part on how your database supports them.

In EJB, the deployer sets transaction isolation levels in a vendor-specific way if the container manages the transaction. The EJB developer sets the transaction isolation level if the enterprise bean manages its own transactions. Up to this point, we have discussed only container-managed transactions; we will discuss bean-managed transactions later in this chapter.

Balancing Performance Against Consistency

Generally speaking, as the isolation levels become more restrictive, the performance of the system decreases because transactions are prevented from accessing the same data. If isolation levels are very restrictivein other words, if they are at the Serializable levelthen all transactions, even simple reads, must wait in line to execute. This can result in a system that is very slow. EJB systems that process a large number of concurrent transactions and need to be very fast will therefore avoid the Serializable isolation level where it is not necessary.

Isolation levels, however, also enforce consistency of data. More restrictive isolation levels help to ensure that invalid data is not used for performing updates. The old adage, "garbage in, garbage out," applies. The Serializable isolation level ensures that data is never accessed concurrently by transactions, thus ensuring that the data is always consistent.

Choosing the correct isolation level requires some research about the database you are using and how it handles locking. You must also carefully analyze how each piece of data in your application is being used. For instance, almost every entity in the Titan Cruises reservation system has data that will seldom, if ever, change. A ship's name never changes. The number of cabins in a ship rarely changes. Cruise names and dates never change because customers have to make vacation plans based on this information. Even if these types of data do change, they would rarely affect the integrity of the system. Therefore, a low isolation level can be specified when a piece of business logic is only viewing this type of data. Reservations, on the other hand, can greatly affect the integrity of our Titan reservation system. If you look at the bookPassage( ) method carefully, you may see that it is quite possible for the system to double-book a cabin of a particular cruise. If two customers are making a reservation concurrently for the same cabin and cruise, then a double booking would happen because there are no checks in bookPassage( ) to prevent it. Let's fix this problem:

public TicketDO bookPassage(CreditCardDO card, double price)
    throws IncompleteConversationalState {

    if (customer == null || cruise == null || cabin == null) {
        throw new IncompleteConversationalState( );
    }
    try {
     Query isReserved 
 = entityManager.createQuery(
           "select count(res) from Reservation res" +
            "where res.cabin = :cabin AND res.cruise = :cruise");
        isReserved.setParameter("cabin", cabin);
        isReserved.setParameter("cruise", cruise);
        int count = (Integer) isReserved.getSingleResult( );
        if (count > 0) throw new EJBException("Cabin already reserved");

        Reservation reservation = new Reservation(
            customer, cruise, cabin, price);
        entityManager.persist(reservation);

        this.process.byCredit(customer, card, price);

        TicketDO ticket = new TicketDO(customer,cruise,cabin,price);

        return ticket;
    } catch(Exception e) {
        throw new EJBException(e);
    }
}

What we've done is added an isReserved query to bookPassage( ) that will check to see if a cabin has already been reserved for a particular cruise. If that query returns any number of cabins, then the reservation cannot be made. If the bookPassage( ) method interacts with the EntityManager and data source using a Serializable isolation level database connection, then executing the isReserved query will obtain the necessary locks in the database to ensure that the value returned by this query will be valid for the duration of the transaction. No isolation level can be used other than Serializable because the other levels do not adequately isolate the query from changes made by other transactions.

Controlling isolation levels

Different EJB servers allow different levels of granularity for isolation levels; some servers defer this responsibility to the database. Most EJB servers and EntityManager implementations control the isolation level through the resource access API (e.g., JDBC and JMS) and may allow different resources to have different isolation levels. However, they will generally require a consistent isolation level for access to the same resource within a single transaction. Consult your vendor's documentation to find out the level of control your server offers.

Bean-managed transactions in session beans and message-driven beans, however, allow you to specify the transaction isolation level using the database's API. The JDBC API, for instance, provides a mechanism for specifying the isolation level of the database connection. For example:

DataSource source = (javax.sql.DataSource)
    jndiCntxt.lookup("java:comp/env/jdbc/titanDB");
Connection con = source.getConnection( );
con.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE);

You can have different isolation levels for different resources within the same transaction, but all enterprise beans using the same resource in a transaction should use the same isolation level.

Optimistic Locking

The isReserved query we used earlier in combination with the Serializable isolation level allowed the bookPassage( ) method to prevent double bookings. With this query in place, there will be far fewer angry customers. This solution for double bookings has a serious performance drawback, though. In order for the isReserved query to work, no other reservation referencing the cabin the customer is trying to book, or the query may be invalid. In most databases, an exclusive lock would need to be obtained on the entire RESERVATION table to make this work. This means that any bookPassage( ) or listAvailableCabin( ) invocations could happen only one at a time. This is a huge scalability problem. As Titan's business grows and adds more ships, cruises, and customers, our system will not be able to handle the new processing load, no matter how many machines we buy or CPUs we upgrade. This is because there will be a high contention on a shared resource (i.e., on the RESERVATION table).

So, how can we solve this concurrency problem? One solution is to use the optimistic locking design pattern. Optimistic locking isn't locking in the traditional sense. The way it works is that in our bookPassage( ) method, we assume that no other customer is trying to book the same cabin at the same time. Then, at transaction commit time, we let the database resolve whether the cabin has been reserved. If it has been reserved, we throw an exception and roll back the transaction. In other words, we are being optimistic that no other customer is reserving the same cabin. How does this work? How does this avoid table-level locks? Well, to use optimistic locking we have to redesign our Titan reservation system a little and use a special feature of Java Persistence.

The first thing we need to do is create a new entity class that holds information about a particular cabin for a particular cruise. Let's call this new entity class CruiseCabin. A CruiseCabin entity class will be created for each cabin for each cruise:

package com.titan.domain;

import javax.persistence.*;

@Entity
public class CruiseCabin {
   private int id;
   private Cabin cabin;
   private Cruise cruise;
   private boolean isReserved;
   private long version;

   @Id @GeneratedValue
   public int getId( ) {
      return id;
   }
   public void setId(int id) {
      this.id = id;
   }

   @OneToOne
   public Cabin getCabin( ) {
      return cabin;
   }
   public void setCabin(Cabin cabin) {
      this.cabin = cabin;
   }

   @OneToOne
   public Cruise getCruise( ) {
      return cruise;
   }

   public void setCruise(Cruise cruise) {
      this.cruise = cruise;
   }

   public boolean getIsReserved( ) {
      return isReserved;
   }
   public void setIsReserved(boolean is) {
     isReserved = is;
   }

@Version 

   protected long getVersion( ) {
      return version; 

   }
   protected void setVersion(long version) {
      this.version = version;
   }
}

The CruiseCabin entity class references the cabin and cruise to which it belongs. The isReserved property lets us know whether somebody has reserved the cabin for that cruise. The new and interesting property is the version property, which is annotated with @javax.persistence.Version . An @Version property is a column in the CruiseCabin table that will hold a version ID of a particular CruiseCabin row. Whenever the CruiseCabin entity class is updated, the version column is incremented. When a transaction beginning the commit process and business logic has updated the CruiseCabin, the entity manager first checks to see if the version property of the in-memory CruiseCabin instance matches the version column currently stored in the database. If the versions match, then the version property is incremented. If they don't match, then the entity manager throws an exception and the whole transaction rolls back. Let's change bookPassage( ) to use this new feature:

public TicketDO bookPassage(CreditCardDO card, double price)
    throws IncompleteConversationalState {

    if (customer == null || cruise == null || cabin == null) {
        throw new IncompleteConversationalState( );
    }
    try {
     Query getCruiseCabin = entityManager.createQuery(
           "SELECT cc FROM CruiseCabin cc WHERE" +
           "cc.cabin = :cabin AND cc.cruise = :cruise");
        getCruiseCabin.setParameter("cabin", cabin);
        getCruiseCabin.setParameter("cruise", cruise);
        CruiseCabin cc = (CruiseCabin)getCruiseCabin.getSingleResult( );

        if (cc.getIsReserved( )) 
        throw new EJBException("Cabin is already reserved for cruise");
        cc.setIsReserved(true);
 
        Reservation reservation = new Reservation(
            customer, cruise, cabin, price);
        entityManager.persist(reservation);

        this.process.byCredit(customer, card, price);

        TicketDO ticket = new TicketDO(customer,cruise,cabin,price);

        return ticket;
    } catch(Exception e) {
        throw new EJBException(e);
    }
}

The bookPassage( ) method conducts a query for the relevant CruiseCabin entity class. If it is reserved, it aborts the transaction. If not, then it sets the isReserved property and continues with the rest of the method. At transaction commit time, the entity manager invokes an SQL query that both verifies and increments the version column of the entity. Let's assume that the queried CruiseCabin has an ID of 1 and the current version is 101:

update CRUISE_CABIN set isReserved=true, version=version + 1
where id = 1 AND version = 101;

If this update returns zero modified rows, then the entity manager knows that the CruiseCabin has been updated by another transaction and a concurrency error has occurred. In this error condition, it throws the javax.persistence.OptimisticLock-Exception and rolls back the transaction. Otherwise, the transaction completes successfully, and the queried CruiseCabin is updated as reserved and its version property is incremented. This optimistic locking solution creates a quick write-lock on one row in our database instead of the vastly unscalable table lock in the Serializable solution that was presented earlier in this chapter.

It should be noted that the optimistic locking design pattern does not work all the time. If you have a row in your database that has a high concurrent write contention, then it is probably less efficient to use the optimistic locking pattern because it will create a lot of rollbacks, which create a lot of overhead in your system. In that scenario, the Serializable solution is possibly more scalable. A redesign of your data model is probably more appropriate in this situation, however. If you have high concurrent access to one particular row in your database, then your system probably won't scale much anyway.

Programmatic Locking

The EntityManager interface has a specific lock( ) method for performing entity locks. To use it, you pass in the entity object you want to lock and indicate whether you want a read or write lock:

package javax.persistence;

public enum LockModeType{
   READ, 

   WRITE
}

public interface EntityManager {
   void lock(Object entity, LockModeType type);
}

LockModeType.READ ensures that no dirty and nonrepeatable reads can occur on the locked entity. LockModeType.WRITE has the same semantics as READ, but it also forces an increment of the entity's @Version property. To implement these semantics, a database row lock is usually performed (i.e., SELECT ... FOR UPDATE ).

Vendor implementations are not required to support locking on entities that do not have an @Version property.


Programmatic locking becomes important when you want to ensure nonrepeatable reads on entity beans that may be read within the transaction but not updated.