Nov. 21, 2007, 2:13 a.m.
posted by oxy
Item 74: Use reference objects to augment garbage collection behaviorSo, having read Item 72 and facing a situation where just waiting for the garbage collector isn't going to work—you need to know when the object is no longer referenced in order to aggressively release resources (see Item 67)—what's a good Java programmer to do? Your first reaction might be to go back to the Java Language Specification to see what sort of facilities are available within the language—after all, if you needed to initialize something (which is what constructors do), then certainly you must need to clean it back up, right? And sure enough, Java provides a concept of a finalizer, a method invoked by the garbage collector when the object is cleaned up. But before you run off to start writing finalizers everywhere, take a deep breath and keep reading. As Joshua Bloch discusses in his excellent Effective Java, finalizers are typically a bad idea all around. For starters, you have no guarantees when—if ever—a finalizer will be executed. If the JVM is shutting down, for example, it's entirely possible that objects waiting to be finalized will simply be ignored and never called; it would be redundant and unnecessary because the JVM is going down anyway. Second, finalizers create more work for the garbage collector; now, instead of simply placing the allocated object back within the pool of available memory (depending on how the garbage collection algorithms are implemented, of course; see Item 72), the object must be placed within a queue of other objects that must have their finalizers invoked. This slows down not only deallocation but allocation as well because this need is flagged at the time the object is created. This also means that those objects requiring finalization are kept around longer, which in turn yields a larger memory heap footprint for the JVM process than might otherwise be required. One story tells of a Java programmer who consistently ran into OutOfMemoryError instances from a given program run. As it turned out, too many objects were being finalized, and the finalizer thread simply couldn't keep up—eventually the JVM ran out of free heap to honor new allocation requests. More importantly, from a server-side perspective, the JVM offers no guarantees about the order or timing a finalizer is invoked around. For example, given three objects, A, B, and C, where A references B, and B references C, if each object has a finalizer, it might seem logical that the JVM would invoke the finalizers in the order A, B, and C; not so. In fact, it's entirely possible that C's finalizer will be invoked first, then B's, then A's, meaning that if B calls into C within B's finalizer, we're invoking methods on an object that by all measures should be dead. As if all that weren't bad enough, what happens in the following situation?
public class Resurrector
{
private static ArrayList deadObjects = new ArrayList();
protected void finalize()
{
try
{
deadObjects.add(this); // Arise, Lazarus!
}
finally
{
super.finalize();
}
}
}
Because within the finalizer we're making the object reachable again from a root set of references, we're effectively resurrecting a dead object. But like all good horror flicks, resurrection isn't everything it's cracked up to be; in this case, this object has already been finalized once, and resurrecting it doesn't change that—so now you have an object that already has one foot in the grave, just waiting for that last reference to be dropped. Once that happens, the object will immediately be released, without the finalizer being invoked again, giving it no chance to do any sort of cleanup. Ouch. See Effective Java [Bloch] for tips on how to implement finalizers correctly; but for the most part, enterprise Java developers will want to avoid finalizers entirely. So now what? In some cases, the various J2EE specifications offer event methods to tell you when objects are being destroyed. For example, servlets have an init method that is fired just before the servlet is handed its first request, and a destroy method that is fired just before the servlet container is about to hand the servlet instance off to the garbage collector for recycling. Starting with the Servlet 2.3 Specification, we can also know the lifetime of the entire Web application itself by creating a ServletContextListener that the container guarantees it will call at startup and shutdown. EJB beans have similar support via the ejbCreate and ejbRemove methods, although only on a per-bean basis. Unfortunately, not all objects whose lifecycles we're interested in are tied to those particular J2EE objects—for example, we may want to create a cache of data in order to speed up processing (see Item 4), and this cache is shared across servlet and/or EJB instances. We could conceivably create some kind of counting scheme within the cached objects, but we'd have to do this for every single object we want to hold in the cache, and this scheme could very quickly break down if we don't get everything exactly correct. Starting with the JDK 1.2 (Java 2) release, Sun realized that Java programmers needed better interaction with the garbage collector in the JVM, and the company provided it via the Reference object types declared in the java.lang.ref package. Although their use may seem a bit esoteric, in many cases it offers up the very sort of functionality we're looking for. Java offers up three kinds of Reference objects: SoftReference, WeakReference, and PhantomReference. All are subclasses of the base class Reference, and each has the basic property that they "wrap" another object, called the referent. You can access this referent from the Reference object via the get method on the Reference object (with one exception, the PhantomReference type, which we'll talk about in just a bit). Reference objects reduce the "strength" of a reference to an object. Normally, when we write something like the following code, the reference declared on the stack, in this case, strongRef, means the object on the other side of the reference could still be used:
Person strongRef = new Person();
while (true)
{
// Do some work with the Person object here
}
return;
Therefore, by Java Language Specification law, the object cannot be garbage collected. Only when that object is no longer strongly reachable, in this case, when we return from the method (since the local variable reference will no longer exist), can it be collected. Reference objects give us more flexibility in interacting with the garbage collector. An object held on the other side of a Reference object, the referent, assuming it is not strongly referenced elsewhere, is now either weakly, softly, or phantomly reachable. The upshot is that the garbage collector is now free to collect the referent, depending slightly on the semantics of the reference object itself. At first, this doesn't seem like much of an advantage; however, the story doesn't end here. In addition to marking the referent as eligible for collection, a Reference object also provides a notification mechanism, called a ReferenceQueue. When a ReferenceQueue is passed into the Reference object's constructor, the garbage collector guarantees to put the Reference object into that queue (it will enqueue the reference), and interested parties can pull the Reference object out of the ReferenceQueue to know whether that Reference object's referent is no longer with us and, if so, when it was collected. Having gone through the basics of Reference objects, we'll take a look at each kind in turn. SoftReference objectsAt some point in your life as a Java programmer, somebody (usually the Big Boss) comes by your cubicle and starts talking about what great work you're doing, how glad he or she is that you work for the company, and so on. Right about the time you start wondering where this is going, he or she drops the other shoe: the application is too slow, and you need to speed it up, fast. Particularly for servlet-based applications, the immediate response comes in a single word: cache. It's not uncommon for a developer facing performance problems to quickly decide (based on intuition, which is bad—see Item 10) that a cache is needed to speed up processing. In particular, the programmer has intuited that the system is spending too much time accessing data from some other location, such as disk or database, and that the data doesn't change very often. Therefore, caching the data in memory, to avoid the slow I/O of disk or database access, will help speed things up. Be very careful here when coming to this conclusion—the performance of your application may not have anything to do with the speed of accessing data from disk or database at all but instead may be suffering from a bottleneck due to contention over a shared resource. Caching in this situation won't help one iota. Only if you've profiled your system, eliminated bottlenecks, and found that the latency of the application is still unacceptable—only then, perhaps, caching is an answer. This is a deliberate tradeoff, however: you're trading scalability on the server-side system in exchange for less latency in obtaining results. So you start caching everything you can: output results, generated images, generated objects, anything you can keep around from one request to another one so that you don't have to recreate the duplicate object the second time. In some cases, programmers have been known to even cache String objects, despite the fact that String objects are often cached under the hood by the JVM. This all works well, for a while. You test the code and find that the system can now handle N clients much faster than it could before. And this code works great, until you get to that N+1 client accessing the system; that is, at some point, as the system adds more and more clients, sooner or later you're going to run out of memory, and the request will fail. The rejected client will be forced to try again until an existing client drops its connection, thereby releasing the resources, including the cached data, used by the server-side code on behalf of that client. "Alas," you tell your boss, "it's time to buy some more hardware to deal with those rare situations when we get that N+1 client. Yeah, it's a shame that it was the big multimillion-dollar client demo, but remember, we wanted those calls to be faster than they were before, so we cached the data. That's the tradeoff between latency and scalability, boss—can't do anything about it." This isn't a great state of affairs. The cache was intended to reduce latency, not reduce scalability. What's happening here is fairly obvious: each client on the system is suddenly soaking up more in the way of resources, and that in turn reduces the number of clients you can support on a given hardware node. This will be the case even if the cache is somehow global in nature, shared across all clients; it's rare that a server-based application can cache data across users, and even where only shared objects are cached, they still take up a certain amount of memory that is no longer available for client processing. Unfortunately, as a result, this means that the cache, data that, by definition, you could recreate if you had to, is acting as a roadblock to further scalability. In many ways, what programmers really want in situations like this is for the cache to empty itself out under low-memory conditions, since we can always go back and recreate objects and recalculate data if necessary. This is precisely what the SoftReference does: when a SoftReference is created around an object, that object is now softly reachable, meaning that if no other strong (normal) references are held to the object, under low-memory conditions the JVM will release the softly referenced object, thus hopefully making more room for object allocations. When this happens, the SoftReference will no longer hold a valid reference to its object and will return null when asked for its referent. Using a SoftReference is actually quite simple: for any object that we wish to be softly referenced, we can create a SoftReference around that object and hold the SoftReference. So, for example, a generic cache implementation based loosely on the java.util.Map interface would look something like this:
public class Cache
{
private Map cachedItems = new HashMap();
public Cache()
{ }
public void put(Object key, Object data)
{
cachedItems.put(key, new SoftReference(data));
}
public Object get(Object key)
{
SoftReference sr = (SoftReference)cachedItems.get(key);
return sr.get();
}
}
Notice that the cache hands out strong references to the softly referenced objects; if code has asked the cache for an object, we don't want the object to suddenly disappear once the object has been returned. The returned strong reference will keep the softly referenced object alive until that strong reference has been dropped, and once that has happened the object will be eligible for collection under low-memory conditions once again. In the case of the Cache class, in the code above, notice that if a softly referenced object does get collected, we may want to remove the key for that object as well. This means that we need to somehow register with the JVM to be notified when a SoftReference gets cleared out; fortunately, this exact behavior is possible using a ReferenceQueue. When we create the SoftReference (or any Reference object, for that matter), we can pass in an instance of a ReferenceQueue that the Cache knows about. When the SoftReference is cleared, the JVM will enqueue the SoftReference instance into the ReferenceQueue, where we can fetch it from the ReferenceQueue by calling remove (which blocks until an enqueued reference is available or a timeout expires) or poll (which returns immediately with either null or an enqueued Reference). So we can amend the Cache implementation shown earlier by making it more aware of when the softly reachable value objects are collected; in this case, we'll poll for enqueued references on each call to get or put, so that way we don't have to worry about setting up a separate Thread to do the blocking remove calls. So now we can cache off objects as much as we like, knowing that if the JVM starts to run tight on memory, it will start reclaiming objects from the cache as necessary until either the cache is empty or the need for memory has been met. If that happens, we can always go back and repopulate the cache when the memory situation is friendlier, but be careful—we don't want to repopulate the cache until there's more room in the heap, or we'll just start thrashing the garbage collector in a big way. In a production implementation, the Cache class should take a threshold parameter in the constructor—if the available heap is less than that threshold, it won't bother trying to hold the cached items but will just throw the references away (since, we presume, the garbage collector is just going to clear the SoftReference in a few milliseconds anyway). WeakReference objectsWeakReference objects, as the name implies, make their referents weakly reachable, which essentially means the referents are eligible for collection at any time, assuming they're not strongly reachable through some other reference. As with PhantomReference objects (discussed next), the power of the WeakReference object lies not so much in the fact that we're allowing the referent to be collected as in the fact that we can be notified when the garbage collector wants to collect the referent. To understand why this is useful, we have to take a step back for a moment and talk about object pools again. Recall from Item 72 that when a resource is both finite and expensive to create, we want to create an object pool around it to mitigate the cost of allocation and cleanup as well as to keep track of how many of these objects have been created. Implicit with this, however, is not only knowing when to create an instance of the resource object in question but also knowing when the borrowed resource object is no longer in use. In traditional object pool implementations, this responsibility is left to the programmer, via some kind of return or cleanup method on the pool, passing the pooled instance back into the pool for reuse. Unfortunately, both you and I know that this sort of policy requires programmers to be vigilant and disciplined, and the brutal fact is that under tight deadlines and impossible requirements specifications, vigilance and discipline tend to be the first casualties of the development team. This means that we run the very real risk of objects not being returned to the pool, and now we're back to relying on the pooled objects' finalizers being triggered as the only means by which we can know the client is done with the pooled object. Fortunately, there's a better way, and as you might have guessed by now, it uses the WeakReference. Because a WeakReference doesn't keep an object alive, we can build an object pool by handing out strong references to pooled resources. When the client is finished, it simply drops the strong reference, which in turn leaves the object weakly reachable, and on the next garbage collection pass, the object will be collected and (by catching the ReferenceQueue notification) the resource can be returned to the object pool. The key to making this work, then, is handing out objects that can be freely recycled and thereby notifying the object pool; the classic way to make this work is to have the object pool itself hold the finite collection of objects in the pool, and instead of handing out references to these objects, hand out references to proxies. When the client drops the reference to the proxy (weakly referenced by our pool), the WeakReference to the proxy gets enqueued. That in turn means we can detect the drop thanks to the ReferenceQueue held inside the pool, and thus return the resource object on the other side of the proxy back to the pool. Because the client always interacts with the proxy (and not our WeakReference itself), normally this means that we want an interface for both the resource objects managed by the pool as well as the proxies to implement (see Item 1). Note that an object pool implementation could also be done with PhantomReference objects, and code that does so (generously donated by Vlad Roubtsov, JavaWorld columnist) appears on the book's accompanying Web site. Practically speaking, from a client perspective, there's no functional difference; the only real change is that PhantomReference objects are signaled at a different time in the object lifecycle than WeakReference objects are. PhantomReference objectsAccording to the documentation for PhantomReference objects, "PhantomReferences are most often used for scheduling pre-mortem cleanup actions in a more flexible way than is possible with the Java finalization mechanism." Quite honestly, after reading the earlier paragraph on finalizers and realizing just what a bad idea they are, you may think that anything that makes object cleanup easier seems like a good idea. (I really should be fair here and point out that it's not finalization that's so bad but the fact that it's entirely nondeterministic, unordered, and unreliable. This isn't really something that Java itself can correct, and any automatic reclamation system must deal with it. For those of you keeping score at home, .NET has the exact same problem.) The problem is, PhantomReference objects don't seem that useful at first; they are unique among the three reference types in that they don't actually hold a reference to their referent, meaning that when get is called on a PhantomReference, it returns null. Or, to be more specific, calling get before the PhantomReference is enqueued returns null—all three Reference types return null after the Reference has been enqueued/signaled. The documentation points out that we can create subclasses of PhantomReference, but if we actually put a reference to the object on which we want to invoke a close method in the PhantomReference subclass, that reference will be a strong one, and the PhantomReference itself will never be enqueued. Using PhantomReference objects requires a bit more subtlety. Here's an example of how we can use the PhantomReference to do cleanup when an object is no longer strongly reached: // Example showing use of PhantomReference // import java.lang.ref.*; import java.util.*; class CleanThisUp { // The resource we need to finalize; for simplicity's sake, // I'm not actually going to show the connection, but it's // pretty easy to see how this would work in practice // private java.sql.Connection conn = getConnectionFromSomeplace(); public CleanThisUp() { System.out.println("CleanThisUp created: " + hashCode()); // Create our PhantomReference to do the cleanup and // register it so the PhantomReference itself doesn't get // lost refList.add(new CTUPhantomRef(this, conn, cleanupQueue)); } private static class CTUPhantomRef extends PhantomReference { private java.sql.Connection connToClose; public CTUPhantomRef(CleanThisUp referent, java.sql.Connection conn, ReferenceQueue q) { super(referent, q); this.connToClose = conn; } public void clear() { try { super.clear(); } finally { // Now do our own cleanup // try { if (connToClose != null) connToClose.close(); System.out.println("I cleaned up a connection!"); } catch (java.sql.SQLException sqlEx) { // Log this, ignore it, whatever—it's never exactly // clear what should be done in the event of an // exception on a close() call. Regardless, don't // just ignore it. // sqlEx.printStackTrace(); } } } } private static ReferenceQueue cleanupQueue = new ReferenceQueue(); private static List refList = Collections.synchronizedList(new ArrayList()); private static Thread cleanupThread; static { cleanupThread = new Thread(new Runnable() { public void run() { try { Reference ref = null; while (true) { ref = cleanupQueue.remove(); refList.remove(ref); ref.clear(); } } catch (InterruptedException intEx) { return; } } }); cleanupThread.setDaemon(true); cleanupThread.start(); } } public class PhRefTest { public static void main (String args[]) { for (int i=0; i<100; i++) { CleanThisUp[] ctuArray = new CleanThisUp[10]; for (int j=0; j<10; j++) ctuArray[j] = new CleanThisUp(); ctuArray = null; } } } There's a lot going on here. To start, we have a class that requires some kind of cleanup—in this case, it's holding a database connection that we want to ensure gets closed in a timely fashion, ideally as soon as the object itself gets released. While the garbage collector can't guarantee that it will react as soon as the last strong reference to the CleanThisUp instance is dropped, we can get the garbage collector to tell us right before it's going to blow this object away using a PhantomReference. So, in the constructor of CleanThisUp, we create a PhantomReference instance (a private derived type of PhantomReference, in fact) with a ReferenceQueue held within the CleanThisUp class's static data area. Remember, however, that the PhantomReference itself can't hold a reference to the CleanThisUp instance, which is why the CTUPhantomRef class is declared as a static nested class within CleanThisUp[5]—unless it's declared static, a nested class instance holds a reference to its enclosing class instance (the "outer this," in Java parlance). This would be enough to keep the CleanThisUp instance strongly reachable, which means it would never be enqueued by the garbage collector, and our better-than-a-finalizer cleanup scheme would fail miserably.
Notice that we also keep track of the CTUPhantomRef instances in an ArrayList held within a CleanThisUp static field (which must be synchronized, by the way, because this ArrayList is going to get pummeled by multiple threads). We need to keep the PhantomReference itself alive, but this strong reference to the PhantomReference has no bearing on the reachability of the referent (the CleanThisUp instance). Down in the PhRefTest class, the driver for this example, we loop 100 times, creating an array of 10 CleanThisUp instances and releasing the reference to the array (thus releasing our strongly referenced link to the instances inside the array). This in turn means those instances are only phantomly reachable; remember, we're still holding the CTUPhantomRef instances in that static ArrayList inside CleanThisUp. The garbage collector is encouraged to do a full collection (despite the fact that it's generally a bad idea to coerce the garbage collector this way—see Item 72), and we execute the next iteration of the loop. When the garbage collector decides to collect our orphaned and phantomly reachable CleanThisUp instances, it first enqueues the CTUPhantomRef instances on the ReferenceQueue we passed in when we constructed it. As far as the garbage collector is concerned, that's all it needs to do, but we have a daemon thread looping infinitely, blocking until a reference is available on that ReferenceQueue. We pull the Reference out of the queue, remove the Reference from the ArrayList, and then call clear on it. (We have to do this because otherwise, PhantomReference instances will never clear and thus never actually collect their referent. It's another of those "quirks" of the PhantomReference class.) By overriding clear on CTUPhantomRef, then, we have an opportunity to do our cleanup before the CleanThisUp instance gets released back to the pool of available memory. We still have to be a bit tricky here, however—we can't hold a reference back to the CleanThisUp instance the CTUPhantomRef holds as a referent because that would keep the referent strongly reachable and then we'd be back to "will never be enqueued" status again. Instead, we pass the CleanThisUp resources themselves into the CTUPhantomRef for cleanup; in this case, we give the CTUPhantomRef a copy of the JDBC Connection instance CleanThisUp is holding so that in the overridden call of clear on CTUPhantomRef we can close it. Sure enough, if you run this, you'll see a whole slew of CleanThisUp instances created, and after a few moments, calls to the CTUPhantomRef's clear method start to intermingle in the display. Take careful note, though—if you're paying close attention, you'll quickly notice that not every object created gets cleaned up. This is because our Thread is a daemon thread, and even though the objects are, in fact, enqueued in the ReferenceQueue, the main thread has exited and our daemon thread isn't enough to keep the JVM alive, so we shut down with some instances still awaiting cleanup. If you absolutely, positively require those objects to be cleaned up, mark the Thread as a nondaemon thread, and figure out some way to kill it when it's time to shut down the JVM. The key drawback to this sample is the Thread spun off from within the CleanThisUp class; in a J2EE environment, it's not always an easy matter to just arbitrarily spin off a Thread (see Item 73 for why the container wants to shoulder thread management). Other possibilities come to mind: create a static method on CleanThisUp that takes a Thread reference for the ReferenceQueue-listening Runnable to use, or return that Runnable from a static method to hand into the container to execute on a Thread. Alternatively, you could periodically hijack a client's thread and use the poll method on ReferenceQueue to see if there are any Reference instances to listen to, but this requires more complexity within the CleanThisUp class. Finally, however, we have a way to do cleanup in a better fashion than using a finalizer; it's more work, certainly, but nothing worth doing is easy. Take careful note of what we're doing with these three examples: in each case, we're offering hints to the garbage collector about how it should behave in certain situations and scenarios. In the case of PhantomReference objects, we're asking for some kind of postmortem cleanup to take place after the object's ready for reclamation. Because the actual cleanup is happening on our own thread, rather than the finalizer thread, we can make conscious decisions about cleanup and thread deadlock that the finalizer thread, since it's encoded deep inside the JVM, can't do. With WeakReference objects, we're asking the garbage collector to send a notification when the last reference to an object has been dropped, which in turn gives us the ability to perform some kind of reclamation of the object on the other side of the WeakReference. And with SoftReference objects, it's a direct signal to the garbage collector that the object on the other side of the reference isn't worth keeping around in low-memory situations. In each case, we're offering more information to the garbage collector than we could in earlier releases of Java; use them as appropriate. |
- Comment