Item 73: Prefer container-managed resource management



Item 73: Prefer container-managed resource management

Java developers coming from the servlet/JSP environment into the EJB environment are frequently surprised to learn how restrictive, by comparison, the EJB environment is. Enterprise beans are not permitted to start threads. Enterprise beans are not permitted to do file-based I/O. Enterprise beans are not permitted to manually implement any sort of synchronization behavior. Enterprise beans are not allowed to establish server sockets, or Reflect, or load a native library, or. . . .

It leaves you wondering precisely what you can do.

There is, of course, a reason for all these restrictions. In order to maintain a system that's as scalable as possible, the container seeks to take on resource management as part of its duties. It does this because, as a unifying force across all sorts of different programs and enterprise systems, the container often has a better picture of the overall resource needs, as opposed to your code's rather localized view of the world.

For example, consider threading: it's not uncommon in many servlet books to see a servlet creating a daemon thread to do some kind of processing in the background when the servlet starts up. Think about what this implies for the servlet container. The servlet container itself needs to manage threads in order to take incoming requests on port 80 or 443 (or any other port, for that matter) and fire them down the appropriate filter chain and servlet code. Intuitively, we know that we want the servlet container to do this—or, more specifically, we want the servlet container to keep a tight lid on the maximum number of executing servlets, since to do otherwise would mean opening the servlet container to a denial-of-service attack.

(If that connection isn't obvious to you, consider the following scenario. Suppose I have a servlet framework that creates a new thread on each incoming request. An attacker creates a small program that loops infinitely, creating HTTP requests against the Web application. The servlet framework continues to spin off new threads until the machine crumples due to too many threads executing simultaneously.)

But now you start firing up threads from your servlets. Because the Servlet specification provides no way for a servlet developer to integrate with the servlet container's thread-management scheme, these threads are, by definition, outside the servlet container's control. So if the system administrator sets a thread pool limit of N threads in the servlet container because he has figured out, after much tuning and testing, that N is the optimal number for this particular container and platform, the underlying virtual machine is actually running N+1 threads—the N threads the servlet container knows about and the one it doesn't, your servlet's thread. Suddenly, you're past the point of optimal usage and starting down the road of diminishing returns. If that one becomes two or more, you could very quickly get into a situation where the thread-switching time outnumbers the actual work time.

This story gets even worse if this rogue thread is created when the servlet context is started up but the programmer (due to a bug, ignorance, or apathy) doesn't shut down the thread on servlet context shutdown. This means that on every servlet context restart (which can happen for a variety of reasons as the system administrator requires), a new thread is being spun off without reaping the old one. Garbage collection won't help here—threads are one of the very few resources that will continue to live even if all formal references to the object are dropped. (The Thread object will not be collected until the thread itself dies and all strong references are gone.) In addition, any objects referenced from that thread will continue to be strongly referenced, meaning they cannot be garbage collected, meaning the JVM now has a larger memory footprint than it should; also, these Thread objects represent thread resources within the operating system itself that are heavier than simple memory-based objects, and . . . well, you get the idea.

What's more, we haven't even begun to discuss whether the servlet spinning off threads should do so from a thread pool or not; that subject is covered in more detail in Item 68. The long and the short of it is, however, how can you know whether it's better to use a thread pool or not on a given system until you've tested that particular VM? Certain JVMs use a threading system that works better when using thread pools, others pool threads under the hood and are in fact more inefficient when Java code itself pools the threads.

J2EE specifies resource management for a number of reasons, chiefly because when working within a container-based environment, it's more efficient and scalable to let the container keep track of those resources and to have components borrow them as necessary. In some cases, as seen within the EJB Specification, giving this ability to the developer would actually create more harm than good; managing threads directly is not the simplest of tasks and introduces all sorts of synchronization concerns that the EJB Specification is trying to get out of the developer's way. (While the Servlet Specification takes a more permissive perspective about this, developers are still discouraged from creating or manipulating threads directly.) Similarly, trying to manage component lifecycle directly often interferes with whatever lifecycle policies the container wishes to introduce.

It's understandable why so many books and articles suggest spinning a thread off from a servlet; one facet that was missing from the J2EE space for a long time was the ability to give J2EE components any sort of "active" status. All of the "traditional" J2EE components (servlets, EJBs, JMS) require a logical thread of control from the client—they require that a client call them in order to borrow a thread from the container to carry out some action. Prior to the EJB 2.1 Specification, there was no way to create an active thread that would tie into a J2EE component without making the thread some kind of client to the component. For example, you couldn't "wake up" a bean within the EJB container every ten minutes or so and check a database table for new entries.

The closest approximation was to create a client process, running on the same machine as the EJB container, that would call into the bean every ten minutes or so. Not exactly the most elegant of solutions, but you do what you have to do. The only other alternatives were to either find EJB containers that were more lax about these resource-management rules, thus allowing them the ability to spin off a thread despite the Specification's prohibition against doing so (thus eliminating any portability), or create the daemon or service application process described earlier that would call into the EJB container when desired.

Only now, as part of EJB 2.1, do programmers have any sort of relief, in the form of the Timer Service, which allows you to register a request for an activity to be created by the EJB container—you create a Timer, either keyed to a periodic recurring sequence of calls (e.g., every five minutes) or to a particular time (e.g., five minutes from now), and the bean you wish the container to call in to; then the container, when the appropriate time elapses, makes the call as necessary. Thanks to this, there is no compelling reason for any of the J2EE specifications to spin off their own threads for polling or timeout purposes.

This laissez-faire[4] attitude reaches beyond just threads. Network connections are another good resource to leave alone; unless you're somehow tying into the container's connection-management mechanism, don't open and close sockets yourself. On top of the fact that there's an outside possibility you'll open a socket port that the container itself will want to listen on later, most post–JDK 1.4 containers use the java.nio libraries to efficiently handle incoming connections, and your meddling here will create more problems than solutions. If you want to listen to outside communications, either use one of the established communications layers already present in J2EE (RMI, JMS, or an established Internet protocol like servlets/HTTP or JavaMail/SMTP) or do your communications outside of a container and bridge to RMI, JMS, servlets, or JavaMail.

[4] French for "keep your hands off," more or less.

The classic "other" resource the container will manage for you is database connections (which includes other connection-style resources, such as Connector connections and JMS connections). Frequently, this isn't an issue within EJB containers, but when working from a servlet container or an application that deals directly with JMS Queue and Topic instances, the EJB container isn't there to do all that wonderful database connection pooling for you. The same is true of working from client Swing applications, or at least it would seem that way.

Because of the widespread belief, not unfounded but not always correct (see Item 72), that pooling is good, servlet developers who worry about extraneous database connections start writing their own database connection pool system or download an existing one (the implementation found in the Jakarta Commons project is a popular one, for example). They deploy the connection pool implementation into their Web application and breathe a sigh of relief knowing that all database connections are now being recycled and that their system is now scalable where before it wasn't.

Unfortunately, things aren't necessarily as idyllic as they might seem. First, it's highly likely that the connection pool, if deployed into the Web application itself, is pooling connections only for that particular Web application—because of ClassLoader boundaries (see Item 70 for details), in most cases each connection pool has its own static collection of connections. This means that there are many more Connection instances out there than desired. Second, depending on how the connection pool is coded, it's entirely possible that connections are being lost—if the pool doesn't make use of Reference objects (see Item 74 for details), any code that doesn't explicitly return a Connection to the pool is "leaking" a Connection that cannot be recycled or garbage collected. Third, when using the latest JDBC drivers, depending on the details of the driver, you don't need to use an external connection pooling implementation. The JDBC 3.0 Specification suggests that drivers can and should pool connections directly, even without having to go through a JNDI-hosted DataSource to do so.

With few exceptions, there is little reason for developers to manage resources in the J2EE containers "by hand"—the underlying implementation of the specification or the containers themselves will typically do a much better job, owing to more knowledge about how the resource operates internally. By stripping out the pooling, you'll also potentially reduce the workload on the garbage collector because there are fewer long-lived objects to keep track of within your code.