Aug. 1, 2009, 11:19 p.m.
posted by oxy
Item 39: Use HttpSession sparinglyIn order to maintain transient state on behalf of clients in an HTML/HTTP-based application, servlet containers provide a facility called session space, represented by the HttpSession interface. The idea itself is simple and straightforward: a servlet programmer can put any Serializable object (see Item 71) into session space, and the next time that same user issues a request to any part of the same Web application, the servlet container will ensure that the same objects will be in the HttpSession object when requested. This allows the servlet developer to maintain per-client state information on behalf of a Web application on the server across HTTP requests. Unfortunately, this mechanism doesn't come entirely for free. In the first place, storing data on the server on behalf of every client reduces the resources available on that server, meaning the maximum load capability of the server goes down proportionally. It's a pretty simple equation: the more data stored into session space, the fewer sessions that machine can handle. So, it follows that in order to support as many clients as possible on a given machine, keep session storage to a minimum. In fact, for truly scalable systems, whenever possible, avoid using sessions whatsoever. By not incurring any per-client cost on the server, the machine load capacity (theoretically) goes to infinite, able to support however many clients can connect to it. This suggestion to avoid sessions if possible goes beyond just simple scalability concerns. For servlet containers running within a Web farm, it's a necessity. Sessions are an in-memory construct; because memory is scoped to a particular machine, unless the Web farm has some mechanism by which a given client will always be sent back to the same server for every request, subsequent processing of a request will not find the session-stored objects placed there by an earlier request. As it turns out, by the way, pinning HTTP requests against the same machine turns out to be frightfully hard to do. If the gateway tries to use the remote address of the client as the indicator of the client request, it will run into issues on a couple of points. Internet service providers that supply IP addresses to dialup consumers, proxy servers, and NATs will offer the same IP address for multiple clients, thus accidentally putting all those clients against the same server. For a small number of clients behind the same proxy, this isn't a big deal, but if the proxy server in question is the one for AOL, this could be an issue. The Servlet 2.2 Specification provides a potential solution to this session-within-clusters problem, in that if a servlet container supports it, a Web application can be marked with the <distributable /> element in the deployment descriptor, and the servlet container will automatically ensure that session information is seamlessly moved between the nodes in the cluster as necessary, typically by serializing the session and shipping it over the network. (This is why the Servlet Specification requires objects placed in session to be Serializable.) On the surface, this seems to offer a solution to the problem, but it turns out to be harder than it first appears. A possible mechanism to provide this support is to designate a single node in the cluster as a session state server, and on each request, which ever node is processing the request asks the session state server for the session state for the client, and this is shipped across the network to the processing node. This mechanism suffers from two side effects, however: (1) every request will incur an additional round-trip to the session state server, which increases the overall latency of the client request, but more importantly, (2) all session state is now being stored in a centralized server, which creates a single point of failure within the cluster. Avoiding single points of failure is frequently the reason we wanted to cluster in the first place, so this obviously isn't ideal. A second possible mechanism is to take a more peer-to-peer approach. As a request comes into the node, the node issues a cluster-wide broadcast signal asking other nodes in the cluster whether they have the latest session state for this client. The node with the latest state responds, and the state is shipped to the processing node for use. This avoids the problem of a single point of failure, but we're still facing the additional round-trips to shift the session state around the network. Worse yet, as a client slowly makes the rounds through the cluster, a copy of that client's session state is stored on each node in the cluster, meaning now the cluster can support only the maximum number of clients storable on any single node in the cluster—this is obviously less than ideal. If each node in the cluster throws away a client's session state after sending it to another node, however, it means that any possibility of caching the session state in order to avoid the network round-trip shipping the session state back and forth is lost. The upshot of all this is that trying to build this functionality is not an easy task; few servlet containers have undertaken it. (Several of the EJB containers that are also servlet containers, such as WebLogic and WebSphere, support distributable Web applications, but this is typically done by building on top of whatever cluster support they have for stateful session beans. Needless to say, clustered stateful session bean state has the same issues.) Before trusting a servlet container to handle this, make sure to ask the vendors exactly how they do it, in order to understand the costs involved. In the event that some kind of distributed session state mechanism is needed but the servlet container either doesn't provide it or provides a mechanism that is less than desirable for your particular needs, all is not lost. Thanks to the power of the Servlet 2.3 Specification, and filters, you can create your own distributable session mechanism without too much difficulty. The key lies in the fact that filters can nominate replacements for the HttpServletRequest and HttpServletResponse objects used within the servlet-processing pipeline. The idea here is simple—create a filter that replaces the default HttpServletRequest with one that overrides the standard getSession method to return a customized HttpSession object instead of the standardized one. Logistically, it would look something like this:
import javax.servlet.*;
import javax.servlet.http.*;
public class DistributableSessionFilter
implements Filter
{
public void doFilter(ServletRequest request,
ServletResponse response,
FilterChain chain)
throws ServletException
{
HttpServletRequest oldReq =
(HttpServletRequest)request;
HttpServletRequestWrapper newRequest =
new HttpServletRequestWrapper(oldReq)
{
public HttpSession getSession(boolean create)
{
if (create)
return new DistributedHttpSession(oldReq, response);
else
{
// If user has created a distributed session already,
// return it; otherwise return null
}
}
};
chain.doFilter(newRequest, response);
}
}
In this code, DistributedHttpSession is a class that implements the HttpSession interface and whose getAttribute/setAttribute (and other) methods take the passed objects and store them to someplace "safe," such as the RDBMS or the shared session state server, and so on. Note that because the standard HttpSession object is no longer being used, it will be up to you, either in this filter or in the DistributedHttpSession, to set the session cookie in the HTTP response headers. Normally the servlet container itself handles this, but since we're not using its default session mechanism anymore, we have to pick up the slack. Use large random numbers for session identifiers, to prevent attackers from being able to guess session ID values, and make sure not to use the standard JSESSIONID header in order to avoid accidental conflicts with the servlet container itself. The actual implementation of this fictitious DistributedHttpSession class can vary widely. One implementation is to simply store the session data into an RDBMS with hot replication turned on (to avoid a single-point-of-failure scenario); another is to use an external shared session state server; a third is to try the peer-to-peer approach. Whatever the implementation, however, the important thing is that by replacing the standard session system with your own version, you take control over the exact behavior of the system, thus making this tunable across Web applications if necessary. This is a practical application of Item 6 at work. Another session-related concern to be careful of is the accidental use of sessions. This isn't a problem within servlets written by hand by developers—a session isn't created until explicitly asked for via the call to getSession on the HttpServletRequest. However, this isn't the case with JSP pages. The JSP documentation states, quite clearly, that the directive to "turn on" session for a given JSP page (the @page directive with the session attribute) is set to true by default, meaning that the following JSP page has a session established for it, even though session is never used on the page: <%@ page import="java.util.*"> <html> <body> Hello, world, I'm a stateless JSP page. It is now <%= new Date() %>. </body> </html> Worse yet, it takes only one JSP page anywhere in the Web application to do this, and the session (and the commensurate overhead associated with it, even if no objects are ever stored in it) will last for the entire time the client uses the Web application. As a result, make sure your JSPs all have session turned off by default, unless it's your desire to use sessions. I wish there were some way to establish session = false as the default for a given Web application, but thus far it's not the case. Lest you believe otherwise, I'm not advocating the complete removal of all session use from your application; in fact, such a suggestion would be ludicrous, given that HTTP offers no reasonable mechanism for providing per-user state. When used carefully, HttpSession provides a necessary and powerful mechanism for providing per-user state within a Web application, and this is often a critical and necessary thing. (How else could I ensure that the user has successfully authenticated or track his or her progress through the application?) The danger is in overusing and abusing the mechanism, thereby creating additional strain on the servlet container when it's not necessary. Don't use sessions unless you need to, and when you do use them, make them lean and mean in order to keep the resources consumed on the servlet container machine as light as possible. |
- Comment