Item 20: Avoid waiting for remote service requests to respond



Item 20: Avoid waiting for remote service requests to respond

One of the drawbacks of the request-response communications model is embedded in its very intent: we have to wait for the response. It seems pretty silly to have to say this, but waiting for the response turns out to be a major hurdle in communications.

This notion of response time has a nasty tendency to stack up across the system. Look at it from the end user's perspective. The user clicks a button on the Web form in the browser. The browser sends the HTTP request to the HTTP server (going through whatever other infrastructure lies between the two: proxy servers, gateways, whatever). The HTTP server hands the request to the internal processing agent—probably a servlet of some form, which then does some processing on the POSTed data (hopefully validating the input in addition to script on the page that does the same thing—see Items 61 and 56, respectively), then makes a call to the database to store the data. This, of course, means we effectively have to put the servlet on pause until the database request is parsed, planned, and executed and the data returned. Or perhaps we make a request out to an EJB layer, which then carries out some processing before again heading out to the database to retrieve some data. Based on that data retrieved, we execute a few more processing steps, then execute another database call to update the data, and return back to the servlet, which then forwards the request off to a JSP page....

Meanwhile, our poor user has died of old age.

In many cases, the long latency of an HTTP-based system isn't necessarily due to any particular component of the system. It's not the fault of RMI, HTTP, or the native wire protocol of whatever database you happen to be using. The system isn't held up by any particular hardware element or the speed of the network. Certainly, any one of these can create a bottleneck, but their absence doesn't guarantee low latency. Instead, the long pause times are attributable to the summation of a little bit of latency involved in each step of the process, combined with the fact that we have to wait for each one to finish before we can move on.

Consider, for example, the canonical e-commerce Web site again. We know already that the front-end servlet/JSP layer is going to need to communicate with several back-end agents, such as the database, in order to display certain parts of the site. (Hopefully the book catalog itself is pretty static, thereby leading us to make the pregenerated content optimization, as described in Item 55.) We know that a good part of this will need to be done before we can continue with carrying out user requests and directives, but exactly how much of this needs to be done in direct response to user actions?

Think about the final checkout stage, for example: we know that we'll want to process the user's credit card number, but does this need to happen as part of the processing before the "Thank you for your order" response shows up? Credit card services can take a long time to execute—remember standing there at the register last time you were at the mall?—and are sensitive to load on the rest of the system. Is this really something you want to subject your users to, just to display "Thank you for your order," particularly since users have this nasty tendency to click the button a second time if they think nothing's happening?

Whenever possible, look for ways to break this synchronous model into asynchronous communication steps, in order to avoid having to force the client to wait. Note that I take particular care here to say "the client" rather than "the user" since your client could very well be another program.

In the case of the checkout stage of the e-commerce site, for example, we don't need to process the credit card as part of the final processing stage. Oh, don't get me wrong, I'm not suggesting that we don't need to process the credit card payment, I'm just suggesting it doesn't need to be done while the user is waiting.

This may come as a surprise to you: What do you do if the user's credit card fails? We can't just "eat" the order and not do any further processing on it, leaving the user to think that the transaction completed successfully. So what happens when it fails?

First, consider the likelihood of this actually happening. Depending on your clientele, it's far more likely that the credit card transaction will go through successfully than it is that the transaction will fail. Most people know when their credit card is full, after all, and don't bother using it if they know it will fail. (Some credit counseling representatives may disagree with this assessment, but anecdotal evidence suggests otherwise.) So, picking a number out of the air, even if 25% of all credit card transactions fail, that means that 75% of the time your user is being forced to wait for something that will never happen—the "Your credit card failed" message. Three out of four users are being forced to endure unnecessary delays.

Second, we're making a critical assumption here—that all the services on which the front end depends are actually up and running. Remember, "stuff happens," and we can't assume that it's a perfect world. It's entirely possible that the credit card processing service is down, for reasons ranging from something benign (perhaps we haven't paid our monthly dues to the agent that handles credit card processing for us, if we don't do it directly) to something catastrophic (God forbid it should happen, but earthquakes, floods, and tornados aren't uncommon, and 9/11 put terrorist actions on the list of risk assessments we have to consider, too). If the processing is done in an asynchronous fashion, in many cases the user perception remains entirely unaware of the outage. If the processing is done synchronously, for as long as the outage is in place, our system is entirely down.

This has larger implications than you might think. If we make use of five synchronous remote services, and each one is down only one day a year (which is a pretty good failure record, when you look at it, around 0.33% downtime per year), our entire system is down five days a year, plus whatever time we're down due to our own software upgrades, maintenance, bugs, and whatever else takes the system down. In essence, our downtime becomes a factor not only of our bad habits but the bad habits of each and every service we depend on.

Third, it's not like we don't have mechanisms to asynchronously notify the user in the event something goes wrong—this is where the most popular Internet technology, electronic mail, comes into play. (E-mail has been quoted in numerous places as being more widely used than HTTP and the Web itself.) If HTTP is the king of request-response protocols across the Internet, then SMTP and POP3 or IMAP4 are the kings of messaging protocols across the Internet. Use them. If the credit card fails to go through successfully, send the user an e-mail and let him or her respond as they will—some users may not even bother to respond, thus reducing the load on the system entirely.

Finally, however, going asynchronous can help to avoid the "Slashdot effect." I use this term to describe the effect on servers when the highly trafficked geek Web site, Slashdot, posts a message regarding something interesting on the Internet. Thousands of requests per second start to flood in, and a site that was quite happily handling its usual traffic load suddenly groans under the additional weight. If you have any scalability weaknesses, you'll find them the first time this happens to you—of course, it also will typically result in your site collapsing under the strain, creating a rather ugly public-relations situation.

If our e-commerce front end chooses to do as much processing as possible in an asynchronous manner, and our back-end services can't keep up with the sudden load, we're still safe: the back-end processes will process messages as quickly as they can, even as the messages pile up. Presumably, at some point the spike will start to wear off, and the game of catch-up can begin. Over time, the extraneous messages will be processed, the queues will have cleared out, and the system will have weathered a scalability nightmare scenario. Or, in the worst case, the processors can't keep up with the messages coming in, and system administrators (who are, of course, keeping a close eye on the situation via the monitoring support you built into the system, as described in Item 12) can fire up another instance or two of the back-end processing element on another machine until the queues start to clear out.

Asynchronous processing can take place in a variety of ways. One way is to go with a completely asynchronous model, as suggested above, by using messaging and message-oriented technologies like JMS or e-mail (via JavaMail, most likely). EJB Message-Driven Beans (MDBs) make this trivial to accomplish—in the e-commerce scenario, at the conclusion of the user's shopping experience, rather than making a collection of calls out to EJBs and other remote services, we simply drop a JMS message into a Queue or Topic and return immediately. MDBs listening on that Queue or Topic will see the message, wake up, process it, and carry out any processing required. If there's a problem, such as the credit card not going through, we send an e-mail to the end user asking him or her to contact a customer service representative to resolve the failure.

Another way is to execute remote requests on a separate thread, thereby allowing the "main" thread to continue processing while we wait for the remote service to respond. This is less preferable than MDBs (or some other messaging solution), for two reasons: (1) we're still limited to the fact that we expect some kind of response and therefore can't complete our response until all our remote service requests return, and (2) it leaves our service open to a denial-of-service attack—an attacker could flood the HTTP layer with a collection of HTTP requests, and each one of those could spin off five threads to various parts of the back end behind the HTTP layer....

At a certain level, we can't eliminate request-response communication entirely—it's simply too useful and too necessary to give up completely. Authentication requests, for example, usually can't be optimized away with an asynchronous communication approach because we can't know what to show the user until we've ascertained the user's identity. In some cases, however, we might be able to play certain kinds of tricks, by putting the authentication-required user interface elements into a separate HTML frame, thereby allowing the rest of the page to render and display itself, giving the user something to look at while we finish the authentication action itself. Remember, in many respects, it's not the actual latency but the perceived latency that differentiates a "fast, usable" site from a "slow-as-a-dog" one.