Feb. 23, 2008, 11:01 a.m.
posted by oxy
Item 8: Define your performance and scalability goalsAn ancient proverb holds that the journey of a thousand miles begins with a single step. That's not precisely true. The journey of a thousand miles begins with a single step and a destination; otherwise, it becomes a journey of two or three thousand miles, assuming it ever actually ends. Developers are fastidious in dealing with customers and analysts when it comes to wanting to nail down exact requirements for the features and functionality of a given application. Field validation, use cases, class structure diagrams—all of these and more will be carefully and painstakingly ironed out into a document that stretches hundreds of pages long, yet nobody ever stops to ask, "Exactly how fast should this thing be?" And yet, this becomes one of the classic user complaints regarding a painstakingly architected application: "It's too slow." This of course brings back the immediate response, "Well, buy faster hardware." The conversation deteriorates pretty quickly at that point. The problem is that we developers need to know precisely what target we're aiming at if we're expected to hit it with any degree of success. Just as we need to know what the features must look like, how the pages must flow, and how the shopping cart must behave in the event of a VIP customer placing a $150 order over a holiday weekend, we need to know how fast the application has to be if we're to meet user expectations. Unfortunately, this is easier said than done. Users speak in plain, simple terms: "It needs to be faster. It's too slow. It takes forever." As sympathetic as we may be to these sorts of reactions (c'mon, we've all felt the same way, sitting there impatiently at the Web browser, waiting for Amazon.com or some other Web site to finally respond), they don't give us much to go on. What we need is something quantifiable, rather than just gut-level intuition that, more often than not, doesn't exactly match up with what our users say they want. What makes it worse is that the very things we need to identify and quantify are notoriously difficult to nail down. Terms like performance, scalability, and capacity get tossed around together in the marketing bowl to produce a stew that's wonderfully attractive but hideously difficult to identify. Popular myth holds that the terms performance and scalability, if not precisely synonymous, are close enough that improving one will quite naturally improve the other. What's good for performance must be good for scalability as well, right? Not only do performance and scalability mean two very different things, but improvements to one often hurt the other. Other books may use different terminology, but for the purposes of this book, we'll define these and related terms as follows.
Frequently, optimizing a system for one of these will in turn help optimize for the other—for example, minimizing contention for resources will not only improve the application's scalability but also lower its latency (since now the system spends less time waiting for locks to be released). However, it's also possible to add enhancements to the system that benefit performance at the expense of scalability or vice versa. For example, a developer may look at the current performance of a system and decide that it's running too slowly. He decides that the system is making too many trips to the database and that the solution is to cache data in the application server so as to eliminate network I/O if he can. (His first mistake is making this decision without consulting a profiler first; see Item 10.) So he writes a generalized caching mechanism, spends a month fine-tuning the cache algorithm, and starts caching data in the application server. Cue credits, we all go home happy, right? Unfortunately, no—the story's not that simple. Because many people access the system all at once, the cache needs to be somewhat "global" in order to hold data for all of these users. And if the data has changed, we need to hold all the users at bay while we update the cache with the new data. The developer has perhaps improved performance by adding the cache but has also introduced a new contention point—access to the cache for data—and therefore hurt scalability. In some extreme cases, depending on usage and the synchronization policy of the cache lock, the time gained due to the removed I/O trip is more than lost due to the synchronization lock on the cache. Even worse, this cache-based implementation doesn't scale well; if the system later moves to run on clustered application servers, the cache must now be replicated across all of the nodes in the cluster, meaning that the system now takes both the latency hit of an additional I/O across the network (to check against the global cache) as well as the contention hit of the global cache lock. Even if the cache has quite an efficient locking mechanism, such that the cost of checking the cache is zero (which will never happen, by the way), scalability is still affected: the cache now occupies memory on the machine, which in turn reduces the total number of concurrent clients that machine can process. The larger the cache, the fewer clients we can support; the smaller the cache, the less effective it becomes. (Sounds like the cache size should probably be hot-configurable; see Item 13.) To be quite honest, many of the performance problems in enterprise systems aren't, in fact, performance problems—they're scalability problems. The system performs poorly because it's blocked waiting for access to some shared resource that everybody else needs, too. If the system does application-level audit logging to track users' actions in the system, and the developer simply uses the default isolation level (see Item 35 for details) when adding rows to the audit-log table, the database is taking out reader/writer locks on the table—meaning other writers are held at the gate, waiting for the one writer to finish before any others can get started. If we remove the unnecessary locking, the database pumps can add rows into the table much faster—thereby reducing the overall latency of the application. The tradeoff in the caching example was one of performance for scalability. If the goal of the system is to minimize the response time for each particular user, even if that in turn reduces the capacity of the system, this was a successful step. If, however, the system needs to support as many concurrent users as possible, regardless of the response time for a particular user,[4] this was a terrible step to take and hurt the application's overall ability to meet its goals. So, in the final accounting, was this a successful optimization? That all depends on what the goals of the system are, and if those goals are never stated outright, as developers we'll never know which decisions to make.
At a practical level, this means a couple of things. First, make sure to optimize optimally (see Item 10) by ensuring you know which 20% of the operations and/or use cases your users are executing 80% of the time. Look for ways to ensure that the users' perceptions of the computer's activity are as short as possible. Note the peculiar phrasing of that last sentence; "the users' perceptions of the computer's activity" is the key here—use techniques and tricks to reduce the amount of time the user is physically unable to move on to the next task. Use multiple threads, use direct database access approaches, use whatever seems appropriate to minimize the user's sense of the system's performance. Find the hook points (see Item 6) that can provide the necessary optimizations if necessary. Or, in some cases, simply build a layer that moves the user out of the physical blocking call—spin off a thread, or post a message to a JMS Queue or Topic for further, asynchronous processing. Frequently, the solutions used here may not actually have an impact on real performance, but as long as the user feels it is fast enough, that's often good enough. Performance and scalability are two obvious elements of importance in any enterprise system. Treat them as you would any other client feature request: document precisely what's meant by goals like "fast" and "acceptable performance" either through hard numbers or (more likely) some kind of reference point mutually acceptable to both developers and users: "It should have a user interface at least as fast as the system we're replacing." While that takes care of performance, nail down in concrete terms the expected loads and concurrent loads on the system; how many users are expected to use it during a 24-hour period, during a 1-hour period, and so on? Knowing these details at the start of the project will enable you to make better decisions about when to make the inevitable performance-against-scalability tradeoffs. Most importantly, though, having these goals explicitly stated tells you when you can stop performance-tuning, and that's just as important as knowing when to start. |
- Comment