July 5, 2009, 12:29 p.m.
posted by oxy
Item 3: Differentiate layers from tiersAnybody remember client/server systems? This was the dominant architectural style for enterprise applications in the previous decade and for years occupied a place in developer consciousness that was second only to object-orientation. Vendors offered toolchains that eased the construction of forms-based (and later GUI-based) programs that hid much of the grungy details of relational database access behind convenient programming constructs, and all seemed right with the world. As time went on, however, we found a couple of problems with client/server systems. The first problem was that of resource management. We found that client/server systems had an intrinsic flaw in their architecture that limited the scalability of the system to the highest number of connections the server could maintain at one time. Considering that most clients didn't use their individual connection more than (on average) 5% of the total time they held the connection open, it made sense to come up with some way for one client to "borrow" a connection from another client that wasn't using it. The second problem came from developers. We found fairly quickly that enterprise systems are more than just dialogs and data, and that the rules surrounding the data (validation rules, for example), what we now call the domain logic, were often coded in the client half of the client/server system. The danger in doing so, as any good J2EE book will tell you, is that making updates to those rules requires a new deployment of the client program, which is difficult to do smoothly as the number of clients grows, particularly if new database changes need to be deployed at the same time. We found that if we could keep domain logic out of the code that goes onto the client system, we could minimize the number of client-side deployments needed. At the same time, however, the only other place to put the domain logic in a client/server system is in the server half, and doing so meant writing code that was vendor-specific in some fashion. If the "server" was a transactional processing system like Tuxedo, then it meant writing code that was linked and compiled against Tuxedo libraries; if we were running clients against a relational database directly, then it meant that the domain logic was thus encoded in database-specific triggers and stored procedures. The concern at the time, which persists to this day, was that of vendor lock-in (despite the fact that few companies ever actually had to deal with this), in that porting your code to another RDBMS or middleware vendor would require actual work, rather than just a redeployment. (This drive to avoid vendor lock-in represents one of the great Holy Grails of our industry, and as with most Holy Grails, it is entirely impossible to achieve in a lossless manner; see Item 11.) At first, both problems seemed solvable by placing an intermediary machine between the client and the server. We could host domain logic, written in some kind of standard form, in this intermediary, and clients thus could call into the domain logic in the intermediary, rather than storing it in the client itself. This results in a lower deployment cost, what we often refer to as a Zero Deployment scenario, since new domain logic only needs to be deployed to the intermediary, rather than every user's client machine. This also results in cleaner modularization of the codebase, since changes in the domain logic can not only remain cleanly partitioned away from the actual code needed to generate the display but also ensure that domain logic is centralized within the codebase, making maintenance easier and ensuring that the logic is consistently applied. At the same time, if this domain logic is hosted in the intermediary, we can do resource management in a much easier fashion—the intermediary itself can manage database interaction, for example, thus sharing connections to the database across a much larger number of client requests. This amortizes the cost of the connection across more than one client, reducing the overall impact n clients have on the system, thus making it feasible to increase the upper boundary on the number of concurrent clients the system can support. (In fact, it becomes a question of how many clients the intermediary can support, and the total number grows substantially if we can have more than one intermediary talking to the server—this is the ubiquitous clustering scenario so frequently discussed.) You know all this; why am I mentioning it now? Because there were two reasons for that intermediary, what J2EE refers to as the application server tier, and you need to recognize that at least one of those reasons has no particular need to require an intermediate machine in place. One of the facets of Java that first gathered excitement in the development community is its inherent portability (to anywhere there's a JVM). Remember, our first experiments with Java occurred in the area of applets and the idea of mobile code—back then, we were thinking about shipping applets via Web browsers. The point, however, is the same: by changing the code stored on the server, we can silently distribute it to the client on any subsequent request. Without a doubt, applets had their drawbacks, but this intrinsic portability wasn't one of them. In short, Java code can migrate across the network in a pretty opaque fashion, thus providing a solution for pushing changes out to the client without requiring human intervention, whether through mechanisms like the applet, Java Network Launch Protocol (JNLP), URLClassLoader, and others. (See Item 51 for more details.) Certainly, for the resource management and database connection pooling aspects, we can't really help but have some kind of intermediate machine in place—we need a "gather point" where we can throttle and coalesce client requests into a single funnel. But think about the traditional browser-servlet container-database flow, and count the number of machines in place. Although we don't normally think of it this way, the servlet container is, in many respects, that intermediate machine, coalescing client requests (HTTP, in this case) against the database. When combined with a JDBC driver that natively supports database connection pooling, we've already neatly achieved the goal of resource management without the need for an EJB server, which helps us keep EJB out of the picture for anything but transactional processing (see Item 9). It's not like we can put all of our domain logic into the middle tier anyway; many domain-driven services, such as validation, need to be done at the client and/or at the server. Consider for a moment a Web application that chose to centralize all—and I do mean all—of its domain logic into its middle tier. This would mean that every form submission would have to make the round-trip back to the Web server (and possibly from there to the EJB container, if that's where the domain logic is implemented) to verify that the user entered a phone number correctly, thus earning one (or two) round-trips across the network and the commensurate cost in performance (see Item 17). Most of us would scoff at missing out on such an obvious case where JavaScript could do the same kind of validation without having to take the network hit. Consider this, though: another form of validation occurs when storing that data into the relational database, since the data needs to obey the relational database integrity constraints in order to be stored without complaint. The database schema, even when enforced by the validation rules in the presentation layer, is as much a part of your domain logic as the code you write by hand. The point is that a real difference exists between layers, different parts of the software codebase that each cover a key responsibility, and tiers, physical machines in the network topography, and that we shouldn't confuse the two. Yes, frequently layers will map to a given tier, such as the database tier and the data layer, but simply assuming that one-to-one pairing as a given eliminates a wide variety of architectural possibilities that can have powerful performance and scalability ramifications, most notably in avoiding excessive time on the network stack between any two (or more) machines (see Item 17). Consider the benefits when this distinction becomes clear. Take the ubiquitous order-tracking system, for example. The company's salespeople use the system to place orders from various customers, and the orders are passed to the shipping department to send out to the customers when the order is ready. Now consider Joe, a salesman who runs the order system off his laptop—if this is a traditional three-tier HTML-based application, Joe needs a network connection in order to place an order. If Joe's out on the golf course with a CEO who's ready to buy, Joe's not going to be happy about waiting until he gets back to the office to place the order. If this is a million-dollar order, neither Joe nor upper management will be happy; Rule Number One of sales is to never give the customer a chance to change his or her mind. (This is why some are talking about "smart client" front ends; see Item 51.) If, on the other hand, we design the system such that the presentation layer, domain logic layer, and parts of the data management layer are hosted on the client tier, and the rest of the data management layer stretches across the client tier, intermediaries (through some kind of Type 3 JDBC Network driver), and server tier, Joe can run the application entirely disconnected from the network, caching the data locally until the network connection becomes available for update against the centralized database (see Item 44). This idea becomes particularly powerful when we put messaging brokers on the client tier, in order to store messages locally until the network becomes available for delivery. When thinking about your application architecture, make sure to delineate the various layers of your system (presentation, domain logic, and data management) from the traditional tiers in the network topology (client, intermediaries, and server) in order to find the best "sweet spot" between centralization and the cost of communication. Most importantly, never make the mistake of assuming presentation layers always go on the client, domain logic always goes on the intermediaries, and data management always goes on the server. It may turn out that doing so is the best architecture, but make that decision consciously, not as a matter of habit. |
- Comment