Item 52: Keep HTML minimal



Item 52: Keep HTML minimal

A Web application is different from a standard GUI application in a couple of ways, most notably in the fact that the server sends back everything to the client—not only the data to be displayed but also the actual formatting codes necessary to display it. Like the old mainframe terminal-based applications, the response returned by the HTTP server includes both data conforming to the user's logical request as well as the HTML display elements framing it, along with whatever client-side scripting is used to provide behavior within the browser.

It's an unfortunate fact of the Web that Web pages seem to take forever to download, even in an era of high-speed broadband and cable modem hookups. We've all been there—click on a hyperlink or submit a form, and it just...takes...forever to get any kind of response. In the meantime, the user seriously contemplates whether he or she wants to have anything to do with this Web site or Web application ever again. For an intranet application deployed to users inside the corporation, holding users hostage to a slow-responding Web application is a quick way to earn a nasty reputation within the company. For an Internet application used across the globe, this is an even quicker way to make sure that those users never come back, which usually translates into lower earnings for the company, which in turn usually leads to severed paychecks for the developers. Remember, UI studies have shown that users will wait, on average, about five seconds before giving up and moving on to something else.

What is it, exactly, that takes an otherwise well-built, well-behaved application and turns it into a snail? A large part of it is the HTML being returned. When an HTML page contains dozens of references to images, large and small, scattered all over the page, the page as a whole seems to drag to a crawl as the browser is forced to go back to the server over and over again to download those images. Does the Web site really need mouse-flyover image-switching graphics buttons for a main menu? Or a footer of ivy leaves twined around the copyright statement? Or the company logo in the upper-left corner of every page? I'll be the first to admit that these things make the page look pretty, but after they've been seen once, they just fade into the background in the user's mind. Worse, though, they still need to be displayed, which means they still need to be downloaded every time. (A good browser will cache what it can, but there are limits to what can be cached, particularly if the images themselves are somehow dynamically generated.) Even beyond that, consider the sizes of the images themselves—if they have any decent size and color depth at all, they can measure well into the hundreds of kilobytes, all of which have to move across the network from server to client.

It's worse than just end-user latency, though. All that stuff has to move from server to client, which means the pipeline between the two is being used to send gratuitous fluff. This in turn reduces the available bandwidth for other clients, meaning that the application's overall scalability is reduced. Unfortunately, this is one of the few areas that can't be solved by throwing more server hardware at the problem—the pipeline coming into your building or data center is the bottleneck, and widening that can often be a very expensive proposition.

In order to keep poor HTML from crushing your application, keep a couple of good-neighbor HTML tips in mind.

  • Minimize use of "heavy" tags. APPLET, OBJECT, and IMG are all "heavy" tags in the sense that they don't provide the browser with everything it needs to do its job—another HTTP round-trip back to the server is necessary, meaning the user interface is going to go on "pause" in the meantime. Many browsers continue to parse and process the rest of the page but leave a big ugly empty space where the image/applet/whatever is supposed to go. That's part of what makes the page "feel" slow to the user.

  • Use frames to help separate portions of the page. If the Web application has a main menu bar running across the top, put that into its own (borderless) frame so that the menu only needs to be pulled across once. Ditto for the copyright banner across the bottom of the page.

  • Where possible, reuse images on a given page. Because browsers (and intermediate processing nodes, like proxy servers and firewalls) tend to cache data downloaded from a given site, improve the page's performance by repeating the same images across pages. This allows the browser to use the cached image rather than force an extra download from the source Web site. The image's references have to match precisely, however, so make sure all your images are coming out of a common directory on the server in order to unify the URLs to the images across pages.

  • Use HTML features rather than images. HTML tags offer a wide spread of functionality and require no additional download to display. For example, instead of creating a graphical "button" by putting an image inside a hyperlink, use standard text with various background colors, foreground colors, and fonts to achieve the same kind of interface. This saves a round-trip (see Item 17) to the server to pull the image.

  • Subject to the guidelines of tasteful and useful user interfaces, avoid excessive page navigation (or, in other words, avoid round-trips—see Item 17). Wizard-style interfaces done in a Web-based fashion tend to break down fairly quickly when each step in the wizard is separated by a 10-second pause waiting for the server to respond with the next step. Try to combine steps into a single form, or if possible, discard the wizard-style interface altogether.

More suggestions for designing an effective HTML-based application can be found in The Design of Sites [Van Duyne/Landay/Hong].

While we're at it, let's get another thing out of the way. Raise your right hand, put your left hand on the cover of this book, and repeat after me:

"I, <your name here>, do solemnly swear on pain of permanent caffeine abstinence that I will ensure that all HTML output returned from my Web applications is XHTML compliant, well formed, and standardized. My tags will be lowercase, my attributes will be complete name-value pairs, values will be quoted, and all tags will either be balanced with both start and end tags or else written in the short form. I will use no element that is not standardized by the XHTML Specification, and I will never, ever consider the use of the BLINK tag to be good form, no matter how appropriate it may seem at 4 A.M. after a long debugging session."

There, that wasn't so hard, was it? XHTML 1.0 is a simple XML codification of the HTML 4.01 standard, essentially assuring that an XHTML-compliant page can be parsed by an XML parser successfully. Why is this so important? Because if the output of your servlet/JSP can be consumed by an XML parser, it can also serve as the input to an XSLT process, which could be used if necessary to render the XHTML back into something less compliant for those users who have older browsers. If your output isn't XHTML (which is to say, if it isn't well-formed XML), this transformation can't be done using XSLT, and you'll be facing the rather scary prospect of having to write this adaptation layer by hand using an HTML parser. I can't think of a worse waste of time.

HTML is wonderful in the sense that it offers a machine-independent way to describe a presentation layer in a relatively concise way, but remember that every page has to be downloaded across the wire to the user. By definition, this is a network round-trip, so make sure each reach back across the network to the server is justifiable and necessary. This means having to take a different approach to designing the user interface, away from traditional GUI approaches in places and favoring a more terminal-based approach instead. Make sure your HTML isn't trying to be a GUI application—that's usually when things start to fall apart for the well-meaning HTML application, a pretty clear sign that you should be thinking about other presentation alternatives (see Item 51).