May 11, 2008, 9:24 a.m.
posted by oxy
Item 56: Validate early, validate everythingIt's a well-understood fact that users won't always give you everything you need; in fact, it's pretty much guaranteed that your average users will forget to provide some crucial piece of information at exactly the worst time. On an e-commerce application, users will forget to fill in credit card numbers. On a portal, users will forget to provide usernames and/or authentication credentials (password or hard-token values). Or worse, they'll provide the data, but it'll be wrong—typos, misinterpretations about what the data should be, or just the classic "Whoops, I selected the wrong thing." It's a fact of programmers' lives that we have to validate what users tell us—but in an HTML-based application, we don't need to punish users when validation failures trip. Consider, for a moment, the classic user-registration form typically required before reading white papers, downloading product trials, or accessing specialized content reserved for "premium" members of the company's target audience. When the form is submitted, what needs to happen? For starters, we'll need to verify that there's actually some kind of data by which we can identify the user for future requests; certain fields within the form are required fields, which cannot be left blank without breaking a fundamental assumption of our system. For example, we're probably going to require a user's full name (first name and last name), as well as a username and password to use for subsequent visits to the site. If the user fails to provide that data, we need to ask him or her to provide it again. Second, certain fields have some data-validation requirements associated with them; for example, assuming this form is for North American customers only, phone numbers must be in the form NNN-NNN-NNNN, and zip codes for U.S. customers must be in U.S. Postal 5+4 format: NNNNN-NNNN. This kind of syntactic data validation ensures that the data provided by the user fits a certain expression pattern and disallows anything other than numbers for the two fields above (zip codes and phone numbers). We could also require that the zip codes provided must match the city cited in the address, and we could even go so far as to verify that the street address exists within the city's boundaries (although this is certainly further than most sites go). Of course we will want to verify that the credit card number submitted matches the name submitted as part of an e-commerce order, and we'll probably want to take advantage of some of the credit card companies' latest attempts to prevent credit card fraud by asking for the new "validation numbers" that appear printed on the back of the card itself. Or we'll want to verify that the product number the user selected in fact actually exists within the catalog, or that the options the user has selected for this particular product in fact make sense for that product (according to some online PC manufacturer sites, some PCs can't have both a DVD and a motion-capture card, for example). This is semantic data validation, and it certainly has just as much importance as syntactic data validation. Here's the biggest question: When should we validate all this stuff? Answer: As early as possible and as comprehensively as possible. The simple fact is that the earlier we can catch some of these simple data-entry errors (the syntactic data validation), the earlier we can get the user to correct them. You've seen—and probably swore to avoid in the future, just as I do—sites that don't actually validate the data until the form has been submitted back to the server. You click the Submit button, wait the necessary 5–10 seconds for the server to receive the data, process the request, and send a response, and are greeted with text, usually in red boldface type, telling you that you forgot to fill in some particular field of the form. If the site is particularly bad about it, the original form is now gone, and hitting the Back button sends you to an empty form, forcing you to type everything all over again. Worse, the original form doesn't tell you which fields are required, leaving you to either (a) fill out every single field on the form, making up answers if necessary, or (b) continue to play "Data Entry Roulette" until you happen across the mystical combination of fields that lets you past the form-validating troglodyte at the gate. Even worse are those applications that validate field by field on the server, returning as soon as the first data validation error is hit, forcing the user to fix that one error and go through another trip back to the server only to find out that the very next field was also bogus. These are all acceptable reasons for murder, in the minds of most users. The main problem is that it's tedious and awkward to do validation correctly, not to mention that it flies in the face of programmers' basic instincts. For example, in a library API, where a method expects to receive several String arguments that are neither null nor empty, it's perfectly acceptable to write code this way:
public class NiftyLibrary
{
public static String transmogrifyStrings(String[] source)
{
// Verify that none of the Strings are null
for (int i=0; i<source.length; i++)
if (source[i] == null)
throw new IllegalArgumentException("No nulls!");
// Verify that none of the Strings are empty
for (int i=0; i<source.length; i++)
if (source[i].equals(""))
throw new IllegalArgumentException("No empties!");
// Do the rest of the work
}
}
Here, we're supporting the required precondition that none of the parameters can be null or empty by throwing an exception as soon as we've detected the error condition. So why shouldn't we support the same kind of error-handling/failed-validation logic in our user interface? This sort of thinking fails miserably for user interfaces because at the UI level, particularly for HTML applications, each user action requires a network round-trip to carry out the request, whereas in a library function, this is entirely internal to the JVM.[3] Therefore, to follow the advice of Item 17, we need to maximize the benefit from each network round-trip in order to keep the number of round-trips to an absolute minimum.
Assuming we're processing form data in a controller servlet (or filter) or some other MVC-like processing agent, as prescribed by Item 53, then all the form validation is happening in a standard Java class; we'll assume a servlet for simplicity's sake. While it would be tempting to write the servlet's validation code like the library code just shown, we need to validate extensively before returning an error condition to the user:
public class RegistrationServlet extends HttpServlet
{
public void doPost(HttpServletRequest req,
HttpServletResponse resp)
throws ServletException
{
// Capture all validation errors in one place
//
List validationErrors = new ArrayList();
// Verify that required fields are non-null
//
if ( (request.getParameter("first_name") == null) ||
(request.getParameter("first_name").equals("") )
validationErrors.add("First Name must not be empty");
if ( (request.getParameter("last_name") == null) ||
(request.getParameter("last_name").equals("") )
validationErrors.add("Last Name must not be empty");
// . . . and so on . . .
// If there were errors, send the user back to the original
// form to reenter the data
//
ServletContext ctx = getServletContext();
if (validationErrors.size() > 0)
{
request.putAttribute("validationErrorsList",
validationErrors);
RequestDispatcher rd =
ctx.getRequestDispatcher("/registration.jsp");
rd.forward(request, response);
}
// Otherwise, pass them on to the goodies behind
// the registration page
//
else
{
RequestDispatcher rd =
ctx.getRequestDispatcher("/premium/index.jsp");
rd.forward(request, response);
}
}
}
Notice that we send the user back to the original JSP page containing the form; in that page, we make sure to put the validation errors at the top of the form and make sure to extract any data submitted as part of the request back into the fields the user filled out. (Frequently, the browsers cache that data themselves, but why take chances?) <%@ page language="Java" % As you can see, providing this is not overly difficult but somewhat tedious, particularly if the JSP page designer is not a programmer. Fortunately, this is why JSP now supports tag libraries:
<%@ page language="Java" %>
<%@ taglib uri="http://www.host.com/HTMLSupportLib"
prefix="html" %>
<html> <!-- the usual head, title elements -->
<body>
<html:validationErrors>
<h2>There were validation errors; please correct the following and submit the page again:</h2>
<ul>
<html:validationErrorList>
<li><html:validationErrorText /></li>
</html:validationErrorList>
</ul>
</html:validationErrors>
<form method="POST" action="/servlet/RegistrationServlet">
First Name: <html:textInput name="first_name" /> <br />
Last Name: <html:textInput name="last_name" /> <br />
<!-- and so on -->
<input name="submit" type="submit" />
</form>
</body>
</html>
The validationErrors, validationErrorList, and validationErrorText are all custom tag handlers that rely on the preceding servlet putting a list of validation errors into the HttpServletRequest under the name validationErrors, just as before, but they simplify the page logic to make the JSP page author's life easier. In addition, the textInput tag handler examines the incoming HttpServletRequest, and if it finds an input parameter that matches its name attribute, prepopulates the value of the HTML field with that data. Suddenly, it's not so tedious anymore. But we still suffer from the same problem as before—in order to do any validation of the form fields, we have to go back to the server to do it. As Item 17 explains, we need to keep our time on the wire to a minimum if we want to keep latency low and scalability high. Fortunately, most HTML browsers offer a client-side alternative to doing validation on the server: scripting languages like JavaScript (Netscape's flavor of the standard ECMAScript language) for Netscape browsers and either VBScript or JScript (Microsoft's flavor of ECMAScript) for Internet Explorer. While most semantic data validation is impossible (or in some cases simply difficult) from within the client-side scripting environment, certainly most syntactic data validation is quite within the boundaries of those languages. Again, however, putting that code onto every form-containing page is tedious, and again, this is where the tag libraries can offer help. For example, the textInput tag can add some script support to the <input> element it will emit to ensure that on submission of the form (or on leaving the field, or on clicking a Validate button, or any of a dozen other possible events), the first_name and last_name fields aren't empty; if they are, pop a dialog box, stop the submission, and let the user fix the problem before we hit the network wire. We'll have to code the tag libraries somewhat intelligently to take browser versions into account, but fortunately we'll only have to write this code once. More importantly, as explained in Item 51, we cannot assume that the scripting support will always be available, so we'll still need to do the server-side validation. The client-side script makes users' lives easier; the server-side validation makes programmers' lives easier. Use both liberally wherever possible, particularly if they can be codified within JSP tag libraries. (The Jakarta Taglibs project, at http://jakarta.apache.org/taglibs, has several tag handlers already written that provide this kind of support, and the Struts library, at http://jakarta.apache.org/struts, also provides similar sorts of tag handlers that are more deeply tied into the Struts architecture.) By the way, don't rely on coding the text red to signify validation errors—remember that many people in the world are red/green color-blind and won't be able to recognize the fact that the text is red. Without some kind of supportive text, they'll be clueless as to why their form submission failed. And while we're talking about error messages, make sure your error messages are actually comprehensible to the nondeveloper crowd—get somebody other than a developer to review the messages and decide whether they make sense. Ultimately, remember that client-side validation serves one purpose: to make it as gentle as possible on the user entering data into your system. You're not going to give up on server-side validation (not after reading Item 61, anyway), but you want users to be able to catch their mistakes as early as possible so as to avoid the round-trip (see Item 17) between client machine and servlet container on every data-entry error. As awkward as it is to do, build validation systems that validate all the user's input at once, and offer a list of all the items that need correction. Remember, at the end of the day, the system is for their use, not yours, so you'd best make sure it's easy to use. User-friendly validation is a key component of that goal. |
- Comment