Item 61: Always validate user input



Item 61: Always validate user input

The Open Web Application Security Project (http://www.owasp.org), as mentioned in Item 60, is an open-source collaborative effort designed to help organizations (particularly, as of this writing, developers within those organizations) recognize and understand the details of securing Web applications and Web Services. As part of this effort, OWASP has produced a Top Ten Web Application Security Vulnerabilities document, in the same style as the SANS/FBI Top Twenty list.

The number one vulnerability on the OWASP list is Unvalidated Parameters. In fact, three of the ten directly deal with user input in some way: Unvalidated Parameters (#1), Buffer Overflows (#5), and Command Injection Flaws (#6). If we extend the idea of handling user input a bit, we can also include Cross-Site Scripting Flaws (#4) and Error Handling Problems (#7). Clearly, how we deal with user input is critical to building a secure system. And when you stop to think about it for a moment, it's pretty clear why this is the case: anywhere the system accepts user input is an open door for an attacker to try to come through.

When building an HTML-based application, validating user input is an absolutely crucial step in securing the application. Unfortunately, too many developers trust the technology to take care of this for them and leave themselves exposed to sneaky, underhanded tactics by attackers. For example, numerous Web applications, in keeping with Item 56, make heavy use of client-side script to perform most, if not all, validation in the browser. It's fast and it prevents a round-trip; we can manipulate the UI elements directly if we need to change the user interface, rather than having to go back to the server and regenerate a new screen.

As a matter of fact, the client-side validation story seems such a good idea, why bother validating the input again on the server? After all, the browser will execute the script as the user and/or would-be attacker move around on the page, and since we (presumably) made sure the user's browser had scripting support turned on from the entry/login page, why duplicate effort to do all that validation again on the server? It's just going to burn CPU cycles in a redundant effort, and wasting CPU cycles hurts not only performance but scalability as well.

The problem, once you stop to think about it, is that attackers don't play by the rules—they don't limit themselves to attacking your application through the browser. Many attackers have a strong enough knowledge of HTTP that they can mock up the form submission themselves using nothing more sophisticated than Telnet, and since most Telnet clients haven't yet been extended to support JavaScript, all your wonderful client-side validation logic goes right out the window.

The brutal truth is that as a developer, you have to assume that the client-side validation logic failed to execute somehow, and on every user input submission, go through a rigorous set of validation checks to ensure the user input has been thoroughly scrubbed for all sorts of nasty kinds of input-based attacks.

Parameter validation errors come in a variety of sizes and shapes. For example, as part of testing your Web application, every time users are asked for input, and even in cases where they're not but incoming data is expected to be there—HTTP headers, cookie values, and so on—try passing in a variety of "bad" data and see how well your Web application reacts. What happens when you pass in:

  • Null?

  • Characters from a different character set? Unicode characters? Unprintable ASCII characters?

  • Zero-length parameters? 1K parameters? 10K parameters?

  • Numbers where characters are expected, and vice versa?

  • Duplicate data (the same parameter twice in the same form, for example)?

You might consider this to be more in the realm of "QA weirdness" because, of course, no user would ever pass a name where a phone number is expected, but remember, we're not talking about users anymore. We're talking about attackers, and they don't play by the rules.

As an example, let's start with the canonical user login page. In many, if not most, systems, user authentication details are stored in an RDBMS table, typically in a rather simple two-column table consisting of the user's login name (uid, a VARCHAR 20 characters long) and the user's authentication credentials, in this case a password (password, another VARCHAR 20 characters long). Any other user-specific details stored in this table, such as configuration settings, personal data, authorization settings, and so on, are for the moment irrelevant.

In the usual fashion, we'll build a login JSP page, the heart of which will look something like this:






<form action="/LoginProcessor" method="POST">

Your Username: <input type="text"

                      name="username"><br />

Your Password: <input type="password"

                      name="password"><br />

<input type="submit" value="Log in">

</form>


So far, so good. The LoginProcessor URL is mapped to a servlet, the key parts of which need to issue a SQL statement against the database to see if this uid/password pair exists within the table. Nothing could be simpler, right?






Connection conn = getConnectionFromSomewhere();

Statement stmt = conn.createStatement();

String SQL = "SELECT * FROM users " +

  "WHERE uid = '" + request.getParameter("username")+

  "' AND password = '" +

  request.getParameter("password") + "'";

ResultSet rs = stmt.executeQuery(SQL);

if (rs.next())

{

  // User/password pair was there; go ahead

  // and forward to the next JSP in the page flow

}

else

{

  // Whoops! User failed to authenticate safely;

  // keep track of the number of times this user

  // fails to authenticate (see Item 60). After

  // a few more tries, lock out the account in case

  // it's an attacker trying a brute-force attack.

}


We're done, right? We check that one off the task list and move on to the next story/task/whatever.

Unfortunately, no, we're not done. As it stands, this code is vulnerable in a big way to a command injection attack (OWASP Vulnerability #6). A command injection attack occurs when user input can carry executable code that will be executed by some layer of the system when processed; in this way, it's conceptually very similar to a buffer overrun attack, which seeks to smash the stack instead.

To see this sort of attack in action, let's switch hats for a moment and take on the perspective of the attacker. One of the first tricks to try is to blindly launch an injection attack and see if it's successful in some way ("success" here is any indication that an attack might work somehow). For example, the attacker may submit a username of 'SELECT', that is, a single quote, the word SELECT, and another single quote, and see what he gets back. In the earlier code, the single quote in front of the text will turn off the single quote in our SQL statement, thereby leaving the SQL parser to assume that the string is supposed to be empty, then it will see the SELECT text, which is of course a keyword in SQL, in the wrong place to successfully parse, and it will throw a SQLException.

So...exactly what did we do with the SQLException again?

There are two different scenarios. In the first, ideal, scenario, we catch the SQLException, regardless of the error, and display a nicely formatted page that tells the user that something went wrong, and would they please try again? The problem here, unfortunately, is that displaying this message on a failed login is going to confuse the user, so we'll also have a different page that tells them their login failed, and please try again. Thanks to our differentiation between bad user input and a failed SQL call, however, we've given the attacker an important hint. In fact, this differentiation is music to the attackers' ears—he now knows that his single-quote-escaped input somehow screwed up the SQL statement, which means that a SQL command injection attack just might work. So he continues the attack.

In the second scenario, which is the unfortunate default, we just catch the SQLException and rethrow it as either a ServletException or an IOException (since that's what the servlet doPost method is declared to throw), figuring that if a SQLException ever occurs, it must be a programmer error and we, at least, want to see the stack trace. Which is absolutely awful, because now when the attacker conducts his command injection experiment, he'll see a stack trace that probably reads something like, "Malformed SQL statement: unexpected SELECT" or something similar in the message portion of the exception. To an attacker, this roughly translates into, "Green light! Keep going! You're almost there!"

So our wily attacker chooses his next command injection attack with a bit more care. If our attacker wants to gain access as a particular user, he could submit the login form as:






username = boss'; SELECT pwd FROM users WHERE password = 'foo

pwd = foo


Plugging this into the SQL statement that's going to be dynamically constructed inside the servlet, we start to get a Really Bad Feeling:






SELECT * FROM users

  WHERE username = '



boss';SELECT password FROM users WHERE password='foo



' AND password = '



foo



'


Reformatted to be more human-readable, we get:






SELECT * FROM users WHERE username = 'boss';

SELECT * FROM users WHERE password = 'foo' AND password = 'foo'


Now, because the semicolon acts as a statement separator in many SQL dialects, where we thought we were executing one SQL statement, thanks to our attacker's rather unorthodox input, we're now executing two of them. How the JDBC driver will react depends, of course, on the actual JDBC driver in use, but most of them will simply support the two ResultSet instances as "extra results," meaning to get to the second ResultSet we need to call getMoreResults. In our servlet code, however, we're just checking to see whether there are any results in the first ResultSet, and since the query succeeded (assuming, of course, our attacker got the username correct, which usually isn't a hard thing to do), our attacker is now logged into the system as bigboss, which is probably a Bad Thing.

Assume our attacker isn't interested in being so subtle, however:






username=boss'; DELETE FROM users WHERE password != '

password = <empty>


Again, making this into readable SQL statements, we get a really nasty turn of events:






SELECT * FROM users WHERE username = 'boss';

DELETE FROM users WHERE password != ";


Suddenly, our system administrators' vigilance in not allowing empty passwords works against us as all the users in the system have their login credentials gleefully erased by the ever-helpful SQL executor in the RDBMS. Suddenly nobody can log in, and you get a phone call at 3 A.M. from the CTO and/or VP demanding to know why your system is suddenly not letting anybody in. This is not the kind of attention you want from upper management.

It gets even more interesting; assuming your system has some kind of role-based authorization scheme, and that (as is typical) your "root" account is the first one installed in the users table, an attacker can do this:






username = boss' OR 1 = 1 --

password = <irrelevant>


Which, again, turning this into human-friendly SQL, yields:






SELECT * FROM users WHERE username='boss' OR 1=1 -- AND

    password="


Note that the double-dash is an end-of-line comment character in most SQL flavors, thus rendering the check for the password entirely irrelevant. Since 1=1 is an always-true predicate, the entire users table will be returned from this query, and since the ultra-empowered administrator account will be the first row returned, that's what our intrepid attacker will be logged in as at this point. Oopsie.

It's not just the login page that potentially has this vulnerability, either, folks. It's everywhere that servlet code directly takes input from the user and creates a SQL statement out of it. Actually, this vulnerability exists anywhere servlet code takes input from the incoming HTTP request and creates a SQL statement out of it, regardless of whether the user actually was supposed to type something into a form or not. The classic culprit here is to put data into a hidden form and use that as part of the created SQL statement:






String SQL = "INSERT INTO order_totals VALUES (" +

  // . . . Other data, like order ID and items,

  // go here . . .

  (totalPrice *

    Float.parseFloat(request.getParameter("discount"))) +

  ")";


where "discount" is a hidden field set by an earlier page via JavaScript. So our wily attacker does a View Source on the HTML page (or just hits the URL directly with Telnet), sees the hidden "discount" field, and suddenly he's buying all sorts of stuff with a 99% discount. (If he did it with a 100% discount, the total would be 0, and that might trip other alarms further down the line. This way he's getting almost the same benefit with lower risk.)

We could go through more examples like this, but they're all variations on the same theme. What's important at this point is to figure out what to do about it.

In the specific case of the login and the order-processing servlets, the first step is to stop constructing the SQL statements directly from user input. Our first reaction might be to write validation code that would somehow trap and exclude the SQL code in the input fields (in the case of the login servlet), but this won't help the order-processing servlet, particularly if there's an actual business case that allows certain people to purchase items with a 99% or 100% discount. (CEOs do this all the time, for example.)

Fortunately, another, simpler answer for SQL statements exists: never use a Statement to do the SQL based on user input. Doing so forces you to worry about taking care of character escaping, which is why the attacker's single-quote trick works in the first place. Instead, if you make use of a PreparedStatement, not only will you get a possible performance boost (see Item 49), but the PreparedStatement mechanism has to take care of properly escaping the input passed in as input parameters to the PreparedStatement. So now our login servlet reads like this:






Connection conn = getConnectionFromSomewhere();

String SQL = "SELECT * FROM users " +

             "WHERE uid = ? and password = ?";

PreparedStatement prep = conn.prepareStatement(SQL);

prep.setString(1, request.getParameter("username");

prep.setString(2, request.getParameter("password");

ResultSet rs = prep.executeQuery();

// Rest as before


Now, assuming the driver isn't written to allow an injection attack (which is a distinct but rapidly shrinking possibility—make sure you've tested your driver against this possibility, among others; see Item 49), a SQL command injection attack attempt will fail with a SQLException.

Not only SQL commands have this vulnerability, however—any time data is being passed to an external system for execution, such as creating a command string to be executed by an external shell, you're back to worrying about validating user input. Fortunately, JDK 1.4 introduces support for regular expressions on strings, which makes this kind of validation simpler. It turns out to be much easier to use a regular expression to verify that a given string doesn't contain the kind of escape characters that make an injection attack possible.

By the way, if your site allows users to post comments on the page using HTML, you're vulnerable to a cross-site scripting attack (OWASP Vulnerability #4), meaning an attacker can put malicious HTML input onto your site that engages in an attack against the user's browser when the user views your site. This is not the way to build good customer relations. (Writing Secure Code gives an example of using a regexp to validate against a cross-site scripting attack [Howard/LeBlanc, 430].)

In some ways, it would be nice if Java supported the idea of a "tainted" string (i.e., one that comes in from outside the JVM), as Perl does, but neither Sun nor the Java Community Process is showing any signs of adding this to the language, so we'll have to live without it for now. If your developers are a disciplined lot, you could create a TaintedString class like the one shown here that does all sorts of validation on the data, and use that to wrap any end-user input:






public class TaintedString

{

  private final String endUserData;

  private boolean validated = false;



  public TaintedString(String input)

  {

    endUserData = input;

  }



  public void validateForXSS()

  {

    validate(new XSSTaintValidator());

  }

  public void validateForSQL()

  {

    validate(new SQLTaintValidator());

  }

  public void validate(TaintValidator v)

  {

    try

    {

      v.validate(endUserData);

      validated = true;

    }

    catch (SecurityException secEx)

    {

      validated = false;

    }

  }



  public String getString()

  {

    if (validated)

      return endUserData;

    else

      return null; // Or throw an Exception,

                   // whichever you prefer

  }



  public interface TaintValidator

  {

    public void validate(String data);

  }



  public static class XSSTaintValidator

    implements TaintValidator

  {

    public void validate(String data)

    {

      // Run through validation scenarios to

      // make sure data doesn't contain an XSS

      // attack; if it does,

      // throw a SecurityException

    }

  }



  public static class SQLTaintValidator

    implements TaintValidator

  {

    public void validate(String data)

    {

      // Run through validation scenarios to make

      // sure data doesn't contain an SQL injection

      // attack; if it does,

      // throw a SecurityException

    }

  }

}


The idea here is that once the String is handed into the TaintedString, it must be validated either by using one of the provided validating methods against a particular kind of attack—cross-site scripting and SQL injection are two examples—or by passing in a customized Strategy object implementing the TaintValidator interface to validate against some other malicious input. Assuming the validation succeeds (i.e., the validate method of the TaintValidator doesn't throw a Security Exception), the validated flag is set to true, and we can obtain the input via getString. If the validation failed or hasn't been called yet, getString returns null (or, if you prefer, can throw an Exception).

The hard part about using this, however, is that developers must have the discipline to take any user input and pass it into the TaintedString before using it further. Since neither the Java language nor the JVM itself support TaintedString, there's no way to enforce this beyond code reviews and source-code scanners. But if the developers can force themselves into using TaintedString for input, they can save themselves from that 3 A.M. phone call, and that's pretty good motivation in my book.

Oh, and for those of you who have taken Item 51 to heart and are using a rich client for your front end and some kind of alternative communication layer back to the server for data exchange, don't think you're out of the woods here. Web Services are fast becoming a favorite way to allow rich-client applications to talk to back-end middleware systems, and to the wily attacker, there is no difference between a Web application serving HTML and a Web Service serving XML. In fact, you must be more careful with the Web Service since scrubbing incoming Web Service data for security intrusion attempts is still below the radar of most Web Service developers. Input is still input, whether it's in HTML format, XML format, or RMI/JRMP object-serialization format. If it's coming in from outside this process, validate it every way you can think of, or it's a fair bet you'll be sorry you didn't someday. (Ring! "Uh, hi, Mr. CEO, sir....Meeting? With the legal team? Five minutes? " Gulp. "Absolutely, sir.")

Regardless of what your front end is, remember that any time you take input directly from the user and feed it into a database query (or any other sort of interaction with a back-end system; I'm firmly convinced that SQL injection attacks are only the most popular of the resource-script-injection attack genre, not the only kind possible), you're taking it upon yourself to be more clever than the attacker, and that's a pretty arrogant assumption. Instead, rely on the mechanisms within the various layers of software you work with, such as the escaping capabilities of the JDBC driver, to protect yourself from malicious user input. And, as a result of reading Item 60, you of course assume that all user input is trying to be malicious in some way.