April 11, 2009, 10:27 a.m.
posted by oxy
Item 49: Know your JDBC providerDespite all the revisions and years behind it, the JDBC Specification still doesn't mandate a large number of particulars—deliberately. For example, when a ResultSet is created, typically (although again, this is purely by convention, not a requirement) the ResultSet holds only the first N rows, where N is some number that sounded good to your JDBC vendor. For some of the major vendors, this N value is—I'm not kidding you—one. Fortunately, this value is not only discoverable but also configurable via getFetchSize and setFetchSize on the ResultSet API, but the point still remains: Do you know what the default is, and is it acceptable? Be sure to check the Boolean return value from setFetchSize to ensure you aren't asking for a fetch size that's larger than what the driver and/or the database supports. Many features of JDBC aren't available, depending on the capabilities of the driver, and your decision regarding how to build the JDBC code using the driver could be deeply affected based on this information. For example, when retrieving data from the ResultSet, there appear to be two entirely equivalent ways to obtain the data: either by ordinal position within the ResultSet or by column name. Is there any real difference between them, besides perhaps programmer convenience? It turns out that there's a very real difference between them, depending entirely on your JDBC driver implementation. The first release of the JDBC Specification (the version that shipped with JDK 1.1) only mandated that JDBC drivers provide firehose cursor support—that is, once a row or column has been pulled from the cursor, it can never be obtained again. This has huge implications when retrieving data out of the ResultSet, in that if you pull a column's data in anything other than the order in which it was declared in the SQL statement, you'll be skipping past columns that can then never be retrieved:
ResultSet rs =
stmt.executeQuery("SELECT id, first_name, last_name " +
"FROM person");
while (rs.next())
{
String firstName = rs.getString("first_name");
String lastName = rs.getString("last_name");
int id = rs.getInt("id");
// Error! This will throw a SQLException in a JDBC 1.0
// driver
}
Say you've been working with a JDBC 2.0 or better driver up until this point. If somebody deploys your code to use the JDBC-ODBC driver (which is a bad idea from the beginning, since the JDBC-ODBC driver is an unsupported, bug-ridden 1.0 driver that is incredibly slow and is rumored to leak memory in some ODBC driver configurations), suddenly your code will start tossing SQLException instances for no apparent reason. Now, if you are truly concerned about writing portable J2EE code, you need to account for the possibility of a 1.0 driver by ensuring that this particular scenario never occurs; therefore, you need to always retrieve data in its exact order, and the easiest way to do this is to use the ordinal form of the methods:
ResultSet rs =
Stmt.executeQuery("SELECT id, first_name, last_name " +
"FROM person");
while (rs.next())
{
int age = rs.getInt(1);
String firstName = rs.getString(2);
String lastName = rs.getString(3);
}
Although not overly painful to write, this code does have several drawbacks. First, if a programmer (or, potentially worse, a database administrator unfamiliar with this little quirk of the JDBC 1.0 API) ever modifies the SQL statement, the corresponding data-retrieval code in the loop will need to be updated to reflect the change in order of columns, or SQLException instances will start to get thrown left and right. (By the way, it's true that using the ordinal form of the functions is slightly faster than using the column-based form, but changing your code to use the ordinal form purely for performance reasons is a micro-optimization and should be avoided, particularly since you lose the inherent documentation hint that using the column names give. If you find yourself tempted to do so, a large number of other optimizations will likely yield a better return, so bypass this one.) It's not just a simple matter of keeping track of the order of columns (which, by the way, argues against ever issuing a SQL SELECT statement that uses * instead of the individual column names). You need to know whether your driver supports statement batching (see Item 48) and isolation levels (see Item 35), which scalar functions it might support, and so on. Much of this information is available via the DatabaseMetaData class, an instance of which is obtained from a Connection, or to a lesser degree from the ResultSetMetaData class, obtained from a given ResultSet. SQL Performance Tuning [Gulutzan/Pelzer] provides several charts describing the results returned from eight vendor databases (IBM, Informix, Ingres, InterBase, Microsoft SQL Server, MySQL, Oracle, and Sybase), but you'll want to run some tests regardless—you can't be certain the chart will be true for your particular database, even if it's one of the vendors listed, because capabilities can change from one release to the next. Another area of interest to the JDBC programmer is that of thread safety: Is it safe to invoke the driver from multiple threads? Do you need to synchronize on Connection, Statement, or ResultSet objects when accessing them from multiple threads? For example, the JDBC-ODBC driver isn't thread safe, so it's up to you to ensure that the driver is never accessed by more than one thread at a time; otherwise you run the risk of some Seriously Bad Things happening in the ODBC driver underneath your Java code. (Which once again underscores that there's never a good reason to use the JDBC-ODBC driver for production code.) One of the classic performance- and scalability-driven suggestions made in numerous JDBC and J2EE books is the PreparedStatement-over-Statement idea: that you should always prefer to use a PreparedStatement instead of a regular Statement. The argument is simple: because each call out to the database requires a certain amount of work (parsing the SQL, creating a query plan, and running it all through the optimizer, and so forth), it's better to amortize that overhead by keeping those preparations around from one call to the next, assuming the same call will be made again. A PreparedStatement will "prepare" the SQL call once (minus the parameters you want to pass in, since those aren't known yet, so it can't prepare those) and thus you'll get better performance on subsequent calls. While using PreparedStatement is a necessity from a security standpoint (see Item 61), from a performance and/or scalability standpoint, it's not quite so clear. For example, the JDBC Specification states that when a Connection is closed, the corresponding Statement objects associated with that Connection also implicitly close. So when you return a Connection obtained from a connection pool back to the connection pool, does that in turn close the PreparedStatement obtained from that Connection, or will the underlying physical connection remain open, thus leaving the PreparedStatement alive and ready to receive additional requests? What about a CallableStatement, since it inherits from PreparedStatement and therefore should obey the same contract as given for PreparedStatement? Unfortunately, while I could wave my hands here and state that "obviously, REAL drivers would keep the pooled Statement around, only a simpleton or moron wouldn't do this," the fact remains that sometimes you have to use software written by simpletons and morons, and the only way to know for certain whether your database and/or database driver is in that category is to test and find out. This discussion of PreparedStatement takes a sharp left turn, by the way, when discussing some database vendors (most notably PostgreSQL) that cause a PreparedStatement to become unprepared when a transaction end (COMMIT) occurs; that is, the following code [Gulutzan/Pelzer, 336] won't work as expected, despite the JDBC Specification's insistence that it should:
PreparedStatement pstmt =
connection.prepareStatement(". . .");
boolean autocommit =
connection.getAutoCommit(); // Let's assume it's true
pstmt.executeUpdate();
// COMMIT happens automatically, since autocommit == true
pstmt.executeUpdate();
// FAILS! PreparedStatement pstmt is no longer prepared
If that doesn't convince you that you need to know what your JDBC driver does under the hood, then I wish you the best of luck. The easiest way, in many respects, to know what your driver does is to fire up your database's query-execution profiler tool and have a look at what actually happens when the JDBC query executes. In some circles, this is known as exploration testing and is more extensively documented by Stu Halloway (http://www.relevancellc.com); put simply, write a series of JUnit-based tests that exercise your assumptions regarding what happens when you execute a sequence of calls (such as the PreparedStatement code shown earlier), and rerun those tests against a variety of scenarios, such as different JDBC drivers, different vendor databases, or even different versions of the same vendor's database. Far better to be surprised during an exploration test run than to be surprised during deployment into production, at the client site, or in the middle of the night after the database administrators upgrade the database without your knowledge (or consent). |
- Comment