May 4, 2009, 8:15 p.m.
posted by oxy
Item 71: Understand Java Object SerializationJava Object Serialization is a wonderful thing. It allows a Java programmer to take an object and reduce it to a stream of bytes just by implementing the java.io.Serializable interface and passing the object to the writeObject method of ObjectOutputStream. Rebuilding the object back from the bytestream is similarly simple: just call readObject on an ObjectInputStream wrapped around the bytestream. As an example of how wonderful Serialization really is, consider the following:
import java.io.*;
import java.util.*;
class Person implements Serializable
{
public String name;
public Person spouse;
public ArrayList children = new ArrayList();
}
public class Serial
{
public Serial()
{ }
public static void main(String[] args)
throws Exception
{
if (args[0].equals("write"))
{
Person youssef = new Person();
youssef.name= "Youssef";
Person sheryl = new Person();
sheryl.name = "Sheryl";
youssef.spouse= sheryl;
sheryl.spouse= youssef;
Person child1 = new Person();
child1.name = "Johnny";
Person child2 = new Person();
child2.name = "Mike";
youssef.children.add(child1);
youssef.children.add(child2);
sheryl.children = youssef.children;
FileOutputStream fos = new FileOutputStream("people.ser");
ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.writeObject(youssef);
fos.close();
}
else
{
FileInputStream fis = new FileInputStream("people.ser");
ObjectInputStream ois = new ObjectInputStream(fis);
Person youssef = (Person)ois.readObject();
System.out.println(youssef.name);
System.out.println(youssef.spouse.name);
System.out.println(((Person)youssef.children.get(0))
name);
System.out.println(((Person)youssef.children.get(1))
name);
System.out.println(((Person)youssef.spouse.children
get(0)).name);
System.out.println(((Person)youssef.spouse.children
get(1)).name);
}
}
}
The thing to take careful note of here is that the single call to writeObject follows the entire object graph: when youssef is serialized, so are sheryl, johnny, and mike. Serialization also ensures that each object is deserialized only once, even though the same object may be referenced multiple times throughout the graph. Thus, even though youssef refers to sheryl, who in turn refers back to youssef, each object appears only once, just as it does in the actual object graph. You get all this without having to write a line of code to support it. Still, when compared to the automatic Serialization capabilities of entity beans or JDO, you may be wondering why as a J2EE programmer you need to worry at all about Serialization. After all, you're not storing data in files, you're storing data in a relational database. So why bother with Serialization? Because, like it or not, you're using it a great deal. Serialization is a cornerstone on which other parts of J2EE are built. For example, RMI-over-JRMP uses Serialization as its marshaling framework for making remote calls (see Chapter 3). Servlet containers often use Serialization to store session state to disk when the containers shut down, so they can restore those active sessions when restarting, thus preserving the illusion that the container was never down. Many EJB containers use Serialization to passivate beans to disk between remote (or local) calls. JMS uses Serialization to support the ObjectMessage. As a result, developers need to be aware of the ins and outs of the Serialization Specification in order to avoid being "surprised" by some subtle behavior of serialized object data. For example, one side effect of using Serialization as RMI's marshaling framework is that because Serialization is not concerned with confidentiality, all parameters passed through RMI are done in virtually clear-text fashion. If confidentiality is a concern for an EJB-based system, this is a problem. Unfortunately, the EJB Specification offers no outlines or rules for providing confidentiality between the RMI stubs and the EJB container in a portable fashion; there's no way to customize the channel (socket) used by the RMI plumbing to make the communication between client and server. By taking control over how objects are serialized, however, individual parameter types can be massaged to encrypt (or at least obfuscate) their sensitive data when passed across during an RMI call. Please note that although many EJB containers may use Serialization for passivation, the EJB Specification does not mandate it, and therefore some containers may not do so in favor of a nonstandardized mechanism. Ideally, such a container would document the mechanism it uses and the reasoning behind it and would provide similar kinds of hook points (see Item 6) for modification. Serialization has a number of hook points that you can (and should, in some cases) take advantage of. The serialVerUID fieldThe first element of Serialization that every Java programmer should understand is the serialVerUID field. When an object is serialized, Java Object Serialization calculates a hash of the entire class based on an exhaustive variety of metadata about the class—the fields in the class, their access scope, their types, the methods of the class, their parameters, the base class, any implemented interfaces, and so forth. This yields an almost-unique value that will be compared at deserialization to ensure that only the same kind of object is deserialized as serialized. Java classes can precalculate this hash and store the precalculated value in a private static long field called serialVerUID. During Serialization, if such a field exists, this value will be used rather than calculating the hash. (It's presumed that the value of this field was precalculated using the JDK serialver utility, rather than selecting values at random.) While optional for basic Serialization, if a Serialized object evolves, calculating this value is crucial to supporting deserialization of "old" objects into "new" types. Even for classes that don't require evolutionary support, however, you can obtain a small performance enhancement by precalculating this hash anyway, thereby saving the necessary runtime CPU cycles to generate it. Customization (writeObject and readObject)The simplest way to provide Serialization support and/or customization is to write private writeObject and readObject methods on the class, each taking a stream argument, ObjectOutputStream and ObjectInputStream, respectively. These methods, if present, will be invoked when serialization (writeObject) or deserialization (readObject) takes place. Within the body of these methods, developers can either take complete control over the serialization of the contents of the class or call the stream's defaultWriteObject or defaultReadObject method to perform default serialization, then tweak from there. So, returning to the earlier Person example, let's add a totalWorth field to track the individual's total net worth. We'd like to keep that value a secret, so we can obfuscate it before serialization and recalculate it on deserialization:
import java.io.*;
import java.util.*;
class Person implements Serializable
{
public String name;
public Person spouse;
public ArrayList children = new ArrayList();
public double totalWorth;
private void writeObject(ObjectOutputStream oos)
throws IOException
{
totalWorth = obfuscateValue(totalWorth);
oos.defaultWriteObject();
}
private void readObject(ObjectInputStream ois)
throws IOException
{
ois.defaultReadObject();
totalWorth = deobfuscateValue(totalWorth);
}
private static double obfuscateValue(double originalValue)
{ return originalValue * 2 – 1; } // Imagine your algorithm
// here
private static double deobfuscateValue(double hiddenValue)
{ return (hiddenValue - 1) / 2; } // Imagine your algorithm
// here
}
Obviously in production code we would want a stronger algorithm than the one shown, but it serves our purposes here. Now, when serialized, a Person instance's totalWorth will be twisted to hide its original value and then untwisted to its original value when deserialized. Replacement (writeReplace and readResolve)At times, simply massaging the data serialized for a given class isn't enough—class evolution or security reasons, for example, sometimes mandate a more drastic measure, that of nominating a different class type for serialization and/or deserialization. This is handled by providing writeReplace and readResolve methods on the class being serialized. For example, consider the CreditCard class used as part of an online system for e-commerce. Naturally, we want to ensure that the credit card number is sent in encrypted form from the user's Web browser to our receiving servlet, but if we're claiming that the credit card number is truly secure, we also need to ensure it is encrypted across the wire between the servlet and the EJB container (never trust the network, even inside the firewall—see Item 60). Toward that end, we can write the CreditCard class to nominate an encrypted replacement, EncryptedCreditCard, to be sent across the wire, and EncryptedCreditCard can nominate an original CreditCard instance when deserialized on the receiving side:
class CreditCard implements java.io.Serializable
{
public CreditCard(Date expiration, String number)
{
System.out.println("CreditCard.<init>");
this.expiration = expiration; this.number = number;
}
public Date expiration;
public String number;
private Object writeReplace()
throws java.io.ObjectStreamException
{
System.out.println("CreditCard.writeReplace()");
return new EncryptedCreditCard(expiration, number);
}
public String toString()
{
return "CreditCard: " + number + " (" + expiration + ")";
}
}
class EncryptedCreditCard implements java.io.Serializable
{
private Date expiration;
private String encryptedNumber;
public EncryptedCreditCard(Date exp, String number)
{
expiration = exp;
encryptedNumber = encryptCreditCardNumber(number);
}
public Object readResolve()
throws java.io.ObjectStreamException
{
return new CreditCard(expiration,
decryptCreditCardNumber(encryptedNumber));
}
private String encryptCreditCardNumber(String num) { ... }
private String decryptCreditCardNumber(String num) { ... }
}
The nice thing about this approach is that when working with CreditCard, programmers can ignore the needs of security, at least when transmitting CreditCard instances across the open network wire. In fact, you can design and implement the entire system up front to work entirely with "open" CreditCard instances, then later add the writeReplace/readResolve logic when confidentiality of these objects during serialization becomes an issue. (How to encrypt the data to prevent easy observation is another matter entirely, one beyond the scope of this discussion; see Item 65 for details, or take a look at Java Security [Oaks] for more complete discussions on this subject.) Further detailsYou can find more information on Java Object Serialization (including discussion of the Serializable fields API and other Serialization functionality) in the standard Java2 SDK documentation bundle or in the books Component Development for the Java Platform [Halloway] or Server-Based Java Programming [Neward]. You should also have a look at Effective Java [Bloch]; in particular, look at Item 54, in which the author points out that marking a class Serializable means you are silently introducing a new constructor on the class, one that takes a byte array as its sole argument. If your class needs to maintain invariants as part of its behavior, make sure the invariants are checked when constructing an object from deserialization, also. Understanding Serialization has benefits that go beyond just understanding how to use it to work with J2EE; frequently, developers look for ways to store objects or other sorts of data, and Serialization is tailor-made for that sort of thing. Objects can be serialized and stored in BLOB columns in tables, objects can be serialized and sent as the content body of an HTTP request or response, and so on. In fact, one convenient way to store user preferences (if you don't want to or can't make use of the Preferences API) is to put the preferences into a HashMap and serialize the HashMap and its contents. The point is, without understanding how Serialization itself works, you'll end up making decisions—like running all of your RMI traffic over SSL just to protect a few fields—that will hurt you in the long run. |
- Comment