April 11, 2010, 9:36 p.m.
posted by nogood
Item 11. The Compiler Puts Stuff in ClassesC programmers are used to knowing everything there is to know about the internal structure and layout of their structs, and they have a habit of writing code that depends on a particular layout. Java programmers are used to programming in ignorance of the structural layout of their objects and sometimes assume their days of ignorance are over when they start programming in C++. In fact, safe and portable coding practices in C++ do require a certain level of healthful agnosticism about the structure and layout of class objects. With a class, it's not always "what you see is what you get." For example, most C++ programmers know that if a class declares one or more virtual functions, then the compiler will insert in each object of that class a pointer to a virtual function table. (Actually, we are guaranteed no such thing by the standard, but that's the way all existing C++ compilers implement virtual functions.) However, C++ programmers on that dangerous verge between competence and expertise often assume that the location of the virtual function table pointer is the same from platform to platform and write code based on that assumptiona deadly error. Some compilers will place the pointer at the start of the object, some will place it at the end of the object, and, if multiple inheritance is involved, several virtual function table pointers may be scattered throughout an object. Never ass|u|me. That's not the end of it, either. If you use virtual inheritance, your objects may keep track of the locations of their virtual base subobjects with embedded pointers, embedded offsets, or nonembedded information. You might end up with a virtual function table pointer even if your class has no virtual functions! Did I mention that the compiler is also allowed to rearrange your data members in limited ways, no matter the order in which they were declared? Is there no end to this madness? There is. If it's important to have a class type that is guaranteed to be like a C struct, you can define a POD (standardese for "plain old data"). Certainly, built-in types such as int, double, and the like are PODs, but a C struct or union-like declaration is also a POD.
struct S { // a POD-struct
int a;
double b;
};
Such PODs can be as safely manipulated as the corresponding C constructs (how safe that is is as questionable in C++ as it is in C). However, if you plan to get low level with your POD, it's important that it remain a POD under maintenance, or all bets are off:
struct S { // no longer a POD-struct!
int a;
double b;
private:
std::string c; // some maintenance
};
If you're not willing to deal only with PODs, what does all this meddling by the compiler imply about how you should manipulate class objects? It implies that you should manipulate class objects at a high level rather than as collections of bits. A higher-level operation will do "the same thing" from platform to platform, but it may accomplish it differently. For example, if you want to copy a class object, never use a block copy such as the standard memcpy function or the hand-coded equivalent, because that's for copying storage, not objects (Placement New [35, 119] discusses this difference). Rather, you should use the object's own initialization or assignment operations. An object's constructor is where the compiler sets up the hidden mechanism that implements the object's virtual functions, and the like. Simply blasting a bunch of bits into uninitialized storage may not do the right thing. Similarly, a copy of one object to another must take care not to overwrite these internal class mechanisms. For example, assignment never changes the value of an object's virtual function table pointers; they're set by the constructor and never changed for the life of the object. A bit blast may well destroy that delicate internal structure. (See also Copy Operations [13, 45].) Another common problem is the assumption that a particular member of a class is resident at a given offset within an object. In particular, it's not uncommon for overly clever code to assume either that the virtual function table pointer is at offset zero (that is, it's the first thing in the class) or that the first declared data member is at offset zero. Neither of these assumptions is correct more than about half the time, and both are (of course) never correct simultaneously.
struct T { // not a POD
int a_; // offset of a_ unknown
virtual void f(); // offset of vptr unknown
};
I'm not going to continue in this vein, because such a list of no-no's would be long, tedious, and possibly tasteless. But the next time you find yourself making low-level assumptions about the internal structure of your classes, pause, reflect, and get your mind out of the gutter! |
- Comment