Feb. 2, 2010, 5:32 a.m.
posted by vendetta
Mail Attachments
You may have noticed that the Body property allows you only to enter text, and a separate property is used to include file attachments. This arrangement is the result of how SMTP works and how mail servers must handle binary data.
SMTP transfers only text messages between systems on the Internet. To compensate for this, mail clients must convert any binary data (such as file attachments) into an ASCII text message before passing the message to the mail server. Then, of course, the recipient’s mail client must be able to convert the text message back into the original binary data.
There are two popular techniques that can convert binary data into text: the uuencode format and the Multipurpose Internet Mail Extensions (MIME) format. Most mail systems allow you to use either technique, although there are still some that require a particular format.
uuencode
Many years before the Internet became popular, Unix administrators were sending binary data across modem lines by converting it to ASCII text and embedding it in mail messages. The program they used to convert binary data into ASCII text was called uuencode. The uu stands for Unix-to-Unix, part of the Unix-to-Unix Copy Protocol (UUCP) that’s used to send messages between Unix hosts via modem.
The uuencode program uses a 3-to-4 encoding scheme, in which 3 bytes of binary data are converted to 4 bytes of ASCII characters. This scheme significantly increases the size of the converted file but ensures that the encoded file can be safely decoded back into the original binary data.
Because of its popularity, many mail systems still support the uuencode method of encoding binary data. However, a newer Internet standard for encoding binary data has been developed: MIME.
MIME
The MIME format (Multipurpose Internet Mail Extensions) is defined in RFCs 2045 and 2046. MIME is more versatile than uuencode in that it includes additional information about the binary file within the converted file. The decoder can thus automatically detect and decode various types of binary files.
The MIME standard also provides a way for the MIME-encoded binary data to be directly incorporated into a standard RFC2822 message. Five new header fields (see Figure) were defined to identify binary data types embedded in the mail message. E-mail clients that can handle MIME messages must be able to process these five new header fields. The fields are added immediately after the RFC2822 header fields, and before the message body. Any MIME attachments are added after the message body, as illustrated in Figure.
|
Field |
Description |
|---|---|
|
MIME-Version |
Specifies the version of MIME used in the encoding |
|
Content-Transfer-Encoding |
Specifies the encoding scheme used to encode the binary data into ASCII |
|
Content-ID |
Specifies a unique identifier for the message section |
|
Content-Description |
A short description identifying the message section |
|
Content-Type |
Specifies the type of content contained in the encoded data |
The MIME-Version Field
The MIME-Version field identifies the MIME encoding version that the sender used to encode the message:
MIME-Version: 1.0
Alternatively, some software packages add text after the version number to identify additional vendor version information:
MIME-Version: 1.0 (software test 2.3a)
The MIME decoding software of the receiving mail server ignores the additional text.
The Content-Transfer-Encoding Field
The Content-Transfer-Encoding field identifies how the binary data in the message is encoded. There are currently seven methods defined for MIME, listed in Figure. Note that the first three methods define no encoding of the data. The 7-bit encoding method assumes that the encoded data is already 7-bit ASCII text characters, not binary data. This is the default used if no Content-Transfer-Encoding field is present.
|
Method |
Description |
|---|---|
|
7-bit |
Standard 7-bit ASCII text |
|
8-bit |
Standard 8-bit ASCII text |
|
binary |
Raw binary data |
|
quoted-printable |
Encodes binary data to printable characters in the U.S.-ASCII character set |
|
base64 |
Encodes 6 bits of binary data into an 8-bit printable character |
|
ietf-token |
Extension token encoding defined in RFC 2045 |
|
x-token |
Two characters, X- or x-, followed (with no intervening space) by any token |
Base64 is the most common method used for encoding binary data. This scheme encodes binary data by mapping 6-bit blocks of binary data to an 8-bit byte of ASCII text. There is less “wasted space” in the encoded file than with the uuencode method, and it often results in a smaller encoded file.
The Content-ID Field
The Content-ID field identifies MIME sections with a unique identification code. One MIME content section can refer to another MIME message by using this unique field value.
The Content-Description Field
The Content-Description field is an ASCII text description of the data to help identify it in the e-mail message. The text can be any ASCII text of any length.
The Content-Type Field
The Content-Type field is where all the action is. This field identifies the data enclosed in the MIME message. Two separate values, a type and a subtype, identify the data. Here’s the field format:
Content-Type: type/subtype
Following are descriptions of the seven basic types of Content-Type identified in MIME:
text The text Content-Type identifies data that is in ASCII text format. The subtypes for the text Content-Type can be in one of three formats:
plain For unformatted ASCII text
html For text formatted with HTML tags
enriched For text formatted with rich text format (RTF) tags
The text Content-Type also specifies the character set used to encode the data with the charset parameter:
Content-Type: text/plain; charset=us-asciiThis line identifies the MIME section as being plain text, using the U.S. ASCII encoding system.
message The message Content-Type identifies multiple RFC2822-formatted messages contained within a single message. It has three subtypes:
rfc822 Specifies a normal embedded RFC 822-formatted message
partial Specifies one section of a long message that was broken up into separate sections
external-body Specifies a pointer to an object that is not within the e-mail message
image The image Content-Type defines embedded binary data that represents a graphic image. Currently two subtypes are defined: the JPEG format and the GIF format.
video The video Content-Type defines embedded binary data that represents video data. The only subtype defined is the MPEG format.
audio The audio Content-Type defines embedded binary data that represents audio data. The only subtype for this is the basic format, which defines a single-channel Integrated Services Digital Network (ISDN) mu-law encoding at an 8KHz sample rate.
application The application Content-Type identifies embedded binary data that represents application data, such as spreadsheets, word processing documents, and other applications. There are two formal subtypes defined: the postscript format for Postscript-formatted print documents, and the octet-stream format, which defines messages containing arbitrary binary data. The octet-stream subtype represents most application-specific data, such as Microsoft Word documents and Microsoft Excel spreadsheets.
multipart The multipart Content-Type is a special type. It identifies messages that contain multiple data content types combined into one message. This format is common in e-mail packages that can present a message in a variety of ways, such as plain ASCII text and HTML, or in messages that contain multiple attachments. There are four subtypes used:
Mixed Specifies that each of the separate parts are independent of one another and should all be presented to the end customer in the order they are received.
Parallel Specifies that each of the separate parts are independent of one another but can be presented to the end customer in any order.
Alternative Specifies that each of the separate parts represents different ways of presenting the same information. Only one part should be presented to the end customer.
Digest Identifies the same method as the mixed subtype but specifies that the body of the message is always in RFC822-format.
| Note |
There are lots more Content-Types in addition to the basic seven shown here. Many e-mail packages even define their own types. As long as the mail server and the mail client understand the ContentType, it can be used. Be careful, though, when using non-standard Content-Types: other mail client packages may not recognize them. |
