Monday, July 09, 2012

Understanding Email Headers, Part II - The Basics

(I'm assuming that you've read the first installment in this series - if not, go do so - and have examined the raw text of at least one email message.)

Now that you've seen that mess of semi-human-readable spew with which a typical email message is opened, let's go into a bit of detail.  We're going to start with the bare minimum - the headers required in every standards-compliant Internet email message:

From: Wes Morgan <wesmorgan1@nowayjose.com>
Date: Mon, 9 Jul 2012 19:12:28 -0400

Yep, that's it.  If you look at RFC 5322 (specifically, Section 3.6, the table on Page 20), you'll see that only these two headers MUST appear in the typical Internet message.  (As you read RFCs, you'll gain a new appreciation for the differences among MUST, SHOULD, SHOULD NOT, MUST NOT, et al.  There's a reason those terms appear in capital letters.)  So, it's quite possible that you may receive a message with naught but the From and Date headers; relax, it's "legal." 

Obviously, the From header includes the address of the sender of the message (and, optionally, their name); there isn't much more to say about that without getting into a lengthy discussion of acceptable address formats. That's definitely beyond the scope of this article; suffice it to say that I once had an email address of <ukecc!flamtap!wes%ukma@UKCC.uky.edu>...

The Date header has changed in recent years, most notably where time zone information is concerned.  Older standards allowed for abbreviations, such as EST for Eastern Standard Time; today, the standard calls for a 4-digit numeric offset from Coodinated Universal Time (UTC), with a + or - prefix as required.  So, the example above specifies an offset of -4 hours from UTC; that's Eastern Daylight Time in the US.  It should also be noted that the day of the week and seconds are optional; between that and the fact that most mail agents graciously accept the old-style headers as well, you may see some variety in Date: headers.  (IMPORTANT NOTE: This is NOT the date/time the message was delivered, but rather the date/time when the sender put the message in its final form - in other words, when they hit "Send.")

Now that we've covered the two required headers, let's talk about those which, if they appear, should only show up once in a given email message.  We'll start with the obvious:

To: Wes Morgan <wesmorgan1@nowayjose.com>
CC: <wesmorgan@hewentthataway.org>, <mybuddyfred@foobarbaz.com>
Subject: RE: testing funky name text in headers
Message-ID: <SNT124-W841AE084F66FD2255A6D287D20@phx.gbl>

These are all fairly self-explanatory, with the possible exception of Message-ID.  All "well-behaved" mail agents insert a Message-ID header, which is supposed to look like "messageidentifier@sitename"; however, you'll notice that the example above uses "phx.gbl", which isn't a meaningful sitename at all.  That's because this is from a Hotmail message; for some internal reason, Microsoft uses "phx.gbl" in its Message-IDs.  Moral of the story?  Once again, sometimes strange-looking stuff can be OK.

The sharp-eyed among you are probably thinking, "Wait a second - where's the Bcc header?"  Well, the answer is simple.  Bcc stands for "blind carbon copy", so while that header IS passed between mail transfer agents as needed, it is removed (if present) before the message is placed in your mailbox.  (If you're really interested, run a network analyzer (like Wireshark) against an unencrypted SMTP service; you'll probably catch a few Bcc headers in the data flow.)

So, what's left?  Well, if you send (or receive) a reply to an earlier email message, a few more headers make an appearance in the reply:

In-Reply-To: <SNT124-W2664003139C1E520CF4F6787D30@phx.gbl>
References: <SNT124-W2664003139C1E520CF4F6787D30@phx.gbl>

Did you notice?  The values of the In-Reply-To and References headers are taken from the Message-ID header of the original!  Ah, but what happens if I "reply to the reply"?   Well, my message gets its own unique Message-ID, of course...and the Message-ID of the message to which I'm replying goes in my In-Reply-To header (which usually has only one Message-ID)...but that In-Reply-To Message-ID is also APPENDED to the References header.  So, in a lengthy back-and-forth, you might see headers that look like this:

In-Reply-To: <4BE8776D.4080504@kheb.fr>
References: <AANLkTik0c9hCMm2Efyj7rB7Us7hL3ZdESYEhE2GBQCfM@mail.gmail.com>
<20100509165117.GD20976@ovh.net>
<AANLkTins_dUSqRbR371SNnOIPYlatKdTCIVM8oDbtVtX@mail.gmail.com>
<AANLkTimKu7l1AtEG-0CI7Q3Ely9PUL2yuyvYuhcMIuSn@mail.gmail.com>
<AANLkTikJaHXYM_DF8zqdaH0vVnJ-fCpqvJ3OCQweoeAb@mail.gmail.com>
<AANLkTinq7riyV4w3VnCoHyj9GsNf3H7jDzu1awU2PNRb@mail.gmail.com>
<AANLkTilrPTCZPj_Tb7bXO5SLym_QY3KUp4J1jkZd5-ZE@mail.gmail.com>
<4BE72FF3.3030501@kheb.fr> <4BE7B451.8060700@linuxant.fr>
<4BE8776D.4080504@kheb.fr>
From: XXX <xxx@xxx.xx>
Date: Mon, 10 May 2010 23:20:59 +0200
Message-ID: <AANLkTikC5oN2rO5VTj8HN7U03b2H3HUqt89KYdemGlcJ@mail.gmail.com>

(Notice that this Message-ID isn't in the References header - because no one has yet referenced this message with a reply!)

Surprise - you've just learned how "threaded discussion" email clients work.  Basically, they can look at any message in the thread, grab its References header, and go find the other messages in your mailbox.

On occasion, someone wants to direct replies to a different address than that specified in the From header.  While this can be used by individuals (if their mail client allows it), we most often see it in conjunction with mailing lists.  Thus, we have Reply-To headers like this one:

Reply-To: bighuge-list@listhostsite.org

Finally, there's a "once and only once" header that occasionally makes an appearance in your mail messages:

Sender: Wes Morgan <wesmorgan1@nowayjose.com>

This one should only show up if/when the sender of the message does NOT agree with the address specified in the From header; in other words, someone/something is sending the message on behalf of the original author.  For instance, you'll often see this in mailing list messages, like so:

From: Wes Morgan <wesmorgan1@nowayjose.com> 
Sender: Big Huge Mailing list <bighuge-list@listhostsite.org>

You may also see the Sender header when a mail client allows delegation, as in "Wes can send email in Steve's name," so keep an eye out for that...

So, to recap:

  • Every message must include the Date and From headers.
  • To, Cc, Subject, and Message-ID are NOT required by the RFCs.
  • Bcc headers do exist, but only "in transit" - they're removed before the message lands in your mailbox.
  • The presence of In-Reply-To and References headers indicate that the message is a reply to a previous message.
    • The References header makes "threaded mail reading" and per-discussion archival possible.
    • The References header will grow in size with each reply in a series of messages.
  • The Sender header usually indicates that the message is being delivered by one person/party on behalf of another, as seen with a mailing list or delegated authority.
  • Reply-To directs replies to an address other than that specified in the From header.
  • If you see more than one instance of any of these headers in a single email message, something goofy is going on.

Those are the 11 basic headers of Internet email.  Next, we'll start talking about the common headers that can show up multiple times...and what we can learn from them.

Post a Comment