Introduction to IMAP

Down the IMAP hole we go!

Introduction

This will be a detailed, though not exhaustive, quickstart into using IMAP. Initially this was also going to highlight the python library, imaplib, but the post was became too long! Maybe next time.

The hope is that this’ll contain enough information about querying email servers that additional questions would most likely be redirected to the spec or subsequent specs linked to in the post. I’m also interested in plenty of examples so my 2am self doesn’t need to overwork itself coming up with queries.

Saying that, when the need to programmatically read email, one should always reach for IMAP, which is so easy to get started with; you only need a terminal with telnet and/or openssl. Since the telnet version will mean insecure communications, I’ll only show examples with openssl.

The basic command to remember is

# For gmail: HOST=imap.gmail.com
# For outlook/hotmail: HOST=imap-mail.outlook.com
openssl s_client -connect ${HOST}:${PORT:-993} -quiet -crlf

# Alternatively (not as powerful because curl can't support
# multiple commands per invocation):
# https://curl.haxx.se/mail/archive-2013-12/0022.html
curl --url "imaps://<host>" --user <user> --request <command>

Gmail Caveat

By default, Gmail is cautious about any unknown client trying to access your email, which, in general, is a good philosophy. Gmail will refuse us connection, stating that we need to log in through the web interface. Officially, we have to use OAuth as detailed in GMail’s IMAP and SMTP documentation, but covering this usage would consume another post and inhibits us from learning! To temporarily disable this feature, Google has an article explaining the background and will lead to the “Less secure apps” option in your account. If you do decide to disable the feature, then it’d probably be wise to enable it after class is out of session.

For me, I have a hotmail account that I can play with no such restrictions.

Time for some learning

For any serious IMAP commandline usage, I recommend using rlwrap, which will allow for command completion and history. With IMAP, I invoke rlwrap like the following, which will ensure that IMAP login commands (which contain your password) aren’t leaked in the history:

rlwrap -g LOGIN openssl s_client -connect <host>:<port> -quiet -crlf

The baptism by fire example will:

  • Login with your email and password
  • List all the folders in the inbox
  • Select the “Inbox” folder
  • Search for all the emails from [email protected]
  • Pick out just the date when the email was received

The example will be broken down further.

* OK Outlook.com IMAP4rev1 server version 17.4.0.0 ready (BAY451-IMAP411)
1 LOGIN <email> <password>
* CAPABILITY IMAP4rev1 CHILDREN ID NAMESPACE UIDPLUS UNSELECT
1 OK <email> authenticated successfully
2 LSUB "" "*"
* LSUB (\HasNoChildren) "/" "Inbox"
* LSUB (\HasNoChildren \Trash) "/" "Deleted"
* LSUB (\HasNoChildren \Sent) "/" "Sent"
* LSUB (\HasNoChildren \Drafts) "/" "Drafts"
* LSUB (\HasNoChildren \Junk) "/" "Junk"
* LSUB (\HasNoChildren) "/" "nbsoftsolutions"
2 OK LSUB completed
3 SELECT "Inbox"
* FLAGS (\Answered \Flagged \Deleted \Seen \Draft $Forwarded)
* 6951 EXISTS
* 0 RECENT
* OK [UIDVALIDITY 105499179] UIDs valid
* OK [UIDNEXT 106952] Predicted next UID
* OK [PERMANENTFLAGS (\Answered \Flagged \Deleted \Seen \Draft)] Limited
3 OK [READ-WRITE] SELECT completed.
4 SEARCH FROM [email protected]
* SEARCH 1789 1791
4 OK SEARCH Completed
5 FETCH 1789,1791 (BODY[HEADER.FIELDS (DATE)])
* 1789 FETCH (FLAGS (\Seen) BODY[HEADER.FIELDS ("DATE")] {41}
Date: Thu, 27 Jun 2013 08:29:16 -0700

)
* 1791 FETCH (FLAGS (\Seen) BODY[HEADER.FIELDS ("DATE")] {41}
Date: Thu, 27 Jun 2013 13:06:36 -0700

)
5 OK FETCH Completed

If restricting the example to just commands I sent, it would boil down:

1 LOGIN <email> <password>
2 LSUB "" "*"
3 SELECT "Inbox"
4 SEARCH FROM [email protected]
5 FETCH 1789,1791 (BODY[HEADER.FIELDS (DATE)])

Notice the incrementing numbers? Yes, I manually typed those in, and no it doesn’t matter what is you type. Many people use ? or gibberish but incrementing numbers is good practice. The point is, they’re used for identifying commands, so it’s probably a good idea to make them unique!

From the spec:

The client command begins an operation. Each client command is prefixed with an identifier (typically a short alphanumeric string, e.g., A0001, A0002, etc.) called a “tag”. A different tag is generated by the client for each command.

And the reason why we use SSL is because of the LOGIN command:

LOGIN command uses a traditional user name and plaintext password pair and has no means of establishing privacy protection or integrity checking.

You may have noticed that some lines terminate with {n}. n is the number of bytes remaining left to the response. This is important because IMAP is a line based protocol, and {n} is the way to transmit data that spans multiple lines. So the {41} means that there are 41 additional bytes to the message.

By selecting the “Inbox” folder, I’m saying that I want to read and write (manipulate) the folder.

Examine vs Select

Prefer EXAMIME over SELECT for read-only behavior

The EXAMINE command is identical to SELECT and returns the same output; however, the selected mailbox is identified as read-only. No changes to the permanent state of the mailbox, including per-user state, are permitted; in particular, EXAMINE MUST NOT cause messages to lose the \Recent flag.

Why do we care about this the \Recent flag? What is it even for?

Message is “recently” arrived in this mailbox. This session is the first session to have been notified about this message; if the session is read-write, subsequent sessions will not see \Recent set for this message. This flag can not be altered by the client.

If there are multiple clients connecting to the same inbox, and if any of these clients rely on the \Recent flag, these clients may not be notified of new messages if one uses SELECT over EXAMINE

In reality, a client that relies on the \Recent flag being present is making too big of an assumption and should use other polling mechanisms such as searching for unread messages. I would still recommend using EXAMINE because it’s better to treat an object as immutable when the opportunity arises. A good example of this is the \Seen flag (a message without this will appear bold in your inbox, ie. unread), as when issuing a FETCH command, there are some parts of a message, that if retrieved, will implicitly mark the the message as read. This can have disastrous affects if someone is expecting unread messages to truly be unread, and an EXAMINE command may avoid this problem. The spec does not require this exact immutability behavior so check with the IMAP server before commiting.

Message Sequence Numbers vs UID

Each message in IMAP has two numbers: it’s message sequence number, and it’s unique identifier. The unique identifier is pretty self-explanatory, with a few caveats following, and the message sequence number is the relative position from the oldest message in the folder. If messages are deleted, sequence numbers are reordered to fill any gaps. As can be imagined this a source of a lot mistakes because if you’re looping through a list of message sequence numbers ascendingly, deleting messages as you go, you’ll end up deleting the wrong messages. The imaplib highlights this problem:

After an EXPUNGE command performs deletions the remaining messages are renumbered, so it is highly advisable to use UIDs instead, with the UID command.

So how does one use the UID command? Surprisingly easy. Take whatever command you were going to execute and prefix it with UID. We’ll modify the example earlier to use UIDs instead.

1 LOGIN <email> <password>
2 LSUB "" "*"
3 SELECT "Inbox"
4 UID SEARCH FROM [email protected]
5 UID FETCH 101789,101791 (BODY[HEADER.FIELDS (DATE)])

The one caveat with UIDs is that, while they’re not supposed to change, the spec allows for some wiggle room:

The unique identifier of a message MUST NOT change during the session, and SHOULD NOT change between sessions.

One can tell if the UIDs have changed by looking at the UIDVALIDITY response when examining an inbox. If the number has changed from the last time then UIDs gather previously may be worthless. However, I believe in practice this does not happen because too many applications would break. The spec strongly suggests that:

The combination of mailbox name, UIDVALIDITY, and UID must refer to a single immutable message on that server forever.

As a result I would keep this in the back of your mind when sharing UIDs across connections (either concurrent connections or sequential). RFC4549, Synchronization Operations for Disconnected IMAP4 Clients contains several good quotes about this situation.

if UIDVALIDITY value returned by the server differs, the client MUST

  • remove any pending “actions” that refer to UIDs in that mailbox and consider them failed

And dovecot, probably the open source IMAP server states:

[UIDVALIDITY] shouldn’t normally change, because if it does it means that client has to download all the messages for the mailbox again.

Even single threaded implementations may get into a sticky situation when moving a message, as only part of the operation may complete because moving is composed of a COPY + STORE (unless your IMAP supports the MOVE command) and the connection may be disconnected after the COPY completes but before the STORE finishes. The RFC writes that the UIDPLUS extension alleviates accidentally downloading the message twice.

The one advantage of message sequence numbers over UIDs is that math can be done with the sequence numbers (eg. messages 1:10 means there are a total of 10 messages in the set. Seems like a small advantage, but some people like it.

Search Examples

Find all messages in an inbox

? SEARCH ALL

Find messages with a flag set

? SEARCH ANSWERED
? SEARCH DELETED
? SEARCH DRAFT
? SEARCH FLAGGED
? SEARCH SEEN
? SEARCH RECENT

Date searching. The first three examples use the RFC-2822 Date header while the last three use the internal date. A message’s internal date “reflects when the message was received” whereas the Date header is for “specifing the date and time at which the creator of the message indicated that the message was complete and ready to enter the mail delivery system”. Testing as shown that querying on the internal date (the last three examples) is two orders of magnitude faster, and the message date and the internal date should be close if not equivalent.

The intervals specified are inclusive, so SINCE 12-Mar-2016 includes the messages received on March 12th.

? SEARCH SENTBEFORE 12-Mar-2016
? SEARCH SENTON 12-Mar-2016
? SEARCH SENTSINCE 12-Mar-2016

? SEARCH SINCE 12-Mar-2016
? SEARCH ON 12-Mar-2016
? SEARCH BEFORE 12-Mar-2016

It is also possible to use the WITHIN Search Extension that defines the two search keys, OLDER and YOUNGER; representing the number of seconds from the server’s current time to fetch messages. The examples query messages that are younger or older than an hour.

? SEARCH YOUNGER 3600
? SEARCH OLDER 3600

Query on message properties. “A message matches the key if the string is a substring of the field. The matching is case-insensitive.”

? SEARCH TO [email protected]
? SEARCH FROM [email protected]
? SEARCH CC [email protected]
? SEARCH BCC [email protected]
? SEARCH BODY github
? SEARCH HEADER RECEIVED foo

Composing multiple search criteria. The only thing special is that the operators are written in Polish notation:

? SEARCH FROM [email protected] SINCE 12-Mar-2016
? SEARCH OR FROM [email protected] FROM [email protected]
? SEARCH OR (FROM [email protected]) (FROM [email protected])
? SEARCH OR OR FROM [email protected] FROM [email protected] FROM [email protected]
? SEARCH OR (FROM [email protected] SINCE 12-Mar-2016) FROM [email protected]
? SEARCH NOT (OR (FROM [email protected]) (BEFORE 12-Mar-2016))
? SEARCH NOT SEEN
? SEARCH UNSEEN

And to retrieve message UIDs you can prefix the search command with UID

? UID SEARCH SINCE 12-Mar-2016
? UID SEARCH OR FROM [email protected] FROM [email protected]
? UID SEARCH TO [email protected]

Searching can also be done on UIDs. Keep in mind the last example may be a good strategy a for mailbox listener to process all the UIDs after the last seen and any unseen messages.

? UID SEARCH UID 1:*
? SEARCH UID 1:*
? UID SEARCH OR (UID 1:*) (UNSEEN)

ESEARCH Examples

If the server supports the ESEARCH extension, a few more possibilities open up:

Count the number of UNSEEN messages and return the first message/UID.

? SEARCH RETURN (MIN COUNT) UNSEEN
? UID SEARCH RETURN (MIN COUNT) UNSEEN

The ESEARCH extensions can also condense message sets to cut down on transmission costs. It’s better to receive 8 bytes of 1:300000 than the ~1.5MB if the message ids were written individually.

? SEARCH RETURN () UNSEEN
? SEARCH RETURN (ALL) UNSEEN

SEARCHRES Examples

Remembering what messages were returned when doing a SEARCH can require unnecessary work, getting the message ids and then parsing them. rfc5182, Referencing the Last SEARCH Result, allows the result set of a SEARCH to be saved and refereced in a subsequent command as $. The documentation for the extensions already contains numerous examples, so I’ll copy and reduce them.

Find all the messages from github and then retrieve some metadata about those messages.

? SEARCH RETURN (SAVE) FROM [email protected]
? FETCH $ (UID INTERNALDATE FLAGS)

More cool ways to use $

? SEARCH (OR $ 1,3000:3021)
? MOVE $ "Other Messages"

To see how SEARCHRES interacts with ESEARCH, check out the rfc.

Fetch Examples

For the fetch examples, I’ll be using .PEEK where I can so that these examples won’t implicitly mark the message as being seen. In my opinion the only way a message should be marked as seen is if an explicit command does sets the flag (but I didn’t write the spec, oh well!)

Fetch the contents of the email message:

? FETCH 1 BODY.PEEK[TEXT]

Fetch the header of the message:

? FETCH 1 BODY.PEEK[HEADER]

Fetch header and contents of email message

? FETCH 1 BODY.PEEK[]

Fetch specific parts of the header (the examples are complementary)

? FETCH 1 BODY.PEEK[HEADER.FIELDS (Date From)]
? FETCH 1 BODY.PEEK[HEADER.FIELDS.NOT (Date From)]

Fetch metadata about the message

? FETCH 1 FLAGS
? FETCH 1 ENVELOPE
? FETCH 1 INTERNALDATE
? FETCH 1 RFC822.SIZE
? FETCH 1 BODYSTRUCTURE.PEEK
? FETCH 1 UID

Fetches can be composed

? FETCH 1 (BODYSTRUCTURE.PEEK UID)
? FETCH 1 (BODYSTRUCTURE.PEEK UID RFC822.SIZE)

If you only want some of a field this is also possible through <start-index.length>

? FETCH 1 (BODYSTRUCTURE.PEEK BODY.PEEK[]<0.200>)

Store Examples

Mark message as seen and deleted in addition to whatever flags may be present

? STORE 1 +FLAGS (\Deleted \Seen)

Unmark a message as seen and deleted, so it’ll show up in the inbox as unread.

? STORE 1 -FLAGS (\Deleted \Seen)

Completely replace the flags of a message with those provided

? STORE 1 FLAGS (\Deleted \Seen)

Alternatively, if a server response for STORE is not wanted then one can specify FLAGS.SILENT in any of the previous examples.

Concurrent Commands

The client MAY send another command without waiting for the completion result response of a command. […] Similarly, a server MAY begin processing another command before processing the current command to completion

However, if we try using our trusty s_client and pasting the following in, the commands will be executed sequentially after a brief delay (at least for outlook).

3 SEARCH FROM [email protected]
4 FETCH 1500 (BODY[HEADER.FIELDS (DATE)])
5 SEARCH FLAGGED SINCE 1-Feb-1994 NOT FROM "Smith"
6 SEARCH HEADER X-FOO ""
7 SEARCH FROM [email protected]
8 SEARCH TEXT "string not in mailbox"

So it looks like the MAY is taken to heart and I would not recommend relying on other behavior. Instead, if two independent commands need to be sent, open another connection.

The IDLE and NOTIFY Commands

The IDLE command, as described by rfc2177, is a simple way to have the server let the client know what’s going on without the client having to periodically poll the server.

Typically responses are pretty simple. The following example shows the responses one may receive as mail is received.

* 0 RECENT
* 4 EXISTS
* 1 RECENT
* 5 EXISTS
* 2 RECENT

Additionally, actions performed on the inbox are returned. Here we can see that the \Deleted flag is being marked on the messages followed by an EXPUNGE

* 4 FETCH (FLAGS (\Seen \Recent))
* 4 FETCH (FLAGS (\Deleted \Seen \Recent))
* 5 FETCH (FLAGS (\Seen \Recent))
* 5 EXPUNGE

Since the server can disconnect the client after 30 minutes, the spec recommends re-issuing the IDLE command every 29 minutes.

IDLE’s newer and more powerful brother is the NOTIFY command (rfc5465). The NOTIFY command is relatively new (published in 2009, compared to IDLE, which was published in 1997), so most mail servers will not support this command. If your mail server does support NOTIFY, then make sure to use it! Some of the benefits that NOTIFY provides:

  • Watch more than one mailbox with a single connection
  • Able to pick and choose what mailbox operations the connection receives
  • Able to customize what is returned on said mailbox operations

So if your email represents a message queue, NOTIFY could be the command for you! No more polling and no more secondary FETCH commands to retrieve data you need.

NOTIFY is much more complicated than IDLE, probably needlessly so. There are hardly any examples, so I’ll do my best to add to them. I’ve found that between servers the support NOTIFY varies with some NOTIFY servers rejecting examples from the spec with less than helpful error messages (Invalid Arguments)

To start simple, we’ll watch the mailbox we have selected for new messages and when messages are expunged. When a new message arrives we’ll also fetch it’s UID and some header fields

? NOTIFY SET (SELECTED (MessageNew (UID BODY.PEEK[HEADER.FIELDS (FROM DATE)]) MessageExpunge))

Quick tip if you want to subscribe to new messages, you’ll also have to subscribe to expunged messages

If one of MessageNew or MessageExpunge is specified, then both events MUST be specified. Otherwise, the server MUST respond with the tagged BAD response.

To turn off notification:

? NOTIFY NONE

Optionally one can provide a STATUS tag to the beginning of the command as shown below. Not really sure what it enables.

? NOTIFY SET STATUS (SELECTED (MessageNew MessageExpunge))

The server may be finicky with notifications and may give you the NOTIFICATIONOVERFLOW when:

the server is unable or unwilling to deliver as many notifications as it is being asked to.

I notice this most frequently when specifying notifications for more than on mailbox. If you’re not working with this restrictions, other usages are highlighted below:

? NOTIFY SET (SELECTED (MessageNew MessageExpunge)) (mailboxes postmaster (MessageNew MessageExpunge))
? NOTIFY SET (personal (SubscriptionChange)) (mailboxes postmaster (MessageNew MessageExpunge))
? NOTIFY SET (INBOXES (SubscriptionChange))

Section 6 gives a better overview of the different inboxes that can be selected.

On an interesting note, one can send commands while a NOTIFY is in progress and also switch to another mailbox. This has the affect of modifying what SELECTED mailbox the NOTIFY command is referring to.

Conclusion

This was a quickstart to IMAP and some of the more important extensions (I’m biased). I didn’t cover many things. There are still more commands in the IMAP spec to gloss over and many more extensions. And what I covered was the IMAP happy path. Numerous servers don’t support the features I showed or will response with NO or BAD, which a good IMAP client should deal with.

Happy IMAPing

Comments: