Fixing Mail import crashes on Leopard

After restoring from a Time Machine backup, Apple Mail would crash every time I tried to re-import my email archive. The problem seems to be affecting a lot of people who have mail that pre-dates OS X. The same crash also happens after migrating to a new machine.

Quick fix

Copy the following command and paste it into your Terminal. Press return and wait a few minutes. Once the command finishes, Mail should be able to import your messages.

grep -lZr 'Content-disposition: attachment' ~/Library/Mail/Mailboxes/ | xargs -0 ruby -i -pe 'gsub(/(Content-type:[^;]*;\s*name=)"(.*)"/){$1+(if !$2.nil? then $2.dump.gsub(/\\\\/, "\\") end)}'

The problem

The problem is specific to older, MacRoman encoded email messages with attachments whose filename includes non-ascii characters. For whatever reason, that freakshow edge-case combination will crash Apple Mail every time it tries to import those files. If by some stroke of luck you already have these messages in your email archive (I did), they can still crash Mail when trying to rebuild the containing mailbox.

The solution is sort of simple, just rename the attachment in the .emlx file. The harder part is finding which file to edit.

We’ll use grep to recursively search for a common token in these older files, then pass the filenames to a Ruby snippet which will filter the line containing the bad characters. Here is a section of a suspect message’s header:

Content-type: application/octet-stream; name="SPWH 4.01ü.sit";
Content-disposition: attachment
Content-transfer-encoding: base64

The first bolded line contains the attachment name which is causing the problem, who knows what that umlaut was originally. The second bolded line is what we’re telling grep to locate, I couldn’t get anything to dependably match the oddball characters. “Content-disposition” appears to be an older attachment syntax and didn’t appear in any of my messages from after 2001.

While the Ruby script could be run from find’s exec command, it wouldn’t be particularly efficient. Calling the script from find would pass every .emlx file in the Mailboxes directory through Ruby’s gsub, which is almost wholly unnecessary (and much slower). Only 10 of my 57,007 messages needed fixing, 99.98% of them were fine.

Is this safe?

Since the most common way to encounter this bug involves Time Machine restores or migration to a new machine, most users should already be backed up. If something should go wrong, just restore the backup’s Library/Mail folder over the messed up one. You can also copy or zip the Library/Mail folder to another drive to be even safer.

That said, I ran dozens of iterations of this solution against copies of my personal Mail archive without issue. And besides, if you’re reading this, you might not have any of your old mail, so how much worse could it really get?

If you’re still worried, copy your ~/Library/Mail/Mailboxes folder to another drive, run the script, then compare directories with something like Apple’s FileMerge. That will show you exactly which files have changed and what was changed inside them.

Renaming the attachment should be perfectly safe since the full encoded file contents are stored in the message. All the attachment name does is specify the filename, in a sense, it’s totally arbitrary.

I first submitted this bug with Apple back in May, if you have a developer account with Apple, please file a dupe for radar: 5912997

iPhone vs. Apple Mail

I’ve been seeing an issue with Apple Mail affecting several iPhone users on a several different of hosts:

  • With a POP account, Apple’s will ask for the password repeatedly, refuse the correct password and fail to collect any mail.
  • With IMAP, the account seems to stall and does not necessarily update state or download new messages. Desktop IMAP behavior is particularly erratic.

In both cases, the iPhone continues to work just fine. The problems mostly affects users who’ve set their iPhone to Auto-Check for mail to something other than Manual. The following lines appear in the Desktop’s console.log almost immediately after setting the iPhone to auto-check for mail:

2007-07-05 15:33:17.190 Mail[21242] Unhandled response to command SELECT: * NO  Trying to get mailbox lock from process 28292
2007-07-05 15:34:24.098 Mail[21242] Unhandled response to command SELECT: * NO  Trying to get mailbox lock from process 29790
2007-07-05 15:36:14.917 Mail[21242] Unhandled response to command SELECT: * NO  Trying to get mailbox lock from process 31080

Those entries seem to indicate that the IMAP server is sending a response that Apple Mail doesn’t know what to do with.

A thread on the MacRumors forums claimed this was a multiple connections issue with the mail server, but I think I’ve conclusively debunked that, at least for IMAP.

To test the multiple connection theory, I set up Thunderbird on two other physical machines, one Mac and a Dell running Ubuntu, then set up my account using the default IMAP settings. I also opened my account in Horde webmail and hit reload a lot. Despite those simultaneous connections, Apple Mail seemed to be fine and messages were getting delivered. The little progress indicator was, however, still sitting there next to the account name, not spinning.

So now I can break Apple Mail just by turning on Auto-Check in iPhone’s Settings->Mail. Manual checking from the iPhone doesn’t cause any problems. So far I’m only seeing this on shared hosts running the Courier mail server.

IMAP is inconsistent about when it breaks, maybe relating related to server load issue. POP will break every time: If I check my email on a POP account with the iPhone, Apple Mail will immediately ask for and then refuse the password for that account.

An IMAP workaround

Installing IMAP-IDLE, pretty much fixes the problems with IMAP. I’ve had this running for several hours and the iPhone checking every 15 minutes, and things seem to be working smoothly. IMAP errors still appear in console.log but mail is getting through. I’m going to install this on a few other machines tomorrow and see what happens.

Not sure what to do about POP, but then we’re migrating everyone over to IMAP anyway.

Deleting Unused mbox files

Or, How I reclaimed 1.25 gigabytes of my hard drive.

When 10.4 imported mail from the old 10.3 mbox files, it broke each message into an individual file so Spotlight could index them. The old mbox files, rightly, were left on the drive. For most people this wouldn’t take up a noticeable amount of space, however those of us with a ton of mail saw a significant hit to our disk space.

The following commands will remove the unused mbox files from the drive, recovering a potentially large amount of disk space:

    cd ~/Library/Mail
    find . -name "mbox" -ls

Make sure the only thing listed are mbox files in your mail directory (they should be). To delete all those files, change the last “-ls” of the above command to “-delete“. (I didn’t include the full command on purpose since it deletes files and I wanted to strongly encourage everyone doing this to check the file list before deleting.) Just to be doubly safe, backup before doing this.

Total size of my mail folder went from 3.07 GB (3,206,511,328 bytes) to 1.84 GB (1,884,864,581). A savings of almost 1.25 GB. At $229.00 for a 93.2 GB formatted notebook drive, that’s an actual cost savings of $3.02.

Note there was/is a bug with Mail importing under 10.4 where very large mbox files don’t read correctly. Make sure all your messages really did import correctly before deleting your mbox files.

Mbox files and in 10.4

One of the big under-the-hood changes to in 10.4 is that messages are no longer in mbox files, this allows Spotlight to index individual messages without having to first parse out the contents of the entire mailbox. Despite being unused, the old mbox files are often still on the drive, which means that most everyone’s mail is now taking up almost twice as much space as it did with 10.3. (my mail folder went from 1.4 to 2.8 gigs). If installing Tiger devoured a lot of hard drive space, that might account for a significant portion of where it went.

After an Archive & Install upgrade, my ~/Library/Mail directory still has folders labeled *.mbox, but those folders each now contain a “Messages” directories which holds thousands of numbered *.emix files. Those mostly appear to be plain text files each containing one message. There is a small glob of XML plist data attached to the end of each file, as well an integer at the top of the file. The first integer is the message’s character/byte count from the end of the integer to the beginning of the XML data.

In theory, a fairly simple shell script could glom everything together into a standard mbox. Not sure how processor intensive that would be, but the steps to reassemble the data would be trivial. At very least Apple’s decision to move away from the mbox format can be easily reversed with no data loss.

Not much has been written about this, but I found this MacOS X Hints mbox thread which confirms what I’m seeing:

I used to be able to use mutt or pine to view the mbox mailboxes in ~/Library/Mail/<account>/<box>/mbox . In 10.4 these are still present, but appear not to be updated any more. The up to date emails are in ~/Library/Mail/<account>/<box>/Messages/*.emlx which I believe is required for spotlight to be able to index messages – it only indexes file-based entities, not subportions of files.

Because Carbon Copy Cloner doesn’t work with 10.4 yet, I can’t comfortably back up my drive and experiment with deleting the old mboxes. It seems like it should be safe to remove all mbox files and associated files, nothing outside the Messages directories has been modified since I upgraded to 10.4. If anyone has more information, please leave a comment.

(While reading a little background on the mbox format, I found the original RFC for email as a text file. The W3c also has an HTML version of RFC822, partially converted by (sir) Tim Berners-Lee. It’s fun to encounter raw history like that.)

Update I posted a simple command to delete unused Mbox files.