Joe Maller.com

Fixing Mail import crashes on Leopard

After restoring from a Time Machine backup, Apple Mail would crash every time I tried to re-import my email archive. The problem seems to be affecting a lot of people who have mail that pre-dates OS X. The same crash also happens after migrating to a new machine.

Quick fix

Copy the following command and paste it into your Terminal. Press return and wait a few minutes. Once the command finishes, Mail should be able to import your messages.

grep -lZr 'Content-disposition: attachment' ~/Library/Mail/Mailboxes/ | xargs -0 ruby -i -pe 'gsub(/(Content-type:[^;]*;\s*name=)"(.*)"/){$1+(if !$2.nil? then $2.dump.gsub(/\\\\/, "\\") end)}'

The problem

The problem is specific to older, MacRoman encoded email messages with attachments whose filename includes non-ascii characters. For whatever reason, that freakshow edge-case combination will crash Apple Mail every time it tries to import those files. If by some stroke of luck you already have these messages in your email archive (I did), they can still crash Mail when trying to rebuild the containing mailbox.

The solution is sort of simple, just rename the attachment in the .emlx file. The harder part is finding which file to edit.

We’ll use grep to recursively search for a common token in these older files, then pass the filenames to a Ruby snippet which will filter the line containing the bad characters. Here is a section of a suspect message’s header:

--B_3113094650_866203
Content-type: application/octet-stream; name="SPWH 4.01ü.sit";
x-mac-creator="53495421";
x-mac-type="53495435"
Content-disposition: attachment
Content-transfer-encoding: base64

The first bolded line contains the attachment name which is causing the problem, who knows what that umlaut was originally. The second bolded line is what we’re telling grep to locate, I couldn’t get anything to dependably match the oddball characters. “Content-disposition” appears to be an older attachment syntax and didn’t appear in any of my messages from after 2001.

While the Ruby script could be run from find’s exec command, it wouldn’t be particularly efficient. Calling the script from find would pass every .emlx file in the Mailboxes directory through Ruby’s gsub, which is almost wholly unnecessary (and much slower). Only 10 of my 57,007 messages needed fixing, 99.98% of them were fine.

Is this safe?

Since the most common way to encounter this bug involves Time Machine restores or migration to a new machine, most users should already be backed up. If something should go wrong, just restore the backup’s Library/Mail folder over the messed up one. You can also copy or zip the Library/Mail folder to another drive to be even safer.

That said, I ran dozens of iterations of this solution against copies of my personal Mail archive without issue. And besides, if you’re reading this, you might not have any of your old mail, so how much worse could it really get?

If you’re still worried, copy your ~/Library/Mail/Mailboxes folder to another drive, run the script, then compare directories with something like Apple’s FileMerge. That will show you exactly which files have changed and what was changed inside them.

Renaming the attachment should be perfectly safe since the full encoded file contents are stored in the message. All the attachment name does is specify the filename, in a sense, it’s totally arbitrary.

I first submitted this bug with Apple back in May, if you have a developer account with Apple, please file a dupe for radar: 5912997