Mbox files and in 10.4

One of the big under-the-hood changes to in 10.4 is that messages are no longer in mbox files, this allows Spotlight to index individual messages without having to first parse out the contents of the entire mailbox. Despite being unused, the old mbox files are often still on the drive, which means that most everyone’s mail is now taking up almost twice as much space as it did with 10.3. (my mail folder went from 1.4 to 2.8 gigs). If installing Tiger devoured a lot of hard drive space, that might account for a significant portion of where it went.

After an Archive & Install upgrade, my ~/Library/Mail directory still has folders labeled *.mbox, but those folders each now contain a “Messages” directories which holds thousands of numbered *.emix files. Those mostly appear to be plain text files each containing one message. There is a small glob of XML plist data attached to the end of each file, as well an integer at the top of the file. The first integer is the message’s character/byte count from the end of the integer to the beginning of the XML data.

In theory, a fairly simple shell script could glom everything together into a standard mbox. Not sure how processor intensive that would be, but the steps to reassemble the data would be trivial. At very least Apple’s decision to move away from the mbox format can be easily reversed with no data loss.

Not much has been written about this, but I found this MacOS X Hints mbox thread which confirms what I’m seeing:

I used to be able to use mutt or pine to view the mbox mailboxes in ~/Library/Mail/<account>/<box>/mbox . In 10.4 these are still present, but appear not to be updated any more. The up to date emails are in ~/Library/Mail/<account>/<box>/Messages/*.emlx which I believe is required for spotlight to be able to index messages – it only indexes file-based entities, not subportions of files.

Because Carbon Copy Cloner doesn’t work with 10.4 yet, I can’t comfortably back up my drive and experiment with deleting the old mboxes. It seems like it should be safe to remove all mbox files and associated files, nothing outside the Messages directories has been modified since I upgraded to 10.4. If anyone has more information, please leave a comment.

(While reading a little background on the mbox format, I found the original RFC for email as a text file. The W3c also has an HTML version of RFC822, partially converted by (sir) Tim Berners-Lee. It’s fun to encounter raw history like that.)

Update I posted a simple command to delete unused Mbox files.

5 Responses to “Mbox files and in 10.4” Comments Feed for Mbox files and in 10.4

  • You can delete the old .mbox files, they are not used anymore… unless you plan to boot into panther (mail v1) again?

    I deleted the .mbox files, and mail 2 continued to work just fine, I guess they keep the files in case you want to go back to panther.

  • One of the issues that led us to this discovery was that an emIx file was corrupted or at least Norton Anti-Virus 10.0 thought it had a virus and could not open it to repair it… etc.

  • Would like to be able to convert the icon and XXXX.emix format for stored emails in library/mail/popaccount/messages so that I can read the date, title of the message and to/from without having to open each one. And then I would like to upload this stored email folder to .mac for back up purposes. Can anyone help with this? Many thanks, Al

  • Use this donationware to make the conversion from emix to mbox.

    Tick the ‘other’ box when importing as they are not Mail compatible.

Leave a Reply