Deleting Unused mbox files

Or, How I reclaimed 1.25 gigabytes of my hard drive.

When 10.4 imported mail from the old 10.3 mbox files, it broke each message into an individual file so Spotlight could index them. The old mbox files, rightly, were left on the drive. For most people this wouldn’t take up a noticeable amount of space, however those of us with a ton of mail saw a significant hit to our disk space.

The following commands will remove the unused mbox files from the drive, recovering a potentially large amount of disk space:

    cd ~/Library/Mail
    find . -name "mbox" -ls

Make sure the only thing listed are mbox files in your mail directory (they should be). To delete all those files, change the last “-ls” of the above command to “-delete“. (I didn’t include the full command on purpose since it deletes files and I wanted to strongly encourage everyone doing this to check the file list before deleting.) Just to be doubly safe, backup before doing this.

Total size of my mail folder went from 3.07 GB (3,206,511,328 bytes) to 1.84 GB (1,884,864,581). A savings of almost 1.25 GB. At $229.00 for a 93.2 GB formatted notebook drive, that’s an actual cost savings of $3.02.

Note there was/is a bug with Mail importing under 10.4 where very large mbox files don’t read correctly. Make sure all your messages really did import correctly before deleting your mbox files.

Mbox files and in 10.4

One of the big under-the-hood changes to in 10.4 is that messages are no longer in mbox files, this allows Spotlight to index individual messages without having to first parse out the contents of the entire mailbox. Despite being unused, the old mbox files are often still on the drive, which means that most everyone’s mail is now taking up almost twice as much space as it did with 10.3. (my mail folder went from 1.4 to 2.8 gigs). If installing Tiger devoured a lot of hard drive space, that might account for a significant portion of where it went.

After an Archive & Install upgrade, my ~/Library/Mail directory still has folders labeled *.mbox, but those folders each now contain a “Messages” directories which holds thousands of numbered *.emix files. Those mostly appear to be plain text files each containing one message. There is a small glob of XML plist data attached to the end of each file, as well an integer at the top of the file. The first integer is the message’s character/byte count from the end of the integer to the beginning of the XML data.

In theory, a fairly simple shell script could glom everything together into a standard mbox. Not sure how processor intensive that would be, but the steps to reassemble the data would be trivial. At very least Apple’s decision to move away from the mbox format can be easily reversed with no data loss.

Not much has been written about this, but I found this MacOS X Hints mbox thread which confirms what I’m seeing:

I used to be able to use mutt or pine to view the mbox mailboxes in ~/Library/Mail/<account>/<box>/mbox . In 10.4 these are still present, but appear not to be updated any more. The up to date emails are in ~/Library/Mail/<account>/<box>/Messages/*.emlx which I believe is required for spotlight to be able to index messages – it only indexes file-based entities, not subportions of files.

Because Carbon Copy Cloner doesn’t work with 10.4 yet, I can’t comfortably back up my drive and experiment with deleting the old mboxes. It seems like it should be safe to remove all mbox files and associated files, nothing outside the Messages directories has been modified since I upgraded to 10.4. If anyone has more information, please leave a comment.

(While reading a little background on the mbox format, I found the original RFC for email as a text file. The W3c also has an HTML version of RFC822, partially converted by (sir) Tim Berners-Lee. It’s fun to encounter raw history like that.)

Update I posted a simple command to delete unused Mbox files.