Removing one file’s contents from another file

A friend called in the middle of a promotional email disaster. Due to a screwup by their mailing service, messages only went out to a random portion of their list. It was a disaster because all he had left to work with were two text files, the masterlist and a log of what had already been sent.

So we basically needed to delete the contents of a small file, “deleteme.txt” from the contents of larger file, “masterlist.txt”. The lines to remove were not contiguous and some lines in the smaller file might have already been removed from the larger file.

Here is the piped unix command I used to do this:

cat deleteme.txt deleteme.txt masterlist.txt | sort | uniq -u > newmasterlist.txt

The uniq command’s -u flag outputs only lines which appear once, omitting every duplicate line. I used cat to join the deleteme.txt file twice to guarantee the interim file would contain at least two copies of every line to remove. If the lines already appeared in masterlist.txt, then there would be three to remove, but forcing duplicates made sure I wouldn’t end up with the already deleted lines being added back in (an XOR).

As I wrote this out, it started to seem more and more simple, almost to the point of silly. Writing this post took far longer than fixing the files. But the solution didn’t occur to me right away and this post is now exactly what I was googling around for.

Share |
2 Comments so far
link: Jan 19, 2007 2:22 am
posted in: misc.