Joe Maller.com

Fixing a Palm duplicate disaster

I recently came across an absolute disaster of a Palm Desktop data file while helping someone setup a new iPhone. It had 13,572 contacts, mostly duplicates. Judging from the number of obvious duplicate entries, my guess is the actual number will be somewhere around 2500 (it was).

Here is the process I used to automatically remove a lot of those duplicates and import the remainder into the Mac’s Address Book.

The first step is to get out of Palm Desktop as soon as possible. Select all contacts and export to a group VCard. This one was 3.4 MB.

Most of this will happen in Terminal, but a quick stop in BBEdit or TextWrangler will save a few steps later on. (TextMate tends to choke on big, non-UTF files.) The Palm export file is encoded in MacRoman. It’s 2008, pretty much any text that isn’t Unicode should be. I used TextWrangler to convert the encoding to UTF-8 no BOM (byte order marker).

VCards require Windows style CRLF line endings. While we could deal with those in Sed, we might as well just switch the file to Unix style LF endings in TextWrangler too. The TextWrangler bottom bar should switch from this:

MacRoman CRLF

To this:

utf8 LF

Now comes the magic.

While this could be done as an impossible-to-read one-line sed command, it’s easier to digest and debug as separate command files.

Here are the steps:

  1. Use Sed to join each individual VCard into a single line using a token to replace line feeds, output to intermediate file
  2. Sort and Uniq the result to remove obvious duplicates.
  3. Replace the tokens with line feeds

Below are the two sed command files I used. I ran these individually but they could easily be piped together into a one-line command.

vcard_oneline.sed:

# define the range we'll be working with
/BEGIN:VCARD/,/END:VCARD/ {

# define the loopback
:loop

# add the next line to the pattern buffer
N

# if pattern is not found, loopback and add more lines
/\nEND:VCARD$/! b loop

# replace newlines in multi-line pattern
s/\n/   %%%     /g
}

Run that like this:

sed -f vcard_oneline.sed palm_dump.vcf > vcards_oneline.txt

Then run that file through sort and uniq:

sort vcards_oneline.txt | uniq > vcards_clean.txt 

vcard_restore.sed:

# replace tokens with DOS style CRLF line endings
s/      %%%     /^M\
/g

# add the <CR> before the LF at the end of the line
s/$/^M/

Run that with something like this:

sed -f vcard_restore.sed vcards_clean.txt > vcards_clean.vcf

After that last step, you should be able to drag the vcards_clean.vcf file into Address Book to import your vcards.

Suggestions for improvement are always welcomed.

Notes:

In VIM, type the tab character as control-v-i (hold control while pressing v then i), type the line break by typing control-v-enter.

iconv could be used to convert from MacRoman to UTF-8. TextWrangler just seemed easier at the time.

Palm Desktop appears to dump group VCards in input order, so duplicate entries were not grouped together. Running the output through sort visually reveals a ton of duplicates and makes it possible to use uniq to remove consecutive duplicates.

I had to quit and re-open Address Book once or twice before it would import the files.


Tabbed clipboard to HTML Table

I was looking for a quick way to get a structured table from some data I had in Numbers. Unfortunately Numbers isn’t scriptable and doesn’t seem to offer plain HTML export. After a little poking around, I just ended up writing a script to do what I wanted.

This little AppleScript will convert anything text in the clipboard into a simple, unstyled HTML table. View the script in Script Editor

Just save it into your Scripts folder and call it after copying some data to the clipboard. Any text on your clipboard will be converted to a basic, un-styled HTML table, ready to paste.

set oldDelims to AppleScript’s text item delimiters

set AppleScript’s text item delimiters to return

set TRs to every text item of (the clipboard as text)

set AppleScript’s text item delimiters to tab

set theTable to “<table>” & return

repeat with TR in TRs

copy theTable & “<tr>” & return to theTable

repeat with TD in text items of TR

copy theTable & “<td>” & TD & “</td>” & return to theTable

end repeat

copy theTable & “</tr>” & return to theTable

end repeat

copy theTable & “</table>” to theTable

set AppleScript’s text item delimiters to oldDelims

set the clipboard to theTable


Arcade Ambiance

This is what a good portion of my childhood sounded like, especially 1983. Andy Hofle has faithfully recreated the ambient soundtrack of coin-op arcades.


Microsoft and Yahoo: LAMP, meet WAMP

After the obvious desire to takeover Yahoo’s unmatched traffic, the thing that most struck me about Microsoft’s proposed Yahoo! acquisition was what they’d do with Yahoo’s extensive foundation of Open Source Software.

Historically, Microsoft has had a deep institutional phobia about OSS. But Yahoo! uses PHP extensively, and Rasmus Lerdorf, the creator of PHP, has worked at Yahoo! since 2002.

This seems to make no sense. Unless the OSS and PHP backend is something Microsoft wants.

On January 31st, Mary Jo Foley published notes from an interview with Sam Ramji, Microsoft’s Director of Platform Technology Strategy. Foley rightly highlighted this quote from Ramji:

“Our focus is getting OSS on top of Windows, and I’m focused on (providing) interoperability between the LAMP (Linux, Apache, MySQL, PHP) and Windows stacks.”

She also posted this PowerPoint slide:
LAMP, meet WAMP

Boom, as they say. Microsoft wants to legitimize Windows as the foundation for a parallel WAMP stack. What better way to prove the viability of WAMP than running the biggest PHP web site in the world?

Microsoft may have finally realized that Open Source can be seen as a competitor, but also as free labor. Google and Apple, along with Yahoo! realized that a long time ago. Why try and compete directly when you can subvert it by becoming the dominant platform that software runs on? Instant credibility, and instant influence.

So far we’ve only seen the first chapter of this story, or perhaps the first act of this tragedy. The next phase looks like it may turn out to be Microsoft, Google and others fighting over Yahoo’s unfortunate carcass and tearing it to shreds.

Steve Ballmer has giant brass balls and Microsoft most likely anticipated that Google would do something to interfere with the acquisition. Microsoft is on the move. This should be an interesting week.

Disclosure: I’m currently holding Yahoo! stock and have previously owned stock in Microsoft.



JavaScript

Projects

iPhoto

Twitter

Categories

Archives:

Geekery etc.

digits.com counter