Joe Maller.com

Fixing a quarter million misnested HTML tags

These things just seem to find me, this time it was a very large database dump for a media site which was plagued with misnested HTML tags. Seriously. Just shy of 250,000 misnested pairs.

Here’s the pattern I came up with to fix it:

Find:

<(([^ >]+)(?:[^>]*))>(.*)<(([^ >]+)(?:[^>]*))>(.*)</\2>(.*)</\5>

Replace with:
<$1>$3<$4>$6</$5>$7</$2>
or, depending on your regex engine, your replace string might look like this:
<\1>\3<\4>\6</\5>\7</\2>

That handles all of the following cases:

<b><i>text</b></i>
<b>text<i>text</b>text</i>
<b><a href="#" target="_new">link</b>text</a>
<a href="#"><h2>text</a></h2>

Running the final substitution was ridiculously fast, Regular Expressions are magic.


Tabbed clipboard to HTML Table

I was looking for a quick way to get a structured table from some data I had in Numbers. Unfortunately Numbers isn’t scriptable and doesn’t seem to offer plain HTML export. After a little poking around, I just ended up writing a script to do what I wanted.

This little AppleScript will convert anything text in the clipboard into a simple, unstyled HTML table. View the script in Script Editor

Just save it into your Scripts folder and call it after copying some data to the clipboard. Any text on your clipboard will be converted to a basic, un-styled HTML table, ready to paste.

set oldDelims to AppleScript‘s text item delimiters

set AppleScript‘s text item delimiters to return

set TRs to every text item of (the clipboard as text)

set AppleScript‘s text item delimiters to tab

set theTable to “<table>” & return

repeat with TR in TRs

copy theTable & “<tr>” & return to theTable

repeat with TD in text items of TR

copy theTable & “<td>” & TD & “</td>” & return to theTable

end repeat

copy theTable & “</tr>” & return to theTable

end repeat

copy theTable & “</table>” to theTable

set AppleScript‘s text item delimiters to oldDelims

set the clipboard to theTable