Fixing a quarter million misnested HTML tags

These things just seem to find me, this time it was a very large database dump for a media site which was plagued with misnested HTML tags. Seriously. Just shy of 250,000 misnested pairs.

Here’s the pattern I came up with to fix it:


<(([^ >]+)(?:[^>]*))>(.*)<(([^ >]+)(?:[^>]*))>(.*)</\2>(.*)</\5>

Replace with:
or, depending on your regex engine, your replace string might look like this:

That handles all of the following cases:

<b><a href="#" target="_new">link</b>text</a>
<a href="#"><h2>text</a></h2>

Running the final substitution was ridiculously fast, Regular Expressions are magic.

How to spell Hanukkah 2009

How to spell Hanukkah 2009

Here are the 21 spellings in order of usage this year: Hanukkah, Chanukah, Hannukah, Hanukah, Chanukkah, Hanuka, Chanuka, Channukah, Hanukka, Chanukka, Hannuka, Hannukkah, Channuka, Channukkah, Hannukka, Xanuka, Channukka, Janukah, Janukkah, Janukkah and Chanuqa.

Previous years: 2004, 2005, 2006, 2007, 2008.

Convert Git branches to remote tracking branches

Update: As of Git 1.7.0, converting existing branches to tracking branches got a whole lot easier. git push now has a -u flag which will set up tracking based on a successful push.

$ git push -u hub master
Branch master set up to track remote branch master from hub.

For reference, here’s the original post:

There are two ways to convert an existing branch to a remote tracking branch, using git config or directly editing the .git/config file.

In both of these examples, the local and remote branches are named “master”. The remote repository is “hub”.

git config commands

$ git config branch.master.remote hub
$ git config branch.master.merge refs/heads/master

editing .git/config

All the git config commands do is add the following to .git/config, editing the file manually has the same result.

[branch "master"]
    remote = hub
    merge = refs/heads/master

What would be nice is an additional config command, branch.<name>.track, which would split a full refspec, sending the relevant parts to the remote and merge commands.

Share |
Leave a comment
link: Dec 10, 2009 1:19 pm
posted in: misc.

Django via CGI on shared hosting

Django just isn’t designed to run under CGI.
It won’t run under OS/2, either.*

Well ok, but running Django under CGI is not impossible. It just kind of really sucks. But anyway, to prove it’s possible if not workable, here’s how I got it running on two standard cPanel shared hosts using plain old slow and clunky CGI.


First, install virtualenv. This makes locally managing modules fantastically easy by creating self-contained Python virtual environments. Installing couldn’t be simpler: Get the script, run the script, source your environment.

$ mkdir ~/src && cd ~/src
$ curl -LO
$ tar -xvzf tip.gz
$ python virtualenv/ --distribute ~/python_virtualenv
New python executable in /home/joe/python_virtualenv/bin/python
Installing distribute.............................................

$ source ~/python_virtualenv/bin/activate 

Now, install Django using pip, which was automatically installed by virtualenv. After sourcing the virtual environment, this works from anywhere.

$ pip install Django
Downloading/unpacking Django
  Downloading Django-1.1.1.tar.gz (5.6Mb): 5.6Mb downloaded
  Running egg_info for package Django
Installing collected packages: Django
  Running install for Django
    changing mode of build/scripts-2.4/ from 664 to 775
    changing mode of /home/joe/python_virtualenv/bin/ to 775
Successfully installed Django

If your host doesn’t block GCC, use pip to be sure your MySQL interface (MySQLdb) is up to date:

$ pip install -U MySQL-python
Successfully installed MySQL-python

Django requires MySQLdb version 1.2.1p2 or higher.

Yolk prints a nice, clean list of everything installed in your Python environment, install and run:

$ pip install yolk
$ yolk -l

Django          - 1.1.1        - active 
MySQL-python    - 1.2.3c1      - active 
pip             - 0.6.1        - active 
setuptools      - 0.6c11       - active 
yolk            - 0.4.1        - active 

At this point, I started a new Django project, assigned a database and filled in the necessary values in I put the Django project files into the virtual environment to keep everything in the same place. This might not be the best practice, but it makes sense to me.

$ cd ~/python_virtualenv/
$ startproject testproject

The sane part is finished, now onto the kludgery.


All the CGI shim solutions I found pointed back to a script Paul Sargent uploaded to ticket 2407 back in summer of 2006. It still works: django.cgi

Three lines need editing:

Line 1: Point the CGI’s shebang to the virtualenv Python binary.


Line 95: Add the directory above the Django project directory to Python’s sys.path.


Line 97: Add the project’s settings to os.environ.

os.environ['DJANGO_SETTINGS_MODULE'] = 'testproject.settings'


For Django to respond to URL requests, those urls need to be fed into the django.cgi script. For testing I routed everything from /django to the cgi script by adding the following lines to my top-level htaccess file:

RewriteEngine on
RewriteRule ^cgi-bin/ - [L]
RewriteRule ^django/(.*)$ /cgi-bin/django.cgi/$1 [QSA,L]

The second line isn’t necessary unless pulling Django urls from the webroot, without it, the redirects would loop.

At this point, the Django site should load from /django/… urls.

Finally, as a quick fix for admin media files, I symlinked Django’s admin media directory from my web root:

ln -s ~/python_virtualenv/lib/python2.4/site-packages/django/contrib/admin/media ~/www/media


I spent quite a few hours spread across a couple days researching and figuring out how to get the first install working. The second installation only took about 5 minutes from start until editing Django’s admin pages.

Running Django through CGI is possible, but it is dog slow. There appears to be some caching after the first request, but that first page load often takes an excruciatingly long time.

Further reading, possible improvements

The servers I was working with are both running the almost six year old Python 2.4.3. The wsigref module was introduced with Python 2.5. My goal was to get Django running without compiling anything since some hosts deny access to GCC.


These sites were helpful in figuring this out.

The two hosts I tested on were LiquidWeb and A2Hosting. Both have been excellent, dependable hosts. Neither has any Python support to speak of on their shared plans. A2 blocks access to GCC.

At some point, I need to stop writing drafts and actually publish something here.


The night of September 10th I went for a run, instead of my usual route, I ran downtown to Ground Zero. Amid the street closings, barricades and police, an overnight fire crew was walking slowly up Church Street with a large wreath. My eyes filled with tears and I could do nothing except kept going.

The fire station across 14th Street from our apartment, Engine 5, gathers on the sidewalk in front of the station for four moments of silence each year. I would imagine most stations do the same.

8:46am is always the hardest. That’s when everything floods back. Each of the following moments gets a little easier, but this is when the memories of images and smells and feelings are nearly overwhelming.

9:03am was the moment we knew Flight 11 was no accident, but that distinction and those 17 minutes of residual innocence have been lost to time.

At 9:59 the South Tower fell and one of the city’s mountains vanished, we knew things would never, ever be like they were.

By 10:28, many of the emotions have washed out, grief and awe give way to genuine feelings of thanks and respect.

Previous 9/11s: 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008.

Worst component ever

SunFone's hissing ACU057A-0512

Pretty much every one of these I’ve owned has failed.

Almost all of my failed power supplies were connected to one of our otherwise awesome Other World Computing Mercury Elite-AL enclosures. Generally, the power adapters last about a year, then go hissy and fail. All of them have been plugged into power conditioning UPSes.

The model number is ACU057A-512. They’re manufactured by SunFone who also supply power adapters for LaCie. These fail so dependably, I’ve taken to keeping spares on hand to make sure I can keep our server’s backup drives online.

Lacie has a photo identifying their power adapters (original). I have a few very old versions of these which explicitly list SunFone as the manufacturer — amazingly they’re still working.

If you’re lucky, the power supply will just fail and the drive will no longer mount. If you’re unlucky, the power supply will gradually fail and some data on the drive will be corrupted. Often the drives will be heard faintly clicking, and if they mount at all, they’ll report all sorts of errors. After at least 8-10 failures, I can only remember one instance where data was compromised. Thankfully that drive was part of a redundant backup strategy, so nothing was lost.

When these fail, they emit a hissing sound. Sometimes it can be heard from across a noisy room, other times I had to hold the brick up to my ear. Sounds like this:

Bob Friesenhahn’s report on MacFixit also mirrors my experience.

I have four D-series LaCie drives here. All of them have experienced power failure. In fact, in the past couple of years I have replaced six failed power supplies. The power supply model number is ACU057A-0512.

The failed supplies were all plugged into a high grade UPS and see an average temperature of 75 degrees. Average time to fail seems to be six months. No supply has lasted more than one year.

Now I purchase these supplies in bulk and keep three of them on hand at all times.

As of September 2009, it looks like OWC has finally switched to a completely different power adapter. Also, a their replacement part number for the doomed SunFone adapters now shows Jentec model JTA0707-Y. OWC has been really good to me over the years, so I’m hoping this change will be the end of this story.

Update January 2011: All of the replacement Jentec power supplies have been working smoothly for over a year.

Next Page »