<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Joe Maller &#187; Regular Expressions</title>
	<atom:link href="http://joemaller.com/tag/regular-expressions/feed/" rel="self" type="application/rss+xml" />
	<link>http://joemaller.com</link>
	<description>.com</description>
	<lastBuildDate>Fri, 27 Jan 2012 06:04:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Fixing a quarter million misnested HTML tags</title>
		<link>http://joemaller.com/1567/fixing-a-quarter-million-misnested-html-tags/</link>
		<comments>http://joemaller.com/1567/fixing-a-quarter-million-misnested-html-tags/#comments</comments>
		<pubDate>Tue, 22 Dec 2009 04:01:42 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[misc.]]></category>
		<category><![CDATA[Web Development]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[regex]]></category>
		<category><![CDATA[Regular Expressions]]></category>

		<guid isPermaLink="false">http://joemaller.com/?p=1567</guid>
		<description><![CDATA[These things just seem to find me, this time it was a very large database dump for a media site which was plagued with misnested HTML tags. Seriously. Just shy of 250,000 misnested pairs. Here&#8217;s the pattern I came up with to fix it: Find: &#60;(([^ &#62;]+)(?:[^&#62;]*))&#62;(.*)&#60;(([^ &#62;]+)(?:[^&#62;]*))&#62;(.*)&#60;/\2&#62;(.*)&#60;/\5&#62; Replace with: &#60;$1&#62;$3&#60;$4&#62;$6&#60;/$5&#62;$7&#60;/$2&#62; or, depending on your [...]]]></description>
			<content:encoded><![CDATA[<p>These things just seem to find me, this time it was a very large database dump for a media site which was plagued with misnested HTML tags. Seriously. Just shy of 250,000 misnested pairs. </p>
<p>Here&#8217;s the pattern I came up with to fix it:</p>
<p>Find:</p>
<pre><code>&lt;(([^ &gt;]+)(?:[^&gt;]*))&gt;(.*)&lt;(([^ &gt;]+)(?:[^&gt;]*))&gt;(.*)&lt;/\2&gt;(.*)&lt;/\5&gt;</code></pre>
<p>Replace with:<br />
<code>&lt;$1&gt;$3&lt;$4&gt;$6&lt;/$5&gt;$7&lt;/$2&gt;</code><br />
or, depending on your regex engine, your replace string might look like this:<br />
<code>&lt;\1&gt;\3&lt;\4&gt;\6&lt;/\5&gt;\7&lt;/\2&gt;</code></p>
<p>That handles all of the following cases:</p>
<pre><code>&lt;b&gt;&lt;i&gt;text&lt;/b&gt;&lt;/i&gt;
&lt;b&gt;text&lt;i&gt;text&lt;/b&gt;text&lt;/i&gt;
&lt;b&gt;&lt;a href="#" target="_new"&gt;link&lt;/b&gt;text&lt;/a&gt;
&lt;a href="#"&gt;&lt;h2&gt;text&lt;/a&gt;&lt;/h2&gt;</code></pre>
<p>Running the final substitution was ridiculously fast, <a href="http://xkcd.com/208/">Regular Expressions are magic</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://joemaller.com/1567/fixing-a-quarter-million-misnested-html-tags/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Database Caching 1/12 queries in 0.009 seconds using disk: basic
Object Caching 209/231 objects using disk: basic

Served from: joemaller.com @ 2012-02-08 18:10:30 -->

<!-- W3 Total Cache: Page cache debug info:
Engine:             disk: enhanced
Cache key:          tag/regular-expressions/feed/_index.xml_gzip
Caching:            enabled
Status:             not cached
Creation Time:      0.391s
Header info:
Set-Cookie:         bb2_screener_=1328742630+38.107.179.217+38.107.179.217; path=/
X-Pingback:         http://joemaller.com/wordpress/xmlrpc.php
Content-Type:       text/xml; charset=UTF-8
Last-Modified:      Wed, 08 Feb 2012 23:10:30 GMT
Vary:               Accept-Encoding, Cookie
Expires:            Thu, 09 Feb 2012 00:10:30 GMT
Pragma:             public
Cache-Control:      public, must-revalidate, proxy-revalidate
Etag:               8722ee9220bdf32559e9b8ecd28baa1f
X-Powered-By:       W3 Total Cache/0.9.2.4
Content-Encoding:   gzip
-->
