<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Joe Maller &#187; Regular Expressions</title>
	<atom:link href="http://joemaller.com/tag/regular-expressions/feed/" rel="self" type="application/rss+xml" />
	<link>http://joemaller.com</link>
	<description>.com</description>
	<lastBuildDate>Tue, 15 May 2012 03:40:33 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Fixing a quarter million misnested HTML tags</title>
		<link>http://joemaller.com/1567/fixing-a-quarter-million-misnested-html-tags/</link>
		<comments>http://joemaller.com/1567/fixing-a-quarter-million-misnested-html-tags/#comments</comments>
		<pubDate>Tue, 22 Dec 2009 04:01:42 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[misc.]]></category>
		<category><![CDATA[Web Development]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[regex]]></category>
		<category><![CDATA[Regular Expressions]]></category>

		<guid isPermaLink="false">http://joemaller.com/?p=1567</guid>
		<description><![CDATA[These things just seem to find me, this time it was a very large database dump for a media site which was plagued with misnested HTML tags. Seriously. Just shy of 250,000 misnested pairs. Here&#8217;s the pattern I came up with to fix it: Find: &#60;(([^ &#62;]+)(?:[^&#62;]*))&#62;(.*)&#60;(([^ &#62;]+)(?:[^&#62;]*))&#62;(.*)&#60;/\2&#62;(.*)&#60;/\5&#62; Replace with: &#60;$1&#62;$3&#60;$4&#62;$6&#60;/$5&#62;$7&#60;/$2&#62; or, depending on your [...]]]></description>
			<content:encoded><![CDATA[<p>These things just seem to find me, this time it was a very large database dump for a media site which was plagued with misnested HTML tags. Seriously. Just shy of 250,000 misnested pairs. </p>
<p>Here&#8217;s the pattern I came up with to fix it:</p>
<p>Find:</p>
<pre><code>&lt;(([^ &gt;]+)(?:[^&gt;]*))&gt;(.*)&lt;(([^ &gt;]+)(?:[^&gt;]*))&gt;(.*)&lt;/\2&gt;(.*)&lt;/\5&gt;</code></pre>
<p>Replace with:<br />
<code>&lt;$1&gt;$3&lt;$4&gt;$6&lt;/$5&gt;$7&lt;/$2&gt;</code><br />
or, depending on your regex engine, your replace string might look like this:<br />
<code>&lt;\1&gt;\3&lt;\4&gt;\6&lt;/\5&gt;\7&lt;/\2&gt;</code></p>
<p>That handles all of the following cases:</p>
<pre><code>&lt;b&gt;&lt;i&gt;text&lt;/b&gt;&lt;/i&gt;
&lt;b&gt;text&lt;i&gt;text&lt;/b&gt;text&lt;/i&gt;
&lt;b&gt;&lt;a href="#" target="_new"&gt;link&lt;/b&gt;text&lt;/a&gt;
&lt;a href="#"&gt;&lt;h2&gt;text&lt;/a&gt;&lt;/h2&gt;</code></pre>
<p>Running the final substitution was ridiculously fast, <a href="http://xkcd.com/208/">Regular Expressions are magic</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://joemaller.com/1567/fixing-a-quarter-million-misnested-html-tags/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Database Caching 3/9 queries in 0.003 seconds using disk: basic
Object Caching 214/223 objects using disk: basic

Served from: joemaller.com @ 2012-05-24 02:34:51 -->
