<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Joe Maller &#187; MySQL</title>
	<atom:link href="http://joemaller.com/tag/mysql/feed/" rel="self" type="application/rss+xml" />
	<link>http://joemaller.com</link>
	<description>.com</description>
	<lastBuildDate>Fri, 27 Jan 2012 06:04:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Fixing mixed-encoding MySQL dumpfiles with WordPress</title>
		<link>http://joemaller.com/1328/fixing-mixed-encoding-mysql-dumpfiles-with-wordpress/</link>
		<comments>http://joemaller.com/1328/fixing-mixed-encoding-mysql-dumpfiles-with-wordpress/#comments</comments>
		<pubDate>Tue, 26 May 2009 13:38:26 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[misc.]]></category>
		<category><![CDATA[latin1]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[unicode]]></category>
		<category><![CDATA[utf8]]></category>
		<category><![CDATA[WordPress]]></category>

		<guid isPermaLink="false">http://joemaller.com/?p=1328</guid>
		<description><![CDATA[Early versions of WordPress didn&#8217;t specify database encoding. Databases created with those earlier versions usually defaulted to Latin1 (ISO-8859-1) character encoding. Problem was, WordPress around version 2.2 started setting new databases to use UTF8 encoding. This is a good thing, except existing databases weren&#8217;t migrated. Unfortunately, WordPress from that point forward assumed all databases were [...]]]></description>
			<content:encoded><![CDATA[<p>Early versions of WordPress didn&#8217;t specify database encoding. Databases created with those earlier versions usually defaulted to Latin1 (ISO-8859-1) character encoding. Problem was, WordPress around version 2.2 started setting new databases to use UTF8 encoding. This is a good thing, except existing databases weren&#8217;t migrated. Unfortunately, WordPress from that point forward assumed all databases were UTF8 and inserted UTF8 data into Latin1 tables. </p>
<p>It&#8217;s likely none of this would be a problem unless attempting to export and restore a database. Well, that&#8217;s not entirely true. Since encoding will garble inside the export/import loop, a lot of WordPress sites can not be backed up properly. There are no errors, no warnings, just sites littered with wrongly encoded entities (<a href='http://en.wikipedia.org/wiki/Mojibake'>Mojibake</a>) after restoring or moving to a new server. This also means that any existing database backups are probably useless. </p>
<p>None of the solutions I found worked for me. Arriving at a functional solution took forever. Troubleshooting multi-stage character encoding  problems is a thankless, maddening task.</p>
<h3>Dumping the database and moving to UTF-8</h3>
<p>Dump the current database:</p>
<pre><code>mysqldump --opt --default-character-set=latin1 --skip-extended-insert myDB -r myDB-latin1.sql</code></pre>
<ul>
<li><code>-r</code> tells mysqldump to write directly to the output file. I&#8217;ve read that using Unix redirection carets could sometimes result in encoding corruption. Native output supposedly gets around that issue, although the <a href="http://bugs.mysql.com/bug.php?id=28969">notes on this MySQL bug</a> say otherwise.</li>
<li><code>--skip-extended-insert</code> puts each row of data on it&#8217;s own line. This makes it easier to diff the resulting files or open them in a text editor like TextWrangler without exceeding horizontal character limits.
<li><code>--default-character-set=latin1</code> tells mysqldump not to do any conversion of the table&#8217;s contents since it believes they&#8217;re already Latin1.  Matching the existing character set prevents MySQL from trying to convert any data. Since WordPress was already stuffing UTF-8 data into Latin1 tables, we need to dump this without any conversion.
</li>
</ul>
<p>Carefully review the dumpfile for encoding errors. I&#8217;m sick thinking about how many of my early attempts might have worked, except the initial file was corrupt.</p>
<h3>No really, you&#8217;re UTF-8</h3>
<p>The dumpfile will have no encoding information, so I used <a href="http://www.gnu.org/software/libiconv/documentation/libiconv/iconv.1.html">iconv</a> to convert it to UTF-8. Note that there may be a few characters which cannot be translated and will throw errors. Save yourself some grief and find an iconv binary which offers the -c flag to ignore those errors:</p>
<pre><code>-c    When this option is given, characters that cannot  be  converted are  silently  discarded, instead of leading to a conversion error.</code></pre>
<p>Most of the webservers I checked had the same 8 year old version of iconv which doesn&#8217;t have the <code>-c</code> flag, so I scp&#8217;d the file to my local machine. MacOS X has a recent enough version of iconv to use for the conversion. </p>
<pre><code>iconv -f UTF-8 -t UTF-8 -c myDB-latin1.sql &gt; myDB-utf8.sql</code></pre>
<p>It&#8217;s worth trying a conversion without the -c flag, to see if it will work. If it doesn&#8217;t, the -c flag will drop the problem characters. I didn&#8217;t find an acceptable automated workaround for this so I just diffed the files and hand-inserted the missing characters. I only had four to replace and none of them were textual.</p>
<p>After many failures and frustrations, I found myself checking file differences all the time. While seeing them is easy in TextWrangler, I checked plenty on the server too:</p>
<pre><code> diff myDB-latin1.sql myDB-utf8.sql</code></pre>
<p>A few &#xFFFD; characters slipped through here, though these might have been already converted errors from previous database migrations that were never noticed. I used TextWrangler to replace them with a small comment token <code>&lt;!-- ERROR --&gt;</code> which I will find and replace in context later on. I didn&#8217;t have any luck trying to make that replacement with sed. </p>
<h3>Fixing the dumpfile </h3>
<p>Before running a global replace on all your data, grep for &#8216;latin1&#8242; first, to be sure the string doesn&#8217;t appear anywhere in your dump file other than structural commands. This is an example of a safe dataset: </p>
<pre><code><strong>$</strong> grep latin1 dumpfile
/*!40101 SET NAMES latin1 */;
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
) ENGINE=MyISAM DEFAULT CHARSET=latin1;</code></pre>
<p>If your data has a &#8216;latin1&#8242; somewhere in it, either edit the dumpfile by hand or <a href="http://www.khelll.com/blog/mysql/changing-database-encoding-from-latin-to-utf8/">read this</a> and dump your schema separate from your data. My data was clean so I just used Sed to replace the latin1&#8242;s with utf8&#8242;s:</p>
<pre><code>sed -e's/latin1/utf8/g' myDB-utf8.sql &gt; myDB-utf8-fixed.sql</code></pre>
<h3>Prepping MySQL</h3>
<p>There are several places where MySQL might re-interpret text encoding, these all need to be dealt with. </p>
<p>The most important step is to <strong>create a completely new database</strong> for your cleaned data. Despite all the following settings, older databases may hang onto character encoding settings and cause problems in the future. Odds are if you&#8217;re dealing with this problem, your database was created prior to MySQL 4.1 adding Unicode support. </p>
<p>The database may need to be configured to use the correct character set and table collation methods.<br />
Database settings don&#8217;t propagate to existing tables, but that won&#8217;t be an issue since we&#8217;re using a newly created database.</p>
<p>The client and database encoding settings can be checked in phpMyAdmin or by calling &#8216;status&#8217; from the MySQL command line. The relevant lines are:</p>
<pre><code><strong>$</strong> mysql myDB -e'status'
Server characterset:	latin1
Db     characterset:	latin1
Client characterset:	latin1
Conn.  characterset:	latin1</code></pre>
<p>Invoking the MySQL command line client with a specified character set yields this:</p>
<pre><code><strong>$</strong> mysql myDB -e'status' --default-character-setutf8
Server characterset:	latin1
Db     characterset:	latin1
Client characterset:	utf8
Conn.  characterset:	utf8</code></pre>
<p>Change the database character set and collation settings with these commands:</p>
<pre><code>ALTER DATABASE test CHARACTER SET utf8;
ALTER DATABASE test COLLATE utf8_unicode_ci;</code></pre>
<p>Now MySQL status should show this:</p>
<pre><code><strong>$ </strong>mysql myDB -e'status' --default-character-setutf8
Server characterset:	latin1
Db     characterset:	utf8
Client characterset:	utf8
Conn.  characterset:	utf8</code></pre>
<p>Unless you run the server, there&#8217;s likely nothing you can do about the server&#8217;s characterset encoding.</p>
<h3>Updating WordPress</h3>
<p>If you&#8217;re upgrading a WordPress installation that&#8217;s been around a while, be sure to update your wp-config.php file from <a href="http://svn.automattic.com/wordpress/tags/2.7.1/wp-config-sample.php" title="">the current config-sample</a>. The most important two settings in there are these:</p>
<pre><code>/** Database Charset to use in creating database tables. */
define('DB_CHARSET', 'utf8');

/** The Database Collate type. Don't change this if in doubt. */
define('DB_COLLATE', '');</code></pre>
<h3>Test and go</h3>
<p>Besides local testing I also checked the dumpfile on a second database on the live server. If everything worked correctly, you should be able to roundtrip the data through MySQL and produce identical dumpfiles. </p>
<p>Remember to specify the default-character-set when you finally load the dumpfile back into the database:</p>
<pre><code>mysql --default-character-set=utf8 DB &lt; </code></pre>
<p>After this ordeal I doubt I&#8217;ll ever invoke a MySQL command without explicitly setting the default character set again, but just in case, I&#8217;ve added this ~/.my.cnf file on all the systems I work with:</p>
<pre><code>[client]
default-character-set=utf8</code></pre>
<p>Double-check that&#8217;s working by calling <code>mysql --print-defaults</code> and <code>mysqldump --print-defaults</code> to make sure the flags transferred. </p>
<p>This process was tested with the following MySQL distributions:</p>
<ul>
<li>mysql  Ver 14.7 Distrib 4.1.11, for pc-linux-gnu (i686)</li>
<li>mysql  Ver 14.14 Distrib 5.1.34, for apple-darwin9.5.0 (i386) using readline 5.1</li>
<li>mysql  Ver 14.12 Distrib 5.0.77, for unknown-linux-gnu (x86_64) using readline 5.1</li>
</ul>
<p>Note: If you will be going between different MySQL server versions, you may need to use the <code><a href="http://dev.mysql.com/doc/refman/5.0/en/mysqldump.html#option_mysqldump_compatible">--compatibility flag</a></code> with an appropriate value. In my case, this site&#8217;s production server (not under my control) is running 4.1.11 and my dev machine is running 5.1.34.</p>
<h3>Other people who&#8217;ve dealt with this too</h3>
<ul>
<li><a href='http://hexmen.com/blog/2008/07/mysql-latin1-utf8-wordpress-upgrade/'> MySQL latin1 → utf8  (WordPress upgrade)</a> &#8212;  Ash Searle</li>
<li><a href="http://alexking.org/blog/2008/03/06/mysql-latin1-utf8-conversion">Fixing a MySQL Character Encoding Mismatch</a> &#8212; Alex King</li>
<li><a href='http://www.khelll.com/blog/mysql/changing-database-encoding-from-latin-to-utf8/'>   Changing database encoding from latin1 to UTF8</a> &#8212; Khaled alHabache</li>
<li><a href='http://www.orthogonalthought.com/blog/index.php/2007/05/mysql-database-migration-and-special-characters/'>Mysql database migration and special characters</a> &#8212; Orthogonal Thought</li>
<li><a href='http://www.mydigitallife.info/2007/06/23/how-to-convert-character-set-and-collation-of-wordpress-database/'>How to Convert Character Set and Collation of WordPress Database</a> &#8212; My Digital Life</li>
</ul>
<p>More on Unicode: <a href='http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html'>The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) &#8211; Joel on Software</a></p>
]]></content:encoded>
			<wfw:commentRss>http://joemaller.com/1328/fixing-mixed-encoding-mysql-dumpfiles-with-wordpress/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>WordPress MySQL Cheatsheets</title>
		<link>http://joemaller.com/830/wordpress-mysql-cheatsheets/</link>
		<comments>http://joemaller.com/830/wordpress-mysql-cheatsheets/#comments</comments>
		<pubDate>Thu, 04 Oct 2007 19:54:33 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[Web Development]]></category>
		<category><![CDATA[cheatsheet]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[WordPress]]></category>

		<guid isPermaLink="false">http://joemaller.com/2007/10/04/wordpress-database-cheatsheets/</guid>
		<description><![CDATA[Here are two cheatsheets showing the table-structure of WordPress versions 2.3 and 2.2.2. Click the thumbnails to download the PDF. WordPress 2.3 WordPress 2.2.2 I created these to help transition some custom category queries from WordPress 2.2.2 over to the terms tables in WordPress 2.3. Table keys are in bold.The pages were generated by a [...]]]></description>
			<content:encoded><![CDATA[<p>Here are two cheatsheets showing the table-structure of WordPress versions 2.3 and 2.2.2. Click the thumbnails to download the PDF.</p>
<p style="border-right: 1px solid #eeeeee; float: left; padding-right: 0.5em; margin-right: 0.5em"><strong>WordPress 2.3<br />
</strong><a href="http://joemaller.com/wordpress/wp-content/uploads/2007/10/wordpress_23_mysql_tables.pdf" title="WordPress 2.3 MySQL Tables PDF"><img src="http://joemaller.com/wordpress/wp-content/uploads/2007/10/wordpress_23_mysql_tables.png" alt="WordPress 2.3 MySQL Tables" /></a></p>
<p><strong>WordPress 2.2.2</strong><a href="http://joemaller.com/wordpress/wp-content/uploads/2007/10/wordpress_222_mysql_tables.pdf" title="WordPress 2.2.2 MySQL Tables PDF"><br />
<img src="http://joemaller.com/wordpress/wp-content/uploads/2007/10/wordpress_222_mysql_tables.png" alt="WordPress 2.2.2 MySQL Tables" /></a></p>
<p>I created these to help transition some custom category queries from WordPress 2.2.2 over to the terms tables in WordPress 2.3. Table keys are in bold.The pages were generated by a small AppleScript Studio app I never quite cleaned up enough to release, it reads MySQL dumpfiles then spits out nice looking tables in <a href="http://www.omnigroup.com/applications/omnigraffle/">OmniGraffle</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://joemaller.com/830/wordpress-mysql-cheatsheets/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Database Caching 1/8 queries in 0.006 seconds using disk: basic
Object Caching 267/273 objects using disk: basic

Served from: joemaller.com @ 2012-02-08 12:20:56 -->

<!-- W3 Total Cache: Page cache debug info:
Engine:             disk: enhanced
Cache key:          tag/mysql/feed/_index.xml_gzip
Caching:            enabled
Status:             not cached
Creation Time:      0.441s
Header info:
Set-Cookie:         bb2_screener_=1328721656+38.107.179.217+38.107.179.217; path=/
X-Pingback:         http://joemaller.com/wordpress/xmlrpc.php
Content-Type:       text/xml; charset=UTF-8
Last-Modified:      Wed, 08 Feb 2012 17:20:56 GMT
Vary:               Accept-Encoding, Cookie
Expires:            Wed, 08 Feb 2012 18:20:56 GMT
Pragma:             public
Cache-Control:      public, must-revalidate, proxy-revalidate
Etag:               5858dbcb3dc722da7d341171494da15d
X-Powered-By:       W3 Total Cache/0.9.2.4
Content-Encoding:   gzip
-->
