Converting latin1 to utf8

If you want to find the quick tutorial, scroll down to the bottom of the article. Here’s some background of the issue I faced.

Last night before going to bed around 10:30 pm, I made a tiny mistake that kept me up until three something in the morning. I logged into iLoveNgocLan.com’s WordPress Admin to post an article. Instead of doing what I intended to do, I changed the encoding for pages and feeds from iso-8859-1 to utf-8. I also set the template header to <meta charset="utf-8">. Vietnamese text still rendered fine until I made a new post. Every word contained question marks. When I edited an existing article, it also turned into question marks.

I freaked out a bit and went back to change back the encoding for pages and feeds. I freaked out a bit more when the option had disappeared in the Settings Reading Screen. It turned out that WordPress has removed that option in version 3.5.0. I suspect that the option was still there because it was set to iso-8859-1 instead of the default utf-8. I edited wp-config.php to change the encoding back to define('DB_CHARSET', 'latin1');. To my dismay, all the Vietnamese texts displayed black diamonds with question marks. It appeared that going back was not possible; therefore, I googled to find out how to convert latin1 to utf8, something I should have done a long time ago.

In my search, I came across Varun Shrivastava’s “How to Fix Weird Characters Seen on WordPress Blog?”, which seems to be straightforward. I follow his instructions using phpMyAdmin. I exported the existing database into UTF-8 encoding. I created a new database with UTF-8 Collation. Then I imported the data back into the new database. Unfortunately, that didn’t solve the issue. The question marks still showed up on new posts, edited articles, and new comments.

Then I followed WordPress’s documentation on “Converting Database Character Sets” using the new database I just created. I ran the following SQL command in phpMyAdmin to change the default charset of the database:

ALTER DATABASE MyDatabaseName CHARACTER SET utf8;

It worked, so I ran the following SQL command to change the default charset of individual tables:

ALTER TABLE wp_posts CHARACTER SET utf8;

Then I ran the following SQL command to run individual columns:

alter table wp_posts change post_content post_content LONGTEXT CHARACTER SET utf8;

Unfortunately, I have to run TEXT, LONGTEXT, TINYTEXT, VARCHAR, ENUM for each individual column for each individual table. It would be a lot of work to do manually and when I got to TINYTEXT, it wiped out all the body text from the article. I gave up and went to bed around 3 something in the morning.

I woke up again around 7 am and tried to figured out what I else I could do. I opened up the SQL file I exported a few hours earlier and performed a find-and-replace to convert latin1 into utf8. 19 instances were found and replaced. I created a new database in phpMyAdmin and imported the new search-and-replaced file. To my surprise, it worked as expected. Take a look at iLoveNgocLan.com. I created new posts, edited old posts, added new comments, and the text came up normal, no more question marks and no more black diamonds.

Essentially, to convert latin1 to utf8, this is all you need to do in phpMyAdmin:

  1. Export your existing database to your local machine
  2. Find latin1 and replace with utf8 on your local machine
  3. Create a new database in phpMyAdmin with UTF-8 Collation
  4. Import the SQL file from your local machine to the new database in phpMyAdmin
  5. Run this SQL command in phpMyAdmin: ALTER DATABASE MyDatabaseName CHARACTER SET utf8;
  6. Edit your wp-config.php to point to the new database and edit define('DB_CHARSET', 'utf8');
  7. Update the meta tag in your theme’s header: <meta charset="utf-8">

That’s it. I don’t know why this method isn’t available already. I wonder if there’s any drawback of using this method, but it seems to work for me. I am not a MySQL expert. In fact, I don’t have much confidence in messing around with it. I lost several hours of sleep, but the learning experience is worth it.