Converting latin1 to utf8
If you want to find the quick tutorial, scroll down to the bottom of the article. Here’s some background of the issue I faced.
Last night before going to bed around 10:30 pm, I made a tiny mistake that kept me up until three something in the morning. I logged into iLoveNgocLan.com’s WordPress Admin to post an article. Instead of doing what I intended to do, I changed the encoding for pages and feeds from iso-8859-1
to utf-8
. I also set the template header to <meta charset="utf-8">
. Vietnamese text still rendered fine until I made a new post. Every word contained question marks. When I edited an existing article, it also turned into question marks.
I freaked out a bit and went back to change back the encoding for pages and feeds. I freaked out a bit more when the option had disappeared in the Settings Reading Screen. It turned out that WordPress has removed that option in version 3.5.0. I suspect that the option was still there because it was set to iso-8859-1
instead of the default utf-8
. I edited wp-config.php
to change the encoding back to define('DB_CHARSET', 'latin1');
. To my dismay, all the Vietnamese texts displayed black diamonds with question marks. It appeared that going back was not possible; therefore, I googled to find out how to convert latin1
to utf8
, something I should have done a long time ago.
In my search, I came across Varun Shrivastava’s “How to Fix Weird Characters Seen on WordPress Blog?”, which seems to be straightforward. I follow his instructions using phpMyAdmin. I exported the existing database into UTF-8 encoding. I created a new database with UTF-8 Collation. Then I imported the data back into the new database. Unfortunately, that didn’t solve the issue. The question marks still showed up on new posts, edited articles, and new comments.
Then I followed WordPress’s documentation on “Converting Database Character Sets” using the new database I just created. I ran the following SQL command in phpMyAdmin to change the default charset of the database:
ALTER DATABASE MyDatabaseName CHARACTER SET utf8;
It worked, so I ran the following SQL command to change the default charset of individual tables:
ALTER TABLE wp_posts CHARACTER SET utf8;
Then I ran the following SQL command to run individual columns:
alter table wp_posts change post_content post_content LONGTEXT CHARACTER SET utf8;
Unfortunately, I have to run TEXT
, LONGTEXT
, TINYTEXT
, VARCHAR
, ENUM
for each individual column for each individual table. It would be a lot of work to do manually and when I got to TINYTEXT
, it wiped out all the body text from the article. I gave up and went to bed around 3 something in the morning.
I woke up again around 7 am and tried to figured out what I else I could do. I opened up the SQL file I exported a few hours earlier and performed a find-and-replace to convert latin1
into utf8
. 19 instances were found and replaced. I created a new database in phpMyAdmin and imported the new search-and-replaced file. To my surprise, it worked as expected. Take a look at iLoveNgocLan.com. I created new posts, edited old posts, added new comments, and the text came up normal, no more question marks and no more black diamonds.
Essentially, to convert latin1
to utf8
, this is all you need to do in phpMyAdmin:
- Export your existing database to your local machine
- Find
latin1
and replace withutf8
on your local machine - Create a new database in phpMyAdmin with UTF-8 Collation
- Import the SQL file from your local machine to the new database in phpMyAdmin
- Run this SQL command in phpMyAdmin:
ALTER DATABASE MyDatabaseName CHARACTER SET utf8;
- Edit your
wp-config.php
to point to the new database and editdefine('DB_CHARSET', 'utf8');
- Update the meta tag in your theme’s header:
<meta charset="utf-8">
That’s it. I don’t know why this method isn’t available already. I wonder if there’s any drawback of using this method, but it seems to work for me. I am not a MySQL expert. In fact, I don’t have much confidence in messing around with it. I lost several hours of sleep, but the learning experience is worth it.