UTF-8 checklist

Character encodings have been driving me insane lately….

– stuff pasted from word wont display properly

– funny square characters in the browser

– smartĀ  quotes from word don’t display properly

So here’ s a handy list of things i have been trying. Some things mgiht work for you, some might not. Let me know if you have anything to add!

Apache

AddDefaultCharset UTF-8

MySQL

On connect – SET NAMES ‘utf8’

Convert: ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
or:

<?
$dbname = $argv[1];
if(!$dbname) {
print “no database specified\n”;
exit;
}
$dbh = mysql_connect(‘localhost’, ‘username’, ‘password’, 1);
mysql_select_db($dbname);
$sql = “SHOW TABLES”;
$res = mysql_query($sql, $dbh);
while($row = mysql_fetch_assoc($res)) {
$sql2 = “ALTER TABLE $row[‘Tables_in_{$dbname}’] CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci”;
print $sql2 . “\n”;
}
?>

http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html
http://gentoo-wiki.com/TIP_Convert_latin1_to_UTF-8_in_MySQL

Also, once all of your databases are utf8 you could set these options in my.cnf:

[mysqld]
default-character-set=utf8
default-collation=utf8_unicode_ci

[client]
default-character-set=utf8

PHP/HTML

<meta http-equiv=”content-type” content=”text/html; charset=utf8″ />
use htmlentities like this:
htmlentities($string, ENT_COMPAT, ‘UTF-8’)

Magpie RSS:
$rss = new MagpieRSS($rss_string, ‘UTF-8’, ‘UTF-8’, false);