While working on web development projects I often need to copy the contents of one database to another. Consequently I’ve seen the problem described on Alex Kings’s blog a few times, and been able to deal with it using the technique he suggests.
Today however I saw the same symptoms – apostrophes and some other characters translated to weird glyphs in a database that had been saved and restored apparently using UTF-8 coding throughout. How can that happen, I wondered?
What I didn’t realise was that in order to ensure the entire restore process uses UTF-8, you also need some additional magic instructions to ensure the actual dump file is read as being UTF encoded. The software I use, SQLyog, adds the required lines to the start of every dump file by default but the file I had to work with had been created using phpMyAdmin – an excellent piece of software but not free of imperfections.
Once I’d recognised the problem, it was merely a matter of pasting the following incantations into the top of the dump file and restoring from it again:
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */; /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */; /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */; /*!40101 SET NAMES utf8 */;
I’m sure not all these lines are strictly needed but the restore worked so I’m happy. Hopefully this little piece of knowledge will help save someone else from the desperate measures I’ve seen advocated elsewhere, such as using search and replace on each of the offending characters.
It’s good to understand how these character set mismatches can occur, as it’s not much fun restoring from a dump and finding that your database (or that of an important client) is now studded with peculiar-looking punctuation. If your database is UTF encoded, if possible make sure the dumps you take already have the magic recipe at the front, especially if someone else may be responsible for restoring data from them.