3.将操作结果从内部字符集编码转换为character_set_results编码. 更加详细的转换过程如下:
Client program sends SQL statement
|
| Encoding: A, defined as "character_set_client"
v
MySQL server - Convertion from encoding A to encoding B
|
| Encoding: B, defined as "character_set_connection"
v
MySQL server - Execution to store data
MySQL server - Conversion from encoding B to encoding C
|
| Encoding: C, defined by text column encoding
v
MySQL server - Storage
...
MySQL server - Storage
|
| Encoding: C, defined by text column encoding
v
MySQL server - Execution to fetch data
MySQL server - Convertion from encoding C to encoding D
|
| Encoding: D, defined as "character_set_results"
v
Client program receives result set
接下来就实例分析下mysql可能乱码的情况以及我认为的原因,不对之处请指出.
4.MySQL乱码实例分析
4.1 问题实例
我们创建一个测试的数据库db1,数据库编码为latin1,注意当前我的机器的终端编码为zh_CN.UTF-8,数据库的编码设定如下所第1部分所示,然后中db1中创建一个表test,sql语句如下:
CREATE TABLE `test` (
`gbk` varchar(2) CHARACTER SET gbk DEFAULT NULL,
`utf8` varchar(2) CHARACTER SET utf8 DEFAULT NULL,
`latin_utf8` varchar(6) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
注意到我们的表的编码是latin1,而表中三个字段的编码各不相同,分别为gbk编码,utf8编码以及latin1编码.之所以这样创建正是为了验证mysql字符集编码的转换过程.好了,重点来了,现在我们在mysql客户端执行:
mysql> insert into test values("中文", "中文", "中文");
Query OK, 1 row affected, 1 warning (0.00 sec)
安装了mysql的筒子可以测试下,在mysql没有开启strict模式的时候,这个插入语句会报一个警告,内容如下:
mysql> show warnings;
+---------+------+-------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xE4\xB8\xAD\xE6\x96\x87' for column 'latin_utf8' at row 1 |
+---------+------+-------------------------------------------------------------------------------------+
我们可以先select看看test表中的内容:
mysql> select * from test;
+--------+--------+------------+
| gbk | utf8 | latin_utf8 |
+--------+--------+------------+
| 中文 | 中文 | ?? |
+--------+--------+------------+
我们还可以查看下test表中实际存储的内容:
mysql> select hex(gbk), hex(utf8), hex(latin_utf8) from test;
+----------+--------------+-----------------+
| hex(gbk) | hex(utf8) | hex(latin_utf8) |
+----------+--------------+-----------------+
| D6D0CEC4 | E4B8ADE69687 | 3F3F |
+----------+--------------+-----------------+
可以发现直接select查看的时候latin_utf8字段乱码了,而通过hex函数查看发现原来latin_utf8字段存储的内容有问题. 出现这个问题的原因就是编码转换过程出了错,按照之前