MYSQL INNODB主键使用varchar和int的区别 - 数据库编程

TOP

MYSQL INNODB主键使用varchar和int的区别(一)

2017-07-14 10:22:52 【大中小】浏览:386次

今天同事讨论关于主键使用varchar和int的区别。

我现在总结的3个问题：
1、tablespace中空间浪费
当然我们知道使用varchar可能会导致辅助索引比较大，因为用到varchar可能存储的字符较多，同时
在行头也存在一个可变字段字符区域(1-2)字节
而辅助索引叶子结点毕竟都存储了主键值，这样至少会多varchar数据字节数量+1(或者2) 字节- 4(int)字节空间。
如果辅助索引比较多空间浪费是可想而知的。
2、辅助索引B+树扫描性能
由于辅助索引B+树的空间要求更大，虽然在B+树层次一般都是3层-4层，索引单值定位I/O消耗并不明显，如果涉及到
范围查询(比如PAGE_CUR_G),需要访问的块就更多，同时比如例如辅助索引的using index，需要访问的块自然
更多
3、比较更加复杂
innodb 在进行元组比较的时候，不管是DML，select都会涉及到元组的比较，同时回表的时候也涉及
到比较操作。而varchar类型的比较比int类型更为复杂一些。
那么我们就来分析第三个问题，第一个问题和第二个问题是显而易见的。
我这里数据库字符集为latin1\latin1_swedish_ci

其实在innodb底层进行比较的时候都调用cmp_data这个函数
在innodb中有自己的定义的数据类型如下：

/*-------------------------------------------*/

/* The 'MAIN TYPE' of a column */
#define DATA_MISSING 0 /* missing column */
#define DATA_VARCHAR 1 /* character varying of the
latin1_swedish_ci charset-collation; note
that the MySQL format for this, DATA_BINARY,
DATA_VARMYSQL, is also affected by whether the
'precise type' contains
DATA_MYSQL_TRUE_VARCHAR */
#define DATA_CHAR 2 /* fixed length character of the
latin1_swedish_ci charset-collation */
#define DATA_FIXBINARY 3 /* binary string of fixed length */
#define DATA_BINARY 4 /* binary string */
#define DATA_BLOB 5 /* binary large object, or a TEXT type;
if prtype & DATA_BINARY_TYPE == 0, then this is
actually a TEXT column (or a BLOB created
with < 4.0.14; since column prefix indexes
came only in 4.0.14, the missing flag in BLOBs
created before that does not cause any harm) */
#define DATA_INT 6 /* integer: can be any size 1 - 8 bytes */
#define DATA_SYS_CHILD 7 /* address of the child page in node pointer */
#define DATA_SYS 8 /* system column */

我们熟悉的int类型属于DATA_INT而varchar属于DATA_VARCHAR，rowid属于DATA_SYS
在函数cmp_data根据各种类型的不同进行了不同比较的方式，这里就将int和varchar
判断的方式进行说明：
1、innodb int类型比较
实际上是在cmp_data中进行了大概的方式如下

if (len) {

#if defined __i386__ || defined __x86_64__ || defined _M_IX86 || defined _M_X64
/* Compare the first bytes with a loop to avoid the call
overhead of memcmp(). On x86 and x86-64, the GCC built-in
(repz cmpsb) seems to be very slow, so we will be calling the
libc version. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
tracks the slowness of the GCC built-in memcmp().

We compare up to the first 4..7 bytes with the loop.
The (len & 3) is used for "normalizing" or
"quantizing" the len parameter for the memcmp() call,
in case the whole prefix is equal. On x86 and x86-64,
the GNU libc memcmp() of equal strings is faster with
len=4 than with len=3.

On other architectures than the IA32 or AMD64, there could
be a built-in memcmp() that is faster than the loop.
We only use the loop where we know that it can improve
the performance. */
for (ulint i = 4 + (len & 3); i > 0; i--) {
cmp = int(*data1++) - int(*data2++);
if (cmp) {
return(cmp);
}

if (!--len) {
break;
}
}
my_strnncollsp_simple

if (len) {
#endif /* IA32 or AMD64 */
cmp = memcmp(data1, data2, len);

if (cmp) {
return(cmp);
}

data1 += len;
data2 += len;
#if defined __i386__ || defined __x86_64__ || defined _M_IX86 || defined _M_X64
}
#endif /* IA32 or AMD64 */
}

cmp = (int) (len1 - len2);

if (!cmp || pad == ULINT_UNDE

首页上一页 1 2 3 4 5 下一页尾页 1/5/5
【大中小】【打印】【繁体】【投稿】【收藏】【推荐】【举报】【评论】【关闭】【返回顶部】

上一篇：使用 RMAN 备份Oracle数据库	下一篇：MySQL RC模式insert update 可能..