Redis源码分析(六)---ziplist压缩列表(一)

2014-11-24 11:56:31 · 作者: · 浏览: 5

ziplist和之前我解析过的adlist列表名字看上去的很像,但是作用却完全不同。之前的adlist主要针对的是普通的数据链表操作。而今天的ziplist指的是压缩链表,为什么叫压缩链表呢,因为链表中我们一般常用pre,next来指明当前的结点的前一个指针或当前的结点的下一个指针,这其实是在一定程度上占据了比较多的内存空间,ziplist采用了长度的表示方法,整个ziplist其实是超级长的字符串,通过里面各个结点的长度,上一个结点的长度等信息,通过快速定位实现相关操作,而且编写者,在长度上也做了动态分配字节的方法,表示长度,避免了一定的内存耗费,比如一个结点的字符串长度每个都很短,而你使用好几个字节表示字符串的长度,显然造成大量浪费,所以在长度表示方面,ziplist 就做到了压缩,也体现了压缩的性能。ziplist 用在什么地方呢,ziplist 就是用在我们平常最常用的一个命令rpush,lpush等这些往链表添加数据的方法,这些数据就是存在ziplist 中的。之后我们会看到相应的实现方法。

在学习ziplist的开始,一定要理解他的结构,关于这一点,必须花一定时间想想,要不然不太容易明白人家的设计。下面是我的理解,帮助大家理解:

/* The ziplist is a specially encoded dually linked list that is designed
 * to be very memory efficient. It stores both strings and integer values,
 * where integers are encoded as actual integers instead of a series of
 * characters. It allows push and pop operations on either side of the list
 * in O(1) time. However, because every operation requires a reallocation of
 * the memory used by the ziplist, the actual complexity is related to the
 * amount of memory used by the ziplist.
 *
 * ziplist是一个编码后的列表,特殊的设计使得内存操作非常有效率,此列表可以同时存放
 * 字符串和整数类型,列表可以在头尾各边支持推加和弹出操作在O(1)常量时间,但是,因为每次
 * 操作设计到内存的重新分配释放,所以加大了操作的复杂性
 * ----------------------------------------------------------------------------
 *
 * ziplist的结构组成:
 * ZIPLIST OVERALL LAYOUT:
 * The general layout of the ziplist is as follows:
 * 
 *
 *  is an unsigned integer to hold the number of bytes that the
 * ziplist occupies. This value needs to be stored to be able to resize the
 * entire structure without the need to traverse it first.
 * 代表着ziplist占有的字节数,这方便当重新调整大小的时候不需要重新从头遍历
 * 
 * 
is the offset to the last entry in the list. This allows a pop * operation on the far side of the list without the need for full traversal. * 记录了最后一个entry的位置在列表中,可以方便快速在列表末尾弹出操作 * * is the number of entries.When this value is larger than 2**16-2, * we need to traverse the entire list to know how many items it holds. * 记录的是ziplist里面entry数据结点的总数 * * is a single byte special value, equal to 255, which indicates the * end of the list. * 代表的是结束标识别,用单字节表示,值是255,就是11111111 * * ZIPLIST ENTRIES: * Every entry in the ziplist is prefixed by a header that contains two pieces * of information. First, the length of the previous entry is stored to be * able to traverse the list from back to front. Second, the encoding with an * optional string length of the entry itself is stored. * 每个entry数据结点主要包含2部分信息,第一个,上一个结点的长度,主要就可以可以从任意结点从后往前遍历整个列表 * 第二个,编码字符串的方式的类型保存 * * The length of the previous entry is encoded in the following way: * If this length is smaller than 254 bytes, it will only consume a single * byte that takes the length as value. When the length is greater than or * equal to 254, it will consume 5 bytes. The first byte is set to 254 to * indicate a larger value is following. The remaining 4 bytes take the * length of the previous entry as value. * 之前的数据结点的字符串长度的长度少于254个字节,他将消耗单个字节,一个字节8位,最大可表示长度为2的8次方 * 当字符串的长度大于254个字节,则用5个字节表示,第一个字节被设置成254,其余的4个字节占据的长度为之前的数据结点的长度 * * The other header field of the entry itself depends on the contents of the * entry. When the entry is a string, the first 2 bits of this header will hold * the type of encoding used to store the length of the string, followed by the * actual length of the string. When the entry is an integer the first 2 bits * are both set to 1. The following 2 bits are used to specify what kind of * integer will be stored after this hea