Lua解析脚本过程中的关键数据结构介绍 - linux编程基础

在这一篇文章中我先来介绍一下lua解析一个脚本文件时要用到的一些关键的数据结构，为将来的一系列代码分析打下一个良好的基础。在整个过程中，比较重要的几个源码文件分别是：llex.h，lparse.h、lobject.h和lopcode.h。

在llex.h中

1 typedef struct Token {
2? int token;
3? SemInfo seminfo;
4 } Token;

Token代表了一个词法单元，其中token表示词法类型如TK_NAME、TK_NUMBER等如果不是这些类型则存放则词素的字符表示，例如分析的代码会这么判断词素单元：

?1 switch (ls->t.token) {
?2? ? case '(': {
?3? ? ? //...
?4? ? }
?5? ? case TK_NAME: {
?6? ? ? //...
?7? ? }
?8? ? default: {
?9? ? ? //...
10? ? }

在Token中SemInfo存放了一些语义相关的一些内容信息

1 typedef union {
2? lua_Number r;
3? TString *ts;
4 } SemInfo;? /* semantics information */

其中当token是数字是内容存放在r中，其他情况存放在ts指向的TString中。

下面是最重要的一个数据结构之一

?1 typedef struct LexState {
?2? int current;? /* current character (charint) */
?3? int linenumber;? /* input line counter */
?4? int lastline;? /* line of last token `consumed' */
?5? Token t;? /* current token */
?6? Token lookahead;? /* look ahead token */
?7? struct FuncState *fs;? /* `FuncState' is private to the parser */
?8? struct lua_State *L;
?9? ZIO *z;? /* input stream */
10? Mbuffer *buff;? /* buffer for tokens */
11? TString *source;? /* current source name */
12? char decpoint;? /* locale decimal point */
13 } LexState;

LexState不仅用于保存当前的词法分析状态信息，而且也保存了整个编译系统的全局状态。current指向了当前字符，t存放了当前的toekn，lookahead存放了向前看的token，由此我认为lua应该是ll(1)的~哈哈（不知道对不对）。fs指向了parser当前解析的函数的一些相关的信息，L指向了当前lua_State结构，z指向输入流，buff指向了token buffer，其他的看注释吧。

下面看看lparse.h文件中的重要结构：

1 typedef struct expdesc {
2? expkind k;
3? union {
4? ? struct { int info, aux; } s;
5? ? lua_Number nval;
6? } u;
7? int t;? /* patch list of `exit when true' */
8? int f;? /* patch list of `exit when false' */
9 } expdesc;

expdesc是存放了表达式的相关描述信息，k是表达式的种类，u在不同的表达式中有不同的含义。

1 typedef struct upvaldesc {
2? lu_byte k;
3? lu_byte info;
4 } upvaldesc;

upvaldesc是存放了upval的相关描述信息。

最后是本文件中最重要的结构：

?1 typedef struct FuncState {
?2? Proto *f;? /* current function header */
?3? Table *h;? /* table to find (and reuse) elements in `k' */
?4? struct FuncState *prev;? /* enclosing function */
?5? struct LexState *ls;? /* lexical state */
?6? struct lua_State *L;? /* copy of the Lua state */
?7? struct BlockCnt *bl;? /* chain of current blocks */
?8? int pc;? /* next position to code (equivalent to `ncode') */
?9? int lasttarget;? /* `pc' of last `jump target' */
10? int jpc;? /* list of pending jumps to `pc' */
11? int freereg;? /* first free register */
12? int nk;? /* number of elements in `k' */
13? int np;? /* number of elements in `p' */
14? short nlocvars;? /* number of elements in `locvars' */
15? lu_byte nactvar;? /* number of active local variables */
16? upvaldesc upvalues[LUAI_MAXUPVALUES];? /* upvalues */
17? unsigned short actvar[LUAI_MAXVARS];? /* declared-variable stack */
18 } FuncState;

在编译过程中，使用FuncState结构体来保存一个函数编译的状态数据。其中，f指向了本函数的协议描述结构体，prev指向了其父函数的FuncState描述，因为在lua中可以在一个函数中定义另一个函数，因此当parse到一个函数的内部函数的定义时会new一个FuncState来描述内部函数，同时开始parse这个内部函数，将这个FuncState的prev指向其外部函数的FuncState，prev变量用来引用外围函数的FuncState，使当前所有没有分析完成的FuncState形成一个栈结构。bl指向当前parse的block，在一个函数中会有很多block代码，lua会将这些同属于同一个函数的block用链表串联起来。jpc是一个OP_JMP指令的链表，因为lua是一遍过的parse，在开始的时候有一些跳转指令不能决定其跳转位置，因此jpc将这些pending jmp指令串联起来，在以后能确定的时候回填，freereg为第一个空闲寄存器的下标，upvalues数组保存了当前函数的所有upvalue，nactvar是当前作用域的局部变量数。

在lparse.c中定义了BlockCnt

?1 /*
?2 ** nodes for block list (list of active blocks)
?3 */
?4 typedef struct BlockCnt {
?5? struct BlockCnt *previous;? /* chain */
?6? int breaklist;? /* list of jumps out of this loop */
?7? lu_byte nactvar;? /* # active locals outside the breakab

Lua解析脚本过程中的关键数据结构介绍(一)