the way in which words are put together to form
phrases, clauses, or sentences.
上下文无关文法
context-free grammars, a more powerful method
of describing languages. Such grammars can describe certain features that have
a recursive structure, which makes them useful ina variety of applications.
Context-free grammars were first used inthe study of human languages. One
way of understanding the relationship of terms such as noun, verb, and preposition
and their respective phrases leads toa natural recursion because noun phrases
may appear inside verb phrases and vice versa. Context-free grammars help us
organize and understand these relationships.
An important application of context-free grammars occurs inthe specification
and compilation of programming languages. A grammar fora programming language often appears asa reference for people trying to learn the language syntax.
Designers of compilers and interpreters for programming languages often start
by obtaining a grammar forthe language. Most compilers and interpreters contain a component called a parser that extracts the meaning ofa program prior to
generating the compiled code or performing the interpreted execution. A numberof methodologies facilitate the construction ofa parser once a context-free
grammar is available. Some tools even automatically generate the parser fromthe grammar.
1
上下文无关语言
Any language that can
be generated by some context-free grammar is called a context-free language
(CFL)
产生式
A grammar consists of a collection of substitution rules, also called productions.
A → 0A1
A → B
B → #
FORMAL DEFINITION OF A CONTEXT-FREE GRAMMAR(形式化定义)
A context-free grammar is a4-tuple (V, Σ, R, S), where
1. V is a finite set called the variables,
2. Σ is a finite set, disjoint from V , called the terminals,
3. R is a finite setof rules, witheach rule being avariableandastringof variables and terminals, and4. S ∈ V is the start variable.
龙书关于上下文无关文法的形式化定义
a context-free grammar (grammar forshort) consists of terminals, nonterminals, a start symbol, and productions.
1. Terminals are the basic symbols from which strings are formed. The term
"token name" is a synonym for"terminal"and frequently we will use theword"token"for terminal when it is clear that we are talking about just
thetoken name. We assume that the terminals are thefirst components
ofthe tokens output bythe lexical analyzer. In (4.4), the terminals are
the keywords ifandelseandthe symbols " c"and") ."2. Nonterminals are syntactic variables that denote sets of strings. In (4.4),
stmt and expr are nonterminals. The sets of strings denoted by nonterminals help define the language generated bythe grammar. Nonterminals
impose a hierarchical structure onthelanguagethatiskeytosyntax
analysis and translation.
3. In a grammar, one nonterminal is distinguished asthe start symbol, andthesetof strings it denotes is the language generated bythe grammar.
Conventionally, the productions forthe start symbol are listed first.
4. The productions ofa grammar specify the manner in which the terminals and nonterminals can be combined to form strings. Each production
consists of:
(a) A nonterminal called the head or left side ofthe production; this
production defines some ofthe strings denoted bythe head.
(b) The symbol --+. Sometimes : : = has been used in place of the arrow.
(c) A body orright side consisting ofzeroor more terminals and nonterminals. The components ofthe body describe one way in which
strings ofthe nonterminal atthe head can be constructed.
乔姆斯基范式 Chomsky normal form
When working with context-free grammars, it is often convenient to have them in simplified form. One of the simplest and most useful forms is called the Chomsky normal form.
文法定理
Any context-free language is generated by a context-free grammar in Chomsky normal form.
语法分析树
//语法树用图形的方式呈现
A parse tree pictorially shows how the start symbol ofa grammar deriv es astringinthe language. If non terminal A has a pro duction A --> X Y Z , then a parse tree may have an interior node labeled A with three children labeled X ,Y , and Z , from left to righ t:
A
X Y Z
F ormally , giv en a con text-free grammar, a p arse tr e e according tothe grammar is a tree withthe follo wing prop erties
文法二义性
设计好文法避免二义性,常用的就是添加规则
we need to design unam biguous grammars for compiling applications,
ortouse am biguous grammars with additional rules to resolve the ambiguities
expr --> expr + term | expr - term | term
term --> term * factor | term / factor | factorfactor --> digit | ( expr )
语法分析
语法分析是决定如何使用文法生成非终结符串的过程
语法树是用来辅助想象用的
Parsing is the process of determining how a string of terminals can be generated by a grammar. In discussing this problem, it is helpful to think of a parse tree being constructed, even though a compiler may not construct one, in practice.
However, a parser must be capable of constructing the tree in principle, or else the translation cannot be guaranteed correct
大多数语法分析归为 “自顶向下”、“自底向上”两种方法
Most parsing methods fall intooneoftwo classes, called the top-down and bottom-up methods. These terms refer tothe order in which nodes inthe parse tree are constructed.
In top-down parsers, construction starts atthe root and proceeds towards the leaves, whilein bottom-up parsers, construction starts atthe leaves and proceeds towards the root.
The popularity of top-down parsers is due tothe fact that efficient parsers can be constructed more easily by hand using top-down methods. Bottom-up parsing,
however, can handle a larger class of grammars and translation schemes, so software tools for generating parsers directly from grammars often use bottom-up methods.
向前看符号
The current terminal(终结符) being scanned in the input is frequently referred to as the lookahead symbol. Initially(刚开始地), the lookahead symbol is the first, i.e., leftmost(最左), terminal of the input string.
产生式试错
一般情况下,产生式不正确的时候就去回溯,为非终结符选择产生式是一个尝试并犯错的过程
In general, the selection of a production for a nonterminal may involve trialand-error; that is, we may have to try a production and backtrack to try another production if the first is found to be unsuitable.
A production is unsuitable if, after using the production, we cannot complete the tree to match the input string.
预测分析法
Predictive parsing relies oninformationaboutthefirstsymbolsthatcanbe
generated bya production body. More precisely, let a be astringof grammar
symbols (terminals and/or nonterminals). We define FIRsT(a) tb be thesetof
terminals that appear asthefirst symbols ofoneor more strings of terminals
generated froma. If a is E or can generate E, then E is also in, FIRsT(a) .