83a792d7ed
[SVN r67619]
105 lines
3.9 KiB
Plaintext
105 lines
3.9 KiB
Plaintext
[/==============================================================================
|
|
Copyright (C) 2001-2011 Joel de Guzman
|
|
Copyright (C) 2001-2011 Hartmut Kaiser
|
|
|
|
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
===============================================================================/]
|
|
[section Syntax Diagram]
|
|
|
|
In the next section, we will deal with Parsing Expression Grammars
|
|
(PEG) [footnote Bryan Ford: Parsing Expression Grammars: A Recognition-Based
|
|
Syntactic Foundation, [@http://pdos.csail.mit.edu/~baford/packrat/popl04/]],
|
|
a variant of Extended Backus-Naur Form (EBNF) [footnote Richard E. Pattis: EBNF:
|
|
A Notation to Describe Syntax, [@http://www.cs.cmu.edu/~pattis/misc/ebnf.pdf]]
|
|
with a different interpretation. It is easier to understand PEG using Syntax
|
|
Diagrams. Syntax diagrams represent a grammar graphically. It was used extensively
|
|
by Niklaus Wirth [footnote Niklaus Wirth: The Programming Language
|
|
Pascal. (July 1973)] in the "Pascal User Manual". Syntax Diagrams are
|
|
easily understandable by programmers due to their similarity to flow
|
|
charts. The isomorphism of the diagrams and functions make them ideal for
|
|
representing __rd__ parsers which are essentially mutually recursive
|
|
functions.
|
|
|
|
Historically, Parsing Expression Grammars have been used for describing grammars
|
|
for parsers only (hence the name). In __spirit__ we use a very similar notation
|
|
for output generation as well. Almost all the concepts described here are
|
|
equally applicable both to __qi__ parsers and to __karma__ generators.
|
|
|
|
[heading Elements]
|
|
|
|
All diagrams have one entry and one exit point. Arrows connect all possible
|
|
paths through the grammar from the entry point to the exit point.
|
|
|
|
[:__sd_start_stop__]
|
|
|
|
Terminals are represented by round boxes. Terminals are atomic and
|
|
usually represent plain characters, strings or tokens.
|
|
|
|
[:__sd_terminals__]
|
|
|
|
Non-terminals are represented by boxes. Diagrams are modularized using
|
|
named non-terminals. A complex diagram can be broken down into a set of
|
|
non-terminals. Non-terminals also allow recursion (i.e. a non-terminal
|
|
can call itself).
|
|
|
|
[:__sd_non_terminals__]
|
|
|
|
[heading Constructs]
|
|
|
|
The most basic composition is the Sequence. B follows A:
|
|
|
|
[:__sd_sequence__]
|
|
|
|
The ordered choice henceforth we will call /alternatives/. In PEG,
|
|
ordered choice and alternatives are not quite the same. PEG allows
|
|
ambiguity of choice where one or more branches can succeed. In PEG, in
|
|
case of ambiguity, the first one always wins.
|
|
|
|
[:__sd_choice__]
|
|
|
|
The optional (zero-or-one):
|
|
|
|
[:__sd_optional__]
|
|
|
|
Now, the loops. We have the zero-or-more and one-or-more:
|
|
|
|
[:__sd_kleene__]
|
|
[:__sd_plus__]
|
|
|
|
Take note that, as in PEG, these loops behave greedily. If there is
|
|
another 'A' just before the end-point, it will always fail because the
|
|
preceding loop has already exhausted all 'A's and there is nothing more
|
|
left. This is a crucial difference between PEG and general Context Free
|
|
Grammars (CFGs). This behavior is quite obvious with syntax diagrams as
|
|
they resemble flow-charts.
|
|
|
|
[heading Predicates]
|
|
|
|
Now, the following are Syntax Diagram versions of PEG predicates. These
|
|
are not traditionally found in Syntax Diagrams. These are special
|
|
extensions we invented to closely follow PEGs.
|
|
|
|
First, we introduce a new element, the Predicate:
|
|
|
|
[:__sd_predicate__]
|
|
|
|
This is similar to the conditionals in flow charts where the 'No' branch
|
|
is absent and always signals a failed parse.
|
|
|
|
We have two versions of the predicate, the /And-Predicate/ and the
|
|
/Not-Predicate/:
|
|
|
|
[:__sd_and_predicate__]
|
|
[:__sd_not_predicate__]
|
|
|
|
The /And-Predicate/ tries the predicate, P, and succeeds if P succeeds,
|
|
or otherwise fail. The opposite is true with the /Not-Predicate/. It
|
|
tries the predicate, P, and fails if P succeeds, or otherwise succeeds.
|
|
Both versions do a look-ahead but do not consume any input regardless if
|
|
P succeeds or not.
|
|
|
|
[endsect]
|
|
|
|
|