97 lines
3.6 KiB
Plaintext
97 lines
3.6 KiB
Plaintext
[/==============================================================================
|
|
Copyright (C) 2001-2015 Joel de Guzman
|
|
Copyright (C) 2001-2011 Hartmut Kaiser
|
|
|
|
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
===============================================================================/]
|
|
[section Syntax Diagram]
|
|
|
|
In the next section, we will deal with Parsing Expression Grammars
|
|
(PEG) [footnote Bryan Ford: Parsing Expression Grammars: A Recognition-Based
|
|
Syntactic Foundation, [@http://pdos.csail.mit.edu/~baford/packrat/popl04/]],
|
|
a variant of Extended Backus-Naur Form (EBNF) [footnote Richard E. Pattis: EBNF:
|
|
A Notation to Describe Syntax, [@http://www.cs.cmu.edu/~pattis/misc/ebnf.pdf]]
|
|
with a different interpretation. It is easier to understand PEG using Syntax
|
|
Diagrams. Syntax diagrams represent a grammar graphically. It was used
|
|
extensively by Niklaus Wirth [footnote Niklaus Wirth: The Programming Language
|
|
Pascal. (July 1973)] in the "Pascal User Manual". Syntax Diagrams are easily
|
|
understandable by programmers due to their similarity to flow charts. The
|
|
isomorphism of the diagrams and functions make them ideal for representing
|
|
__rd__ parsers which are essentially mutually recursive functions.
|
|
|
|
[heading Elements]
|
|
|
|
All diagrams have one entry and one exit point. Arrows connect all possible
|
|
paths through the grammar from the entry point to the exit point.
|
|
|
|
[:__sd_start_stop__]
|
|
|
|
Terminals are represented by round boxes. Terminals are atomic and
|
|
usually represent plain characters, strings or tokens.
|
|
|
|
[:__sd_terminals__]
|
|
|
|
Non-terminals are represented by boxes. Diagrams are modularized using
|
|
named non-terminals. A complex diagram can be broken down into a set of
|
|
non-terminals. Non-terminals also allow recursion (i.e. a non-terminal can call
|
|
itself).
|
|
|
|
[:__sd_non_terminals__]
|
|
|
|
[heading Constructs]
|
|
|
|
The most basic composition is the Sequence. B follows A:
|
|
|
|
[:__sd_sequence__]
|
|
|
|
The ordered choice henceforth we will call /alternatives/. In PEG, ordered
|
|
choice and alternatives are not quite the same. PEG allows ambiguity of choice
|
|
where one or more branches can succeed. In PEG, in case of ambiguity, the first
|
|
one always wins.
|
|
|
|
[:__sd_choice__]
|
|
|
|
The optional (zero-or-one):
|
|
|
|
[:__sd_optional__]
|
|
|
|
Now, the loops. We have the zero-or-more and one-or-more:
|
|
|
|
[:__sd_kleene__]
|
|
[:__sd_plus__]
|
|
|
|
Take note that, as in PEG, these loops behave greedily. If there is
|
|
another 'A' just before the end-point, it will always fail because the
|
|
preceding loop has already exhausted all 'A's and there is nothing more left.
|
|
This is a crucial difference between PEG and general Context Free Grammars
|
|
(CFGs). This behavior is quite obvious with syntax diagrams as they resemble
|
|
flow-charts.
|
|
|
|
[heading Predicates]
|
|
|
|
Now, the following are Syntax Diagram versions of PEG predicates. These are not
|
|
traditionally found in Syntax Diagrams. These are special extensions we invented
|
|
to closely follow PEGs.
|
|
|
|
First, we introduce a new element, the Predicate:
|
|
|
|
[:__sd_predicate__]
|
|
|
|
This is similar to the conditionals in flow charts where the 'No' branch is
|
|
absent and always signals a failed parse.
|
|
|
|
We have two versions of the predicate, the /And-Predicate/ and the
|
|
/Not-Predicate/:
|
|
|
|
[:__sd_and_predicate__]
|
|
[:__sd_not_predicate__]
|
|
|
|
The /And-Predicate/ tries the predicate, P, and succeeds if P succeeds,
|
|
or otherwise fail. The opposite is true with the /Not-Predicate/. It
|
|
tries the predicate, P, and fails if P succeeds, or otherwise succeeds. Both
|
|
versions do a look-ahead but do not consume any input regardless if P succeeds
|
|
or not.
|
|
|
|
[endsect]
|