83a792d7ed
[SVN r67619]
206 lines
9.3 KiB
Plaintext
206 lines
9.3 KiB
Plaintext
[/==============================================================================
|
|
Copyright (C) 2001-2011 Joel de Guzman
|
|
Copyright (C) 2001-2011 Hartmut Kaiser
|
|
|
|
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
===============================================================================/]
|
|
|
|
[section Introduction]
|
|
|
|
Boost Spirit is an object-oriented, recursive-descent parser and
|
|
output generation library for C++. It allows you to write grammars and
|
|
format descriptions using a format similar to Extended Backus Naur
|
|
Form (EBNF)[footnote [@http://www.cl.cam.ac.uk/%7Emgk25/iso-14977.pdf
|
|
ISO-EBNF]] directly in C++. These inline grammar
|
|
specifications can mix freely with other C++ code and, thanks to the
|
|
generative power of C++ templates, are immediately executable. In
|
|
retrospect, conventional compiler-compilers or parser-generators have
|
|
to perform an additional translation step from the source EBNF code to
|
|
C or C++ code.
|
|
|
|
The syntax and semantics of the libraries' API directly form domain-specific
|
|
embedded languages (DSEL). In fact, Spirit exposes 3 different DSELs to the
|
|
user:
|
|
|
|
* one for creating parser grammars,
|
|
* one for the specification of the required tokens to be used for parsing,
|
|
* and one for the description of the required output formats.
|
|
|
|
Since the target input grammars and output formats are written entirely in C++
|
|
we do not need any separate tools to compile, preprocess or integrate those
|
|
into the build process. __spirit__ allows seamless integration of the parsing
|
|
and output generation process with other C++ code. This often allows for
|
|
simpler and more efficient code.
|
|
|
|
Both the created parsers and generators are fully attributed, which allows you
|
|
to easily build and handle hierarchical data structures in memory. These data
|
|
structures resemble the structure of the input data and can directly be used
|
|
to generate arbitrarily-formatted output.
|
|
|
|
The [link spirit.spiritstructure figure] below depicts the overall structure
|
|
of the Boost Spirit library. The library consists of 4 major parts:
|
|
|
|
* __classic__: This is the almost-unchanged code base taken from the
|
|
former Boost Spirit V1.8 distribution. It has been moved into the namespace
|
|
boost::spirit::classic. A special compatibility layer has been added to
|
|
ensure complete compatibility with existing code using Spirit V1.8.
|
|
* __qi__: This is the parser library allowing you to build recursive
|
|
descent parsers. The exposed domain-specific language can be used to describe
|
|
the grammars to implement, and the rules for storing the parsed information.
|
|
* __lex__: This is the library usable to create tokenizers (lexers). The
|
|
domain-specific language exposed by __lex__ allows you to define regular
|
|
expressions used to match tokens (create token definitions), associate these
|
|
regular expressions with code to be executed whenever they are matched, and
|
|
to add the token definitions to the lexical analyzer.
|
|
* __karma__: This is the generator library allowing you to create code for
|
|
recursive descent, data type-driven output formatting. The exposed
|
|
domain-specific language is almost equivalent to the parser description language
|
|
used in __qi__, except that it is used to describe the required output
|
|
format to generate from a given data structure.
|
|
|
|
[fig spiritstructure.png..The overall structure of the Boost Spirit library..spirit.spiritstructure]
|
|
|
|
|
|
The three components, __qi__, __karma__ and __lex__, are designed to be used
|
|
either stand alone, or together. The general methodology is to use the token
|
|
sequence generated by __lex__ as the input for a parser generated by __qi__.
|
|
On the opposite side of the equation, the hierarchical data structures generated
|
|
by __qi__ are used for the output generators created using __karma__.
|
|
However, there is nothing to stop you from using any of these components all
|
|
by themselves.
|
|
|
|
The [link spirit.spiritkarmaflow figure] below shows the typical data flow of
|
|
some input being converted to some internal representation. After some
|
|
(optional) transformation these data are converted back into some different,
|
|
external representation. The picture highlights Spirit's place in this data
|
|
transformation flow.
|
|
|
|
[fig spiritkarmaflow.png..The place of __qi__ and __karma__ in a data transformation flow of a typical application..spirit.spiritkarmaflow]
|
|
|
|
[heading A Quick Overview of Parsing with __qi__]
|
|
|
|
__qi__ is Spirit's sublibrary dealing with generating parsers based on a given
|
|
target grammar (essentially a format description of the input data to read).
|
|
|
|
A simple EBNF grammar snippet:
|
|
|
|
group ::= '(' expression ')'
|
|
factor ::= integer | group
|
|
term ::= factor (('*' factor) | ('/' factor))*
|
|
expression ::= term (('+' term) | ('-' term))*
|
|
|
|
is approximated using facilities of Spirit's /Qi/ sublibrary as seen in this
|
|
code snippet:
|
|
|
|
group = '(' >> expression >> ')';
|
|
factor = integer | group;
|
|
term = factor >> *(('*' >> factor) | ('/' >> factor));
|
|
expression = term >> *(('+' >> term) | ('-' >> term));
|
|
|
|
Through the magic of expression templates, this is perfectly valid and
|
|
executable C++ code. The production rule `expression` is, in fact, an object that
|
|
has a member function `parse` that does the work given a source code written in
|
|
the grammar that we have just declared. Yes, it's a calculator. We shall
|
|
simplify for now by skipping the type declarations and the definition of the
|
|
rule `integer` invoked by `factor`. Now, the production rule `expression` in our
|
|
grammar specification, traditionally called the `start` symbol, can recognize
|
|
inputs such as:
|
|
|
|
12345
|
|
-12345
|
|
+12345
|
|
1 + 2
|
|
1 * 2
|
|
1/2 + 3/4
|
|
1 + 2 + 3 + 4
|
|
1 * 2 * 3 * 4
|
|
(1 + 2) * (3 + 4)
|
|
(-1 + 2) * (3 + -4)
|
|
1 + ((6 * 200) - 20) / 6
|
|
(1 + (2 + (3 + (4 + 5))))
|
|
|
|
Certainly we have modified the original EBNF syntax. This is done to
|
|
conform to C++ syntax rules. Most notably we see the abundance of
|
|
shift >> operators. Since there are no 'empty' operators in C++, it is
|
|
simply not possible to write something like:
|
|
|
|
a b
|
|
|
|
as seen in math syntax, for example, to mean multiplication or, in our case,
|
|
as seen in EBNF syntax to mean sequencing (b should follow a). __qi__
|
|
uses the shift `>>` operator instead for this purpose. We take the `>>` operator,
|
|
with arrows pointing to the right, to mean "is followed by". Thus we write:
|
|
|
|
a >> b
|
|
|
|
The alternative operator `|` and the parentheses `()` remain as is. The
|
|
assignment operator `=` is used in place of EBNF's `::=`. Last but not least,
|
|
the Kleene star `*`, which in this case is a postfix operator in EBNF becomes a
|
|
prefix. Instead of:
|
|
|
|
a* //... in EBNF syntax,
|
|
|
|
we write:
|
|
|
|
*a //... in Spirit.
|
|
|
|
since there are no postfix stars, `*`, in C/C++. Finally, we terminate each
|
|
rule with the ubiquitous semi-colon, `;`.
|
|
|
|
|
|
[heading A Quick Overview of Output Generation with __karma__]
|
|
|
|
Spirit not only allows you to describe the structure of the input, it also enables
|
|
the specification of the output format for your data in a similar way, and based
|
|
on a single syntax and compatible semantics.
|
|
|
|
Let's assume we need to generate a textual representation from a simple data
|
|
structure such as a `std::vector<int>`. Conventional code probably would look like:
|
|
|
|
std::vector<int> v (initialize_and_fill());
|
|
std::vector<int>::iterator end = v.end();
|
|
for (std::vector<int>::iterator it = v.begin(); it != end; ++it)
|
|
std::cout << *it << std::endl;
|
|
|
|
which is not very flexible and quite difficult to maintain when it comes to
|
|
changing the required output format. Spirit's sublibrary /Karma/ allows you to
|
|
specify output formats for arbitrary data structures in a very flexible way.
|
|
The following snippet is the /Karma/ format description used to create the
|
|
same output as the traditional code above:
|
|
|
|
*(int_ << eol)
|
|
|
|
Here are some more examples of format descriptions for different output
|
|
representations of the same `std::vector<int>`:
|
|
|
|
[table Different output formats for `std::vector<int>`
|
|
[ [Format] [Example] [Description] ]
|
|
[ [`'[' << *(int_ << ',') << ']'`] [`[1,8,10,]`] [Comma separated list of integers] ]
|
|
[ [`*('(' << int_ << ')' << ',')`] [`(1),(8),(10),`] [Comma separated list of integers in parenthesis] ]
|
|
[ [`*hex`] [`18a`] [A list of hexadecimal numbers] ]
|
|
[ [`*(double_ << ',')`] [`1.0,8.0,10.0,`] [A list of floating point numbers] ]
|
|
]
|
|
|
|
We will see later in this documentation how it is possible to avoid printing
|
|
the trailing `','`.
|
|
|
|
Overall, the syntax is similar to __qi__ with the exception that we use the `<<`
|
|
operator for output concatenation. This should be easy to understand as it
|
|
follows the conventions used in the Standard's I/O streams.
|
|
|
|
Another important feature of __karma__ allows you to fully decouple the data
|
|
type from the output format. You can use the same output format with different
|
|
data types as long as these conform conceptually. The next table gives some
|
|
related examples.
|
|
|
|
[table Different data types usable with the output format `*(int_ << eol)`
|
|
[ [Data type] [Description] ]
|
|
[ [`int i[4]`] [C style arrays] ]
|
|
[ [`std::vector<int>`] [Standard vector] ]
|
|
[ [`std::list<int>`] [Standard list] ]
|
|
[ [`boost::array<long, 20>`] [Boost array] ]
|
|
]
|
|
|
|
[endsect]
|