d4c0179fa6
After #456 `parse_rule` is always called with value of `rule::attribute_type&` type. By removing attribute type deduction we also will turn any possible bug in `transform_attribute` from a linkage to compile error. Fixes #454
279 lines
9.5 KiB
Plaintext
279 lines
9.5 KiB
Plaintext
[/==============================================================================
|
|
Copyright (C) 2001-2015 Joel de Guzman
|
|
Copyright (C) 2001-2011 Hartmut Kaiser
|
|
|
|
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
===============================================================================/]
|
|
|
|
[section:roman Roman Numerals]
|
|
|
|
This example demonstrates:
|
|
|
|
* The Symbol Table
|
|
* Non-terminal rules
|
|
|
|
[heading Symbol Table]
|
|
|
|
The symbol table holds a dictionary of symbols where each symbol is a sequence
|
|
of characters. The template class, can work efficiently with 8, 16, 32 and even
|
|
64 bit characters. Mutable data of type T are associated with each symbol.
|
|
|
|
Traditionally, symbol table management is maintained separately outside the BNF
|
|
grammar through semantic actions. Contrary to standard practice, the Spirit
|
|
symbol table class `symbols` is a parser. An object of which may be used
|
|
anywhere in the EBNF grammar specification. It is an example of a dynamic
|
|
parser. A dynamic parser is characterized by its ability to modify its behavior
|
|
at run time. Initially, an empty symbols object matches nothing. At any time,
|
|
symbols may be added or removed, thus, dynamically altering its behavior.
|
|
|
|
Each entry in a symbol table may have an associated mutable data slot. In this
|
|
regard, one can view the symbol table as an associative container (or map) of
|
|
key-value pairs where the keys are strings.
|
|
|
|
The symbols class expects one template parameter to specify the data type
|
|
associated with each symbol: its attribute. There are a couple of
|
|
namespaces in X3 where you can find various versions of the symbols class
|
|
for handling different character encoding including ascii, standard,
|
|
standard_wide, iso8859_1, and unicode. The default symbol parser type in
|
|
the main x3 namespace is standard.
|
|
|
|
Here's a parser for roman hundreds (100..900) using the symbol table. Keep in
|
|
mind that the data associated with each slot is the parser's attribute (which is
|
|
passed to attached semantic actions).
|
|
|
|
struct hundreds_ : x3::symbols<unsigned>
|
|
{
|
|
hundreds_()
|
|
{
|
|
add
|
|
("C" , 100)
|
|
("CC" , 200)
|
|
("CCC" , 300)
|
|
("CD" , 400)
|
|
("D" , 500)
|
|
("DC" , 600)
|
|
("DCC" , 700)
|
|
("DCCC" , 800)
|
|
("CM" , 900)
|
|
;
|
|
}
|
|
|
|
} hundreds;
|
|
|
|
Here's a parser for roman tens (10..90):
|
|
|
|
struct tens_ : x3::symbols<unsigned>
|
|
{
|
|
tens_()
|
|
{
|
|
add
|
|
("X" , 10)
|
|
("XX" , 20)
|
|
("XXX" , 30)
|
|
("XL" , 40)
|
|
("L" , 50)
|
|
("LX" , 60)
|
|
("LXX" , 70)
|
|
("LXXX" , 80)
|
|
("XC" , 90)
|
|
;
|
|
}
|
|
|
|
} tens;
|
|
|
|
and, finally, for ones (1..9):
|
|
|
|
struct ones_ : x3::symbols<unsigned>
|
|
{
|
|
ones_()
|
|
{
|
|
add
|
|
("I" , 1)
|
|
("II" , 2)
|
|
("III" , 3)
|
|
("IV" , 4)
|
|
("V" , 5)
|
|
("VI" , 6)
|
|
("VII" , 7)
|
|
("VIII" , 8)
|
|
("IX" , 9)
|
|
;
|
|
}
|
|
|
|
} ones;
|
|
|
|
Now we can use `hundreds`, `tens` and `ones` anywhere in our parser expressions.
|
|
They are all parsers.
|
|
|
|
[heading Rules]
|
|
|
|
Up until now, we've been inlining our parser expressions, passing them directly
|
|
to the `phrase_parse` function. The expression evaluates into a temporary,
|
|
unnamed parser which is passed into the `phrase_parse` function, used, and then
|
|
destroyed. This is fine for small parsers. When the expressions get complicated,
|
|
you'd want to break the expressions into smaller easier-to-understand pieces,
|
|
name them, and refer to them from other parser expressions by name.
|
|
|
|
A parser expression can be assigned to what is called a "rule". There are
|
|
various ways to declare rules. The simplest form is:
|
|
|
|
rule<ID> const r = "some-name";
|
|
|
|
[heading Rule ID]
|
|
|
|
At the very least, the rule needs an identification tag. This ID can be any
|
|
struct or class type and need not be defined. Forward declaration would suffice.
|
|
In subsequent tutorials, we will see that the rule ID can have additional
|
|
functionalities for error handling and annotation.
|
|
|
|
[heading Rule Name]
|
|
|
|
The name is optional, but is useful for debugging and error handling, as
|
|
we'll see later. Notice that rule `r` is declared `const`. Rules are
|
|
immutable and are best declared as `const`. Rules are lightweight and can be
|
|
passed around by value. Its only member variable is a `std::string`: its
|
|
name.
|
|
|
|
[note Unlike Qi (Spirit V2), X3 rules can be used with both `phrase_parse` and
|
|
`parse` without having to specify the skip parser]
|
|
|
|
[heading Rule Attributes]
|
|
|
|
For our next example, there's one more rule form you should know about:
|
|
|
|
rule<ID, Attribute> const r = "some-name";
|
|
|
|
The Attribute parameter specifies the attribute type of the rule. You've seen
|
|
that our parsers can have an attribute. Recall that the `double_` parser has
|
|
an attribute of `double`. To be precise, these are /synthesized/ attributes.
|
|
The parser "synthesizes" the attribute value. If the parser is a function,
|
|
think of them as function return values.
|
|
|
|
[heading Rule Definition]
|
|
|
|
After having declared a rule, you need a definition for the rule. Example:
|
|
|
|
auto const r_def = double_ >> *(',' >> double_);
|
|
|
|
By convention, rule definitions have a _def suffix. Like rules, rule definitions
|
|
are immutable and are best declared as `const`.
|
|
|
|
[#__tutorial_spirit_define__]
|
|
[heading BOOST_SPIRIT_DEFINE]
|
|
|
|
Now that we have a rule and its definition, we tie the rule with a rule
|
|
definition using the `BOOST_SPIRIT_DEFINE` macro:
|
|
|
|
BOOST_SPIRIT_DEFINE(r);
|
|
|
|
Behind the scenes, what's actually happening is that we are defining a `parse_rule`
|
|
function in the client namespace that tells X3 how to invoke the rule. For example,
|
|
given a rule named `my_rule` and a corresponding definition named `my_rule_def`,
|
|
`BOOST_SPIRIT_DEFINE(my_rule)` expands to this code:
|
|
|
|
template <typename Iterator, typename Context>
|
|
inline bool parse_rule(
|
|
decltype(my_rule)
|
|
, Iterator& first, Iterator const& last
|
|
, Context const& context, decltype(my_rule)::attribute_type& attr)
|
|
{
|
|
using boost::spirit::x3::unused;
|
|
static auto const def_ = my_rule_def;
|
|
return def_.parse(first, last, context, unused, attr);
|
|
}
|
|
|
|
And so for each rule defined using `BOOST_SPIRIT_DEFINE`, there is an
|
|
overloaded `parse_rule` function. At parse time, Spirit X3 recursively calls
|
|
the appropriate `parse_rule` function.
|
|
|
|
[note `BOOST_SPIRIT_DEFINE` is variadic and may be used for one or more rules.
|
|
Example: `BOOST_SPIRIT_DEFINE(r1, r2, r3);`]
|
|
|
|
[heading Grammars]
|
|
|
|
Unlike Qi (Spirit V2), X3 discards the notion of a grammar as a concrete
|
|
entity for encapsulating rules. In X3, a grammar is simply a logical group of
|
|
rules that work together, typically with a single top-level start rule which
|
|
serves as the main entry point. X3 grammars are grouped using namespaces.
|
|
The roman numeral grammar is a very nice and simple example of a grammar:
|
|
|
|
namespace parser
|
|
{
|
|
using x3::eps;
|
|
using x3::lit;
|
|
using x3::_val;
|
|
using x3::_attr;
|
|
using ascii::char_;
|
|
|
|
auto set_zero = [&](auto& ctx){ _val(ctx) = 0; };
|
|
auto add1000 = [&](auto& ctx){ _val(ctx) += 1000; };
|
|
auto add = [&](auto& ctx){ _val(ctx) += _attr(ctx); };
|
|
|
|
x3::rule<class roman, unsigned> const roman = "roman";
|
|
|
|
auto const roman_def =
|
|
eps [set_zero]
|
|
>>
|
|
(
|
|
-(+lit('M') [add1000])
|
|
>> -hundreds [add]
|
|
>> -tens [add]
|
|
>> -ones [add]
|
|
)
|
|
;
|
|
|
|
BOOST_SPIRIT_DEFINE(roman);
|
|
}
|
|
|
|
Things to take notice of:
|
|
|
|
* The start rule's attribute is `unsigned`.
|
|
|
|
* `_val(ctx)` gets a reference to the rule's synthesized attribute.
|
|
|
|
* `_attr(ctx)` gets a reference to the parser's synthesized attribute.
|
|
|
|
* `eps` is a special spirit parser that consumes no input but is always
|
|
successful. We use it to initialize the rule's synthesized
|
|
attribute, to zero before anything else. The actual parser starts at
|
|
`+lit('M')`, parsing roman thousands. Using `eps` this way is good
|
|
for doing pre and post initializations.
|
|
|
|
* The rule `roman` and the definition `roman_def` are const objects.
|
|
|
|
* The rule's ID is `class roman`. C++ allows you to declare the class
|
|
in the actual template declaration as you can see in the example:
|
|
|
|
x3::rule<class roman, unsigned> const roman = "roman";
|
|
|
|
[heading Let's Parse!]
|
|
|
|
bool r = parse(iter, end, roman, result);
|
|
|
|
if (r && iter == end)
|
|
{
|
|
std::cout << "-------------------------\n";
|
|
std::cout << "Parsing succeeded\n";
|
|
std::cout << "result = " << result << std::endl;
|
|
std::cout << "-------------------------\n";
|
|
}
|
|
else
|
|
{
|
|
std::string rest(iter, end);
|
|
std::cout << "-------------------------\n";
|
|
std::cout << "Parsing failed\n";
|
|
std::cout << "stopped at: \": " << rest << "\"\n";
|
|
std::cout << "-------------------------\n";
|
|
}
|
|
|
|
`roman` is our roman numeral parser. This time around we are using the
|
|
no-skipping version of the parse functions. We do not want to skip any spaces!
|
|
We are also passing in an attribute, `unsigned result`, which will receive the
|
|
parsed value.
|
|
|
|
The full cpp file for this example can be found here:
|
|
[@../../../example/x3/roman.cpp roman.cpp]
|
|
|
|
[endsect]
|