413 lines
14 KiB
Plaintext
413 lines
14 KiB
Plaintext
[/==============================================================================
|
|
Copyright (C) 2001-2018 Joel de Guzman
|
|
|
|
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
|
|
I would like to thank Rainbowverse, llc (https://primeorbial.com/)
|
|
for sponsoring this work and donating it to the community.
|
|
===============================================================================/]
|
|
|
|
[section:annotation Annotations - Decorating the ASTs]
|
|
|
|
As a prerequisite in understanding this tutorial, please review the previous
|
|
[tutorial_employee employee example]. This example builds on top of that
|
|
example.
|
|
|
|
Stop and think about it... We're actually generating ASTs (abstract syntax
|
|
trees) in our previoius examples. We parsed a single structure and generated
|
|
an in-memory representation of it in the form of a struct: the struct
|
|
employee. If we changed the implementation to parse one or more employees,
|
|
the result would be a std::vector<employee>. We can go on and add more
|
|
hierarchy: teams, departments, corporations, etc. We can have an AST
|
|
representation of it all.
|
|
|
|
This example shows how to annotate the AST with the iterator positions for
|
|
access to the source code when post processing using a client supplied
|
|
`on_success` handler. The example will show how to get the position in input
|
|
source stream that corresponds to a given element in the AST.
|
|
|
|
In addition, This example also shows how to "inject" client data, using the
|
|
"with" directive, that the `on_success` handler can access as it is called
|
|
within the parse traversal through the parser's context.
|
|
|
|
The full cpp file for this example can be found here:
|
|
[@../../../example/x3/annotation.cpp annotation.cpp]
|
|
|
|
[heading The AST]
|
|
|
|
First, we'll update our previous employee struct, this time separating the
|
|
person into its own struct. So now, we have two structs, the `person` and the
|
|
`employee`. Take note too that we now inherit `person` and `employee` from
|
|
`x3::position_tagged` which provides positional information that we can use
|
|
to tell the AST's position in the input stream anytime.
|
|
|
|
namespace client { namespace ast
|
|
{
|
|
struct person : x3::position_tagged
|
|
{
|
|
person(
|
|
std::string const& first_name = ""
|
|
, std::string const& last_name = ""
|
|
)
|
|
: first_name(first_name)
|
|
, last_name(last_name)
|
|
{}
|
|
|
|
std::string first_name, last_name;
|
|
};
|
|
|
|
struct employee : x3::position_tagged
|
|
{
|
|
int age;
|
|
person who;
|
|
double salary;
|
|
};
|
|
}}
|
|
|
|
Like before, we need to tell __fusion__ about our structs to make them
|
|
first-class fusion citizens that the grammar can utilize:
|
|
|
|
BOOST_FUSION_ADAPT_STRUCT(client::ast::person,
|
|
first_name, last_name
|
|
)
|
|
|
|
BOOST_FUSION_ADAPT_STRUCT(client::ast::employee,
|
|
age, who, salary
|
|
)
|
|
|
|
[heading x3::position_cache]
|
|
|
|
Before we proceed, let me introduce a helper class called the
|
|
`position_cache`. It is a simple class that collects iterator ranges that
|
|
point to where each element in the AST are located in the input stream. Given
|
|
an AST, you can query the position_cache about AST's position. For example:
|
|
|
|
auto pos = positions.position_of(my_ast);
|
|
|
|
Where `my_ast` is the AST, `positions` and is the `position_cache`,
|
|
`position_of` returns an iterator range that points to the start and end
|
|
(`pos.begin()` and `pos.end()`) positions where the AST was parsed from.
|
|
`positions.begin()` and `positions.end()` points to the start and end of the
|
|
entire input stream.
|
|
|
|
[heading on_success]
|
|
|
|
The `on_success` gives you everything you want from semantic actions without
|
|
the visual clutter. Declarative code can and should be free from imperative
|
|
code. `on_success` as a concept and mechanism is an important departure from
|
|
how things are done in Spirit's previous version: Qi.
|
|
|
|
As demonstrated in the previous [tutorial_employee employee example], the
|
|
preferred way to extract data from an input source is by having the parser
|
|
collect the data for us into C++ structs as it traverses the input stream.
|
|
Ideally, Spirit X3 grammars are fully attributed and declared in such a way
|
|
that you do not have to add any imperative code and there should be no need
|
|
for semantic actions at all. The parser simply works as declared and you get
|
|
your data back as a result.
|
|
|
|
However, there are certain cases where there's no way to avoid introducing
|
|
imperative code. But semantic actions mess up our clean declarative grammars.
|
|
If we care to keep our code clean, `on_success` handlers are alternative
|
|
callback hooks to client code that are executed by the parser after a
|
|
successful parse without polluting the grammar. Like semantic actions,
|
|
`on_success` handlers have access to the AST, the iterators, and context.
|
|
But, unlike semantic actions, `on_success` handlers are cleanly separated
|
|
from the actual grammar.
|
|
|
|
[heading Annotation Handler]
|
|
|
|
As discussed, we annotate the AST with its position in the input stream with
|
|
our `on_success` handler:
|
|
|
|
// tag used to get the position cache from the context
|
|
struct position_cache_tag;
|
|
|
|
struct annotate_position
|
|
{
|
|
template <typename T, typename Iterator, typename Context>
|
|
inline void on_success(Iterator const& first, Iterator const& last
|
|
, T& ast, Context const& context)
|
|
{
|
|
auto& position_cache = x3::get<position_cache_tag>(context).get();
|
|
position_cache.annotate(ast, first, last);
|
|
}
|
|
};
|
|
|
|
`position_cache_tag` is a special tag we will use to get a reference to the
|
|
actual `position_cache`, client data that we will inject at very start, when
|
|
we call parse. More on that later.
|
|
|
|
Our `on_success` handler gets a reference to the actual `position_cache` and
|
|
calls its `annotate` member function, passing in the AST and the iterators.
|
|
`position_cache.annotate(ast, first, last)` annotates the AST with
|
|
information required by `x3::position_tagged`.
|
|
|
|
[heading The Parser]
|
|
|
|
Now we'll write a parser for our employee. To simplify, inputs will be of the
|
|
form:
|
|
|
|
{ age, "forename", "surname", salary }
|
|
|
|
[#__tutorial_annotated_employee_parser__]
|
|
Here we go:
|
|
|
|
namespace parser
|
|
{
|
|
using x3::int_;
|
|
using x3::double_;
|
|
using x3::lexeme;
|
|
using ascii::char_;
|
|
|
|
struct quoted_string_class;
|
|
struct person_class;
|
|
struct employee_class;
|
|
|
|
x3::rule<quoted_string_class, std::string> const quoted_string = "quoted_string";
|
|
x3::rule<person_class, ast::person> const person = "person";
|
|
x3::rule<employee_class, ast::employee> const employee = "employee";
|
|
|
|
auto const quoted_string_def = lexeme['"' >> +(char_ - '"') >> '"'];
|
|
auto const person_def = quoted_string >> ',' >> quoted_string;
|
|
|
|
auto const employee_def =
|
|
'{'
|
|
>> int_ >> ','
|
|
>> person >> ','
|
|
>> double_
|
|
>> '}'
|
|
;
|
|
|
|
auto const employees = employee >> *(',' >> employee);
|
|
|
|
BOOST_SPIRIT_DEFINE(quoted_string, person, employee);
|
|
}
|
|
|
|
[heading Rule Declarations]
|
|
|
|
struct quoted_string_class;
|
|
struct person_class;
|
|
struct employee_class;
|
|
|
|
x3::rule<quoted_string_class, std::string> const quoted_string = "quoted_string";
|
|
x3::rule<person_class, ast::person> const person = "person";
|
|
x3::rule<employee_class, ast::employee> const employee = "employee";
|
|
|
|
Go back and review the original [link __tutorial_employee_parser__ employee parser].
|
|
What has changed?
|
|
|
|
* We split the single employee rule into three smaller rules: `quoted_string`,
|
|
`person` and `employee`.
|
|
* We're using forward declared rule classes: `quoted_string_class`, `person_class`,
|
|
and `employee_class`.
|
|
|
|
[heading Rule Classes]
|
|
|
|
Like before, in this example, the rule classes, `quoted_string_class`,
|
|
`person_class`, and `employee_class` provide statically known IDs for the
|
|
rules required by X3 to perform its tasks. In addition to that, the rule
|
|
class can also be extended to have some user-defined customization hooks that
|
|
are called:
|
|
|
|
* On success: After a rule sucessfully parses an input.
|
|
* On Error: After a rule fails to parse.
|
|
|
|
By subclassing the rule class from a client supplied handler such as our
|
|
`annotate_position` handler above:
|
|
|
|
struct person_class : annotate_position {};
|
|
struct employee_class : annotate_position {};
|
|
|
|
The code above tells X3 to check the rule class if it has an `on_success` or
|
|
`on_error` member functions and appropriately calls them on such events.
|
|
|
|
[#__tutorial_with_directive__]
|
|
[heading The with Directive]
|
|
|
|
For any parser `p`, one can inject supplementary data that semantic actions
|
|
and handlers can access later on when they are called. The general syntax is:
|
|
|
|
with<tag>(data)[p]
|
|
|
|
For our particular example, we use to inject the `position_cache` into the
|
|
parse for our `annotate_position` on_success handler to have access to:
|
|
|
|
auto const parser =
|
|
// we pass our position_cache to the parser so we can access
|
|
// it later in our on_sucess handlers
|
|
with<position_cache_tag>(std::ref(positions))
|
|
[
|
|
employees
|
|
];
|
|
|
|
Typically this is done just before calling `x3::parse` or `x3::phrase_parse`.
|
|
`with` is a very lightwight operation. It is possible to inject as much data
|
|
as you want, even multiple `with` directives:
|
|
|
|
with<tag1>(data1)
|
|
[
|
|
with<tag2>(data2)[p]
|
|
]
|
|
|
|
Multiple `with` directives can (perhaps not obviously) be injected from
|
|
outside the called function. Here's an outline:
|
|
|
|
template <typename Parser>
|
|
void bar(Parser const& p)
|
|
{
|
|
// Inject data2
|
|
auto const parser = with<tag2>(data2)[p];
|
|
x3::parse(first, last, parser);
|
|
}
|
|
|
|
void foo()
|
|
{
|
|
// Inject data1
|
|
auto const parser = with<tag1>(data1)[my_parser];
|
|
bar(p);
|
|
}
|
|
|
|
[heading Let's Parse]
|
|
|
|
Now we have the complete parse mechanism with support for annotations:
|
|
|
|
using iterator_type = std::string::const_iterator;
|
|
using position_cache = boost::spirit::x3::position_cache<std::vector<iterator_type>>;
|
|
|
|
std::vector<client::ast::employee>
|
|
parse(std::string const& input, position_cache& positions)
|
|
{
|
|
using boost::spirit::x3::ascii::space;
|
|
|
|
std::vector<client::ast::employee> ast;
|
|
iterator_type iter = input.begin();
|
|
iterator_type const end = input.end();
|
|
|
|
using boost::spirit::x3::with;
|
|
|
|
// Our parser
|
|
using client::parser::employees;
|
|
using client::parser::position_cache_tag;
|
|
|
|
auto const parser =
|
|
// we pass our position_cache to the parser so we can access
|
|
// it later in our on_sucess handlers
|
|
with<position_cache_tag>(std::ref(positions))
|
|
[
|
|
employees
|
|
];
|
|
|
|
bool r = phrase_parse(iter, end, parser, space, ast);
|
|
|
|
// ... Some error checking here
|
|
|
|
return ast;
|
|
}
|
|
|
|
Let's walk through the code.
|
|
|
|
First, we have some typedefs for 1) The iterator type we are using for the
|
|
parser, `iterator_type` and 2) For the `position_cache` type. The latter is a
|
|
template that accepts the type of container it will hold. In this case, a
|
|
`std::vector<iterator_type>`.
|
|
|
|
The main parse function accepts an input, a std::string and a reference to a
|
|
position_cache, and retuns an AST: `std::vector<client::ast::employee>`.
|
|
|
|
Inside the parse function, we first create an AST where parsed data will be
|
|
stored:
|
|
|
|
std::vector<client::ast::employee> ast;
|
|
|
|
Then finally, we create a parser, injecting a reference to the `position_cache`,
|
|
and call phrase_parse:
|
|
|
|
using client::parser::employees;
|
|
using client::parser::position_cache_tag;
|
|
|
|
auto const parser =
|
|
// we pass our position_cache to the parser so we can access
|
|
// it later in our on_sucess handlers
|
|
with<position_cache_tag>(std::ref(positions))
|
|
[
|
|
employees
|
|
];
|
|
|
|
bool r = phrase_parse(iter, end, parser, space, ast);
|
|
|
|
On successful parse, the AST, `ast`, will contain the actual parsed data.
|
|
|
|
[heading Getting The Source Positions]
|
|
|
|
Now that we have our main parse function, let's have an example sourcefile to
|
|
parse and show how we can obtain the position of an AST element, returned
|
|
after a successful parse.
|
|
|
|
Given this input:
|
|
|
|
std::string input = R"(
|
|
{
|
|
23,
|
|
"Amanda",
|
|
"Stefanski",
|
|
1000.99
|
|
},
|
|
{
|
|
35,
|
|
"Angie",
|
|
"Chilcote",
|
|
2000.99
|
|
},
|
|
{
|
|
43,
|
|
"Dannie",
|
|
"Dillinger",
|
|
3000.99
|
|
},
|
|
{
|
|
22,
|
|
"Dorene",
|
|
"Dole",
|
|
2500.99
|
|
},
|
|
{
|
|
38,
|
|
"Rossana",
|
|
"Rafferty",
|
|
5000.99
|
|
}
|
|
)";
|
|
|
|
We call our parse function after instantiating a `position_cache` object that
|
|
will hold the source stream positions:
|
|
|
|
position_cache positions{input.begin(), input.end()};
|
|
auto ast = parse(input, positions);
|
|
|
|
We now have an AST, `ast`, that contains the parsed results. Let us get the
|
|
source positions of the 2nd employee:
|
|
|
|
auto pos = positions.position_of(ast[1]); // zero based of course!
|
|
|
|
`pos` is an iterator range that contians iterators to the start and end of
|
|
`ast[1]` in the input stream.
|
|
|
|
[heading Config]
|
|
|
|
If you read the previous [tutorial_minimal Program Structure] tutorial where
|
|
we separated various logical modules of the parser into separate cpp and
|
|
header files, and you are wondering how to provide the context configuration
|
|
information (see [link tutorial_configuration Config Section]), we need to
|
|
supplement the context like this:
|
|
|
|
using phrase_context_type = x3::phrase_parse_context<x3::ascii::space_type>::type;
|
|
|
|
typedef x3::context<
|
|
error_handler_tag
|
|
, std::reference_wrapper<position_cache>
|
|
, phrase_context_type>
|
|
context_type;
|
|
|
|
[endsect]
|