83a792d7ed
[SVN r67619]
226 lines
6.5 KiB
Plaintext
226 lines
6.5 KiB
Plaintext
[/==============================================================================
|
|
Copyright (C) 2001-2011 Joel de Guzman
|
|
Copyright (C) 2001-2011 Hartmut Kaiser
|
|
|
|
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
===============================================================================/]
|
|
|
|
[section Employee - Parsing into structs]
|
|
|
|
It's a common question in the __spirit_list__: How do I parse and place
|
|
the results into a C++ struct? Of course, at this point, you already
|
|
know various ways to do it, using semantic actions. There are many ways
|
|
to skin a cat. Spirit2, being fully attributed, makes it even easier.
|
|
The next example demonstrates some features of Spirit2 that make this
|
|
easy. In the process, you'll learn about:
|
|
|
|
* More about attributes
|
|
* Auto rules
|
|
* Some more built-in parsers
|
|
* Directives
|
|
|
|
[import ../../example/qi/employee.cpp]
|
|
|
|
First, let's create a struct representing an employee:
|
|
|
|
[tutorial_employee_struct]
|
|
|
|
Then, we need to tell __fusion__ about our employee struct to make it a first-class
|
|
fusion citizen that the grammar can utilize. If you don't know fusion yet,
|
|
it is a __boost__ library for working with heterogeneous collections of data,
|
|
commonly referred to as tuples. Spirit uses fusion extensively as part of its
|
|
infrastructure.
|
|
|
|
In fusion's view, a struct is just a form of a tuple. You can adapt any struct
|
|
to be a fully conforming fusion tuple:
|
|
|
|
[tutorial_employee_adapt_struct]
|
|
|
|
Now we'll write a parser for our employee. Inputs will be of the form:
|
|
|
|
employee{ age, "surname", "forename", salary }
|
|
|
|
Here goes:
|
|
|
|
[tutorial_employee_parser]
|
|
|
|
The full cpp file for this example can be found here: [@../../example/qi/employee.cpp]
|
|
|
|
Let's walk through this one step at a time (not necessarily from top to bottom).
|
|
|
|
template <typename Iterator>
|
|
struct employee_parser : grammar<Iterator, employee(), space_type>
|
|
|
|
`employee_parser` is a grammar. Like before, we make it a template so that we can
|
|
reuse it for different iterator types. The grammar's signature is:
|
|
|
|
employee()
|
|
|
|
meaning, the parser generates employee structs. `employee_parser` skips white
|
|
spaces using `space_type` as its skip parser.
|
|
|
|
employee_parser() : employee_parser::base_type(start)
|
|
|
|
Initializes the base class.
|
|
|
|
rule<Iterator, std::string(), space_type> quoted_string;
|
|
rule<Iterator, employee(), space_type> start;
|
|
|
|
Declares two rules: `quoted_string` and `start`. `start` has the same template
|
|
parameters as the grammar itself. `quoted_string` has a `std::string` attribute.
|
|
|
|
[heading Lexeme]
|
|
|
|
lexeme['"' >> +(char_ - '"') >> '"'];
|
|
|
|
`lexeme` inhibits space skipping from the open brace to the closing brace.
|
|
The expression parses quoted strings.
|
|
|
|
+(char_ - '"')
|
|
|
|
parses one or more chars, except the double quote. It stops when it sees
|
|
a double quote.
|
|
|
|
[heading Difference]
|
|
|
|
The expression:
|
|
|
|
a - b
|
|
|
|
parses `a` but not `b`. Its attribute is just `A`; the attribute of `a`. `b`'s
|
|
attribute is ignored. Hence, the attribute of:
|
|
|
|
char_ - '"'
|
|
|
|
is just `char`.
|
|
|
|
[heading Plus]
|
|
|
|
+a
|
|
|
|
is similar to Kleene star. Rather than match everything, `+a` matches one or more.
|
|
Like it's related function, the Kleene star, its attribute is a `std::vector<A>`
|
|
where `A` is the attribute of `a`. So, putting all these together, the attribute
|
|
of
|
|
|
|
+(char_ - '"')
|
|
|
|
is then:
|
|
|
|
std::vector<char>
|
|
|
|
[heading Sequence Attribute]
|
|
|
|
Now what's the attribute of
|
|
|
|
'"' >> +(char_ - '"') >> '"'
|
|
|
|
?
|
|
|
|
Well, typically, the attribute of:
|
|
|
|
a >> b >> c
|
|
|
|
is:
|
|
|
|
fusion::vector<A, B, C>
|
|
|
|
where `A` is the attribute of `a`, `B` is the attribute of `b` and `C` is the
|
|
attribute of `c`. What is `fusion::vector`? - a tuple.
|
|
|
|
[note If you don't know what I am talking about, see: [@http://tinyurl.com/6xun4j
|
|
Fusion Vector]. It might be a good idea to have a look into __fusion__ at this
|
|
point. You'll definitely see more of it in the coming pages.]
|
|
|
|
[heading Attribute Collapsing]
|
|
|
|
Some parsers, especially those very little literal parsers you see, like `'"'`,
|
|
do not have attributes.
|
|
|
|
Nodes without attributes are disregarded. In a sequence, like above, all nodes
|
|
with no attributes are filtered out of the `fusion::vector`. So, since `'"'` has
|
|
no attribute, and `+(char_ - '"')` has a `std::vector<char>` attribute, the
|
|
whole expression's attribute should have been:
|
|
|
|
fusion::vector<std::vector<char> >
|
|
|
|
But wait, there's one more collapsing rule: If the attribute is followed by a
|
|
single element `fusion::vector`, The element is stripped naked from its container.
|
|
To make a long story short, the attribute of the expression:
|
|
|
|
'"' >> +(char_ - '"') >> '"'
|
|
|
|
is:
|
|
|
|
std::vector<char>
|
|
|
|
[heading Auto Rules]
|
|
|
|
It is typical to see rules like:
|
|
|
|
r = p[_val = _1];
|
|
|
|
If you have a rule definition such as the above, where the attribute of the RHS
|
|
(right hand side) of the rule is compatible with the attribute of the LHS (left
|
|
hand side), then you can rewrite it as:
|
|
|
|
r %= p;
|
|
|
|
The attribute of `p` automatically uses the attribute of `r`.
|
|
|
|
So, going back to our `quoted_string` rule:
|
|
|
|
quoted_string %= lexeme['"' >> +(char_ - '"') >> '"'];
|
|
|
|
is a simplified version of:
|
|
|
|
quoted_string = lexeme['"' >> +(char_ - '"') >> '"'][_val = _1];
|
|
|
|
The attribute of the `quoted_string` rule: `std::string` *is compatible* with
|
|
the attribute of the RHS: `std::vector<char>`. The RHS extracts the parsed
|
|
attribute directly into the rule's attribute, in-situ.
|
|
|
|
[note `r %= p` and `r = p` are equivalent if there are no semantic actions
|
|
associated with `p`. ]
|
|
|
|
|
|
[heading Finally]
|
|
|
|
We're down to one rule, the start rule:
|
|
|
|
start %=
|
|
lit("employee")
|
|
>> '{'
|
|
>> int_ >> ','
|
|
>> quoted_string >> ','
|
|
>> quoted_string >> ','
|
|
>> double_
|
|
>> '}'
|
|
;
|
|
|
|
Applying our collapsing rules above, the RHS has an attribute of:
|
|
|
|
fusion::vector<int, std::string, std::string, double>
|
|
|
|
These nodes do not have an attribute:
|
|
|
|
* `lit("employee")`
|
|
* `'{'`
|
|
* `','`
|
|
* `'}'`
|
|
|
|
[note In case you are wondering, `lit("employee")` is the same as "employee". We
|
|
had to wrap it inside `lit` because immediately after it is `>> '{'`. You can't
|
|
right-shift a `char[]` and a `char` - you know, C++ syntax rules.]
|
|
|
|
Recall that the attribute of `start` is the `employee` struct:
|
|
|
|
[tutorial_employee_struct]
|
|
|
|
Now everything is clear, right? The `struct employee` *IS* compatible with
|
|
`fusion::vector<int, std::string, std::string, double>`. So, the RHS of `start`
|
|
uses start's attribute (a `struct employee`) in-situ when it does its work.
|
|
|
|
[endsect]
|