590 lines
25 KiB
Plaintext
590 lines
25 KiB
Plaintext
[#manual]
|
|
[section User manual]
|
|
|
|
[section What is a parser]
|
|
|
|
See the [link parser parser] section of the [link reference reference] for the
|
|
explanation of what a parser is.
|
|
|
|
[section The input of the parsers]
|
|
|
|
Parsers take a [link string `string`] as input, which represents a string
|
|
for template metaprograms. For example the string `"Hello World!"` can be
|
|
defined the following way:
|
|
|
|
string<'H','e','l','l','o',' ','W','o','r','l','d','!'>
|
|
|
|
This syntax makes the input of the parsers difficult to read. Metaparse works
|
|
with compilers using C++98, but the input of the parsers has to be defined the
|
|
way it is described above.
|
|
|
|
Based on `constexpr`, a feature provided by C++11, Metaparse provides a macro,
|
|
[link BOOST_METAPARSE_STRING `BOOST_METAPARSE_STRING`] for defining strings:
|
|
|
|
BOOST_METAPARSE_STRING("Hello World!")
|
|
|
|
This defines a [link string `string`] as well, however, it is easier to
|
|
read. The maximum length of the string that can be defined this way is limited,
|
|
however, this limit is configurable. It is specified by the
|
|
`BOOST_METAPARSE_LIMIT_STRING_SIZE` macro.
|
|
|
|
[endsect]
|
|
|
|
[section Source positions]
|
|
|
|
A source position is described using a compile-time data structure. The
|
|
following functions can be used to query it:
|
|
|
|
* [link get_col `get_col`]
|
|
* [link get_line `get_line`]
|
|
|
|
The beginning of the input is [link start `start`] which requires
|
|
`<boost/metaparse/start.hpp>` to be included.
|
|
|
|
[endsect]
|
|
|
|
[section Error handling]
|
|
|
|
An error is described using a compile-time data structure. It contains
|
|
information about the source position where the error was detected and some
|
|
[link parsing_error_message description] about the error.
|
|
[link debug_parsing_error `debug_parsing_error`] can be used to display the
|
|
error message. Metaparse provides the
|
|
[link BOOST_METAPARSE_DEFINE_ERROR `BOOST_METAPARSE_DEFINE_ERROR`] macro for
|
|
defining simple [link parsing_error_message parsing error message]s.
|
|
|
|
[endsect]
|
|
|
|
[section Some examples of simple parsers]
|
|
|
|
* A parser that parses nothing and always succeeds is
|
|
[link return_ `return_`].
|
|
* A parser that always fails is [link fail `fail`].
|
|
* A parser that parses one character and returns the parsed character as the
|
|
result is [link one_char `one_char`].
|
|
|
|
[endsect]
|
|
|
|
[section Combining parsers]
|
|
|
|
Complex parsers can be built by combining simple parsers. The parser library
|
|
contains a number of parser combinators that build new parsers from already
|
|
existing ones.
|
|
|
|
For example
|
|
[link accept_when `accept_when`]`<Parser, Predicate, RejectErrorMsg>` is a
|
|
parser. It uses `Parser` to parse the input. When `Parser` rejects the input,
|
|
the combinator returns the error `Parser` failed with. When `Parser` is
|
|
successful, the combinator validates the result using `Predicate`. If the
|
|
predicate returns true, the combinator accepts the input, otherwise it generates
|
|
an error with the message `RejectErrorMsg`.
|
|
|
|
Having [link accept_when `accept_when`], [link one_char `one_char`] can be
|
|
used to build parsers that accept only digit characters, only whitespaces, etc.
|
|
For example [link digit `digit`] accepts only digit characters:
|
|
|
|
typedef
|
|
boost::metaparse::accept_when<
|
|
boost::metaparse::one_char,
|
|
boost::metaparse::util::is_digit,
|
|
boost::metaparse::errors::digit_expected
|
|
>
|
|
digit;
|
|
|
|
[endsect]
|
|
|
|
[section Sequence]
|
|
|
|
The result of a successful parsing is some value and the remaining string that
|
|
was not parsed. The remaining string can be processed by another parser. The
|
|
parser library provides a parser combinator, [link sequence `sequence`],
|
|
that takes a number of parsers as arguments and builds a new parser from them
|
|
that:
|
|
|
|
* Parses the input using the first parser
|
|
* If parsing succeeds, it parses the remaining string with the second parser
|
|
* It continues applying the parsers in order as long as they succeed
|
|
* If all of them succeed, it returns the list of results
|
|
* If any of the parsers fails, the combinator fails as well and returns the
|
|
error the first failing parser returned with
|
|
|
|
[endsect]
|
|
|
|
[#repetition]
|
|
[section Repetition]
|
|
|
|
It is a common thing to parse a list of things of unknown length. As an example
|
|
let's start with something simple: the text is a list of numbers. For example:
|
|
|
|
11 13 3 21
|
|
|
|
We want the result of parsing to be the sum of these values. Metaparse provides
|
|
the [link int_ `int_`] parser we can use to parse one of these numbers.
|
|
Metaparse provides the [link token `token`] combinator to consume the
|
|
whitespaces after the number. So the following parser parses one number and the
|
|
whitespaces after it:
|
|
|
|
using int_token = token<int_>;
|
|
|
|
The result of parsing is a boxed integer value: the value of the parsed number.
|
|
For example parsing
|
|
[link BOOST_METAPARSE_STRING `BOOST_METAPARSE_STRING`]`("13 ")` gives
|
|
`boost::mpl::int_<13>` as the result.
|
|
|
|
Our example input is a list of numbers. Each number can be parsed by
|
|
`int_token`:
|
|
|
|
[$images/metaparse/repeated_diag0.png [width 70%]]
|
|
|
|
This diagram shows how the repeated application of `int_token` can parse the
|
|
example input. Metaparse provides the [link repeated `repeated`] parser to
|
|
easily implement this. The result of parsing is a typelist: the list of the
|
|
individual numbers.
|
|
|
|
[$images/metaparse/repeated_diag1.png [width 70%]]
|
|
|
|
This diagram shows how [link repeated `repeated`]`<int_token>` works. It uses
|
|
the `int_token` parser repeatedly and builds a `boost::mpl::vector` from the
|
|
results it provides.
|
|
|
|
But we need the sum of these, so we need to summarise the result. We can do this
|
|
by wrapping our parser, [link repeated `repeated`]`<int_token>` with
|
|
[link transform `transform`]. That gives us the opportunity to specify a
|
|
function transforming this typelist to some other value - the sum of the
|
|
elements in our case. Initially let's ignore how to summarise the elements in
|
|
the vector. Let's assume that it can be implemented by a lambda expression and
|
|
use `boost::mpl::lambda<...>::type` representing that lambda expression. Here is
|
|
an example using [link transform `transform`] and this lambda expression:
|
|
|
|
using sum_parser =
|
|
transform<
|
|
repeated<int_token>,
|
|
boost::mpl::lambda<...>::type
|
|
>;
|
|
|
|
The [link transform `transform`]`<>` parser combinator wraps the
|
|
[link repeated `repeated`]`<int_token>` to build the parser we need. Here is a
|
|
diagram showing how it works:
|
|
|
|
[$images/metaparse/repeated_diag2.png [width 70%]]
|
|
|
|
As the diagram shows, the
|
|
[link transform `transform`]`<`[link repeated `repeated`]`<int_token>, ...>`
|
|
parser parses the input using [link repeated `repeated`]`<int_token>` and then
|
|
does some processing on the result of parsing.
|
|
|
|
Let's implement the missing lambda expression that tells
|
|
[link transform `transform`] how to change the result coming from
|
|
[link repeated `repeated`]`<int_token>`. We can summarise the numbers in a
|
|
typelist by using Boost.MPL's `fold` or `accumulate`. Here is an example doing
|
|
that:
|
|
|
|
using sum_op = mpl::lambda<mpl::plus<mpl::_1, mpl::_2>>::type;
|
|
|
|
using sum_parser =
|
|
transform<
|
|
repeated<int_token>,
|
|
mpl::lambda<
|
|
mpl::fold<mpl::_1, mpl::int_<0>, sum_op>
|
|
>::type
|
|
>;
|
|
|
|
Here is an extended version of the above diagram showing what happens here:
|
|
|
|
[$images/metaparse/repeated_diag3.png [width 70%]]
|
|
|
|
This example parses the input, builds the list of numbers and then loops over it
|
|
and summarises the values. It starts with the second argument of `fold`,
|
|
`int_<0>` and adds every item of the list of numbers (which is the result of
|
|
the parser [link repeated `repeated`]`<int_token>`) one by one.
|
|
|
|
[note
|
|
Note that [link transform `transform`] wraps another parser,
|
|
[link repeated `repeated`]`<int_token>` here. It parses the input with that
|
|
parser, gets the result of that parsing and changes that result.
|
|
[link transform `transform`] itself will be a parser returning that updated
|
|
result.
|
|
]
|
|
|
|
[#introducing-foldl]
|
|
[section Introducing foldl]
|
|
|
|
It works, however, this is rather inefficient: it has a loop parsing the
|
|
integers one by one, building a typelist and then it loops over this typelist to
|
|
summarise the result. Using template metaprograms in your applications can have
|
|
a serious impact on the compiler's memory usage and the speed of the
|
|
compilation, therefore I recommend being careful with these things.
|
|
|
|
Metaparse offers more efficient ways of achieving the same result. You don't
|
|
need two loops: you can merge them together and add every number to your summary
|
|
right after parsing it. Metaparse offers the [link foldl `foldl`] for this.
|
|
|
|
With [link foldl `foldl`] you specify:
|
|
|
|
* the parser to parse the individual elements of the list
|
|
(which is `int_token` in our example)
|
|
* the initial value used for folding (which is `int_<0>` in our example)
|
|
* the forward operation merging the sub-result we have so far and the value
|
|
coming from the last application of the parser (this was `sum_op` in our
|
|
example)
|
|
|
|
Our parser can be implemented this way:
|
|
|
|
using better_sum_parser = foldl<int_token, mpl::int_<0>, sum_op>;
|
|
|
|
As you can see the implementation of the parser is more compact.
|
|
Here is a diagram showing what happens when you use this parser to parse some
|
|
input:
|
|
|
|
[$images/metaparse/foldl_diag1.png [width 70%]]
|
|
|
|
As you can see, not only the implementation of the parser is more compact, but
|
|
it achieves the same result by doing less as well. It parses the input by
|
|
applying `int_token` repeatedly, just like the previous solution. But it
|
|
produces the final result without building a typelist as an internal step. Here
|
|
is how it works internally:
|
|
|
|
[$images/metaparse/foldl_diag2.png [width 70%]]
|
|
|
|
It summarises the results of the repeated `int_token` application using
|
|
`sum_op`. This implementation is more efficient. It accepts an empty string as a
|
|
valid input: the sum of it is `0`. It may be good for you, in which case you are
|
|
done. If you don't wan to accept it, you can use [link foldl1 `foldl1`] instead
|
|
of [link foldl `foldl`]. This is the same, but it rejects empty input.
|
|
(Metaparse offers [link repeated1 `repeated1`] as well if you choose the first
|
|
approach and would like to reject empty string)
|
|
|
|
[endsect]
|
|
|
|
[#introducing-foldr]
|
|
[section Introducing foldr]
|
|
|
|
[note
|
|
Note that if you are reading this manual for the first time, you probably want
|
|
to skip this section and proceed with
|
|
[link introducing-foldl_start_with_parser Introducing foldl_start_with_parser]
|
|
]
|
|
|
|
You might have noticed that Metaparse offers [link foldr `foldr`] as well. The
|
|
difference between [link foldl `foldl`] and [link foldr `foldr`] is the
|
|
direction in which the results are summarised. (`l` stands for ['from the Left]
|
|
and `r` stands for ['from the Right]) Here is a diagram showing how
|
|
`better_sum_parser` works if it is implemented using [link foldr `foldr`]:
|
|
|
|
[$images/metaparse/foldr_diag1.png [width 70%]]
|
|
|
|
As you can see this is very similar to using [link foldl `foldl`], but the
|
|
results coming out of the individual applications of `int_token` are summarised
|
|
in a right-to-left order. As `sum_op` is addition, it does not affect the end
|
|
result, but in other cases it might.
|
|
|
|
[note
|
|
Note that the implementation of [link foldl `foldl`] is more efficient than
|
|
[link foldr `foldr`]. Prefer [link foldl `foldl`] whenever possible.
|
|
]
|
|
|
|
As you might expect it, Metaparse offers [link foldr1 `foldr1`] as well, which
|
|
folds from the right and rejects empty input.
|
|
|
|
[endsect]
|
|
|
|
[#introducing-foldl_start_with_parser]
|
|
[section Introducing foldl_start_with_parser]
|
|
|
|
Let's change the grammar of our little language. Instead of a list of numbers,
|
|
let's expect numbers separated by a `+` symbol. Our example input becomes the
|
|
following:
|
|
|
|
BOOST_METAPARSE_STRING("11 + 13 + 3 + 21")
|
|
|
|
Parsing it with [link foldl `foldl`] or [link repeated `repeated`] is difficult:
|
|
there has to be a `+` symbol before every element ['except] the first one. None
|
|
of the already introduced repetition constructs offer a way of treating the
|
|
first element in a different way.
|
|
|
|
If we forget about the first number for a moment, the rest of the input is
|
|
`"+ 13 + 3 + 21"`. This can easily be parsed by [link foldl `foldl`] (or
|
|
[link repeated `repeated`]):
|
|
|
|
using plus_token = token<lit_c<'+'>>;
|
|
using plus_int = last_of<plus_token, int_token>;
|
|
|
|
using sum_parser2 = foldl<plus_int, int_<0>, sum_op>;
|
|
|
|
It uses `plus_int`, that is [link last_of `last_of`]`<plus_token, int_token>`
|
|
as the parser that is used repeatedly to get the numbers. It does the following:
|
|
|
|
* Uses `plus_token` to parse the `+` symbol and any whitespace that might follow
|
|
it.
|
|
* Uses then `int_token` to parse the number
|
|
* Combines the above two with [link last_of `last_of`] to use both parsers in
|
|
order and keep only the result of using the second one (the result of parsing
|
|
the `+` symbol is thrown away - we don't care about it).
|
|
|
|
This way [link last_of `last_of`]`<plus_token, int_token>` returns the value of
|
|
the number as the result of parsing, just like our previous parser, `int_token`
|
|
did. Because of this, it can be used as a drop-in replacement of `int_token` in
|
|
the previous example and we get a parser for our updated language. Or at least
|
|
for all number except the first one.
|
|
|
|
This [link foldl `foldl`] can not parse the first element, because it expects a
|
|
`+` symbol before every number. You might think of making the `+` symbol
|
|
optional in the above approach - don't do that. It makes the parser accept
|
|
`"11 + 13 3 21"` as well as the `+` symbol is now optional ['everywhere].
|
|
|
|
What you could do is parsing the first element with `int_token`, the rest of
|
|
the elements with the above [link foldl `foldl`]-based solution and add the
|
|
result of the two. This is left as an exercise to the reader.
|
|
|
|
Metaparse offers [link foldl_start_with_parser `foldl_start_with_parser`] to
|
|
implement this. [link foldl_start_with_parser `foldl_start_with_parser`] is the
|
|
same as [link foldl `foldl`]. The difference is that instead of an initial value
|
|
to combine the list elements with it takes an ['initial parser]:
|
|
|
|
using plus_token = token<lit_c<'+'>>;
|
|
using plus_int = last_of<plus_token, int_token>;
|
|
|
|
using sum_parser3 = foldl_start_with_parser<plus_int, int_token, sum_op>;
|
|
|
|
[link foldl_start_with_parser `foldl_start_with_parser`] starts with applying
|
|
that initial parser and uses the result it returns as the initial value for
|
|
folding. It does the same as [link foldl `foldl`] after that. The following
|
|
diagram shows how it can be used to parse a list of numbers separated by `+`
|
|
symbols:
|
|
|
|
[$images/metaparse/foldl_start_with_parser_diag1.png [width 70%]]
|
|
|
|
As the diagram shows, it start parsing the list of numbers with `int_token`,
|
|
uses its value as the starting value for folding (earlier approaches were using
|
|
the value `int_<0>` as this starting value). Then it parses all elements of the
|
|
list by using `plus_int` multiple times.
|
|
|
|
[endsect]
|
|
|
|
[#introducing-foldr_start_with_parser]
|
|
[section Introducing foldr_start_with_parser]
|
|
|
|
[note
|
|
Note that if you are reading this manual for the first time, you probably want
|
|
to skip this section and try creating some parsers using
|
|
[link foldl_start_with_parser `foldl_start_with_parser`] instead.
|
|
]
|
|
|
|
[@foldl_start_with_parser.hpp `foldl_start_with_parser`] has its
|
|
['from the right] pair,
|
|
[link foldr_start_with_parser `foldr_start_with_parser`]. It uses the same
|
|
elements as [link foldl_start_with_parser `foldl_start_with_parser`] but in a
|
|
different order. Here is a parser for our example language implemented with
|
|
[link foldr_start_with_parser `foldr_start_with_parser`]:
|
|
|
|
using plus_token = token<lit_c<'+'>>;
|
|
using int_plus = first_of<int_token, plus_token>;
|
|
|
|
using sum_parser4 = foldr_start_with_parser<int_plus, int_token, sum_op>;
|
|
|
|
Note that it uses `int_plus` instead of `plus_int`. This is because the parser
|
|
the initial value for folding comes from is used after `int_plus` has parsed the
|
|
input as many times as it could. It might sound strange for the first time, but
|
|
the following diagram should help you understand how it works:
|
|
|
|
[$images/metaparse/foldr_start_with_parser_diag1.png [width 70%]]
|
|
|
|
As you can see, it starts with the parser that is applied repeatedly on the
|
|
input, thus instead of parsing `plus_token int_token` repeatedly, we need to
|
|
parse `int_token plus_token` repeatedly. The last number is not followed by `+`,
|
|
thus `int_plus` fails to parse it and it stops the iteration.
|
|
[link foldr_start_with_parser `foldr_start_with_parser`] then uses the other
|
|
parser, `int_token` to parse the input. It succeeds and the result it returns is
|
|
used as the starting value for folding from the right.
|
|
|
|
[note
|
|
Note that as the above description also suggests, the implementation of
|
|
[link foldl_start_with_parser `foldl_start_with_parser`] is more efficient
|
|
than [link foldr_start_with_parser `foldr_start_with_parser`]. Prefer
|
|
[link foldl_start_with_parser `foldl_start_with_parser`] whenever possible.
|
|
]
|
|
|
|
[endsect]
|
|
|
|
[#introducing-foldl_reject_incomplete_start_with_parser]
|
|
[section Introducing foldl_reject_incomplete_start_with_parser]
|
|
|
|
Using a parser built with
|
|
[link foldl_start_with_parser `foldl_start_with_parser`] we can parse the input
|
|
when the input is correct. However, it is not always the case. Consider the
|
|
following input for example:
|
|
|
|
BOOST_METAPARSE_STRING("11 + 13 + 3 + 21 +")
|
|
|
|
This is an invalid expression. However, if we parse it using the
|
|
[link foldl_start_with_parser `foldl_start_with_parser`]-based parser presented
|
|
earlier (`sum_parser3`), it accepts the input and the result is `48`. This is
|
|
because [link foldl_start_with_parser `foldl_start_with_parser`] parses the
|
|
input ['as long as it can]. It parses the first`int_token` (`11`) and then it
|
|
starts parsing the `plus_int` elements (`+ 13`, `+ 3`, `+ 21`). After parsing
|
|
all of these, it tries to parse the remaining `" +"` input using `plus_int`
|
|
which fails and therefore
|
|
[link foldl_start_with_parser `foldl_start_with_parser`] stops after `+ 21`.
|
|
|
|
The problem is that the parser parses the longest sub-expression starting from
|
|
the beginning, that represents a valid expression. The rest is ignored. The
|
|
parser can be wrapped by [link entire_input `entire_input`] to make sure to
|
|
reject expressions with invalid extra characters at the end, however, that
|
|
won't make the error message useful. ([link entire_input `entire_input`] can
|
|
only tell the author of the invalid expression that after `+ 21` is something
|
|
wrong).
|
|
|
|
Metaparse provides
|
|
[link foldl_reject_incomplete_start_with_parser `foldl_reject_incomplete_start_with_parser`],
|
|
which does the same as [link foldl_start_with_parser `foldl_start_with_parser`],
|
|
except that once no further repetitions are found, it checks ['where] the
|
|
repeated parser (in our example `plus_int`) fails. When it can make any progress
|
|
(eg. it finds a `+` symbol), then
|
|
[link foldl_reject_incomplete_start_with_parser `foldl_reject_incomplete_start_with_parser`]
|
|
assumes, that the expression's author intended to make the repetition longer,
|
|
but made a mistake and propagates the error message coming from that last broken
|
|
expression.
|
|
|
|
[$images/metaparse/foldl_reject_incomplete_start_with_parser_diag1.png [width 70%]]
|
|
|
|
The above diagram shows how
|
|
[link foldl_reject_incomplete_start_with_parser `foldl_reject_incomplete_start_with_parser`]
|
|
parses the example invalid input and how it fails. This can be used for better
|
|
error reporting from the parsers.
|
|
|
|
Other folding parsers also have their `f` version. (eg.
|
|
[link foldr_reject_incomplete `foldr_reject_incomplete`],
|
|
[link foldl_reject_incomplete1 `foldl_reject_incomplete1`], etc).
|
|
|
|
[endsect]
|
|
|
|
[#finding-the-right-folding-parser-combinator]
|
|
[section Finding the right folding parser combinator]
|
|
|
|
As you might have noticed, there are a lot of different folding parser
|
|
combinators. To help you find the right one, the following naming convention is
|
|
used:
|
|
|
|
[$images/metaparse/folds.png [width 70%]]
|
|
|
|
[note
|
|
Note that there is no `foldr_reject_incomplete_start_with_parser`. The `p`
|
|
version of the right-folding parsers applies the special parser, whose result
|
|
is the initial value, after the repeated elements. Therefore, when the parser
|
|
parsing one repeated element fails, `foldr_start_with_parser` would apply that
|
|
special final parser instead of checking how the repeated element's parser
|
|
failed.
|
|
]
|
|
|
|
[endsect]
|
|
|
|
[endsect]
|
|
|
|
[#result_types]
|
|
[section What can be built from a compile-time string?]
|
|
|
|
Parsers built using Metaparse are template metaprograms parsing text (or code)
|
|
at compile-time. Here is a list of things that can be the "result" of parsing:
|
|
|
|
* A ['type]. An example for this is a parser parsing a `printf` format string
|
|
and returning the typelist (eg. `boost::mpl::vector`) of the expected
|
|
arguments.
|
|
* A ['constant value]. An example for this is the result of a calculator
|
|
language. See the [link getting_started Getting Started] section for further
|
|
details.
|
|
* A ['runtime object]. A static runtime object can be generated that might be
|
|
used at runtime. An example for this is parsing regular expressions at
|
|
compile-time and building `boost::xpressive::sregex` objects. See the
|
|
`regex` example of Metaparse for an example.
|
|
* A C++ ['function], which might be called at runtime. A C++ function can be
|
|
generated that can be called at runtime. It is good for generating native
|
|
(and optimised) code from EDSLs. See the `compile_to_native_code` example of
|
|
Metaparse as an example for this.
|
|
* A [link metafunction_class ['template metafunction class]]. The result of
|
|
parsing might be a type, which is a
|
|
[link metafunction_class template metafunction class]. This is good for
|
|
building an EDSL for template metaprogramming. See the `meta_hs` example of
|
|
Metaparse as an example for this.
|
|
|
|
[endsect]
|
|
|
|
[section Grammars]
|
|
|
|
Metaparse provides a way to define grammars in a syntax that resembles EBNF. The
|
|
[link grammar `grammar`] template can be used to define a grammar. It can be
|
|
used the following way:
|
|
|
|
grammar<BOOST_METAPARSE_STRING("plus_exp")>
|
|
::import<BOOST_METAPARSE_STRING("int_token"), token<int_>>::type
|
|
|
|
::rule<BOOST_METAPARSE_STRING("ws ::= (' ' | '\n' | '\r' | '\t')*")>::type
|
|
::rule<BOOST_METAPARSE_STRING("plus_token ::= '+' ws"), front<_1>>::type
|
|
::rule<BOOST_METAPARSE_STRING("plus_exp ::= int_token (plus_token int_token)*"), plus_action>::type
|
|
|
|
The code above defines a parser from a grammar definition. The start symbol of
|
|
the grammar is `plus_exp`. The lines beginning with `::rule` define rules.
|
|
Rules optionally have a semantic action, which is a metafunction class that
|
|
transforms the result of parsing after the rule has been applied.
|
|
Existing parsers can be bound to names and be used in the rules by importing
|
|
them. Lines beginning with `::import` bind existing parsers to names.
|
|
|
|
The result of a grammar definition is a parser which can be given to other
|
|
parser combinators or be used directly. Given that grammars can import existing
|
|
parsers and build new ones, they are parser combinators as well.
|
|
|
|
[endsect]
|
|
|
|
[endsect]
|
|
|
|
[section Parsing based on `constexpr`]
|
|
|
|
Metaparse is based on template metaprogramming, however, C++11 provides
|
|
`constexpr`, which can be used for parsing at compile-time as well. While
|
|
implementing parsers based on `constexpr` is easier for a C++ developer, since
|
|
its syntax resembles the regular syntax of the language, the result of parsing
|
|
has to be a `constexpr` value. Parsers based on template metaprogramming can
|
|
build types as the result of parsing. These types may be boxed `constexpr`
|
|
values but can be metafunction classes, classes with static functions which can
|
|
be called at runtime, etc.
|
|
|
|
When a parser built with Metaparse needs a sub-parser for processing a part of
|
|
the input text and generating a `constexpr` value as the result of parsing, one
|
|
can implement the sub-parser based on `constexpr` functions. Metaparse
|
|
can be integrated with them and lift their results into C++ template
|
|
metaprogramming. An example demonstrating this feature can be found among the
|
|
examples (`constexpr_parser`). This capability makes it possible to integrate
|
|
Metaparse with parsing libraries based on `constexpr`.
|
|
|
|
[endsect]
|
|
|
|
[section What types of grammars can be used?]
|
|
|
|
It is possible to write parsers for ['context free grammars] using Metaparse.
|
|
However, this is not the most general category of grammars that can be used. As
|
|
Metaparse is a highly extendable framework, it is not clear what should be
|
|
considered to be the limit of Metaparse itself. For example Metaparse provides
|
|
the [link accept_when `accept_when`] [link parser_combinator parser combinator].
|
|
It can be used to provide arbitrary predicates for enabled/disabling a specific
|
|
rule. One can go as far as providing the Turing machine (as a
|
|
[link metafunction metafunction]) of the entire grammar as a predicate, so one
|
|
can build parsers for ['unrestricted grammars] that can be parsed using a Turing
|
|
machine. Note that such a parser would not be considered to be a parser built
|
|
with Metaparse, however, it is not clear how far a solution might go and still
|
|
be considered using Metaparse.
|
|
|
|
Metaparse assumes that the parsers are ['deterministic], as they have only "one"
|
|
result. It is of course possible to write parsers and combinators that return a
|
|
set (or list or some other container) of results as that "one" result, but that
|
|
can be considered building a new parser library. There is no clear boundary for
|
|
Metaparse.
|
|
|
|
Metaparse supports building ['top-down parsers] and ['left-recursion] is not
|
|
supported as it would lead to infinite recursion. ['Right-recursion] is
|
|
supported, however, in most cases the
|
|
[link repetition iterative parser combinators] provide better alternatives.
|
|
|
|
[endsect]
|
|
|
|
[endsect]
|
|
|