83a792d7ed
[SVN r67619]
306 lines
10 KiB
Plaintext
306 lines
10 KiB
Plaintext
[/==============================================================================
|
|
Copyright (C) 2001-2011 Joel de Guzman
|
|
Copyright (C) 2001-2011 Hartmut Kaiser
|
|
|
|
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
===============================================================================/]
|
|
[section:char Character Parsers]
|
|
|
|
This module includes parsers for single characters. Currently, this
|
|
module includes literal chars (e.g. `'x'`, `L'x'`), `char_` (single
|
|
characters, ranges and character sets) and the encoding specific
|
|
character classifiers (`alnum`, `alpha`, `digit`, `xdigit`, etc.).
|
|
|
|
[heading Module Header]
|
|
|
|
// forwards to <boost/spirit/home/qi/char.hpp>
|
|
#include <boost/spirit/include/qi_char.hpp>
|
|
|
|
Also, see __include_structure__.
|
|
|
|
[/------------------------------------------------------------------------------]
|
|
[section:char Character Parser (`char_`, `lit`)]
|
|
|
|
[heading Description]
|
|
|
|
The `char_` parser matches single characters. The `char_` parser has an
|
|
associated __char_encoding_namespace__. This is needed when doing basic
|
|
operations such as inhibiting case sensitivity and dealing with
|
|
character ranges.
|
|
|
|
There are various forms of `char_`.
|
|
|
|
[heading char_]
|
|
|
|
The no argument form of `char_` matches any character in the associated
|
|
__char_encoding_namespace__.
|
|
|
|
char_ // matches any character
|
|
|
|
[heading char_(ch)]
|
|
|
|
The single argument form of `char_` (with a character argument) matches
|
|
the supplied character.
|
|
|
|
char_('x') // matches 'x'
|
|
char_(L'x') // matches L'x'
|
|
char_(x) // matches x (a char)
|
|
|
|
[heading char_(first, last)]
|
|
|
|
`char_` with two arguments, matches a range of characters.
|
|
|
|
char_('a','z') // alphabetic characters
|
|
char_(L'0',L'9') // digits
|
|
|
|
A range of characters is created from a low-high character pair. Such a
|
|
parser matches a single character that is in the range, including both
|
|
endpoints. Note, the first character must be /before/ the second,
|
|
according to the underlying __char_encoding_namespace__.
|
|
|
|
Character mapping is inherently platform dependent. It is not guaranteed
|
|
in the standard for example that `'A' < 'Z'`, that is why in Spirit2, we
|
|
purposely attach a specific __char_encoding_namespace__ (such as ASCII,
|
|
ISO-8859-1) to the `char_` parser to eliminate such ambiguities.
|
|
|
|
[note *Sparse bit vectors*
|
|
|
|
To accommodate 16/32 and 64 bit characters, the char-set statically
|
|
switches from a `std::bitset` implementation when the character type is
|
|
not greater than 8 bits, to a sparse bit/boolean set which uses a sorted
|
|
vector of disjoint ranges (`range_run`). The set is constructed from
|
|
ranges such that adjacent or overlapping ranges are coalesced.
|
|
|
|
`range_runs` are very space-economical in situations where there are lots
|
|
of ranges and a few individual disjoint values. Searching is O(log n)
|
|
where n is the number of ranges.]
|
|
|
|
[heading char_(def)]
|
|
|
|
Lastly, when given a string (a plain C string, a `std::basic_string`,
|
|
etc.), the string is regarded as a char-set definition string following
|
|
a syntax that resembles posix style regular expression character sets
|
|
(except that double quotes delimit the set elements instead of square
|
|
brackets and there is no special negation ^ character). Examples:
|
|
|
|
char_("a-zA-Z") // alphabetic characters
|
|
char_("0-9a-fA-F") // hexadecimal characters
|
|
char_("actgACTG") // DNA identifiers
|
|
char_("\x7f\x7e") // Hexadecimal 0x7F and 0x7E
|
|
|
|
[heading lit(ch)]
|
|
|
|
`lit`, when passed a single character, behaves like the single argument
|
|
`char_` except that `lit` does not synthesize an attribute. A plain
|
|
`char` or `wchar_t` is equivalent to a `lit`.
|
|
|
|
[note `lit` is reused by both the [qi_lit_string string parsers] and the
|
|
char parsers. In general, a char parser is created when you pass in a
|
|
character and a string parser is created when you pass in a string. The
|
|
exception is when you pass a single element literal string, e.g.
|
|
`lit("x")`. In this case, we optimize this to create a char parser
|
|
instead of a string parser.]
|
|
|
|
Examples:
|
|
|
|
'x'
|
|
lit('x')
|
|
lit(L'x')
|
|
lit(c) // c is a char
|
|
|
|
[heading Header]
|
|
|
|
// forwards to <boost/spirit/home/qi/char/char.hpp>
|
|
#include <boost/spirit/include/qi_char_.hpp>
|
|
|
|
Also, see __include_structure__.
|
|
|
|
[heading Namespace]
|
|
|
|
[table
|
|
[[Name]]
|
|
[[`boost::spirit::lit // alias: boost::spirit::qi::lit` ]]
|
|
[[`ns::char_`]]
|
|
]
|
|
|
|
In the table above, `ns` represents a __char_encoding_namespace__.
|
|
|
|
[heading Model of]
|
|
|
|
[:__primitive_parser_concept__]
|
|
|
|
[variablelist Notation
|
|
[[`c`, `f`, `l`] [A literal char, e.g. `'x'`, `L'x'` or anything that can be
|
|
converted to a `char` or `wchar_t`, or a __qi_lazy_argument__
|
|
that evaluates to anything that can be converted to a `char`
|
|
or `wchar_t`.]]
|
|
[[`ns`] [A __char_encoding_namespace__.]]
|
|
[[`cs`] [A __string__ or a __qi_lazy_argument__ that evaluates to a __string__
|
|
that specifies a char-set definition string following a syntax
|
|
that resembles posix style regular expression character sets
|
|
(except the square brackets and the negation `^` character).]]
|
|
[[`cp`] [A char parser, a char range parser or a char set parser.]]
|
|
]
|
|
|
|
[heading Expression Semantics]
|
|
|
|
Semantics of an expression is defined only where it differs from, or is
|
|
not defined in __primitive_parser_concept__.
|
|
|
|
[table
|
|
[[Expression] [Semantics]]
|
|
[[`c`] [Create char parser from a char, `c`.]]
|
|
[[`lit(c)`] [Create a char parser from a char, `c`.]]
|
|
[[`ns::char_`] [Create a char parser that matches any character in the
|
|
`ns` encoding.]]
|
|
[[`ns::char_(c)`] [Create a char parser with `ns` encoding from a char, `c`.]]
|
|
[[`ns::char_(f, l)`][Create a char-range parser that matches characters from
|
|
range (`f` to `l`, inclusive) with `ns` encoding.]]
|
|
[[`ns::char_(cs)`] [Create a char-set parser with `ns` encoding from a char-set
|
|
definition string, `cs`.]]
|
|
[[`~cp`] [Negate `cp`. The result is a negated char parser that
|
|
matches any character in the `ns` encoding except the
|
|
characters matched by `cp`.]]
|
|
]
|
|
|
|
[heading Attributes]
|
|
|
|
[table
|
|
[[Expression] [Attribute]]
|
|
[[`c`] [__unused__ or if `c` is a __qi_lazy_argument__, the character
|
|
type returned by invoking it.]]
|
|
[[`lit(c)`] [__unused__ or if `c` is a __qi_lazy_argument__, the character
|
|
type returned by invoking it.]]
|
|
[[`ns::char_`] [The character type of the __char_encoding_namespace__, `ns`.]]
|
|
[[`ns::char_(c)`] [The character type of the __char_encoding_namespace__, `ns`.]]
|
|
[[`ns::char_(f, l)`][The character type of the __char_encoding_namespace__, `ns`.]]
|
|
[[`ns::char_(cs)`] [The character type of the __char_encoding_namespace__, `ns`.]]
|
|
[[`~cp`] [The attribute of `cp`.]]
|
|
]
|
|
|
|
[heading Complexity]
|
|
|
|
[:*O(N)*, except for char-sets with 16-bit (or more) characters (e.g.
|
|
`wchar_t`). These have *O(log N)* complexity, where N is the number of
|
|
distinct character ranges in the set.]
|
|
|
|
[heading Example]
|
|
|
|
[note The test harness for the example(s) below is presented in the
|
|
__qi_basics_examples__ section.]
|
|
|
|
Some using declarations:
|
|
|
|
[reference_using_declarations_lit_char]
|
|
|
|
Basic literals:
|
|
|
|
[reference_char_literals]
|
|
|
|
Range:
|
|
|
|
[reference_char_range]
|
|
|
|
Character set:
|
|
|
|
[reference_char_set]
|
|
|
|
Lazy char_ using __phoenix__
|
|
|
|
[reference_char_phoenix]
|
|
|
|
[endsect] [/ Char]
|
|
|
|
[/------------------------------------------------------------------------------]
|
|
[section:char_class Character Classification Parsers (`alnum`, `digit`, etc.)]
|
|
|
|
[heading Description]
|
|
|
|
The library has the full repertoire of single character parsers for
|
|
character classification. This includes the usual `alnum`, `alpha`,
|
|
`digit`, `xdigit`, etc. parsers. These parsers have an associated
|
|
__char_encoding_namespace__. This is needed when doing basic operations
|
|
such as inhibiting case sensitivity.
|
|
|
|
[heading Header]
|
|
|
|
// forwards to <boost/spirit/home/qi/char/char_class.hpp>
|
|
#include <boost/spirit/include/qi_char_class.hpp>
|
|
|
|
Also, see __include_structure__.
|
|
|
|
[heading Namespace]
|
|
|
|
[table
|
|
[[Name]]
|
|
[[`ns::alnum`]]
|
|
[[`ns::alpha`]]
|
|
[[`ns::blank`]]
|
|
[[`ns::cntrl`]]
|
|
[[`ns::digit`]]
|
|
[[`ns::graph`]]
|
|
[[`ns::lower`]]
|
|
[[`ns::print`]]
|
|
[[`ns::punct`]]
|
|
[[`ns::space`]]
|
|
[[`ns::upper`]]
|
|
[[`ns::xdigit`]]
|
|
]
|
|
|
|
In the table above, `ns` represents a __char_encoding_namespace__.
|
|
|
|
[heading Model of]
|
|
|
|
[:__primitive_parser_concept__]
|
|
|
|
[variablelist Notation
|
|
[[`ns`] [A __char_encoding_namespace__.]]
|
|
]
|
|
|
|
[heading Expression Semantics]
|
|
|
|
Semantics of an expression is defined only where it differs from, or is
|
|
not defined in __primitive_parser_concept__.
|
|
|
|
[table
|
|
[[Expression] [Semantics]]
|
|
[[`ns::alnum`] [Matches alpha-numeric characters]]
|
|
[[`ns::alpha`] [Matches alphabetic characters]]
|
|
[[`ns::blank`] [Matches spaces or tabs]]
|
|
[[`ns::cntrl`] [Matches control characters]]
|
|
[[`ns::digit`] [Matches numeric digits]]
|
|
[[`ns::graph`] [Matches non-space printing characters]]
|
|
[[`ns::lower`] [Matches lower case letters]]
|
|
[[`ns::print`] [Matches printable characters]]
|
|
[[`ns::punct`] [Matches punctuation symbols]]
|
|
[[`ns::space`] [Matches spaces, tabs, returns, and newlines]]
|
|
[[`ns::upper`] [Matches upper case letters]]
|
|
[[`ns::xdigit`] [Matches hexadecimal digits]]
|
|
]
|
|
|
|
[heading Attributes]
|
|
|
|
[:The character type of the __char_encoding_namespace__, `ns`.]
|
|
|
|
[heading Complexity]
|
|
|
|
[:O(N)]
|
|
|
|
[heading Example]
|
|
|
|
[note The test harness for the example(s) below is presented in the
|
|
__qi_basics_examples__ section.]
|
|
|
|
Some using declarations:
|
|
|
|
[reference_using_declarations_char_class]
|
|
|
|
Basic usage:
|
|
|
|
[reference_char_class]
|
|
|
|
[endsect] [/ Char Classification]
|
|
|
|
[endsect]
|