spirit/doc/qi/char.qbk
Hartmut Kaiser 83a792d7ed Spirit: updating copyrights
[SVN r67619]
2011-01-03 16:58:38 +00:00

306 lines
10 KiB
Plaintext

[/==============================================================================
Copyright (C) 2001-2011 Joel de Guzman
Copyright (C) 2001-2011 Hartmut Kaiser
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
===============================================================================/]
[section:char Character Parsers]
This module includes parsers for single characters. Currently, this
module includes literal chars (e.g. `'x'`, `L'x'`), `char_` (single
characters, ranges and character sets) and the encoding specific
character classifiers (`alnum`, `alpha`, `digit`, `xdigit`, etc.).
[heading Module Header]
// forwards to <boost/spirit/home/qi/char.hpp>
#include <boost/spirit/include/qi_char.hpp>
Also, see __include_structure__.
[/------------------------------------------------------------------------------]
[section:char Character Parser (`char_`, `lit`)]
[heading Description]
The `char_` parser matches single characters. The `char_` parser has an
associated __char_encoding_namespace__. This is needed when doing basic
operations such as inhibiting case sensitivity and dealing with
character ranges.
There are various forms of `char_`.
[heading char_]
The no argument form of `char_` matches any character in the associated
__char_encoding_namespace__.
char_ // matches any character
[heading char_(ch)]
The single argument form of `char_` (with a character argument) matches
the supplied character.
char_('x') // matches 'x'
char_(L'x') // matches L'x'
char_(x) // matches x (a char)
[heading char_(first, last)]
`char_` with two arguments, matches a range of characters.
char_('a','z') // alphabetic characters
char_(L'0',L'9') // digits
A range of characters is created from a low-high character pair. Such a
parser matches a single character that is in the range, including both
endpoints. Note, the first character must be /before/ the second,
according to the underlying __char_encoding_namespace__.
Character mapping is inherently platform dependent. It is not guaranteed
in the standard for example that `'A' < 'Z'`, that is why in Spirit2, we
purposely attach a specific __char_encoding_namespace__ (such as ASCII,
ISO-8859-1) to the `char_` parser to eliminate such ambiguities.
[note *Sparse bit vectors*
To accommodate 16/32 and 64 bit characters, the char-set statically
switches from a `std::bitset` implementation when the character type is
not greater than 8 bits, to a sparse bit/boolean set which uses a sorted
vector of disjoint ranges (`range_run`). The set is constructed from
ranges such that adjacent or overlapping ranges are coalesced.
`range_runs` are very space-economical in situations where there are lots
of ranges and a few individual disjoint values. Searching is O(log n)
where n is the number of ranges.]
[heading char_(def)]
Lastly, when given a string (a plain C string, a `std::basic_string`,
etc.), the string is regarded as a char-set definition string following
a syntax that resembles posix style regular expression character sets
(except that double quotes delimit the set elements instead of square
brackets and there is no special negation ^ character). Examples:
char_("a-zA-Z") // alphabetic characters
char_("0-9a-fA-F") // hexadecimal characters
char_("actgACTG") // DNA identifiers
char_("\x7f\x7e") // Hexadecimal 0x7F and 0x7E
[heading lit(ch)]
`lit`, when passed a single character, behaves like the single argument
`char_` except that `lit` does not synthesize an attribute. A plain
`char` or `wchar_t` is equivalent to a `lit`.
[note `lit` is reused by both the [qi_lit_string string parsers] and the
char parsers. In general, a char parser is created when you pass in a
character and a string parser is created when you pass in a string. The
exception is when you pass a single element literal string, e.g.
`lit("x")`. In this case, we optimize this to create a char parser
instead of a string parser.]
Examples:
'x'
lit('x')
lit(L'x')
lit(c) // c is a char
[heading Header]
// forwards to <boost/spirit/home/qi/char/char.hpp>
#include <boost/spirit/include/qi_char_.hpp>
Also, see __include_structure__.
[heading Namespace]
[table
[[Name]]
[[`boost::spirit::lit // alias: boost::spirit::qi::lit` ]]
[[`ns::char_`]]
]
In the table above, `ns` represents a __char_encoding_namespace__.
[heading Model of]
[:__primitive_parser_concept__]
[variablelist Notation
[[`c`, `f`, `l`] [A literal char, e.g. `'x'`, `L'x'` or anything that can be
converted to a `char` or `wchar_t`, or a __qi_lazy_argument__
that evaluates to anything that can be converted to a `char`
or `wchar_t`.]]
[[`ns`] [A __char_encoding_namespace__.]]
[[`cs`] [A __string__ or a __qi_lazy_argument__ that evaluates to a __string__
that specifies a char-set definition string following a syntax
that resembles posix style regular expression character sets
(except the square brackets and the negation `^` character).]]
[[`cp`] [A char parser, a char range parser or a char set parser.]]
]
[heading Expression Semantics]
Semantics of an expression is defined only where it differs from, or is
not defined in __primitive_parser_concept__.
[table
[[Expression] [Semantics]]
[[`c`] [Create char parser from a char, `c`.]]
[[`lit(c)`] [Create a char parser from a char, `c`.]]
[[`ns::char_`] [Create a char parser that matches any character in the
`ns` encoding.]]
[[`ns::char_(c)`] [Create a char parser with `ns` encoding from a char, `c`.]]
[[`ns::char_(f, l)`][Create a char-range parser that matches characters from
range (`f` to `l`, inclusive) with `ns` encoding.]]
[[`ns::char_(cs)`] [Create a char-set parser with `ns` encoding from a char-set
definition string, `cs`.]]
[[`~cp`] [Negate `cp`. The result is a negated char parser that
matches any character in the `ns` encoding except the
characters matched by `cp`.]]
]
[heading Attributes]
[table
[[Expression] [Attribute]]
[[`c`] [__unused__ or if `c` is a __qi_lazy_argument__, the character
type returned by invoking it.]]
[[`lit(c)`] [__unused__ or if `c` is a __qi_lazy_argument__, the character
type returned by invoking it.]]
[[`ns::char_`] [The character type of the __char_encoding_namespace__, `ns`.]]
[[`ns::char_(c)`] [The character type of the __char_encoding_namespace__, `ns`.]]
[[`ns::char_(f, l)`][The character type of the __char_encoding_namespace__, `ns`.]]
[[`ns::char_(cs)`] [The character type of the __char_encoding_namespace__, `ns`.]]
[[`~cp`] [The attribute of `cp`.]]
]
[heading Complexity]
[:*O(N)*, except for char-sets with 16-bit (or more) characters (e.g.
`wchar_t`). These have *O(log N)* complexity, where N is the number of
distinct character ranges in the set.]
[heading Example]
[note The test harness for the example(s) below is presented in the
__qi_basics_examples__ section.]
Some using declarations:
[reference_using_declarations_lit_char]
Basic literals:
[reference_char_literals]
Range:
[reference_char_range]
Character set:
[reference_char_set]
Lazy char_ using __phoenix__
[reference_char_phoenix]
[endsect] [/ Char]
[/------------------------------------------------------------------------------]
[section:char_class Character Classification Parsers (`alnum`, `digit`, etc.)]
[heading Description]
The library has the full repertoire of single character parsers for
character classification. This includes the usual `alnum`, `alpha`,
`digit`, `xdigit`, etc. parsers. These parsers have an associated
__char_encoding_namespace__. This is needed when doing basic operations
such as inhibiting case sensitivity.
[heading Header]
// forwards to <boost/spirit/home/qi/char/char_class.hpp>
#include <boost/spirit/include/qi_char_class.hpp>
Also, see __include_structure__.
[heading Namespace]
[table
[[Name]]
[[`ns::alnum`]]
[[`ns::alpha`]]
[[`ns::blank`]]
[[`ns::cntrl`]]
[[`ns::digit`]]
[[`ns::graph`]]
[[`ns::lower`]]
[[`ns::print`]]
[[`ns::punct`]]
[[`ns::space`]]
[[`ns::upper`]]
[[`ns::xdigit`]]
]
In the table above, `ns` represents a __char_encoding_namespace__.
[heading Model of]
[:__primitive_parser_concept__]
[variablelist Notation
[[`ns`] [A __char_encoding_namespace__.]]
]
[heading Expression Semantics]
Semantics of an expression is defined only where it differs from, or is
not defined in __primitive_parser_concept__.
[table
[[Expression] [Semantics]]
[[`ns::alnum`] [Matches alpha-numeric characters]]
[[`ns::alpha`] [Matches alphabetic characters]]
[[`ns::blank`] [Matches spaces or tabs]]
[[`ns::cntrl`] [Matches control characters]]
[[`ns::digit`] [Matches numeric digits]]
[[`ns::graph`] [Matches non-space printing characters]]
[[`ns::lower`] [Matches lower case letters]]
[[`ns::print`] [Matches printable characters]]
[[`ns::punct`] [Matches punctuation symbols]]
[[`ns::space`] [Matches spaces, tabs, returns, and newlines]]
[[`ns::upper`] [Matches upper case letters]]
[[`ns::xdigit`] [Matches hexadecimal digits]]
]
[heading Attributes]
[:The character type of the __char_encoding_namespace__, `ns`.]
[heading Complexity]
[:O(N)]
[heading Example]
[note The test harness for the example(s) below is presented in the
__qi_basics_examples__ section.]
Some using declarations:
[reference_using_declarations_char_class]
Basic usage:
[reference_char_class]
[endsect] [/ Char Classification]
[endsect]