xpressive/doc/traits.qbk
2014-01-11 00:44:37 -08:00

94 lines
3.8 KiB
Plaintext

[/
/ Copyright (c) 2008 Eric Niebler
/
/ Distributed under the Boost Software License, Version 1.0. (See accompanying
/ file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
/]
[section Localization and Regex Traits]
[h2 Overview]
Matching a regular expression against a string often requires locale-dependent information. For example,
how are case-insensitive comparisons performed? The locale-sensitive behavior is captured in a traits class.
xpressive provides three traits class templates: `cpp_regex_traits<>`, `c_regex_traits<>` and `null_regex_traits<>`.
The first wraps a `std::locale`, the second wraps the global C locale, and the third is a stub traits type for
use when searching non-character data. All traits templates conform to the
[link boost_xpressive.user_s_guide.concepts.traits_requirements Regex Traits Concept].
[h2 Setting the Default Regex Trait]
By default, xpressive uses `cpp_regex_traits<>` for all patterns. This causes all regex objects to use
the global `std::locale`. If you compile with `BOOST_XPRESSIVE_USE_C_TRAITS` defined, then xpressive will use
`c_regex_traits<>` by default.
[h2 Using Custom Traits with Dynamic Regexes]
To create a dynamic regex that uses a custom traits object, you must use _regex_compiler_.
The basic steps are shown in the following example:
// Declare a regex_compiler that uses the global C locale
regex_compiler<char const *, c_regex_traits<char> > crxcomp;
cregex crx = crxcomp.compile( "\\w+" );
// Declare a regex_compiler that uses a custom std::locale
std::locale loc = /* ... create a locale here ... */;
regex_compiler<char const *, cpp_regex_traits<char> > cpprxcomp(loc);
cregex cpprx = cpprxcomp.compile( "\\w+" );
The `regex_compiler` objects act as regex factories. Once they have been imbued with a locale,
every regex object they create will use that locale.
[h2 Using Custom Traits with Static Regexes]
If you want a particular static regex to use a different set of traits, you can use the special `imbue()`
pattern modifier. For instance:
// Define a regex that uses the global C locale
c_regex_traits<char> ctraits;
sregex crx = imbue(ctraits)( +_w );
// Define a regex that uses a customized std::locale
std::locale loc = /* ... create a locale here ... */;
cpp_regex_traits<char> cpptraits(loc);
sregex cpprx1 = imbue(cpptraits)( +_w );
// A shorthand for above
sregex cpprx2 = imbue(loc)( +_w );
The `imbue()` pattern modifier must wrap the entire pattern. It is an error to `imbue` only
part of a static regex. For example:
// ERROR! Cannot imbue() only part of a regex
sregex error = _w >> imbue(loc)( _w );
[h2 Searching Non-Character Data With [^null_regex_traits]]
With xpressive static regexes, you are not limitted to searching for patterns in character sequences.
You can search for patterns in raw bytes, integers, or anything that conforms to the
[link boost_xpressive.user_s_guide.concepts.chart_requirements Char Concept]. The `null_regex_traits<>` makes it simple. It is a
stub implementation of the [link boost_xpressive.user_s_guide.concepts.traits_requirements Regex Traits Concept]. It recognizes
no character classes and does no case-sensitive mappings.
For example, with `null_regex_traits<>`, you can write a static regex to find a pattern in a
sequence of integers as follows:
// some integral data to search
int const data[] = {0, 1, 2, 3, 4, 5, 6};
// create a null_regex_traits<> object for searching integers ...
null_regex_traits<int> nul;
// imbue a regex object with the null_regex_traits ...
basic_regex<int const *> rex = imbue(nul)(1 >> +((set= 2,3) | 4) >> 5);
match_results<int const *> what;
// search for the pattern in the array of integers ...
regex_search(data, data + 7, what, rex);
assert(what[0].matched);
assert(*what[0].first == 1);
assert(*what[0].second == 6);
[endsect]