db22157874
[SVN r57395]
121 lines
3.7 KiB
Plaintext
121 lines
3.7 KiB
Plaintext
[/
|
|
/ Copyright (c) 2009 Eric Niebler
|
|
/
|
|
/ Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
/ file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
/]
|
|
|
|
[section Named Captures]
|
|
|
|
[h2 Overview]
|
|
|
|
For complicated regular expressions, dealing with numbered captures can be a
|
|
pain. Counting left parentheses to figure out which capture to reference is
|
|
no fun. Less fun is the fact that merely editing a regular expression could
|
|
cause a capture to be assigned a new number, invaliding code that refers back
|
|
to it by the old number.
|
|
|
|
Other regular expression engines solve this problem with a feature called
|
|
/named captures/. This feature allows you to assign a name to a capture, and
|
|
to refer back to the capture by name rather by number. Xpressive also supports
|
|
named captures, both in dynamic and in static regexes.
|
|
|
|
[h2 Dynamic Named Captures]
|
|
|
|
For dynamic regular expressions, xpressive follows the lead of other popular
|
|
regex engines with the syntax of named captures. You can create a named capture
|
|
with `"(?P<xxx>...)"` and refer back to that capture with `"(?P=xxx)"`. Here,
|
|
for instance, is a regular expression that creates a named capture and refers
|
|
back to it:
|
|
|
|
// Create a named capture called "char" that matches a single
|
|
// character and refer back to that capture by name.
|
|
sregex rx = sregex::compile("(?P<char>.)(?P=char)");
|
|
|
|
The effect of the above regular expression is to find the first doubled
|
|
character.
|
|
|
|
Once you have executed a match or search operation using a regex with named
|
|
captures, you can access the named capture through the _match_results_ object
|
|
using the capture's name.
|
|
|
|
std::string str("tweet");
|
|
sregex rx = sregex::compile("(?P<char>.)(?P=char)");
|
|
smatch what;
|
|
if(regex_search(str, what, rx))
|
|
{
|
|
std::cout << "char = " << what["char"] << std::endl;
|
|
}
|
|
|
|
The above code displays:
|
|
|
|
[pre
|
|
char = e
|
|
]
|
|
|
|
You can also refer back to a named capture from within a substitution string.
|
|
The syntax for that is `"\\g<xxx>"`. Below is some code that demonstrates how
|
|
to use named captures when doing string substitution.
|
|
|
|
std::string str("tweet");
|
|
sregex rx = sregex::compile("(?P<char>.)(?P=char)");
|
|
str = regex_replace(str, rx, "**\\g<char>**", regex_constants::format_perl);
|
|
std::cout << str << std::endl;
|
|
|
|
Notice that you have to specify `format_perl` when using named captures. Only
|
|
the perl syntax recognizes the `"\\g<xxx>"` syntax. The above code displays:
|
|
|
|
[pre
|
|
tw\*\*e\*\*t
|
|
]
|
|
|
|
[h2 Static Named Captures]
|
|
|
|
If you're using static regular expressions, creating and using named
|
|
captures is even easier. You can use the _mark_tag_ type to create
|
|
a variable that you can use like [globalref boost::xpressive::s1 `s1`],
|
|
[globalref boost::xpressive::s1 `s2`] and friends, but with a name
|
|
that is more meaningful. Below is how the above example would look
|
|
using static regexes:
|
|
|
|
mark_tag char_(1); // char_ is now a synonym for s1
|
|
sregex rx = (char_= _) >> char_;
|
|
|
|
After a match operation, you can use the `mark_tag` to index into the
|
|
_match_results_ to access the named capture:
|
|
|
|
std::string str("tweet");
|
|
mark_tag char_(1);
|
|
sregex rx = (char_= _) >> char_;
|
|
smatch what;
|
|
if(regex_search(str, what, rx))
|
|
{
|
|
std::cout << what[char_] << std::endl;
|
|
}
|
|
|
|
The above code displays:
|
|
|
|
[pre
|
|
char = e
|
|
]
|
|
|
|
When doing string substitutions with _regex_replace_, you can use named
|
|
captures to create /format expressions/ as below:
|
|
|
|
std::string str("tweet");
|
|
mark_tag char_(1);
|
|
sregex rx = (char_= _) >> char_;
|
|
str = regex_replace(str, rx, "**" + char_ + "**");
|
|
std::cout << str << std::endl;
|
|
|
|
The above code displays:
|
|
|
|
[pre
|
|
tw\*\*e\*\*t
|
|
]
|
|
|
|
[note You need to include [^<boost/xpressive/regex_actions.hpp>] to
|
|
use format expressions.]
|
|
|
|
[endsect]
|