8dbe26f290
[SVN r46362]
315 lines
12 KiB
Plaintext
315 lines
12 KiB
Plaintext
[/
|
|
/ Copyright (c) 2008 Eric Niebler
|
|
/
|
|
/ Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
/ file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
/]
|
|
|
|
[section String Substitutions]
|
|
|
|
Regular expressions are not only good for searching text; they're good at ['manipulating] it. And one of the
|
|
most common text manipulation tasks is search-and-replace. xpressive provides the _regex_replace_ algorithm for
|
|
searching and replacing.
|
|
|
|
[h2 regex_replace()]
|
|
|
|
Performing search-and-replace using _regex_replace_ is simple. All you need is an input sequence, a regex object,
|
|
and a format string or a formatter object. There are several versions of the _regex_replace_ algorithm. Some accept
|
|
the input sequence as a bidirectional container such as `std::string` and returns the result in a new container
|
|
of the same type. Others accept the input as a null terminated string and return a `std::string`. Still others
|
|
accept the input sequence as a pair of iterators and writes the result into an output iterator. The substitution
|
|
may be specified as a string with format sequences or as a formatter object. Below are some simple examples of
|
|
using string-based substitutions.
|
|
|
|
std::string input("This is his face");
|
|
sregex re = as_xpr("his"); // find all occurrences of "his" ...
|
|
std::string format("her"); // ... and replace them with "her"
|
|
|
|
// use the version of regex_replace() that operates on strings
|
|
std::string output = regex_replace( input, re, format );
|
|
std::cout << output << '\n';
|
|
|
|
// use the version of regex_replace() that operates on iterators
|
|
std::ostream_iterator< char > out_iter( std::cout );
|
|
regex_replace( out_iter, input.begin(), input.end(), re, format );
|
|
|
|
The above program prints out the following:
|
|
|
|
[pre
|
|
Ther is her face
|
|
Ther is her face
|
|
]
|
|
|
|
Notice that ['all] the occurrences of `"his"` have been replaced with `"her"`.
|
|
|
|
Click [link boost_xpressive.user_s_guide.examples.replace_all_sub_strings_that_match_a_regex here] to see
|
|
a complete example program that shows how to use _regex_replace_. And check the _regex_replace_ reference
|
|
to see a complete list of the available overloads.
|
|
|
|
[h2 Replace Options]
|
|
|
|
The _regex_replace_ algorithm takes an optional bitmask parameter to control the formatting. The
|
|
possible values of the bitmask are:
|
|
|
|
[table Format Flags
|
|
[[Flag] [Meaning]]
|
|
[[`format_default`] [Recognize the ECMA-262 format sequences (see below).]]
|
|
[[`format_first_only`] [Only replace the first match, not all of them.]]
|
|
[[`format_no_copy`] [Don't copy the parts of the input sequence that didn't match the regex
|
|
to the output sequence.]]
|
|
[[`format_literal`] [Treat the format string as a literal; that is, don't recognize any
|
|
escape sequences.]]
|
|
[[`format_perl`] [Recognize the Perl format sequences (see below).]]
|
|
[[`format_sed`] [Recognize the sed format sequences (see below).]]
|
|
[[`format_all`] [In addition to the Perl format sequences, recognize some
|
|
Boost-specific format sequences.]]
|
|
]
|
|
|
|
These flags live in the `xpressive::regex_constants` namespace. If the substitution parameter is
|
|
a function object instead of a string, the flags `format_literal`, `format_perl`, `format_sed`, and
|
|
`format_all` are ignored.
|
|
|
|
[h2 The ECMA-262 Format Sequences]
|
|
|
|
When you haven't specified a substitution string dialect with one of the format flags above,
|
|
you get the dialect defined by ECMA-262, the standard for ECMAScript. The table below shows
|
|
the escape sequences recognized in ECMA-262 mode.
|
|
|
|
[table Format Escape Sequences
|
|
[[Escape Sequence] [Meaning]]
|
|
[[[^$1], [^$2], etc.] [the corresponding sub-match]]
|
|
[[[^$&]] [the full match]]
|
|
[[[^$\`]] [the match prefix]]
|
|
[[[^$']] [the match suffix]]
|
|
[[[^$$]] [a literal `'$'` character]]
|
|
]
|
|
|
|
Any other sequence beginning with `'$'` simply represents itself. For example, if the format string were
|
|
`"$a"` then `"$a"` would be inserted into the output sequence.
|
|
|
|
[h2 The Sed Format Sequences]
|
|
|
|
When specifying the `format_sed` flag to _regex_replace_, the following escape sequences
|
|
are recognized:
|
|
|
|
[table Sed Format Escape Sequences
|
|
[[Escape Sequence] [Meaning]]
|
|
[[[^\\1], [^\\2], etc.] [The corresponding sub-match]]
|
|
[[[^&]] [the full match]]
|
|
[[[^\\a]] [A literal `'\a'`]]
|
|
[[[^\\e]] [A literal `char_type(27)`]]
|
|
[[[^\\f]] [A literal `'\f'`]]
|
|
[[[^\\n]] [A literal `'\n'`]]
|
|
[[[^\\r]] [A literal `'\r'`]]
|
|
[[[^\\t]] [A literal `'\t'`]]
|
|
[[[^\\v]] [A literal `'\v'`]]
|
|
[[[^\\xFF]] [A literal `char_type(0xFF)`, where [^['F]] is any hex digit]]
|
|
[[[^\\x{FFFF}]] [A literal `char_type(0xFFFF)`, where [^['F]] is any hex digit]]
|
|
[[[^\\cX]] [The control character [^['X]]]]
|
|
]
|
|
|
|
[h2 The Perl Format Sequences]
|
|
|
|
When specifying the `format_perl` flag to _regex_replace_, the following escape sequences
|
|
are recognized:
|
|
|
|
[table Perl Format Escape Sequences
|
|
[[Escape Sequence] [Meaning]]
|
|
[[[^$1], [^$2], etc.] [the corresponding sub-match]]
|
|
[[[^$&]] [the full match]]
|
|
[[[^$\`]] [the match prefix]]
|
|
[[[^$']] [the match suffix]]
|
|
[[[^$$]] [a literal `'$'` character]]
|
|
[[[^\\a]] [A literal `'\a'`]]
|
|
[[[^\\e]] [A literal `char_type(27)`]]
|
|
[[[^\\f]] [A literal `'\f'`]]
|
|
[[[^\\n]] [A literal `'\n'`]]
|
|
[[[^\\r]] [A literal `'\r'`]]
|
|
[[[^\\t]] [A literal `'\t'`]]
|
|
[[[^\\v]] [A literal `'\v'`]]
|
|
[[[^\\xFF]] [A literal `char_type(0xFF)`, where [^['F]] is any hex digit]]
|
|
[[[^\\x{FFFF}]] [A literal `char_type(0xFFFF)`, where [^['F]] is any hex digit]]
|
|
[[[^\\cX]] [The control character [^['X]]]]
|
|
[[[^\\l]] [Make the next character lowercase]]
|
|
[[[^\\L]] [Make the rest of the substitution lowercase until the next [^\\E]]]
|
|
[[[^\\u]] [Make the next character uppercase]]
|
|
[[[^\\U]] [Make the rest of the substitution uppercase until the next [^\\E]]]
|
|
[[[^\\E]] [Terminate [^\\L] or [^\\U]]]
|
|
[[[^\\1], [^\\2], etc.] [The corresponding sub-match]]
|
|
[[[^\\g<name>]] [The named backref /name/]]
|
|
]
|
|
|
|
[h2 The Boost-Specific Format Sequences]
|
|
|
|
When specifying the `format_all` flag to _regex_replace_, the escape sequences
|
|
recognized are the same as those above for `format_perl`. In addition, conditional
|
|
expressions of the following form are recognized:
|
|
|
|
[pre
|
|
?Ntrue-expression:false-expression
|
|
]
|
|
|
|
where /N/ is a decimal digit representing a sub-match. If the corresponding sub-match
|
|
participated in the full match, then the substitution is /true-expression/. Otherwise,
|
|
it is /false-expression/. In this mode, you can use parens [^()] for grouping. If you
|
|
want a literal paren, you must escape it as [^\\(].
|
|
|
|
[h2 Formatter Objects]
|
|
|
|
Format strings are not always expressive enough for all your text substitution
|
|
needs. Consider the simple example of wanting to map input strings to output
|
|
strings, as you may want to do with environment variables. Rather than a format
|
|
/string/, for this you would use a formatter /object/. Consider the following
|
|
code, which finds embedded environment variables of the form `"$(XYZ)"` and
|
|
computes the substitution string by looking up the environment variable in a
|
|
map.
|
|
|
|
#include <map>
|
|
#include <string>
|
|
#include <iostream>
|
|
#include <boost/xpressive/xpressive.hpp>
|
|
using namespace boost;
|
|
using namespace xpressive;
|
|
|
|
std::map<std::string, std::string> env;
|
|
|
|
std::string const &format_fun(smatch const &what)
|
|
{
|
|
return env[what[1].str()];
|
|
}
|
|
|
|
int main()
|
|
{
|
|
env["X"] = "this";
|
|
env["Y"] = "that";
|
|
|
|
std::string input("\"$(X)\" has the value \"$(Y)\"");
|
|
|
|
// replace strings like "$(XYZ)" with the result of env["XYZ"]
|
|
sregex envar = "$(" >> (s1 = +_w) >> ')';
|
|
std::string output = regex_replace(input, envar, format_fun);
|
|
std::cout << output << std::endl;
|
|
}
|
|
|
|
In this case, we use a function, `format_fun()` to compute the substitution string
|
|
on the fly. It accepts a _match_results_ object which contains the results of the
|
|
current match. `format_fun()` uses the first submatch as a key into the global `env`
|
|
map. The above code displays:
|
|
|
|
[pre
|
|
"this" has the value "that"
|
|
]
|
|
|
|
The formatter need not be an ordinary function. It may be an object of class type.
|
|
And rather than return a string, it may accept an output iterator into which it
|
|
writes the substitution. Consider the following, which is functionally equivalent
|
|
to the above.
|
|
|
|
#include <map>
|
|
#include <string>
|
|
#include <iostream>
|
|
#include <boost/xpressive/xpressive.hpp>
|
|
using namespace boost;
|
|
using namespace xpressive;
|
|
|
|
struct formatter
|
|
{
|
|
typedef std::map<std::string, std::string> env_map;
|
|
env_map env;
|
|
|
|
template<typename Out>
|
|
Out operator()(smatch const &what, Out out) const
|
|
{
|
|
env_map::const_iterator where = env.find(what[1]);
|
|
if(where != env.end())
|
|
{
|
|
std::string const &sub = where->second;
|
|
out = std::copy(sub.begin(), sub.end(), out);
|
|
}
|
|
return out;
|
|
}
|
|
|
|
};
|
|
|
|
int main()
|
|
{
|
|
formatter fmt;
|
|
fmt.env["X"] = "this";
|
|
fmt.env["Y"] = "that";
|
|
|
|
std::string input("\"$(X)\" has the value \"$(Y)\"");
|
|
|
|
sregex envar = "$(" >> (s1 = +_w) >> ')';
|
|
std::string output = regex_replace(input, envar, fmt);
|
|
std::cout << output << std::endl;
|
|
}
|
|
|
|
The formatter must be a callable object -- a function or a function object --
|
|
that has one of three possible signatures, detailed in the table below. For
|
|
the table, `fmt` is a function pointer or function object, `what` is a
|
|
_match_results_ object, `out` is an OutputIterator, and `flags` is a value
|
|
of `regex_constants::match_flag_type`:
|
|
|
|
[table Formatter Signatures
|
|
[
|
|
[Formatter Invocation]
|
|
[Return Type]
|
|
[Semantics]
|
|
]
|
|
[
|
|
[`fmt(what)`]
|
|
[Range of characters (e.g. `std::string`) or null-terminated string]
|
|
[The string matched by the regex is replaced with the string returned by
|
|
the formatter.]
|
|
]
|
|
[
|
|
[`fmt(what, out)`]
|
|
[OutputIterator]
|
|
[The formatter writes the replacement string into `out` and returns `out`.]
|
|
]
|
|
[
|
|
[`fmt(what, out, flags)`]
|
|
[OutputIterator]
|
|
[The formatter writes the replacement string into `out` and returns `out`.
|
|
The `flags` parameter is the value of the match flags passed to the
|
|
_regex_replace_ algorithm.]
|
|
]
|
|
]
|
|
|
|
[h2 Formatter Expressions]
|
|
|
|
In addition to format /strings/ and formatter /objects/, _regex_replace_ also
|
|
accepts formatter /expressions/. A formatter expression is a lambda expression
|
|
that generates a string. It uses the same syntax as that for
|
|
[link boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions
|
|
Semantic Actions], which are covered later. The above example, which uses
|
|
_regex_replace_ to substitute strings for environment variables, is repeated
|
|
here using a formatter expression.
|
|
|
|
#include <map>
|
|
#include <string>
|
|
#include <iostream>
|
|
#include <boost/xpressive/xpressive.hpp>
|
|
#include <boost/xpressive/regex_actions.hpp>
|
|
using namespace boost::xpressive;
|
|
|
|
int main()
|
|
{
|
|
std::map<std::string, std::string> env;
|
|
env["X"] = "this";
|
|
env["Y"] = "that";
|
|
|
|
std::string input("\"$(X)\" has the value \"$(Y)\"");
|
|
|
|
sregex envar = "$(" >> (s1 = +_w) >> ')';
|
|
std::string output = regex_replace(input, envar, ref(env)[s1]);
|
|
std::cout << output << std::endl;
|
|
}
|
|
|
|
In the above, the formatter expression is `ref(env)[s1]`. This means to use the
|
|
value of the first submatch, `s1`, as a key into the `env` map. The purpose of
|
|
`xpressive::ref()` here is to make the reference to the `env` local variable /lazy/
|
|
so that the index operation is deferred until we know what to replace `s1` with.
|
|
|
|
[endsect]
|