994d4e48cc
[SVN r44163]
203 lines
16 KiB
HTML
203 lines
16 KiB
HTML
<html>
|
|
<head>
|
|
<title>Directives</title>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
<link rel="stylesheet" href="theme/style.css" type="text/css">
|
|
</head>
|
|
|
|
<body>
|
|
<table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2">
|
|
<tr>
|
|
<td width="10">
|
|
</td>
|
|
<td width="85%">
|
|
<font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Directives</b></font>
|
|
</td>
|
|
<td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td>
|
|
</tr>
|
|
</table>
|
|
<br>
|
|
<table border="0">
|
|
<tr>
|
|
<td width="10"></td>
|
|
<td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
|
|
<td width="30"><a href="epsilon.html"><img src="theme/l_arr.gif" border="0"></a></td>
|
|
<td width="30"><a href="scanner.html"><img src="theme/r_arr.gif" border="0"></a></td>
|
|
</tr>
|
|
</table>
|
|
<p>Parser directives have the form: <b>directive[expression]</b></p>
|
|
<p>A directive modifies the behavior of its enclosed expression, essentially <em>decorating</em>
|
|
it. The framework pre-defines a few directives. Clients of the framework are
|
|
free to define their own directives as needed. Information on how this is done
|
|
will be provided later. For now, we shall deal only with predefined directives.</p>
|
|
<h2>lexeme_d</h2>
|
|
<p>Turns off white space skipping. At the phrase level, the parser ignores white
|
|
spaces, possibly including comments. Use <tt>lexeme_d</tt> in situations where
|
|
we want to work at the character level instead of the phrase level. Parsers
|
|
can be made to work at the character level by enclosing the pertinent parts
|
|
inside the lexeme_d directive. For example, let us complete the example presented
|
|
in the <a href="introduction.html">Introduction</a>. There, we skipped the definition
|
|
of the <tt>integer</tt> rule. Here's how it is actually defined:</p>
|
|
<pre><code><font color="#000000"><span class=identifier> </span><span class=identifier>integer </span><span class=special>= </span><span class=identifier>lexeme_d</span><span class=special>[ </span><span class=special>!(</span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'+'</span><span class=special>) </span><span class=special>| </span><span class=literal>'-'</span><span class=special>) </span><span class=special>>> </span><span class=special>+</span><span class=identifier>digit </span><span class=special>];</span></font></code></pre>
|
|
<p>The <tt>lexeme_d</tt> directive instructs the parser to work on the character
|
|
level. Without it, the <tt>integer</tt> rule would have allowed erroneous embedded
|
|
white spaces in inputs such as <span class="quotes">"1 2 345"</span>
|
|
which will be parsed as <span class="quotes">"12345"</span>.</p>
|
|
<h2>as_lower_d</h2>
|
|
<p>There are times when we want to inhibit case sensitivity. The <tt>as_lower_d</tt>
|
|
directive converts all characters from the input to lower-case.</p>
|
|
<table width="80%" border="0" align="center">
|
|
<tr>
|
|
<td class="note_box"><img src="theme/alert.gif" width="16" height="16"><b>
|
|
as_lower_d behavior</b> <br>
|
|
<br>
|
|
It is important to note that only the input is converted to lower case.
|
|
Parsers enclosed inside the <tt>as_lower_d</tt> expecting upper case characters
|
|
will fail to parse. Example: <tt>as_lower_d[<span class="quotes">'X'</span>]</tt>
|
|
will never succeed because it expects an upper case <tt class="quotes">'X'</tt>
|
|
that the <tt>as_lower_d</tt> directive will never supply.</td>
|
|
</tr>
|
|
</table>
|
|
<p>For example, in Pascal, keywords and identifiers are case insensitive. Pascal
|
|
ignores the case of letters in identifiers and keywords. Identifiers Id, ID
|
|
and id are indistinguishable in Pascal. Without the as_lower_d directive, it
|
|
would be awkward to define a rule that recognizes this. Here's a possibility:</p>
|
|
<pre><code><font color="#000000"><span class=special> </span><span class=identifier>r </span><span class=special>= </span><span class=identifier>str_p</span><span class=special>(</span><span class=string>"id"</span><span class=special>) </span><span class=special>| </span><span class=string>"Id" </span><span class=special>| </span><span class=string>"iD" </span><span class=special>| </span><span class=string>"ID"</span><span class=special>;</span></font></code></pre>
|
|
<p>Now, try doing that with the case insensitive Pascal keyword <span class="quotes">"BEGIN"</span>.
|
|
The <tt>as_lower_d</tt> directive makes this simple:</p>
|
|
<pre><code><font color="#000000"><span class=special> </span><span class=identifier>r </span><span class=special>= </span><span class=identifier>as_lower_d</span><span class=special>[</span><span class=string>"begin"</span><span class=special>];</span></font></code></pre>
|
|
<table width="80%" border="0" align="center">
|
|
<tr>
|
|
<td class="note_box"><div align="justify"><img src="theme/note.gif" width="16" height="16">
|
|
<b>Primitive arguments</b> <br>
|
|
<br>
|
|
The astute reader will notice that we did not explicitly wrap <span class="quotes">"begin"</span>
|
|
inside an <tt>str_p</tt>. Whenever appropriate, directives should be able
|
|
to allow primitive types such as <tt>char</tt>, <tt>int</tt>, <tt>wchar_t</tt>,
|
|
<tt>char const<span class="operators">*</span></tt>, <tt>wchar_t const<span class="operators">*</span></tt>
|
|
and so on. Examples: <tt><br>
|
|
<br>
|
|
</tt><code><span class=identifier>as_lower_d</span><tt><span class=special>[</span><span class=string>"hello"</span><span class=special>]
|
|
</span><span class=comment>// same as as_lower_d[str_p("hello")]</span></tt><code></code><span class=identifier><br>
|
|
as_lower_d</span><span class=special>[</span><span class=literal>'x'</span><span class=special>]
|
|
</span><span class=comment>// same as as_lower_d[ch_p('x')]</span></code></div></td>
|
|
</tr>
|
|
</table>
|
|
<h3>no_actions_d</h3>
|
|
<p>There are cases where you want <a href="semantic_actions.html">semantic actions</a>
|
|
not to be triggered. By enclosing a parser in the <tt>no_actions_d</tt> directive,
|
|
all semantic actions directly or indirectly attached to the parser will not
|
|
fire. </p>
|
|
<pre><code><font color="#000000"><span class=special> </span>no_actions_d<span class=special>[</span><span class=identifier>expression</span><span class=special>]</span></font></code><code><font color="#000000"><span class=special></span></font></code></pre>
|
|
<h3>Tweaking the Scanner Type</h3>
|
|
<p><img src="theme/note.gif" width="16" height="16"> How does <tt>lexeme_d, as_lower_d</tt>
|
|
and <font color="#000000"><tt>no_actions_d</tt></font> work? These directives
|
|
do their magic by tweaking the scanner policies. Well, you don't need to know
|
|
what that means for now. Scanner policies are discussed <a href="indepth_the_scanner.html">later</a>.
|
|
However, it is important to note that when the scanner policy is tweaked, the
|
|
result is a different scanner. Why is this important to note? The <a href="rule.html">rule</a>
|
|
is tied to a particular scanner (one or more scanners, to be precise). If you
|
|
wrap a rule inside a <tt>lexeme_d, as_lower_d</tt> or <font color="#000000"><tt>no_actions_d,</tt>the
|
|
compiler will complain about <a href="faq.html#scanner_business">scanner mismatch</a>
|
|
unless you associate the required scanner with the rule. </font></p>
|
|
<p><tt>lexeme_scanner</tt>, <tt>as_lower_scanner</tt> and <tt>no_actions_scanner</tt>
|
|
are your friends if the need to wrap a rule inside these directives arise. Learn
|
|
bout these beasts in the next chapter on <a href="scanner.html#lexeme_scanner">The
|
|
Scanner and Parsing</a>.</p>
|
|
<h2>longest_d</h2>
|
|
<p>Alternatives in the Spirit parser compiler are short-circuited (see <a href="operators.html">Operators</a>).
|
|
Sometimes, this is not what is desired. The <tt>longest_d</tt> directive instructs
|
|
the parser not to short-circuit alternatives enclosed inside this directive,
|
|
but instead makes the parser try all possible alternatives and choose the one
|
|
matching the longest portion of the input stream.</p>
|
|
<p>Consider the parsing of integers and real numbers:</p>
|
|
<pre><code><font color="#000000"><span class=comment> </span><span class=identifier>number </span><span class=special>= </span><span class=identifier>real </span><span class=special>| </span><span class=identifier>integer</span><span class=special>;</span></font></code></pre>
|
|
<p>A number can be a real or an integer. This grammar is ambiguous. An input <span class="quotes">"1234"</span>
|
|
should potentially match both real and integer. Recall though that alternatives
|
|
are short-circuited . Thus, for inputs such as above, the real alternative always
|
|
wins. However, if we swap the alternatives:</p>
|
|
<pre><code><font color="#000000"><span class=special> </span><span class=identifier>number </span><span class=special>= </span><span class=identifier>integer </span><span class=special>| </span><span class=identifier>real</span><span class=special>;</span></font></code></pre>
|
|
<p>we still have a problem. Now, an input <span class="quotes">"123.456"</span>
|
|
will be partially matched by integer until the decimal point. This is not what
|
|
we want. The solution here is either to fix the ambiguity by factoring out the
|
|
common prefixes of real and integer or, if that is not possible nor desired,
|
|
use the <tt>longest_d</tt> directive:</p>
|
|
<pre><code><font color="#000000"><span class=special> </span><span class=identifier>number </span><span class=special>= </span><span class=identifier>longest_d</span><span class=special>[ </span><span class=identifier>integer </span><span class=special>| </span><span class=identifier>real </span><span class=special>];</span></font></code></pre>
|
|
<h2>shortest_d</h2>
|
|
<p>Opposite of the <tt>longest_d</tt> directive.</p>
|
|
<table width="80%" border="0" align="center">
|
|
<tr>
|
|
<td class="note_box"><img src="theme/note.gif" width="16" height="16"> <b>Multiple
|
|
alternatives</b> <br>
|
|
<br>
|
|
The <tt>longest_d</tt> and <tt>shortest_d</tt> directives can accept two
|
|
or more alternatives. Examples:<br>
|
|
<br>
|
|
<font color="#000000"><span class=identifier><code>longest</code></span><code><span class=special>[
|
|
</span><span class=identifier>a </span><span class=special>| </span><span class=identifier>b
|
|
</span><span class=special>| </span><span class=identifier>c </span><span class=special>];
|
|
</span><span class=identifier><br>
|
|
shortest</span><span class=special>[ </span><span class=identifier>a </span><span class=special>|
|
|
</span><span class=identifier>b </span><span class=special>| </span><span class=identifier>c
|
|
</span><span class=special>| </span><span class=identifier>d </span><span class=special>];</span></code></font></td>
|
|
</tr>
|
|
</table>
|
|
<h2>limit_d</h2>
|
|
<p>Ensures that the result of a parser is constrained to a given min..max range
|
|
(inclusive). If not, then the parser fails and returns a no-match.</p>
|
|
<p><b>Usage:</b></p>
|
|
<pre><code><font color="#000000"><span class=special> </span><span class=identifier>limit_d</span><span class=special>(</span><span class=identifier>min</span><span class=special>, </span><span class=identifier>max</span><span class=special>)[</span><span class=identifier>expression</span><span class=special>]</span></font></code></pre>
|
|
<p>This directive is particularly useful in conjunction with parsers that parse
|
|
specific scalar ranges (for example, <a href="numerics.html">numeric parsers</a>).
|
|
Here's a practical example. Although the numeric parsers can be configured to
|
|
accept only a limited number of digits (say, 0..2), there is no way to limit
|
|
the result to a range (say -1.0..1.0). This design is deliberate. Doing so would
|
|
have undermined Spirit's design rule that <i><span class="quotes">"the
|
|
client should not pay for features that she does not use"</span></i>. We
|
|
would have stored the min, max values in the numeric parser itself, used or
|
|
unused. Well, we could get by by using static constants configured by a non-type
|
|
template parameter, but that is not acceptable because that way, we can only
|
|
accommodate integers. What about real numbers or user defined numbers such as
|
|
big-ints?</p>
|
|
<p><b>Example</b>, parse time of the form <b>HH:MM:SS</b>:</p>
|
|
<pre><code><font color="#000000"><span class=special> </span><span class=identifier>uint_parser</span><span class=special><</span><span class=keyword>int</span><span class=special>, </span><span class=number>10</span><span class=special>, </span><span class=number>2</span><span class=special>, </span><span class=number>2</span><span class=special>> </span><span class=identifier>uint2_p</span><span class=special>;
|
|
|
|
</span><span class=identifier>r </span><span class=special>= </span><span class=identifier>lexeme_d
|
|
</span><span class=special>[
|
|
</span><span class=identifier>limit_d</span><span class=special>(</span><span class=number>0u</span><span class=special>, </span><span class=number>23u</span><span class=special>)[</span><span class=identifier>uint2_p</span><span class=special>] </span><span class=special>>> </span><span class=literal>':' </span><span class=comment>// Hours 00..23
|
|
</span><span class=special>>> </span><span class=identifier>limit_d</span><span class=special>(</span><span class=number>0u</span><span class=special>, </span><span class=number>59u</span><span class=special>)[</span><span class=identifier>uint2_p</span><span class=special>] </span><span class=special>>> </span><span class=literal>':' </span><span class=comment>// Minutes 00..59
|
|
</span><span class=special>>> </span><span class=identifier>limit_d</span><span class=special>(</span><span class=number>0u</span><span class=special>, </span><span class=number>59u</span><span class=special>)[</span><span class=identifier>uint2_p</span><span class=special>] </span><span class=comment>// Seconds 00..59
|
|
</span><span class=special>];</span></font></code>
|
|
</pre>
|
|
<h2>min_limit_d</h2>
|
|
<p>Sometimes, it is useful to unconstrain just the maximum limit. This will allow
|
|
for an interval that's unbounded in one direction. The directive min_limit_d
|
|
ensures that the result of a parser is not less than minimun. If not, then the
|
|
parser fails and returns a no-match.</p>
|
|
<p><b>Usage:</b></p>
|
|
<pre><code><font color="#000000"><span class=special> </span><span class=identifier>min_limit_d</span><span class=special>(</span><span class=identifier>min</span><span class=special>)[</span><span class=identifier>expression</span><span class=special>]</span></font></code></pre>
|
|
<p><b>Example</b>, ensure that a date is not less than 1900</p>
|
|
<pre><code><font color="#000000"><span class=special> </span><span class=identifier>min_limit_d</span><span class=special>(</span><span class=number>1900u</span><span class=special>)[</span><span class=identifier>uint_p</span><span class=special>]</span></font></code></pre>
|
|
<h2>max_limit_d</h2>
|
|
<p>Opposite of <tt>min_limit_d</tt>. Take note that <tt>limit_d[p]</tt> is equivalent
|
|
to:</p>
|
|
<pre><code><font color="#000000"><span class=special> </span><span class=identifier>min_limit_d</span><span class=special>(</span><span class=identifier>min</span><span class=special>)[</span><span class=identifier>max_limit_d</span><span class=special>(</span><span class=identifier>max</span><span class=special>)[</span><span class=identifier>p</span><span class=special>]]</span></font></code><code><font color="#000000"><span class=special></span></font></code></pre>
|
|
<table border="0">
|
|
<tr>
|
|
<td width="10"></td>
|
|
<td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
|
|
<td width="30"><a href="epsilon.html"><img src="theme/l_arr.gif" border="0"></a></td>
|
|
<td width="30"><a href="scanner.html"><img src="theme/r_arr.gif" border="0"></a></td>
|
|
</tr>
|
|
</table>
|
|
<br>
|
|
<hr size="1">
|
|
<p class="copyright">Copyright © 1998-2003 Joel de Guzman<br>
|
|
<br>
|
|
<font size="2">Use, modification and distribution is subject to the Boost Software
|
|
License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
|
|
http://www.boost.org/LICENSE_1_0.txt) </font> </p>
|
|
<p> </p>
|
|
</body>
|
|
</html>
|