994d4e48cc
[SVN r44163]
232 lines
21 KiB
HTML
232 lines
21 KiB
HTML
<html>
|
|
<head>
|
|
<title>The Rule</title>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
<link rel="stylesheet" href="theme/style.css" type="text/css">
|
|
</head>
|
|
|
|
<body>
|
|
<table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2">
|
|
<tr>
|
|
<td width="10">
|
|
</td>
|
|
<td width="85%">
|
|
<font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>The Rule</b></font>
|
|
</td>
|
|
<td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td>
|
|
</tr>
|
|
</table>
|
|
<br>
|
|
<table border="0">
|
|
<tr>
|
|
<td width="10"></td>
|
|
<td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
|
|
<td width="30"><a href="numerics.html"><img src="theme/l_arr.gif" border="0"></a></td>
|
|
<td width="30"><a href="epsilon.html"><img src="theme/r_arr.gif" border="0"></a></td>
|
|
</tr>
|
|
</table>
|
|
<p>The <b>rule</b> is a polymorphic parser that acts as a named place-holder capturing
|
|
the behavior of an EBNF expression assigned to it. Naming an EBNF expression
|
|
allows it to be referenced later. The <tt>rule</tt> is a template class parameterized
|
|
by the type of the scanner (<tt>ScannerT</tt>), the rule's <a href="indepth_the_parser_context.html">context</a>
|
|
and its <a href="#tag">tag</a>. Default template parameters are provided to
|
|
make it easy to use the rule.</p>
|
|
<pre><code><font color="#000000"><span class=identifier> </span><span class=keyword>template</span><span class=special><
|
|
</span><span class=keyword>typename </span><span class=identifier>ScannerT </span><span class=special>= </span><span class=identifier>scanner</span><span class=special><>,
|
|
</span><span class=keyword>typename </span><span class=identifier>ContextT </span><span class=special>= </span><span class=identifier>parser_context</span><span class=special><></span><span class=identifier>,
|
|
</span><span class="keyword">typename</span><span class=identifier> TagT </span><span class="special">=</span><span class=identifier> parser_address_tag</span><span class=special>>
|
|
</span><span class=keyword>class </span><span class=identifier>rule</span><span class=special>;</span></font></code></pre>
|
|
<p>Default template parameters are supplied to handle the most common case. <tt>ScannerT</tt>
|
|
defaults to <tt>scanner<></tt>, a plain vanilla scanner that acts on <tt>char
|
|
const<span class="operators">*</span></tt> iterators and does nothing special
|
|
at all other than iterate through all the chars in the null terminated input
|
|
a character at a time. The rule tag, <tt>TagT</tt>, typically used with <a href="trees.html">ASTs</a>,
|
|
is used to identify a rule; it is explained <a href="#tag">here</a>. In trivial
|
|
cases, declaring a rule as <tt>rule<></tt> is enough. You need not be
|
|
concerned at all with the <tt>ContextT</tt> template parameter unless you wish
|
|
to tweak the low level behavior of the rule. Detailed information on the <tt>ContextT</tt>
|
|
template parameter is provided <a href="indepth_the_parser_context.html">elsewhere</a>.
|
|
</p>
|
|
<h3><a name="order_of_parameters"></a>Order of parameters</h3>
|
|
<p>As of v1.8.0, the <tt>ScannerT</tt>, <tt>ContextT</tt> and <tt>TagT</tt> can
|
|
be specified in any order. If a template parameter is missing, it will assume
|
|
the defaults. Examples:</p>
|
|
<pre><span class=identifier> rule</span><span class=special><> </span><span class=identifier>rx1</span><span class=special>;
|
|
</span><span class=identifier>rule</span><span class=special><</span><span class=identifier>scanner</span><span class=special><> </span><span class=special>> </span><span class=identifier>rx2</span><span class=special>;
|
|
</span> <span class=identifier>rule</span><span class=special><</span><span class=identifier>parser_context<code><font color="#000000"><span class=special><></span></font></code> </span><span class=special>> </span><span class=identifier>rx3</span><span class=special>;
|
|
</span><span class=identifier>rule</span><span class=special><</span><span class=identifier>parser_context<code><font color="#000000"><span class=special><></span></font></code></span><span class=special>, </span><span class=identifier>parser_address_tag</span><span class=special>> </span><span class=identifier>rx4</span><span class=special>;
|
|
</span> <span class=identifier>rule</span><span class=special><</span><span class=identifier>parser_address_tag</span><span class=special>> </span><span class=identifier>rx5</span><span class=special>;
|
|
</span> <span class=identifier>rule</span><span class=special><</span><span class=identifier>parser_address_tag</span><span class=special>, </span><span class=identifier>scanner</span><span class=special><>, </span><span class=identifier>parser_context<code><font color="#000000"><span class=special><></span></font></code> </span><span class=special>> </span><span class=identifier>rx6</span><span class=special>;
|
|
</span><span class=identifier>rule</span><span class=special><</span><span class=identifier>parser_context<code><font color="#000000"><span class=special><></span></font></code></span><span class=special>, </span><span class=identifier>scanner</span><span class=special><>, </span><span class=identifier>parser_address_tag</span><span class=special>> </span><span class=identifier>rx7</span><span class=special>;</span></pre>
|
|
<h3><a name="multiple_scanner_support" id="multiple_scanner_support"></a>Multiple scanners</h3>
|
|
<p>As of v1.8.0, rules can use one or more scanner types. There are cases, for
|
|
instance, where we need a rule that can work on the phrase and character levels.
|
|
Rule/scanner mismatch has been a source of confusion and is the no. 1 <a href="faq.html#scanner_business">FAQ</a>.
|
|
To address this issue, we now have multiple scanner support. Example:</p>
|
|
<pre><span class=special> </span><span class=keyword>typedef </span><span class=identifier>scanner_list</span><span class=special><</span><span class=identifier>scanner</span><span class=special><>, </span><span class=identifier>phrase_scanner_t</span><span class=special>> </span><span class=identifier>scanners</span><span class=special>;
|
|
|
|
</span><span class=identifier>rule</span><span class=special><</span><span class=identifier>scanners</span><span class=special>> </span><span class=identifier>r </span><span class=special>= </span><span class=special>+</span><span class=identifier>anychar_p</span><span class=special>;
|
|
</span><span class=identifier>assert</span><span class=special>(</span><span class=identifier>parse</span><span class=special>(</span><span class=string>"abcdefghijk"</span><span class=special>, </span><span class=identifier>r</span><span class=special>).</span><span class=identifier>full</span><span class=special>);
|
|
</span><span class=identifier>assert</span><span class=special>(</span><span class=identifier>parse</span><span class=special>(</span><span class=string>"a b c d e f g h i j k"</span><span class=special>, </span><span class=identifier>r</span><span class=special>, </span><span class=identifier>space_p</span><span class=special>).</span><span class=identifier>full</span><span class=special>);</span></pre>
|
|
<p>Notice how rule <tt>r</tt> is used in both the phrase and character levels.
|
|
</p>
|
|
<p>By default support for multiple scanners is disabled. The macro
|
|
<tt>BOOST_SPIRIT_RULE_SCANNERTYPE_LIMIT</tt> must be defined to the
|
|
maximum number of scanners allowed in a scanner_list. The value must
|
|
be greater than 1 to enable multiple scanners. Given the
|
|
example above, to define a limit of two scanners for the list, the
|
|
following line must be inserted into the source file before the
|
|
inclusion of Spirit headers:
|
|
</p>
|
|
<pre><span class=special> </span><span class=preprocessor>#define </span><span class=identifier>BOOST_SPIRIT_RULE_SCANNERTYPE_LIMIT</span> <span class=literal>2</span></pre>
|
|
<table width="80%" border="0" align="center">
|
|
<tr>
|
|
<td class="note_box"><img src="theme/bulb.gif" width="13" height="18"> See
|
|
the techniques section for an <a href="techniques.html#multiple_scanner_support">example</a>
|
|
of a <a href="grammar.html">grammar</a> using a multiple scanner enabled
|
|
rule, <a href="scanner.html#lexeme_scanner">lexeme_scanner</a> and <a href="scanner.html#as_lower_scanner">as_lower_scanner.</a></td>
|
|
</tr>
|
|
</table>
|
|
<h3>Rule Declarations</h3>
|
|
<p>The rule class models EBNF's production rule. Example:</p>
|
|
<pre><code><font color="#000000"> <span class=identifier>rule</span><span class=special><> </span><span class=identifier>a_rule </span><span class=special>= </span><span class=special>*(</span><span class=identifier>a </span><span class=special>| </span><span class=identifier>b</span><span class=special>) </span><span class=special>& </span><span class=special>+(</span><span class=identifier>c </span><span class=special>| </span><span class=identifier>d </span><span class=special>| </span><span class=identifier>e</span><span class=special>);</span></font></code></pre>
|
|
<p>The type and behavior of the right-hand (rhs) EBNF expression, which may be
|
|
arbitrarily complex, is encoded in the rule named a_rule. a_rule may now be
|
|
referenced elsewhere in the grammar:</p>
|
|
<pre><code><font color="#000000"> <span class=identifier>rule</span><span class=special><> </span><span class=identifier>another_rule </span><span class=special>= </span><span class=identifier>f </span><span class=special>>> </span><span class=identifier>g </span><span class=special>>> </span><span class=identifier>h </span><span class=special>>> </span><span class=identifier>a_rule</span><span class=special>;</span></font></code></pre>
|
|
<table width="80%" border="0" align="center">
|
|
<tr>
|
|
<td class="note_box"><img src="theme/alert.gif" width="16" height="16"> <b>Referencing
|
|
rules <br>
|
|
</b><br>
|
|
When a rule is referenced anywhere in the right hand side of an EBNF expression,
|
|
the rule is held by the expression by reference. It is the responsibility
|
|
of the client to ensure that the referenced rule stays in scope and does
|
|
not get destructed while it is being referenced. </td>
|
|
</tr>
|
|
</table>
|
|
<pre><span class=special> </span><span class=identifier>a </span><span class=special>= </span><span class=identifier>int_p</span><span class=special>;
|
|
</span><span class=identifier>b </span><span class=special>= </span><span class=identifier>a</span><span class=special>;
|
|
</span><span class=identifier>c </span><span class=special>= </span><span class=identifier>int_p </span><span class=special>>> </span><span class=identifier>b</span><span class=special>;</span></pre>
|
|
<h3>Copying Rules</h3>
|
|
<p>The rule is a weird C++ citizen, unlike any other C++ object. It does not have
|
|
the proper copy and assignment semantics and cannot be stored and passed around
|
|
by value. If you need to copy a rule you have to explicitly call its member
|
|
function <tt>copy()</tt>:</p>
|
|
<pre><span class=special> </span><span class=identifier>r</span><span class="special">.</span><span class=identifier>copy()</span><span class=special>;</span></pre>
|
|
<p>However, be warned that copying a rule will not deep copy other referenced
|
|
rules of the source rule being copied. This might lead to dangling references.
|
|
Again, it is the responsibility of the client to ensure that all referenced
|
|
rules stay in scope and does not get destructed while it is being referenced.
|
|
Caveat emptor.</p>
|
|
<p>If you copy a rule, then you'll want to place it in a storage somewhere. The
|
|
problem is how? The storage can't be another rule:</p>
|
|
<pre> <code><font color="#000000"><span class=identifier>rule</span><span class=special><></span></font></code> r2 <span class="special">=</span> <span class=identifier>r</span><span class="special">.</span><span class=identifier>copy()</span><span class=special>; </span><span class="comment">// BAD!</span></pre>
|
|
<p>because rules are weird and does not have the expected C++ copy-constructor
|
|
and assignment semantics! As a general rule: <strong>Don't put a copied rule
|
|
into another rule! </strong>Instead, use the <a href="stored_rule.html">stored_rule</a>
|
|
for that purpose.</p>
|
|
<h3>Forward declarations</h3>
|
|
<p>A <tt>rule</tt> may be declared before being defined to allow cyclic structures
|
|
typically found in BNF declarations. Example:</p>
|
|
<pre><code><font color="#000000"><span class=special> </span><span class=identifier>rule</span><span class=special><> </span><span class=identifier>a</span><span class=special>, </span><span class=identifier>b</span><span class=special>, </span><span class=identifier>c</span><span class=special>;
|
|
|
|
</span><span class=identifier>a </span><span class=special>= </span><span class=identifier>b </span><span class=special>| </span><span class=identifier>a</span><span class=special>;
|
|
</span><span class=identifier>b </span><span class=special>= </span><span class=identifier>c </span><span class=special>| </span><span class=identifier>a</span><span class=special>;</span></font></code></pre>
|
|
<h3>Recursion</h3>
|
|
<p>The right-hand side of a rule may reference other rules, including itself.
|
|
The limitation is that direct or indirect left recursion is not allowed (this
|
|
is an unchecked run-time error that results in an infinite loop). This is typical
|
|
of top-down parsers. Example:</p>
|
|
<pre><code><font color="#000000"><span class=special> </span><span class=identifier>a </span><span class=special>= </span><span class=identifier>a </span><span class=special>| </span><span class=identifier>b</span><span class=special>; </span><span class=comment>// infinite loop!</span></font></code></pre>
|
|
<table width="80%" border="0" align="center">
|
|
<tr>
|
|
<td class="note_box"><img src="theme/lens.gif" width="15" height="16"> <b>What
|
|
is left recursion?<br>
|
|
</b><br>
|
|
Left recursion happens when you have a rule that calls itself before anything
|
|
else. A top-down parser will go into an infinite loop when this happens.
|
|
See the <a href="faq.html#left_recursion">FAQ</a> for details on how to
|
|
eliminate left recursion.</td>
|
|
</tr>
|
|
</table>
|
|
<h3>Undefined rules</h3>
|
|
<p>An undefined rule matches nothing and is semantically equivalent to <tt>nothing_p</tt>.</p>
|
|
<h3>Redeclarations</h3>
|
|
<p>Like any other C++ assignment, a second assignment to a rule is destructive
|
|
and will redefine it. The old definition is lost. Rules are dynamic. A rule
|
|
can change its definition anytime:</p>
|
|
<pre><code><font color="#000000"><span class=identifier> r </span><span class=special>= </span><span class=identifier>a_definition</span><span class=special>;
|
|
</span><span class=identifier> r </span><span class=special>= </span><span class=identifier>another_definition</span><span class=special>;</span></font></code></pre>
|
|
<p>Rule <tt>r</tt> loses the old definition when the second assignment is made.
|
|
As mentioned, an undefined rule matches nothing and is semantically equivalent
|
|
to <tt>nothing_p</tt>.
|
|
<h3>Dynamic Parsers</h3>
|
|
<p>Hosting declarative EBNF in imperative C++ yields an interesting blend. We
|
|
have the best of both worlds. We have the ability to conveniently modify the
|
|
grammar at run time using imperative constructs such as <tt>if</tt>, <tt>else</tt>
|
|
statements. Example:</p>
|
|
<pre><code><font color="#000000"><span class=special> </span><span class=keyword>if </span><span class=special>(</span><span class=identifier>feature_is_available</span><span class=special>)
|
|
</span><span class=identifier>r </span><span class=special>= </span><span class=identifier>add_this_feature</span><span class=special>;</span></font></code></pre>
|
|
<p>Rules are essentially dynamic parsers. A dynamic parser is characterized by
|
|
its ability to modify its behavior at run time. Initially, an undefined rule
|
|
matches nothing. At any time, the rule may be defined and redefined, thus, dynamically
|
|
altering its behavior.</p>
|
|
<h3>No start rule</h3>
|
|
<p>Typically, parsers have what is called a start symbol, chosen to be the root
|
|
of the grammar where parsing starts. The Spirit parser framework has no notion
|
|
of a start symbol. Any rule can be a start symbol. This feature promotes step-wise
|
|
creation of parsers. We can build parsers from the bottom up while fully testing
|
|
each level or module up untill we get to the top-most level.</p>
|
|
<h3><a name="tag"></a>Parser Tags</h3>
|
|
<p>Rules may be tagged for identification purposes. This is necessary, especially
|
|
when dealing with <a href="trees.html">parse trees and ASTs</a> to see which
|
|
rule created a specific AST/parse tree node. Each rule has an ID of type <tt>parser_id</tt>.
|
|
This ID can be obtained through the rule's <tt>id()</tt> member function:</p>
|
|
<pre><code><font color="#000000"><span class=identifier> my_rule</span><span class=special>.</span><span class=identifier>id</span><span class=special>(); </span><span class=comment>// get my_rule's id</span></font></code></pre>
|
|
<p>The <tt>parser_id</tt> class is declared as:</p>
|
|
<pre> <span class="keyword">class</span> <span class="identifier">parser_id</span><br> <span class="special">{</span><br> <span class="keyword">public</span><span class="special">:</span><br> parser_id<span class="special">();</span><br> <span class="keyword">explicit</span> parser_id<span class="special">(</span><span class="keyword">void const</span><span class="special">*</span> p<span class="special">);</span><br> parser_id<span class="special">(</span><span class="keyword">std::size_t</span> l<span class="special">);</span>
|
|
|
|
<span class="keyword">bool</span> <span class="keyword">operator</span><span class="special">==(</span><span class="identifier">parser_id</span> <span class="keyword">const</span><span class="special">&</span> x<span class="special">)</span> const<span class="special">;</span><br> <span class="keyword">bool</span> <span class="keyword">operator</span><span class="special">!=(</span><span class="identifier">parser_id</span> <span class="keyword">const</span><span class="special">&</span> x<span class="special">)</span> const<span class="special">;</span>
|
|
<span class="keyword">bool</span> <span class="keyword"> operator</span><span class="special"><(</span><span class="identifier">parser_id</span> <span class="keyword">const</span><span class="special">&</span> x<span class="special">)</span> const<span class="special">;</span>
|
|
<span class="special"></span><span class="keyword">std::size_t</span><span class="identifier"> to_long</span><span class="special">()</span> <span class="keyword">const</span><span class="special">;
|
|
};</span></pre>
|
|
<h3>parser_address_tag</h3>
|
|
<p>The rule's <tt>TagT</tt> template parameter supplies this ID. This defaults
|
|
to <tt>parser_address_tag</tt>. The <tt>parser_address_tag</tt> uses the address
|
|
of the rule as its ID. This is often not the most convenient, since it is not
|
|
always possible to get the address of a rule to compare against. </p>
|
|
<h3>parser_tag</h3>
|
|
<p>It is possible to have specific constant integers to identify a rule. For this
|
|
purpose, we can use the <tt>parser_tag<N></tt>, where N is a constant
|
|
integer:</p>
|
|
<pre><code><font color="#000000"><span class=identifier> rule</span><span class=special><</span><span class=identifier>parser_tag</span><span class="special"><</span><span class=identifier>123</span><span class="special">> > </span><span class="identifier">my_rule</span><span class="special">; </span><span class="comment">// set my_rule's id to 123</span></font></code></pre>
|
|
<h3>dynamic_parser_tag</h3>
|
|
<p>The <tt>parser_tag<N></tt> can only specifiy a <strong>static ID</strong>,
|
|
which is defined at compile time. If you need the ID to be <strong>dynamic</strong>
|
|
(changeable at runtime), you can use the <tt>dynamic_parser_tag</tt> class as
|
|
the <tt>TagT</tt> template parameter. This template parameter enables the <tt>set_id()</tt>
|
|
function, which may be used to set the required id at runtime:</p>
|
|
<pre><code><font color="#000000"><span class=identifier> rule</span><span class=special><</span><span class=identifier>dynamic_parser_tag</span><span class="special">> </span><span class="identifier">my_dynrule</span><span class="special">;</span>
|
|
my_dynrule.set_id(1234); <span class="comment">// set my_dynrule's id to 1234</span></font></code></pre>
|
|
<p>If the <tt>set_id()</tt> function isn't called, the parser id defaults to the
|
|
address of the rule as its ID, just like the <tt>parser_address_tag</tt> template
|
|
parameter would do. </p>
|
|
<table border="0">
|
|
<tr>
|
|
<td width="10"></td>
|
|
<td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
|
|
<td width="30"><a href="numerics.html"><img src="theme/l_arr.gif" border="0"></a></td>
|
|
<td width="30"><a href="epsilon.html"><img src="theme/r_arr.gif" border="0"></a></td>
|
|
</tr>
|
|
</table>
|
|
<br>
|
|
<hr size="1">
|
|
<p class="copyright">Copyright © 1998-2003 Joel de Guzman<br>
|
|
<br>
|
|
<font size="2">Use, modification and distribution is subject to the Boost Software
|
|
License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
|
|
http://www.boost.org/LICENSE_1_0.txt)</font></p>
|
|
</body>
|
|
</html>
|