726 lines
40 KiB
HTML
726 lines
40 KiB
HTML
<html>
|
||
|
||
<head>
|
||
<meta http-equiv="Content-Language" content="en-us">
|
||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||
<meta name="GENERATOR" content="Microsoft FrontPage 5.0">
|
||
<meta name="ProgId" content="FrontPage.Editor.Document">
|
||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||
<title>The boost::fsm library - Rationale</title>
|
||
</head>
|
||
|
||
<body link="#0000ff" vlink="#800080">
|
||
|
||
<table border="0" cellpadding="7" cellspacing="0" width="100%" summary="header">
|
||
<tr>
|
||
<td valign="top" width="300">
|
||
<h3><a href="../../../index.htm">
|
||
<img alt="C++ Boost" src="../../../c++boost.gif" border="0" width="277" height="86"></a></h3>
|
||
</td>
|
||
<td valign="top">
|
||
<h1 align="center">The boost::fsm library</h1>
|
||
<h2 align="center">Rationale</h2>
|
||
</td>
|
||
</tr>
|
||
</table>
|
||
<hr>
|
||
<dl class="index">
|
||
<dt><a href="#Introduction">Introduction</a></dt>
|
||
<dt><a href="#Why yet another state machine framework">Why yet another state
|
||
machine framework</a></dt>
|
||
<dt><a href="#State-local storage">State-local storage</a></dt>
|
||
<dt><a href="#Dynamic configurability">Dynamic configurability</a></dt>
|
||
<dt><a href="#Error handling">Error handling</a></dt>
|
||
<dt><a href="#Asynchronous state machines">Asynchronous state machines</a></dt>
|
||
<dt><a href="#User actions: Member functions vs. function objects">User
|
||
actions: Member functions vs. function objects</a></dt>
|
||
<dt><a href="#Speed versus scalability tradeoffs">Speed versus scalability
|
||
tradeoffs</a></dt>
|
||
<dt><a href="#Memory management customization">Memory management
|
||
customization</a></dt>
|
||
<dt><a href="#RTTI customization">RTTI customization</a></dt>
|
||
<dt><a href="#Double dispatch">Double dispatch</a></dt>
|
||
<dt><a href="#Resource usage">Resource usage</a></dt>
|
||
<dt><a href="#Limitations">Limitations</a></dt>
|
||
</dl>
|
||
<h2><a name="Introduction">Introduction</a></h2>
|
||
<p>Most of the design decisions made during the development of this library
|
||
are the result of the following requirements.</p>
|
||
<p>boost::fsm should ...</p>
|
||
<ol>
|
||
<li>be fully type-safe. Whenever possible, type mismatches should be flagged
|
||
with an error at compile-time</li>
|
||
<li>not require the use of a code generator. A lot of the existing FSM
|
||
solutions force the developer to design the state machine either graphically
|
||
or in a specialized language. All or part of the code is then generated</li>
|
||
<li>allow for easy transformation of a UML statechart (defined in
|
||
<a href="http://www.omg.org/cgi-bin/doc?formal/03-03-01">
|
||
http://www.omg.org/cgi-bin/doc?formal/03-03-01</a>) into a working state
|
||
machine. Vice versa, an existing C++ implementation of a state machine
|
||
should be fairly trivial to transform into a UML statechart. Specifically,
|
||
the following state machine features should be supported:
|
||
<ul>
|
||
<li>Hierarchical (composite, nested) states</li>
|
||
<li>Orthogonal (concurrent) states</li>
|
||
<li>Entry-, exit- and transition-actions</li>
|
||
<li>Guards</li>
|
||
<li>Shallow/deep history</li>
|
||
</ul>
|
||
</li>
|
||
<li>produce a customizable reaction when a C++ exception is propagated from
|
||
user code</li>
|
||
<li>support synchronous and asynchronous state machines and leave it to the
|
||
user which thread an asynchronous state machine will run in. Users should
|
||
also be able to use the threading library of their choice</li>
|
||
<li>support the development of arbitrarily large and complex state machines.
|
||
Multiple developers should be able to work on the same state machine
|
||
simultaneously</li>
|
||
<li>allow the user to customize all resource management so that the library
|
||
could be used for applications with hard real-time requirements</li>
|
||
<li>enforce as much as possible at compile time. Specifically, invalid state
|
||
machines should not compile</li>
|
||
<li>offer reasonable performance for a wide range of applications</li>
|
||
</ol>
|
||
<h2><a name="Why yet another state machine framework">Why yet another state
|
||
machine framework?</a></h2>
|
||
<p>Before I started to develop this library I had a look at the following
|
||
frameworks:</p>
|
||
<ul>
|
||
<li>The framework accompanying the book "Practical Statecharts in C/C++" by
|
||
Miro Samek, CMP Books, ISBN: 1-57820-110-1<br>
|
||
<a href="http://www.quantum-leaps.com">http://www.quantum-leaps.com<br>
|
||
</a>Fails to satisfy at least the requirements 1, 3, 4, 6, 8.</li>
|
||
<li>The framework accompanying "Rhapsody in C++" by ILogix (a code generator
|
||
solution)<br>
|
||
<a href="http://www.ilogix.com/products/rhapsody/rhap_incplus.cfm">
|
||
http://www.ilogix.com/products/rhapsody/rhap_incplus.cfm<br>
|
||
</a>This might look like comparing apples with oranges. However, there is no
|
||
inherent reason why a code generator couldn't produce code that can easily
|
||
be understood and modified by humans. Fails to satisfy at least the
|
||
requirements 2, 4, 5, 6, 8 (there is quite a bit of error checking before
|
||
code generation, though).</li>
|
||
<li>The framework accompanying the article "State Machine Design in C++"<br>
|
||
<a href="http://www.cuj.com/articles/2000/0005/0005f/0005f.htm?topic=articles">
|
||
http://www.cuj.com/articles/2000/0005/0005f/0005f.htm?topic=articles<br>
|
||
</a>Fails to satisfy at least the requirements 1, 3, 4, 5 (there is no
|
||
direct threading support), 6, 8.</li>
|
||
</ul>
|
||
<p>I believe boost::fsm satisfies all requirements.</p>
|
||
<h2><a name="State-local storage">State-local storage</a></h2>
|
||
<p>This not yet widely known state machine feature is enabled by the fact that
|
||
every state is represented by a class. Upon state-entry, an object of the
|
||
class is constructed and the object is later destructed when the state machine
|
||
exits the state. Any data that is useful only as long as the machine resides
|
||
in the state can (and should) thus be a member of the state. This feature
|
||
paired with the ability to spread a state machine over several translation
|
||
units makes possible virtually unlimited scalability. </p>
|
||
<p>In most existing FSM frameworks the whole state machine runs in one
|
||
environment (context). That is, all resource handles and variables local to
|
||
the state machine are stored in one place (normally as members of the class
|
||
that also derives from some state machine base class). For large state
|
||
machines this often leads to the class having a huge number of data members
|
||
most of which are needed only briefly in a tiny part of the machine. The state
|
||
machine class therefore often becomes a change hotspot what leads to frequent
|
||
recompilations of the whole state machine. </p>
|
||
<h2><a name="Dynamic configurability">Dynamic configurability</a></h2>
|
||
<h3>Two types of state machine frameworks</h3>
|
||
<ul>
|
||
<li>A state machine framework supports dynamic configurability if the whole
|
||
layout of a state machine can be defined at runtime ("layout" refers to
|
||
states and transitions, actions are still specified with normal C++ code).
|
||
That is, data only available at runtime can be used to build arbitrarily
|
||
large machines. See "A Multiple Substring Search Algorithm" by Moishe
|
||
Halibard and Moshe Rubin in June 2002 issue of CUJ for a good example
|
||
(unfortunately not available online).</li>
|
||
<li>On the other side are state machine frameworks which require the layout
|
||
to be specified at compile time.</li>
|
||
</ul>
|
||
<p>State machines that are built at runtime almost always get away with a
|
||
simple state model (no hierarchical states, no orthogonal states, no entry and
|
||
exit actions, no history) because the layout is very often <b>computed by an
|
||
algorithm</b>. On the other hand, machine layouts that are fixed at compile
|
||
time are almost always designed by humans, who frequently need/want a
|
||
sophisticated state model in order to keep the complexity at acceptable
|
||
levels. Dynamically configurable FSM frameworks are therefore often optimized
|
||
for simple flat machines while incarnations of the static variant tend to
|
||
offer more features for abstraction.</p>
|
||
<p>However, fully-featured dynamic FSM libraries do exist. So, the question
|
||
is:</p>
|
||
<h3>Why not use a dynamically configurable FSM library for all state machines?</h3>
|
||
<p>One might argue that a dynamically configurable FSM framework is all one
|
||
ever needs because <b>any</b> state machine can be implemented with it.
|
||
However, due to its nature such a framework has a number of disadvantages when
|
||
used to implement static machines:</p>
|
||
<ul>
|
||
<li>No compile-time optimizations and validations can be made. For example,
|
||
boost::fsm determines the
|
||
<a href="definitions.html#Innermost common outer state">innermost common
|
||
outer state</a> of the transition-source and destination state at compile
|
||
time. Moreover, compile time checks ensure that the state machine is valid
|
||
(e.g. that there are no transitions between orthogonal states).</li>
|
||
<li>Double dispatch must inevitably be implemented with some kind of a
|
||
table. As argued under <a href="#Double dispatch">Double dispatch</a>, this
|
||
scales badly.</li>
|
||
<li>To warrant fast table lookup, states and events must be represented with
|
||
an integer. To keep the table as small as possible, the numbering should be
|
||
continuous, e.g. if there are ten states, it's best to use the ids 0-9. To
|
||
ensure continuity of ids, all states are best defined in the same header
|
||
file. The same applies for the events. Again, this does not scale.</li>
|
||
<li>Because events carrying parameters are not represented by a type, some
|
||
sort of a generic event with a property map must be used and type-safety is
|
||
enforced at runtime rather than at compile time.</li>
|
||
</ul>
|
||
<p>It is for these reasons, that boost::fsm was built from ground up to <b>not</b>
|
||
support dynamic configurability. However, this does not mean that it's
|
||
impossible to dynamically shape a machine implemented with this library. For
|
||
example, guards can be used to make different transitions depending on input
|
||
only available at runtime. However, such layout changes will always be limited
|
||
to what can be foreseen before compilation. A somewhat related library, the
|
||
boost::spirit parser framework, allows for roughly the same runtime
|
||
configurability. </p>
|
||
<h2><a name="Error handling">Error handling</a></h2>
|
||
<p>There is not a single word about error handling in the UML state machine
|
||
semantics specifications. Moreover, most existing FSM solutions also seem to
|
||
ignore the issue. </p>
|
||
<h3>Why an FSM library should support error handling</h3>
|
||
<p>Consider the following state configuration:</p>
|
||
<p><img border="0" src="A.gif" width="230" height="170"></p>
|
||
<p>Both states define entry actions (x() and y()). Whenever state A becomes
|
||
active, a call to x() will immediately be followed by a call to y(). y() could
|
||
depend on the side-effects of x(). Therefore, executing y() does not make
|
||
sense if x() fails. This is not an esoteric corner case but happens in
|
||
every-day state machines all the time. For example, x() could acquire memory
|
||
the contents of which is later modified by y(). There is a different but in
|
||
terms of error handling equally critical situation in the Tutorial under
|
||
<a href="tutorial.html#Getting state information out of the machine">Getting
|
||
state information out of the machine</a> when <code>Running::~Running()</code>
|
||
accesses its outer state <code>Active</code>. Had the entry action of <code>
|
||
Active</code> failed and had <code>Running</code> been entered anyway then
|
||
<code>Running</code>'s exit action would have invoked undefined behavior.<br>
|
||
The error handling situation with outer and inner states resembles the one
|
||
with base and derived classes: If a base class constructor fails (by throwing
|
||
an exception) the construction is aborted, the derived class constructor is
|
||
not called and the object never comes to life.</p>
|
||
<p>If an FSM framework does not account for failing actions, the user is
|
||
forced to adopt cumbersome workarounds. For example, a failing action would
|
||
have to post an appropriate error event and set a global error variable to
|
||
true. Every following action would first have to check the error variable
|
||
before doing anything. After all actions have completed (by doing nothing!),
|
||
the previously posted error event would have to be processed what would lead
|
||
to the remedy action being executed. Please note that it is not sufficient to
|
||
simply queue the error event as other events could still be pending. Instead,
|
||
the error event has absolute priority and would have to be dealt with
|
||
immediately.</p>
|
||
<p>So, to be safe, programmers would have to encapsulate the code of <b>every</b>
|
||
action in <code>if ( !error ) { /* action */ }</code> blocks. Moreover, a
|
||
<code>try { /* action */ } catch ( ... ) { /* post error event */ error =
|
||
true; }</code> statement would often have to be added because called functions
|
||
might throw and letting an exception propagate out of a user action would at
|
||
best terminate the state machine immediately. Writing all this boiler-plate
|
||
code is simply boring and quite unnecessary.</p>
|
||
<h3>Error handling support in boost::fsm</h3>
|
||
<ul>
|
||
<li>C++ exceptions are used for all error handling. Except from exit-actions
|
||
(mapped to state-destructors and exceptions should almost never be
|
||
propagated from destructors), exceptions can be propagated from all user
|
||
functions.</li>
|
||
<li>A customizable per state machine policy specifies how to convert all
|
||
exceptions propagated from user code. Out of the box, an <code>
|
||
exception_thrown</code> event is generated.</li>
|
||
<li>An exception event is always processed immediately and thus has absolute
|
||
priority over any possibly pending events. The event queue stays as it was
|
||
until the exception event has been processed.</li>
|
||
<li>The processing logic is as follows:
|
||
<ul>
|
||
<li>Exception events resulting from failed <code>react</code> functions
|
||
are sent to the <a href="definitions.html#Innermost state">innermost state</a>
|
||
that was last visited during <a href="definitions.html#Reaction">reaction</a>
|
||
search</li>
|
||
<li>Exception events resulting from failed entry actions are sent to the
|
||
outer state of the state that the machine tried to enter</li>
|
||
<li>Exception events resulting from failed transition actions are sent to
|
||
the <a href="definitions.html#Innermost common outer state">innermost
|
||
common outer state</a></li>
|
||
</ul>
|
||
<p>In the last two cases the state-machine is not in a stable state when the
|
||
exception event is generated and leaving it there (e.g. by ignoring the
|
||
exception event) would violate an invariant of state machines. So, the
|
||
exception event reaction must either be a transition or a termination to
|
||
bring the machine back into a stable state. That<61>s why the framework checks
|
||
that the state machine is stable after processing an exception event. If
|
||
this is not the case the state machine is terminated and the exception is
|
||
rethrown. </li>
|
||
</ul>
|
||
<h2><a name="Asynchronous state machines">Asynchronous state machines</a></h2>
|
||
<h3>Requirements</h3>
|
||
<p>For asynchronous state machines different applications have rather varied
|
||
requirements:</p>
|
||
<ol>
|
||
<li>In some applications each state machine needs to run in its own thread,
|
||
other applications are single-threaded and run all machines in the same
|
||
thread</li>
|
||
<li>For some applications a FIFO scheduler is perfect, others need priority-
|
||
or EDF-schedulers</li>
|
||
<li>For some applications the boost::thread library is just fine, others
|
||
might want to use another threading library, yet other applications run on
|
||
OS-less platforms where ISRs are the only mode of (apparently) concurrent
|
||
execution</li>
|
||
</ol>
|
||
<h3>Out of the box behavior</h3>
|
||
<p>By default, <code>asynchronous_state_machine<></code> subclass objects are
|
||
serviced by a <code>fifo_scheduler<></code> object. <code>fifo_scheduler<></code>
|
||
does not lock or wait in single-threaded applications and uses boost::thread
|
||
primitives to do so in multi-threaded programs. Moreover, a <code>
|
||
fifo_scheduler<></code> object can service an arbitrary number of <code>
|
||
asynchronous_state_machine<></code> subclass objects. Under the hood, <code>
|
||
fifo_scheduler<></code> is just a thin wrapper around an object of its <code>
|
||
FifoWorker</code> template parameter (which manages the queue and ensures
|
||
thread safety) and a <code>processor_container<></code> (which manages the
|
||
lifetime of the state machines).</p>
|
||
<h3>Customization</h3>
|
||
<p>If a user needs to customize the scheduler behavior she can do so by
|
||
instantiating <code>fifo_scheduler<></code> with her own class modeling the
|
||
<code>FifoWorker</code> concept. I considered a much more generic design where
|
||
locking and waiting is implemented in a policy but I have so far failed to
|
||
come up with a clean and simple interface for it. Especially the waiting is a
|
||
bit difficult to model as some platforms have condition variables, others have
|
||
events and yet others don't have any notion of waiting whatsoever (they
|
||
instead loop until a new event arrives, presumably via an ISR). Given the
|
||
relatively few lines of code required to implement a custom <code>FifoWorker</code>
|
||
type and the fact that almost all applications will implement at most one such
|
||
class, it does not seem to be worthwhile anyway.</p>
|
||
<p>Applications requiring a less or more sophisticated event processor
|
||
lifetime management can customize the behavior at a more coarse level, by
|
||
using a custom <code>Scheduler</code> type. This is currently also true for
|
||
applications requiring non-FIFO queuing schemes. However, boost::fsm will
|
||
probably provide a <code>priority_scheduler</code> in the future so that
|
||
custom schedulers need to be implemented only in rare cases.</p>
|
||
<h2><a name="User actions: Member functions vs. function objects">User
|
||
actions: Member functions vs. function objects</a></h2>
|
||
<p>All user-supplied functions (<code>react</code> member functions, entry-,
|
||
exit- and transition-actions) must be class members. The reasons for this are
|
||
as follows: </p>
|
||
<ul>
|
||
<li>The concept of state-local storage mandates that state-entry and
|
||
state-exit actions (mapped to constructors and destructors) are implemented
|
||
as members.</li>
|
||
<li><code>react</code> member functions and transition actions often access
|
||
state-local data. So, it is most natural to implement these functions as
|
||
members of the class the data of which the functions will operate on anyway.</li>
|
||
</ul>
|
||
<h2><a name="Speed versus scalability tradeoffs">Speed versus scalability
|
||
tradeoffs</a></h2>
|
||
<p>Quite a bit of effort has gone into making the library fast for small
|
||
simple machines <b>and</b> scaleable at the same time (this applies only to
|
||
<code>state_machine<></code>, there still is some room for optimizing <code>
|
||
fifo_scheduler<></code>, especially for multi-threaded builds). While I
|
||
believe it should perform reasonably in most applications, the scalability
|
||
does not come for free. Small, carefully handcrafted state machines will thus
|
||
easily outperform equivalent boost::fsm machines. To get a picture of how big
|
||
the gap is, I implemented a simple benchmark in the BitMachine example. The
|
||
Handcrafted example is a handcrafted variant of the 1-bit-BitMachine
|
||
implementing the same benchmark.</p>
|
||
<p>I tried to create a fair but somewhat unrealistic <b>worst-case</b>
|
||
scenario:</p>
|
||
<ul>
|
||
<li>For both machines exactly one object of the only event is allocated
|
||
before starting the test. This same object is then sent to the machines over
|
||
and over.</li>
|
||
<li>The Handcrafted machine employs GOF-visitor double dispatch. The states
|
||
are preallocated so that event dispatch & transition amounts to nothing more
|
||
than two virtual calls and one pointer assignment.</li>
|
||
</ul>
|
||
<p>The Benchmarks - compiled with MSVC7.1 (single threaded), running on an
|
||
Intel Pentium M 1600 - produced the following results:</p>
|
||
<ul>
|
||
<li>Handcrafted: 10 nanoseconds to dispatch one event and make the resulting
|
||
transition.</li>
|
||
<li>1-bit-BitMachine with customized memory management: 210 nanoseconds to
|
||
dispatch one event and make the resulting transition.</li>
|
||
</ul>
|
||
<p>Although this is a big difference I still think it will not be noticeable
|
||
in most real-world applications. No matter whether an application uses
|
||
handcrafted or boost::fsm machines it will...</p>
|
||
<ul>
|
||
<li>almost never run into a situation where a state machine is swamped with
|
||
as many events as in the benchmarks. Unless a state machine is abused for
|
||
parsing, it will typically spend a good deal of time waiting for events</li>
|
||
<li>often run state machines in their own threads. This adds considerable
|
||
locking and thread-switching overhead. Performance tests with the PingPong
|
||
example, where two asynchronous state machines exchange events, gave the
|
||
following times to process one event and perform the resulting in-state
|
||
reaction (using the library with <code>boost::fast_pool_allocator<></code>):<ul>
|
||
<li>Single-threaded (no locking and waiting): 590ns</li>
|
||
<li>Multi-threaded with one thread (the scheduler uses mutex locking but
|
||
never has to wait for events): 4300ns</li>
|
||
<li>Multi-threaded with two threads (both schedulers use mutex locking and
|
||
exactly one always waits for an event): 7000ns</li>
|
||
</ul>
|
||
<p>As mentioned above there definitely is some room to improve the
|
||
multi-threaded timings. However, a quick test has shown that the MT overhead
|
||
will always be well over 1000ns per event. Handcrafted machines will
|
||
inevitably have the same overhead, making raw single-threaded dispatch and
|
||
transition speed much less important</li>
|
||
<li>almost always allocate events with <code>new</code> and destroy them
|
||
after consumption. This will add a few cycles, even if event memory
|
||
management is customized</li>
|
||
<li>often use state machines that employ orthogonal states and other
|
||
advanced features. This forces the handcrafted machines to use a more
|
||
adequate and more time-consuming book-keeping</li>
|
||
</ul>
|
||
<p>Therefore, in real-world applications event dispatch and transition not
|
||
normally constitutes a bottleneck and the relative gap between handcrafted and
|
||
boost::fsm machines also becomes much smaller than in the worst-case scenario.</p>
|
||
<p>BitMachine measurements with more states and with different levels of
|
||
optimization:</p>
|
||
<table border="3" width="100%" id="AutoNumber2" cellpadding="2">
|
||
<tr>
|
||
<td width="25%" rowspan="2"><b>Machine configuration<br>
|
||
# states / # outgoing transitions per state</b></td>
|
||
<td width="75%" colspan="3"><b>Event dispatch & transition time
|
||
[nanoseconds]</b></td>
|
||
</tr>
|
||
<tr>
|
||
<td width="25%">Out of the box</td>
|
||
<td width="25%">Same as out of the box but with <code>
|
||
<a href="configuration.html#Application Defined Macros">
|
||
BOOST_FSM_USE_NATIVE_RTTI</a></code> defined</td>
|
||
<td width="25%">Same as out of the box but with customized memory
|
||
management</td>
|
||
</tr>
|
||
<tr>
|
||
<td width="25%">2 / 1</td>
|
||
<td width="25%">680</td>
|
||
<td width="25%">790</td>
|
||
<td width="25%">210</td>
|
||
</tr>
|
||
<tr>
|
||
<td width="25%">4 / 2</td>
|
||
<td width="25%">690</td>
|
||
<td width="25%">850</td>
|
||
<td width="25%">210</td>
|
||
</tr>
|
||
<tr>
|
||
<td width="25%">8 / 3</td>
|
||
<td width="25%">690</td>
|
||
<td width="25%">910</td>
|
||
<td width="25%">220</td>
|
||
</tr>
|
||
<tr>
|
||
<td width="25%">16 / 4</td>
|
||
<td width="25%">710</td>
|
||
<td width="25%">990</td>
|
||
<td width="25%">230</td>
|
||
</tr>
|
||
<tr>
|
||
<td width="25%">32 / 5</td>
|
||
<td width="25%">740</td>
|
||
<td width="25%">1090</td>
|
||
<td width="25%">240</td>
|
||
</tr>
|
||
<tr>
|
||
<td width="25%">64 / 6</td>
|
||
<td width="25%">820</td>
|
||
<td width="25%">1250</td>
|
||
<td width="25%">310</td>
|
||
</tr>
|
||
</table>
|
||
<h3>Possible optimizations</h3>
|
||
<p>Currently, <code>std::list<></code>s are used for event and state storage.
|
||
These could be replaced with an intrusive linked list container what would
|
||
eliminate 50% of the <code>operator new()</code> and <code>operator delete()</code>
|
||
calls made during an event dispatch & transition cycle of the smallest
|
||
BitMachine. I would guess that this could speed it up by 25%-50%. However,
|
||
dispatch time is not affected and can quickly consume considerable time, as
|
||
the 6-bit-BitMachine shows. Moreover, most states of real-world machines are
|
||
quite deeply nested and the average transition involves the deallocation and
|
||
allocation of 2 states. Since <code>std::list<></code> allocations occur only
|
||
once per transition and orthogonal region, the relative performance gain of
|
||
this optimization becomes much smaller for typical machines and does not seem
|
||
to be worth the effort of hand-crafting an intrusive linked list.</p>
|
||
<h2><a name="Memory management customization">Memory management customization</a></h2>
|
||
<p>Out of the box, all internal data is allocated on the normal heap. This
|
||
should be satisfactory for applications where all the following prerequisites
|
||
are met:</p>
|
||
<ul>
|
||
<li>There are no deterministic reaction time (hard real-time) requirements.</li>
|
||
<li>The application will typically not process more than a handful of events
|
||
per second. This is just a general guideline, some platforms can easily cope
|
||
with more than 100000 events per second (see timings above).</li>
|
||
<li>The application will never run long enough for heap fragmentation to
|
||
become a problem. This is of course an issue for all long running programs
|
||
not only the ones employing this library. However, it should be noted that
|
||
fragmentation problems could show up earlier than with traditional FSM
|
||
frameworks.</li>
|
||
</ul>
|
||
<p>Should a system not meet any of these prerequisites customization of all
|
||
memory management (not just boost::fsm's) should be considered, which is
|
||
supported as follows:</p>
|
||
<ul>
|
||
<li>By passing a class offering a <code>std::allocator<></code> interface
|
||
for the <code>Allocator</code> parameter of the <code>state_machine</code>
|
||
class template. The <code>rebind</code> member template is used to customize
|
||
memory allocation of the internal containers.</li>
|
||
<li>By replacing the <code>simple_state</code>, <code>state</code> and <code>
|
||
event</code> class templates with ones that have a customized <code>operator
|
||
new()</code> and <code>operator delete()</code>. This can be as easy as
|
||
inheriting your customized class templates from the framework-supplied class
|
||
templates <b>and</b> your preferred small-object/deterministic/constant-time
|
||
allocator base class.</li>
|
||
</ul>
|
||
<p><code>simple_state<></code> and <code>state<></code> subclass objects are
|
||
constructed and destructed only by the state machine. It would therefore be
|
||
possible to use the <code>state_machine<></code> allocator instead of forcing
|
||
the user to overload <code>operator new()</code> and <code>operator delete()</code>.
|
||
However, a lot of systems employ at most one instance of a particular state
|
||
machine, which means that a) there is at most one object of a particular state
|
||
and b) this object is always constructed, accessed and destructed by one and
|
||
the same thread. We can exploit these facts in a much simpler (and faster)
|
||
<code>new</code>/<code>delete</code> implementation (for example, see
|
||
UniqueObject.hpp in the BitMachine example). However, this is only possible as
|
||
long as we have the freedom to customize memory management for state classes
|
||
separately.</p>
|
||
<h2><a name="RTTI customization">RTTI customization</a></h2>
|
||
<p>RTTI is used for event dispatch and <code>state_downcast<>()</code>.
|
||
Currently, there are exactly two options:</p>
|
||
<ol>
|
||
<li>By default, a speed-optimized internal implementation is employed</li>
|
||
<li>The library can be instructed to use native C++ RTTI instead by defining
|
||
<code><a href="configuration.html#Application Defined Macros">
|
||
BOOST_FSM_USE_NATIVE_RTTI</a></code></li>
|
||
</ol>
|
||
<p>Just about the only reason to favor 2 is the fact that state and event
|
||
objects need to store one pointer less, meaning that in the best case the
|
||
memory footprint of a state machine object could shrink by 15%. However, on
|
||
most platforms executable size grows when C++ RTTI is turned on. So, given the
|
||
small per machine object savings, option 2 only makes sense in applications
|
||
where both of the following conditions hold:</p>
|
||
<ul>
|
||
<li>So few events are processed that event dispatch will never become a
|
||
bottleneck</li>
|
||
<li>There is a need to reduce the memory allocated at runtime (at the cost
|
||
of a larger executable)</li>
|
||
</ul>
|
||
<p>Obvious candidates are embedded systems where the executable resides in
|
||
ROM. Other candidates are applications running a large number of identical
|
||
state machines where this measure could even reduce the <b>overall</b> memory
|
||
footprint.</p>
|
||
<h2><a name="Double dispatch">Double dispatch</a></h2>
|
||
<p>At the heart of every state machine lies an implementation of double
|
||
dispatch. This is due to the fact that the incoming event <b>and</b> the
|
||
active state define exactly which <a href="definitions.html#Reaction">reaction</a>
|
||
the state machine will produce. For each event dispatch, one virtual call is
|
||
followed by a linear search for the appropriate reaction, using one RTTI
|
||
comparison per reaction. The following alternatives were considered but
|
||
rejected:</p>
|
||
<ul>
|
||
<li><a href="http://www.objectmentor.com/resources/articles/acv.pdf">Acyclic
|
||
visitor</a>: This double-dispatch variant satisfies all scalability
|
||
requirements but performs badly due to costly inheritance tree cross-casts.
|
||
Moreover, a state must store one v-pointer for <b>each</b> reaction what
|
||
slows down construction and makes memory management customization
|
||
inefficient. In addition, C++ RTTI must inevitably be turned on, with
|
||
negative effects on executable size. boost::fsm originally employed acyclic
|
||
visitor and was about 4 times slower than it is now (MSVC7.1 on Intel
|
||
Pentium M). The dispatch speed might be better on other platforms but the
|
||
other negative effects will remain.</li>
|
||
<li>
|
||
<a href="http://www.isbiel.ch/~due/courses/c355/slides/patterns/visitor.pdf">
|
||
GOF Visitor</a>: The GOF Visitor pattern inevitably makes the whole machine
|
||
depend upon all events. That is, whenever a new event is added there is no
|
||
way around recompiling the whole state machine. This is contrary to the
|
||
scalability requirements.</li>
|
||
<li>Two-dimensional array of function pointers: To satisfy requirement 6, it
|
||
should be possible to spread a single state machine over several translation
|
||
units. This however means that the dispatch table must be filled at runtime
|
||
and the different translation units must somehow make themselves "known", so
|
||
that their part of the state machine can be added to the table. There simply
|
||
is no way to do this automatically <b>and</b> portably. The only portable
|
||
way that a state machine distributed over several translation units could
|
||
employ table-based double dispatch relies on the user. The programmer(s)
|
||
would somehow have to <b>manually</b> tie together the various pieces of the
|
||
state machine. Not only does this scale badly but is also quite error-prone.</li>
|
||
</ul>
|
||
<h2><a name="Resource usage">Resource usage</a></h2>
|
||
<h3>Memory</h3>
|
||
<p>On a 32-bit box, one empty active state typically needs less than 50 bytes
|
||
of memory. Even <b>very</b> complex machines will usually have less than 20
|
||
simultaneously active states so just about every machine should run with less
|
||
than one kilobyte of memory (not counting event queues). Obviously, the
|
||
per-machine memory footprint is offset by whatever state-local members the
|
||
user adds.</p>
|
||
<h3>Processor cycles</h3>
|
||
<p>The following ranking should give a rough picture of what feature will
|
||
consume how many cycles:</p>
|
||
<ol>
|
||
<li><code>state_cast<>()</code>: By far the most cycle-consuming feature.
|
||
Searches linearly for a suitable state, using one <code>dynamic_cast</code>
|
||
per visited state.</li>
|
||
<li>State entry and exit: Profiling of the fully optimized 1-bit-BitMachine
|
||
suggested that about 100ns of the 210ns total are spent destructing the
|
||
exited state and constructing the entered state. Obviously, transitions
|
||
where the <a href="definitions.html#Innermost common outer state">innermost
|
||
common outer state</a> is "far" from the leaf states and/or with lots of
|
||
orthogonal states can easily cause the destruction and construction of quite
|
||
a few states leading to significant amounts of time spent for a transition.</li>
|
||
<li><code>state_downcast<>()</code>: Searches linearly for the requested
|
||
state, using one virtual call and one RTTI comparison per visited state.</li>
|
||
<li>History: For a state containing a history pseudo state a binary search
|
||
through the (usually small) history map must be performed on each entry and
|
||
exit. History slot allocation is performed exactly once, before first entry.</li>
|
||
<li>Event dispatch: One virtual call followed by a linear search for a
|
||
suitable <a href="definitions.html#Reaction">reaction</a>, using one RTTI
|
||
comparison per visited reaction.</li>
|
||
<li>Orthogonal states: One additional virtual call for each exited state <b>
|
||
if</b> there is more than one active leaf state before a transition. It
|
||
should also be noted that the worst-case event dispatch time is multiplied
|
||
in the presence of orthogonal states. For example, if two orthogonal leaf
|
||
states are added to a given state configuration, the worst-case time is
|
||
tripled.</li>
|
||
</ol>
|
||
<h2><a name="Limitations">Limitations</a></h2>
|
||
<h4>Deferring and posting events</h4>
|
||
<p>For performance reasons and because synchronous state machines often do not
|
||
need to queue events, it is possible to operate such machines entirely with
|
||
stack-allocated events. However, as soon as events need to be deferred and/or
|
||
posted there is no way around queuing and allocation with <code>new</code>.
|
||
The interface of <code>simple_state<>::post_event</code> enforces the use of
|
||
<code>boost::intrusive_ptr<></code> at compile time. But there is no way to do
|
||
the same for deferred events because allocation and deferral happen in
|
||
completely unrelated places. Of course, a "wrongly" allocated event could
|
||
easily be transformed into one allocated with <code>new</code> and pointed to
|
||
by <code>boost::intrusive_ptr<></code> with a virtual <code>clone()</code>
|
||
function. However, in my experience, event deferral is needed only very rarely
|
||
in synchronous state machines and the asynchronous variant enforces the use of
|
||
<code>boost::intrusive_ptr<></code> anyway. So, most users won't run into this
|
||
limitation and I rejected the <code>clone()</code> idea because it could cause
|
||
inefficiencies casual users wouldn't be aware of. In addition, users not
|
||
needing event deferral would nevertheless pay with increased code size.</p>
|
||
<h4>Junction points</h4>
|
||
<p>UML junction points are not supported because arbitrarily complex guard
|
||
expressions can easily be implemented with <code>custom_reaction<></code>s.</p>
|
||
<h4>Dynamic choice points</h4>
|
||
<p>Currently there is no direct support for this UML element because its
|
||
behavior can often be implemented with <code>custom_reaction<></code>s. In
|
||
rare cases this is not possible, namely when a choice point happens to be the
|
||
initial state. Then, the behavior can easily be implemented as follows:</p>
|
||
<pre>struct make_choice : fsm::event< make_choice > {};
|
||
|
||
// universal choice point base class template
|
||
template< class MostDerived, class Context >
|
||
struct choice_point : fsm::state< MostDerived, Context,
|
||
fsm::custom_reaction< make_choice > >
|
||
{
|
||
typedef fsm::state< MostDerived, Context,
|
||
fsm::custom_reaction< make_choice > > base_type;
|
||
typedef typename base_type::my_context my_context;
|
||
typedef choice_point my_base;
|
||
|
||
choice_point( my_context ctx ) : base_type( ctx )
|
||
{
|
||
base_type::post_event(
|
||
boost::intrusive_ptr< make_choice >( new make_choice() ) );
|
||
}
|
||
};
|
||
|
||
// ...
|
||
|
||
struct MyChoicePoint;
|
||
struct Machine : fsm::state_machine< Machine, MyChoicePoint > {};
|
||
|
||
struct Destination1;
|
||
struct Destination2;
|
||
struct Destination3;
|
||
struct MyChoicePoint : choice_point< MyChoicePoint, Machine >
|
||
{
|
||
MyChoicePoint( my_context ctx ) : my_base( ctx ) {}
|
||
|
||
fsm::result react( const make_choice & )
|
||
{
|
||
if ( /* ... */ )
|
||
{
|
||
return transit< Destination1 >();
|
||
}
|
||
else if ( /* ... */ )
|
||
{
|
||
return transit< Destination2 >();
|
||
}
|
||
else
|
||
{
|
||
return transit< Destination3 >();
|
||
}
|
||
}
|
||
};</pre>
|
||
<p><code>choice_point<></code> is not currently part of boost::fsm, mainly
|
||
because I fear that beginners could use it in places where they would be
|
||
better off with <code>custom_reaction<></code>. If the demand is high enough I
|
||
will add it to the library.</p>
|
||
<h4>Deep history of orthogonal regions</h4>
|
||
<p>Deep history of states with orthogonal regions is currently not supported:</p>
|
||
<p><img border="0" src="DeepHistoryLimitation1.gif" width="331" height="346"></p>
|
||
<p>Attempts to implement this state chart will lead to a compile-time error
|
||
because B has orthogonal regions and its direct or indirect outer state
|
||
contains a deep history pseudo state. In other words, a state containing a
|
||
deep history pseudo state must not have any direct or indirect inner states
|
||
which themselves have orthogonal regions. This limitation stems from the fact
|
||
that full deep history support would be more complicated to implement and
|
||
would consume more resources than the currently implemented limited deep
|
||
history support. Moreover, full deep history behavior can easily be
|
||
implemented with shallow history:</p>
|
||
<p><img border="0" src="DeepHistoryLimitation2.gif" width="332" height="347"></p>
|
||
<p>Of course, this only works if C, D, E or any of their direct or indirect
|
||
inner states do not have orthogonal regions. If not so then this pattern has
|
||
to be applied recursively.</p>
|
||
<h4>Synchronization (join and fork) bars</h4>
|
||
<p><img border="0" src="JoinAndFork.gif" width="541" height="301"></p>
|
||
<p>Synchronization bars are not supported, that is, a transition always
|
||
originates at exactly one state and always ends at exactly one state. In my
|
||
experience join bars are sometimes useful but their behavior can easily be
|
||
emulated with guards. Fork bars are needed only rarely. Their support would
|
||
complicate the implementation quite a bit.</p>
|
||
<h4>Event dispatch to orthogonal regions</h4>
|
||
<p>The boost::fsm event dispatch algorithm is different to the one specified
|
||
in
|
||
<a href="http://www.wisdom.weizmann.ac.il/~dharel/SCANNED.PAPERS/Statecharts.pdf">
|
||
David Harel's original paper</a> and in the
|
||
<a href="http://www.omg.org/cgi-bin/doc?formal/03-03-01">UML standard</a>.
|
||
Both mandate that each event is dispatched to all orthogonal regions of a
|
||
state machine. Example:</p>
|
||
<p><img border="0" src="EventDispatch.gif" width="436" height="211"></p>
|
||
<p>Here the Harel/UML dispatch algorithm specifies that the machine must
|
||
transition from (B,D) to (C,E) when an EvX event is processed. Because of the
|
||
subtleties that Harel describes in chapter 7 of
|
||
<a href="http://www.wisdom.weizmann.ac.il/~dharel/SCANNED.PAPERS/Statecharts.pdf">
|
||
his paper</a>, an implementation of this algorithm is not only quite complex
|
||
but also much slower than the simplified version employed by boost::fsm, which
|
||
stops searching for <a href="definitions.html#Reaction">reactions</a> as soon
|
||
as it has found one suitable for the current event. That is, had the example
|
||
been implemented with this library, the machine would have transitioned
|
||
non-deterministically from (B,D) to either (C,D) or (B,E). This version was
|
||
chosen because, in my experience, in real-world machines different orthogonal
|
||
regions often do not specify transitions for the same events. For the rare
|
||
cases when they do, the UML behavior can easily be emulated as follows:</p>
|
||
<p><img border="0" src="SimpleEventDispatch.gif" width="466" height="226"></p>
|
||
<h4>Transitions across orthogonal regions</h4>
|
||
<p>
|
||
<img border="0" src="TransitionsAcrossOrthogonalRegions.gif" width="226" height="271"></p>
|
||
<p>Such transitions are currently flagged with an error at compile time (the
|
||
UML specifications explicitly allow them while Harel does not mention them at
|
||
all). I decided to not support them because I have erroneously tried to
|
||
implement such a transition several times but have never come across a
|
||
situation where it would make any sense. If you need to make such transitions,
|
||
please do let me know!</p>
|
||
<hr>
|
||
<p>Revised
|
||
<!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->16 March, 2004<!--webbot bot="Timestamp" endspan i-checksum="28873" --></p>
|
||
<p><i>Copyright <20> <a href="mailto:ah2003@gmx.net">Andreas Huber D<>nni</a>
|
||
2003-2004. Use, modification and distribution are subject to the Boost
|
||
Software License, Version 1.0. (See accompanying file
|
||
<a href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or copy at
|
||
<a href="http://www.boost.org/LICENSE_1_0.txt">
|
||
http://www.boost.org/LICENSE_1_0.txt</a>)</i></p>
|
||
|
||
</body>
|
||
|
||
</html>
|