statechart/doc/rationale.html

<html>

<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="GENERATOR" content="Microsoft FrontPage 5.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
<title>The boost::fsm library - Rationale</title>
</head>

<body link="#0000ff" vlink="#800080">

<table border="0" cellpadding="7" cellspacing="0" width="100%" summary="header">
  <tr>
    <td valign="top" width="300">
    <h3><a href="../../../index.htm">
    <img alt="C++ Boost" src="../../../c++boost.gif" border="0" width="277" height="86"></a></h3>
    </td>
    <td valign="top">
    <h1 align="center">The boost::fsm library</h1>
    <h2 align="center">Rationale</h2>
    </td>
  </tr>
</table>
<hr>
<dl class="index">
  <dt><a href="#Introduction">Introduction</a></dt>
  <dt><a href="#Why yet another state machine framework">Why yet another state
  machine framework</a></dt>
  <dt><a href="#State-local storage">State-local storage</a></dt>
  <dt><a href="#Dynamic configurability">Dynamic configurability</a></dt>
  <dt><a href="#Error handling">Error handling</a></dt>
  <dt><a href="#Asynchronous state machines">Asynchronous state machines</a></dt>
  <dt><a href="#User actions: Member functions vs. function objects">User
  actions: Member functions vs. function objects</a></dt>
  <dt><a href="#Speed versus scalability tradeoffs">Speed versus scalability
  tradeoffs</a></dt>
  <dt><a href="#Memory management customization">Memory management
  customization</a></dt>
  <dt><a href="#RTTI customization">RTTI customization</a></dt>
  <dt><a href="#Double dispatch">Double dispatch</a></dt>
  <dt><a href="#Resource usage">Resource usage</a></dt>
  <dt><a href="#Limitations">Limitations</a></dt>
</dl>
<h2><a name="Introduction">Introduction</a></h2>
<p>Most of the design decisions made during the development of this library
are the result of the following requirements.</p>
<p>boost::fsm should ...</p>
<ol>
  <li>be fully type-safe. Whenever possible, type mismatches should be flagged
  with an error at compile-time</li>
  <li>not require the use of a code generator. A lot of the existing FSM
  solutions force the developer to design the state machine either graphically
  or in a specialized language. All or part of the code is then generated</li>
  <li>allow for easy transformation of a UML statechart (defined in
  <a href="http://www.omg.org/cgi-bin/doc?formal/03-03-01">
  http://www.omg.org/cgi-bin/doc?formal/03-03-01</a>) into a working state
  machine. Vice versa, an existing C++ implementation of a state machine
  should be fairly trivial to transform into a UML statechart. Specifically,
  the following state machine features should be supported:
  <ul>
    <li>Hierarchical (composite, nested) states</li>
    <li>Orthogonal (concurrent) states</li>
    <li>Entry-, exit- and transition-actions</li>
    <li>Guards</li>
    <li>Shallow/deep history</li>
  </ul>
  </li>
  <li>produce a customizable reaction when a C++ exception is propagated from
  user code</li>
  <li>support synchronous and asynchronous state machines and leave it to the
  user which thread an asynchronous state machine will run in. Users should
  also be able to use the threading library of their choice</li>
  <li>support the development of arbitrarily large and complex state machines.
  Multiple developers should be able to work on the same state machine
  simultaneously</li>
  <li>allow the user to customize all resource management so that the library
  could be used for applications with hard real-time requirements</li>
  <li>enforce as much as possible at compile time. Specifically, invalid state
  machines should not compile</li>
  <li>offer reasonable performance for a wide range of applications</li>
</ol>
<h2><a name="Why yet another state machine framework">Why yet another state
machine framework?</a></h2>
<p>Before I started to develop this library I had a look at the following
frameworks:</p>
<ul>
  <li>The framework accompanying the book &quot;Practical Statecharts in C/C++&quot; by
  Miro Samek, CMP Books, ISBN: 1-57820-110-1<br>
  <a href="http://www.quantum-leaps.com">http://www.quantum-leaps.com<br>
  </a>Fails to satisfy at least the requirements 1, 3, 4, 6, 8.</li>
  <li>The framework accompanying &quot;Rhapsody in C++&quot; by ILogix (a code generator
  solution)<br>
  <a href="http://www.ilogix.com/products/rhapsody/rhap_incplus.cfm">
  http://www.ilogix.com/products/rhapsody/rhap_incplus.cfm<br>
  </a>This might look like comparing apples with oranges. However, there is no
  inherent reason why a code generator couldn't produce code that can easily
  be understood and modified by humans. Fails to satisfy at least the
  requirements 2, 4, 5, 6, 8 (there is quite a bit of error checking before
  code generation, though).</li>
  <li>The framework accompanying the article &quot;State Machine Design in C++&quot;<br>
  <a href="http://www.cuj.com/articles/2000/0005/0005f/0005f.htm?topic=articles">
  http://www.cuj.com/articles/2000/0005/0005f/0005f.htm?topic=articles<br>
  </a>Fails to satisfy at least the requirements 1, 3, 4, 5 (there is no
  direct threading support), 6, 8.</li>
</ul>
<p>I believe boost::fsm satisfies all requirements.</p>
<h2><a name="State-local storage">State-local storage</a></h2>
<p>This not yet widely known state machine feature is enabled by the fact that
every state is represented by a class. Upon state-entry, an object of the
class is constructed and the object is later destructed when the state machine
exits the state. Any data that is useful only as long as the machine resides
in the state can (and should) thus be a member of the state. This feature
paired with the ability to spread a state machine over several translation
units makes possible virtually unlimited scalability.&nbsp;</p>
<p>In most existing FSM frameworks the whole state machine runs in one
environment (context). That is, all resource handles and variables local to
the state machine are stored in one place (normally as members of the class
that also derives from some state machine base class). For large state
machines this often leads to the class having a huge number of data members
most of which are needed only briefly in a tiny part of the machine. The state
machine class therefore often becomes a change hotspot what leads to frequent
recompilations of the whole state machine.&nbsp;</p>
<h2><a name="Dynamic configurability">Dynamic configurability</a></h2>
<h3>Two types of state machine frameworks</h3>
<ul>
  <li>A state machine framework supports dynamic configurability if the whole
  layout of a state machine can be defined at runtime (&quot;layout&quot; refers to
  states and transitions, actions are still specified with normal C++ code).
  That is, data only available at runtime can be used to build arbitrarily
  large machines. See &quot;A Multiple Substring Search Algorithm&quot; by Moishe
  Halibard and Moshe Rubin in June 2002 issue of CUJ for a good example
  (unfortunately not available online).</li>
  <li>On the other side are state machine frameworks which require the layout
  to be specified at compile time.</li>
</ul>
<p>State machines that are built at runtime almost always get away with a
simple state model (no hierarchical states, no orthogonal states, no entry and
exit actions, no history) because the layout is very often <b>computed by an
algorithm</b>. On the other hand, machine layouts that are fixed at compile
time are almost always designed by humans, who frequently need/want a
sophisticated state model in order to keep the complexity at acceptable
levels. Dynamically configurable FSM frameworks are therefore often optimized
for simple flat machines while incarnations of the static variant tend to
offer more features for abstraction.</p>
<p>However, fully-featured dynamic FSM libraries do exist. So, the question
is:</p>
<h3>Why not use a dynamically configurable FSM library for all state machines?</h3>
<p>One might argue that a dynamically configurable FSM framework is all one
ever needs because <b>any</b> state machine can be implemented with it.
However, due to its nature such a framework has a number of disadvantages when
used to implement static machines:</p>
<ul>
  <li>No compile-time optimizations and validations can be made. For example,
  boost::fsm determines the
  <a href="definitions.html#Innermost common outer state">innermost common
  outer state</a> of the transition-source and destination state at compile
  time. Moreover, compile time checks ensure that the state machine is valid
  (e.g. that there are no transitions between orthogonal states).</li>
  <li>Double dispatch must inevitably be implemented with some kind of a
  table. As argued under <a href="#Double dispatch">Double dispatch</a>, this
  scales badly.</li>
  <li>To warrant fast table lookup, states and events must be represented with
  an integer. To keep the table as small as possible, the numbering should be
  continuous, e.g. if there are ten states, it's best to use the ids 0-9. To
  ensure continuity of ids, all states are best defined in the same header
  file. The same applies for the events. Again, this does not scale.</li>
  <li>Because events carrying parameters are not represented by a type, some
  sort of a generic event with a property map must be used and type-safety is
  enforced at runtime rather than at compile time.</li>
</ul>
<p>It is for these reasons, that boost::fsm was built from ground up to <b>not</b>
support dynamic configurability. However, this does not mean that it's
impossible to dynamically shape a machine implemented with this library. For
example, guards can be used to make different transitions depending on input
only available at runtime. However, such layout changes will always be limited
to what can be foreseen before compilation. A somewhat related library, the
boost::spirit parser framework, allows for roughly the same runtime
configurability. </p>
<h2><a name="Error handling">Error handling</a></h2>
<p>There is not a single word about error handling in the UML state machine
semantics specifications. Moreover, most existing FSM solutions also seem to
ignore the issue.&nbsp;</p>
<h3>Why an FSM library should support error handling</h3>
<p>Consider the following state configuration:</p>
<p><img border="0" src="A.gif" width="230" height="170"></p>
<p>Both states define entry actions (x() and y()). Whenever state A becomes
active, a call to x() will immediately be followed by a call to y(). y() could
depend on the side-effects of x(). Therefore, executing y() does not make
sense if x() fails. This is not an esoteric corner case but happens in
every-day state machines all the time. For example, x() could acquire memory
the contents of which is later modified by y(). There is a different but in
terms of error handling equally critical situation in the Tutorial under
<a href="tutorial.html#Getting state information out of the machine">Getting
state information out of the machine</a> when <code>Running::~Running()</code>
accesses its outer state <code>Active</code>. Had the entry action of <code>
Active</code> failed and had <code>Running</code> been entered anyway then
<code>Running</code>'s exit action would have invoked undefined behavior.<br>
The error handling situation with outer and inner states resembles the one
with base and derived classes: If a base class constructor fails (by throwing
an exception) the construction is aborted, the derived class constructor is
not called and the object never comes to life.</p>
<p>If an FSM framework does not account for failing actions, the user is
forced to adopt cumbersome workarounds. For example, a failing action would
have to post an appropriate error event and set a global error variable to
true. Every following action would first have to check the error variable
before doing anything. After all actions have completed (by doing nothing!),
the previously posted error event would have to be processed what would lead
to the remedy action being executed. Please note that it is not sufficient to
simply queue the error event as other events could still be pending. Instead,
the error event has absolute priority and would have to be dealt with
immediately.</p>
<p>So, to be safe, programmers would have to encapsulate the code of <b>every</b>
action in <code>if ( !error ) { /* action */ }</code> blocks. Moreover, a
<code>try { /* action */ } catch ( ... ) { /* post error event */ error =
true; }</code> statement would often have to be added because called functions
might throw and letting an exception propagate out of a user action would at
best terminate the state machine immediately. Writing all this boiler-plate
code is simply boring and quite unnecessary.</p>
<h3>Error handling support in boost::fsm</h3>
<ul>
  <li>C++ exceptions are used for all error handling. Except from exit-actions
  (mapped to state-destructors and exceptions should almost never be
  propagated from destructors), exceptions can be propagated from all user
  functions.</li>
  <li>A customizable per state machine policy specifies how to convert all
  exceptions propagated from user code. Out of the box, an <code>
  exception_thrown</code> event is generated.</li>
  <li>An exception event is always processed immediately and thus has absolute
  priority over any possibly pending events. The event queue stays as it was
  until the exception event has been processed.</li>
  <li>The processing logic is as follows:
  <ul>
    <li>Exception events resulting from failed <code>react</code> functions
    are sent to the <a href="definitions.html#Innermost state">innermost state</a>
    that was last visited during <a href="definitions.html#Reaction">reaction</a>
    search</li>
    <li>Exception events resulting from failed entry actions are sent to the
    outer state of the state that the machine tried to enter</li>
    <li>Exception events resulting from failed transition actions are sent to
    the <a href="definitions.html#Innermost common outer state">innermost
    common outer state</a></li>
  </ul>
  <p>In the last two cases the state-machine is not in a stable state when the
  exception event is generated and leaving it there (e.g. by ignoring the
  exception event) would violate an invariant of state machines. So, the
  exception event reaction must either be a transition or a termination to
  bring the machine back into a stable state. That<61>s why the framework checks
  that the state machine is stable after processing an exception event. If
  this is not the case the state machine is terminated and the exception is
  rethrown. </li>
</ul>
<h2><a name="Asynchronous state machines">Asynchronous state machines</a></h2>
<h3>Requirements</h3>
<p>For asynchronous state machines different applications have rather varied
requirements:</p>
<ol>
  <li>In some applications each state machine needs to run in its own thread,
  other applications are single-threaded and run all machines in the same
  thread</li>
  <li>For some applications a FIFO scheduler is perfect, others need priority-
  or EDF-schedulers</li>
  <li>For some applications the boost::thread library is just fine, others
  might want to use another threading library, yet other applications run on
  OS-less platforms where ISRs are the only mode of (apparently) concurrent
  execution</li>
</ol>
<h3>Out of the box behavior</h3>
<p>By default, <code>asynchronous_state_machine&lt;&gt;</code> subclass objects are
serviced by a <code>fifo_scheduler&lt;&gt;</code> object. <code>fifo_scheduler&lt;&gt;</code>
does not lock or wait in single-threaded applications and uses boost::thread
primitives to do so in multi-threaded programs. Moreover, a <code>
fifo_scheduler&lt;&gt;</code> object can service an arbitrary number of <code>
asynchronous_state_machine&lt;&gt;</code> subclass objects. Under the hood, <code>
fifo_scheduler&lt;&gt;</code> is just a thin wrapper around an object of its <code>
FifoWorker</code> template parameter (which manages the queue and ensures
thread safety) and a <code>processor_container&lt;&gt;</code> (which manages the
lifetime of the state machines).</p>
<h3>Customization</h3>
<p>If a user needs to customize the scheduler behavior she can do so by
instantiating <code>fifo_scheduler&lt;&gt;</code> with her own class modeling the
<code>FifoWorker</code> concept. I considered a much more generic design where
locking and waiting is implemented in a policy but I have so far failed to
come up with a clean and simple interface for it. Especially the waiting is a
bit difficult to model as some platforms have condition variables, others have
events and yet others don't have any notion of waiting whatsoever (they
instead loop until a new event arrives, presumably via an ISR). Given the
relatively few lines of code required to implement a custom <code>FifoWorker</code>
type and the fact that almost all applications will implement at most one such
class, it does not seem to be worthwhile anyway.</p>
<p>Applications requiring a less or more sophisticated event processor
lifetime management can customize the behavior at a more coarse level, by
using a custom <code>Scheduler</code> type. This is currently also true for
applications requiring non-FIFO queuing schemes. However, boost::fsm will
probably provide a <code>priority_scheduler</code> in the future so that
custom schedulers need to be implemented only in rare cases.</p>
<h2><a name="User actions: Member functions vs. function objects">User
actions: Member functions vs. function objects</a></h2>
<p>All user-supplied functions (<code>react</code> member functions, entry-,
exit- and transition-actions) must be class members. The reasons for this are
as follows: </p>
<ul>
  <li>The concept of state-local storage mandates that state-entry and
  state-exit actions (mapped to constructors and destructors) are implemented
  as members.</li>
  <li><code>react</code> member functions and transition actions often access
  state-local data. So, it is most natural to implement these functions as
  members of the class the data of which the functions will operate on anyway.</li>
</ul>
<h2><a name="Speed versus scalability tradeoffs">Speed versus scalability
tradeoffs</a></h2>
<p>Quite a bit of effort has gone into making the library fast for small
simple machines <b>and</b> scaleable at the same time (this applies only to
<code>state_machine&lt;&gt;</code>, there still is some room for optimizing <code>
fifo_scheduler&lt;&gt;</code>, especially for multi-threaded builds). While I
believe it should perform reasonably in most applications, the scalability
does not come for free. Small, carefully handcrafted state machines will thus
easily outperform equivalent boost::fsm machines. To get a picture of how big
the gap is, I implemented a simple benchmark in the BitMachine example. The
Handcrafted example is a handcrafted variant of the 1-bit-BitMachine
implementing the same benchmark.</p>
<p>I tried to create a fair but somewhat unrealistic <b>worst-case</b>
scenario:</p>
<ul>
  <li>For both machines exactly one object of the only event is allocated
  before starting the test. This same object is then sent to the machines over
  and over.</li>
  <li>The Handcrafted machine employs GOF-visitor double dispatch. The states
  are preallocated so that event dispatch &amp; transition amounts to nothing more
  than two virtual calls and one pointer assignment.</li>
</ul>
<p>The Benchmarks - compiled with MSVC7.1 (single threaded), running on an
Intel Pentium M 1600 - produced the following results:</p>
<ul>
  <li>Handcrafted: 10 nanoseconds to dispatch one event and make the resulting
  transition.</li>
  <li>1-bit-BitMachine with customized memory management: 210 nanoseconds to
  dispatch one event and make the resulting transition.</li>
</ul>
<p>Although this is a big difference I still think it will not be noticeable
in most&nbsp;real-world applications. No matter whether an application uses
handcrafted or boost::fsm machines it will...</p>
<ul>
  <li>almost never run into a situation where a state machine is swamped with
  as many events as in the benchmarks. Unless a state machine is abused for
  parsing, it will typically spend a good deal of time waiting for events</li>
  <li>often run state machines in their own threads. This adds considerable
  locking and thread-switching overhead. Performance tests with the PingPong
  example, where two asynchronous state machines exchange events, gave the
  following times to process one event and perform the resulting in-state
  reaction (using the library with <code>boost::fast_pool_allocator&lt;&gt;</code>):<ul>
    <li>Single-threaded (no locking and waiting): 590ns</li>
    <li>Multi-threaded with one thread (the scheduler uses mutex locking but
    never has to wait for events): 4300ns</li>
    <li>Multi-threaded with two threads (both schedulers use mutex locking and
    exactly one always waits for an event): 7000ns</li>
  </ul>
  <p>As mentioned above there definitely is some room to improve the
  multi-threaded timings. However, a quick test has shown that the MT overhead
  will always be well over 1000ns per event. Handcrafted machines will
  inevitably have the same overhead, making raw single-threaded dispatch and
  transition speed much less important</li>
  <li>almost always allocate events with <code>new</code> and destroy them
  after consumption. This will add a few cycles, even if event memory
  management is customized</li>
  <li>often use state machines that employ orthogonal states and other
  advanced features. This forces the handcrafted machines to use a more
  adequate and more time-consuming book-keeping</li>
</ul>
<p>Therefore, in real-world applications event dispatch and transition not
normally constitutes a bottleneck and the relative gap between handcrafted and
boost::fsm machines also becomes much smaller than in the worst-case scenario.</p>
<p>BitMachine measurements with more states and with different levels of
optimization:</p>
<table border="3" width="100%" id="AutoNumber2" cellpadding="2">
  <tr>
    <td width="25%" rowspan="2"><b>Machine configuration<br>
    # states / # outgoing transitions per state</b></td>
    <td width="75%" colspan="3"><b>Event dispatch &amp; transition time
    [nanoseconds]</b></td>
  </tr>
  <tr>
    <td width="25%">Out of the box</td>
    <td width="25%">Same as out of the box but with <code>
    <a href="configuration.html#Application Defined Macros">
    BOOST_FSM_USE_NATIVE_RTTI</a></code> defined</td>
    <td width="25%">Same as out of the box but with customized memory
    management</td>
  </tr>
  <tr>
    <td width="25%">2 / 1</td>
    <td width="25%">680</td>
    <td width="25%">790</td>
    <td width="25%">210</td>
  </tr>
  <tr>
    <td width="25%">4 / 2</td>
    <td width="25%">690</td>
    <td width="25%">850</td>
    <td width="25%">210</td>
  </tr>
  <tr>
    <td width="25%">8 / 3</td>
    <td width="25%">690</td>
    <td width="25%">910</td>
    <td width="25%">220</td>
  </tr>
  <tr>
    <td width="25%">16 / 4</td>
    <td width="25%">710</td>
    <td width="25%">990</td>
    <td width="25%">230</td>
  </tr>
  <tr>
    <td width="25%">32 / 5</td>
    <td width="25%">740</td>
    <td width="25%">1090</td>
    <td width="25%">240</td>
  </tr>
  <tr>
    <td width="25%">64 / 6</td>
    <td width="25%">820</td>
    <td width="25%">1250</td>
    <td width="25%">310</td>
  </tr>
</table>
<h3>Possible optimizations</h3>
<p>Currently, <code>std::list&lt;&gt;</code>s are used for event and state storage.
These could be replaced with an intrusive linked list container what would
eliminate 50% of the <code>operator new()</code> and <code>operator delete()</code>
calls made during an event dispatch &amp; transition cycle of the smallest
BitMachine. I would guess that this could speed it up by 25%-50%. However,
dispatch time is not affected and can quickly consume considerable time, as
the 6-bit-BitMachine shows. Moreover, most states of real-world machines are
quite deeply nested and the average transition involves the deallocation and
allocation of 2 states. Since <code>std::list&lt;&gt;</code> allocations occur only
once per transition and orthogonal region, the relative performance gain of
this optimization becomes much smaller for typical machines and does not seem
to be worth the effort of hand-crafting an intrusive linked list.</p>
<h2><a name="Memory management customization">Memory management customization</a></h2>
<p>Out of the box, all internal data is allocated on the normal heap. This
should be satisfactory for applications where all the following prerequisites
are met:</p>
<ul>
  <li>There are no deterministic reaction time (hard real-time) requirements.</li>
  <li>The application will typically not process more than a handful of events
  per second. This is just a general guideline, some platforms can easily cope
  with more than 100000 events per second (see timings above).</li>
  <li>The application will never run long enough for heap fragmentation to
  become a problem. This is of course an issue for all long running programs
  not only the ones employing this library. However, it should be noted that
  fragmentation problems could show up earlier than with traditional FSM
  frameworks.</li>
</ul>
<p>Should a system not meet any of these prerequisites customization of all
memory management (not just boost::fsm's) should be considered, which is
supported as follows:</p>
<ul>
  <li>By passing a class offering a <code>std::allocator&lt;&gt;</code> interface
  for the <code>Allocator</code> parameter of the <code>state_machine</code>
  class template. The <code>rebind</code> member template is used to customize
  memory allocation of the internal containers.</li>
  <li>By replacing the <code>simple_state</code>, <code>state</code> and <code>
  event</code> class templates with ones that have a customized <code>operator
  new()</code> and <code>operator delete()</code>. This can be as easy as
  inheriting your customized class templates from the framework-supplied class
  templates <b>and</b> your preferred small-object/deterministic/constant-time
  allocator base class.</li>
</ul>
<p><code>simple_state&lt;&gt;</code> and <code>state&lt;&gt;</code> subclass objects are
constructed and destructed only by the state machine. It would therefore be
possible to use the <code>state_machine&lt;&gt;</code> allocator instead of forcing
the user to overload <code>operator new()</code> and <code>operator delete()</code>.
However, a lot of systems employ at most one instance of a particular state
machine, which means that a) there is at most one object of a particular state
and b) this object is always constructed, accessed and destructed by one and
the same thread. We can exploit these facts in a much simpler (and faster)
<code>new</code>/<code>delete</code> implementation (for example, see
UniqueObject.hpp in the BitMachine example). However, this is only possible as
long as we have the freedom to customize memory management for state classes
separately.</p>
<h2><a name="RTTI customization">RTTI customization</a></h2>
<p>RTTI is used for event dispatch and <code>state_downcast&lt;&gt;()</code>.
Currently, there are exactly two options:</p>
<ol>
  <li>By default, a speed-optimized internal implementation is employed</li>
  <li>The library can be instructed to use native C++ RTTI instead by defining
  <code><a href="configuration.html#Application Defined Macros">
  BOOST_FSM_USE_NATIVE_RTTI</a></code></li>
</ol>
<p>Just about the only reason to favor 2 is the fact that state and event
objects need to store one pointer less, meaning that in the best case the
memory footprint of a state machine object could shrink by 15%. However, on
most platforms executable size grows when C++ RTTI is turned on. So, given the
small per machine object savings, option 2 only makes sense in applications
where both of the following conditions hold:</p>
<ul>
  <li>So few events are processed that event dispatch will never become a
  bottleneck</li>
  <li>There is a need to reduce the memory allocated at runtime (at the cost
  of a larger executable)</li>
</ul>
<p>Obvious candidates are embedded systems where the executable resides in
ROM. Other candidates are applications running a large number of identical
state machines where this measure could even reduce the <b>overall</b> memory
footprint.</p>
<h2><a name="Double dispatch">Double dispatch</a></h2>
<p>At the heart of every state machine lies an implementation of double
dispatch. This is due to the fact that the incoming event <b>and</b> the
active state define exactly which <a href="definitions.html#Reaction">reaction</a>
the state machine will produce. For each event dispatch, one virtual call is
followed by a linear search for the appropriate reaction, using one RTTI
comparison per reaction. The following alternatives were considered but
rejected:</p>
<ul>
  <li><a href="http://www.objectmentor.com/resources/articles/acv.pdf">Acyclic
  visitor</a>: This double-dispatch variant satisfies all scalability
  requirements but performs badly due to costly inheritance tree cross-casts.
  Moreover, a state must store one v-pointer for <b>each</b> reaction what
  slows down construction and makes memory management customization
  inefficient. In addition, C++ RTTI must inevitably be turned on, with
  negative effects on executable size. boost::fsm originally employed acyclic
  visitor and was about 4 times slower than it is now (MSVC7.1 on Intel
  Pentium M). The dispatch speed might be better on other platforms but the
  other negative effects will remain.</li>
  <li>
  <a href="http://www.isbiel.ch/~due/courses/c355/slides/patterns/visitor.pdf">
  GOF Visitor</a>: The GOF Visitor pattern inevitably makes the whole machine
  depend upon all events. That is, whenever a new event is added there is no
  way around recompiling the whole state machine. This is contrary to the
  scalability requirements.</li>
  <li>Two-dimensional array of function pointers: To satisfy requirement 6, it
  should be possible to spread a single state machine over several translation
  units. This however means that the dispatch table must be filled at runtime
  and the different translation units must somehow make themselves &quot;known&quot;, so
  that their part of the state machine can be added to the table. There simply
  is no way to do this automatically <b>and</b> portably. The only portable
  way that a state machine distributed over several translation units could
  employ table-based double dispatch relies on the user. The programmer(s)
  would somehow have to <b>manually</b> tie together the various pieces of the
  state machine. Not only does this scale badly but is also quite error-prone.</li>
</ul>
<h2><a name="Resource usage">Resource usage</a></h2>
<h3>Memory</h3>
<p>On a 32-bit box, one empty active state typically needs less than 50 bytes
of memory. Even <b>very</b> complex machines will usually have less than 20
simultaneously active states so just about every machine should run with less
than one kilobyte of memory (not counting event queues). Obviously, the
per-machine memory footprint is offset by whatever state-local members the
user adds.</p>
<h3>Processor cycles</h3>
<p>The following ranking should give a rough picture of what feature will
consume how many cycles:</p>
<ol>
  <li><code>state_cast&lt;&gt;()</code>: By far the most cycle-consuming feature.
  Searches linearly for a suitable state, using one <code>dynamic_cast</code>
  per visited state.</li>
  <li>State entry and exit: Profiling of the fully optimized 1-bit-BitMachine
  suggested that about 100ns of the 210ns total are spent destructing the
  exited state and constructing the entered state. Obviously, transitions
  where the <a href="definitions.html#Innermost common outer state">innermost
  common outer state</a> is &quot;far&quot; from the leaf states and/or with lots of
  orthogonal states can easily cause the destruction and construction of quite
  a few states leading to significant amounts of time spent for a transition.</li>
  <li><code>state_downcast&lt;&gt;()</code>: Searches linearly for the requested
  state, using one virtual call and one RTTI comparison per visited state.</li>
  <li>History: For a state containing a history pseudo state a binary search
  through the (usually small) history map must be performed on each entry and
  exit. History slot allocation is performed exactly once, before first entry.</li>
  <li>Event dispatch: One virtual call followed by a linear search for a
  suitable <a href="definitions.html#Reaction">reaction</a>, using one RTTI
  comparison per visited reaction.</li>
  <li>Orthogonal states: One additional virtual call for each exited state <b>
  if</b> there is more than one active leaf state before a transition. It
  should also be noted that the worst-case event dispatch time is multiplied
  in the presence of orthogonal states. For example, if two orthogonal leaf
  states are added to a given state configuration, the worst-case time is
  tripled.</li>
</ol>
<h2><a name="Limitations">Limitations</a></h2>
<h4>Deferring and posting events</h4>
<p>For performance reasons and because synchronous state machines often do not
need to queue events, it is possible to operate such machines entirely with
stack-allocated events. However, as soon as events need to be deferred and/or
posted there is no way around queuing and allocation with <code>new</code>.
The interface of <code>simple_state&lt;&gt;::post_event</code> enforces the use of
<code>boost::intrusive_ptr&lt;&gt;</code> at compile time. But there is no way to do
the same for deferred events because allocation and deferral happen in
completely unrelated places. Of course, a &quot;wrongly&quot; allocated event could
easily be transformed into one allocated with <code>new</code> and pointed to
by <code>boost::intrusive_ptr&lt;&gt;</code> with a virtual <code>clone()</code>
function. However, in my experience, event deferral is needed only very rarely
in synchronous state machines and the asynchronous variant enforces the use of
<code>boost::intrusive_ptr&lt;&gt;</code> anyway. So, most users won't run into this
limitation and I rejected the <code>clone()</code> idea because it could cause
inefficiencies casual users wouldn't be aware of. In addition, users not
needing event deferral would nevertheless pay with increased code size.</p>
<h4>Junction points</h4>
<p>UML junction points are not supported because arbitrarily complex guard
expressions can easily be implemented with <code>custom_reaction&lt;&gt;</code>s.</p>
<h4>Dynamic choice points</h4>
<p>Currently there is no direct support for this UML element because its
behavior can often be implemented with <code>custom_reaction&lt;&gt;</code>s. In
rare cases this is not possible, namely when a choice point happens to be the
initial state. Then, the behavior can easily be implemented as follows:</p>
<pre>struct make_choice : fsm::event&lt; make_choice &gt; {};

// universal choice point base class template
template&lt; class MostDerived, class Context &gt;
struct choice_point : fsm::state&lt; MostDerived, Context,
  fsm::custom_reaction&lt; make_choice &gt; &gt;
{
  typedef fsm::state&lt; MostDerived, Context,
    fsm::custom_reaction&lt; make_choice &gt; &gt; base_type;
  typedef typename base_type::my_context my_context;
  typedef choice_point my_base;

  choice_point( my_context ctx ) : base_type( ctx )
  {
    base_type::post_event(
      boost::intrusive_ptr&lt; make_choice &gt;( new make_choice() ) );
  }
};

// ...

struct MyChoicePoint;
struct Machine : fsm::state_machine&lt; Machine, MyChoicePoint &gt; {};

struct Destination1;
struct Destination2;
struct Destination3;
struct MyChoicePoint : choice_point&lt; MyChoicePoint, Machine &gt;
{
  MyChoicePoint( my_context ctx ) : my_base( ctx ) {}

  fsm::result react( const make_choice &amp; )
  {
    if ( /* ... */ )
    {
      return transit&lt; Destination1 &gt;();
    }
    else if ( /* ... */ )
    {
      return transit&lt; Destination2 &gt;();
    }
    else
    {
      return transit&lt; Destination3 &gt;();
    }
  }
};</pre>
<p><code>choice_point&lt;&gt;</code> is not currently part of boost::fsm, mainly
because I fear that beginners could use it in places where they would be
better off with <code>custom_reaction&lt;&gt;</code>. If the demand is high enough I
will add it to the library.</p>
<h4>Deep history of orthogonal regions</h4>
<p>Deep history of states with orthogonal regions is currently not supported:</p>
<p><img border="0" src="DeepHistoryLimitation1.gif" width="331" height="346"></p>
<p>Attempts to implement this state chart will lead to a compile-time error
because B has orthogonal regions and its direct or indirect outer state
contains a deep history pseudo state. In other words, a state containing a
deep history pseudo state must not have any direct or indirect inner states
which themselves have orthogonal regions. This limitation stems from the fact
that full deep history support would be more complicated to implement and
would consume more resources than the currently implemented limited deep
history support. Moreover, full deep history behavior can easily be
implemented with shallow history:</p>
<p><img border="0" src="DeepHistoryLimitation2.gif" width="332" height="347"></p>
<p>Of course, this only works if C, D, E or any of their direct or indirect
inner states do not have orthogonal regions. If not so then this pattern has
to be applied recursively.</p>
<h4>Synchronization (join and fork) bars</h4>
<p><img border="0" src="JoinAndFork.gif" width="541" height="301"></p>
<p>Synchronization bars are not supported, that is, a transition always
originates at exactly one state and always ends at exactly one state. In my
experience join bars are sometimes useful but their behavior can easily be
emulated with guards. Fork bars are needed only rarely. Their support would
complicate the implementation quite a bit.</p>
<h4>Event dispatch to orthogonal regions</h4>
<p>The boost::fsm event dispatch algorithm is different to the one specified
in
<a href="http://www.wisdom.weizmann.ac.il/~dharel/SCANNED.PAPERS/Statecharts.pdf">
David Harel's original paper</a> and in the
<a href="http://www.omg.org/cgi-bin/doc?formal/03-03-01">UML standard</a>.
Both mandate that each event is dispatched to all orthogonal regions of a
state machine. Example:</p>
<p><img border="0" src="EventDispatch.gif" width="436" height="211"></p>
<p>Here the Harel/UML dispatch algorithm specifies that the machine must
transition from (B,D) to (C,E) when an EvX event is processed. Because of the
subtleties that Harel describes in chapter 7 of
<a href="http://www.wisdom.weizmann.ac.il/~dharel/SCANNED.PAPERS/Statecharts.pdf">
his paper</a>, an implementation of this algorithm is not only quite complex
but also much slower than the simplified version employed by boost::fsm, which
stops searching for <a href="definitions.html#Reaction">reactions</a> as soon
as it has found one suitable for the current event. That is, had the example
been implemented with this library, the machine would have transitioned
non-deterministically from (B,D) to either (C,D) or (B,E). This version was
chosen because, in my experience, in real-world machines different orthogonal
regions often do not specify transitions for the same events. For the rare
cases when they do, the UML behavior can easily be emulated as follows:</p>
<p><img border="0" src="SimpleEventDispatch.gif" width="466" height="226"></p>
<h4>Transitions across orthogonal regions</h4>
<p>
<img border="0" src="TransitionsAcrossOrthogonalRegions.gif" width="226" height="271"></p>
<p>Such transitions are currently flagged with an error at compile time (the
UML specifications explicitly allow them while Harel does not mention them at
all). I decided to not support them because I have erroneously tried to
implement such a transition several times but have never come across a
situation where it would make any sense. If you need to make such transitions,
please do let me know!</p>
<hr>
<p>Revised
<!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->16 March, 2004<!--webbot bot="Timestamp" endspan i-checksum="28873" --></p>
<p><i>Copyright <20> <a href="mailto:ah2003@gmx.net">Andreas Huber D<>nni</a>
2003-2004. Use, modification and distribution are subject to the Boost
Software License, Version 1.0. (See accompanying file
<a href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or copy at
<a href="http://www.boost.org/LICENSE_1_0.txt">
http://www.boost.org/LICENSE_1_0.txt</a>)</i></p>

</body>

</html>