safe_numerics/doc/boostbook/tutorial.xml
2019-02-24 09:54:13 -08:00

348 lines
16 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE section PUBLIC "-//Boost//DTD BoostBook XML V1.1//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
<section id="safe_numerics.tutorial">
<title>Tutorial and Motivating Examples</title>
<section id="safe_numerics.tutorial.1">
<title>Arithmetic Expressions Can Yield Incorrect Results</title>
<para>When some operation on signed integer types results in a result
which exceeds the capacity of a data variable to hold it, the result is
undefined. In the case of unsigned integer types a similar situation
results in a value wrap as per modulo arithmetic. In either case the
result is different than in integer number arithmetic in the mathematical
sense. This is called "overflow". Since word size can differ between
machines, code which produces mathematically correct results in one set of
circumstances may fail when re-compiled on a machine with different
hardware. When this occurs, most C++ programs will continue to execute
with no indication that the results are wrong. It is the programmer's
responsibility to ensure such undefined behavior is avoided.</para>
<para>This program demonstrates this problem. The solution is to replace
instances of built in integer types with corresponding safe types.</para>
<programlisting><xi:include href="../../example/example1.cpp" parse="text"
xmlns:xi="http://www.w3.org/2001/XInclude"/></programlisting>
<screen>example 1:undetected erroneous expression evaluation
Not using safe numerics
error NOT detected!
-127 != 127 + 2
Using safe numerics
error detected:converted signed value too large: positive overflow error
Program ended with exit code: 0</screen>
</section>
<section id="safe_numerics.tutorial.2">
<title>Arithmetic Operations Can Overflow Silently</title>
<para>A variation of the above is when a value is incremented/decremented
beyond its domain.</para>
<programlisting><xi:include href="../../example/example2.cpp" parse="text"
xmlns:xi="http://www.w3.org/2001/XInclude"/></programlisting>
<screen>example 2:undetected overflow in data type
Not using safe numerics
-2147483648 != 2147483647 + 1
error NOT detected!
Using safe numerics
addition result too large
error detected!</screen>
<para>When variables of unsigned integer type are decremented below zero,
they "roll over" to the highest possible unsigned version of that integer
type. This is a common problem which is generally never detected.</para>
</section>
<section id="safe_numerics.tutorial.3">
<title>Arithmetic on Unsigned Integers Can Yield Incorrect Results</title>
<para>Subtracting two unsigned values of the same size will result in an
unsigned value. If the first operand is less than the second the result
will be arithmetically in correct. But if the size of the unsigned types
is less than that of an <code>unsigned int</code>, C/C++ will promote the
types to <code>signed int</code> before subtracting resulting in an
correct result. In either case, there is no indication of an error.
Somehow, the programmer is expected to avoid this behavior. Advice usually
takes the form of "Don't use unsigned integers for arithmetic". This is
well and good, but often not practical. C/C++ itself uses unsigned for
<code>sizeof(T)</code> which is then used by users in arithmetic.</para>
<para>This program demonstrates this problem. The solution is to replace
instances of built in integer types with corresponding safe types.</para>
<programlisting><para><xi:include href="../../example/example8.cpp"
parse="text" xmlns:xi="http://www.w3.org/2001/XInclude"/></para></programlisting>
<screen>example 8:undetected erroneous expression evaluation
Not using safe numerics
error NOT detected!
4294967171 != 2 - 127
Using safe numerics
error detected:subtraction result cannot be negative: negative overflow error
Program ended with exit code: 0</screen>
</section>
<section id="safe_numerics.tutorial.4">
<title>Implicit Conversions Can Lead to Erroneous Results</title>
<para>At CPPCon 2016 Jon Kalb gave a very entertaining (and disturbing)
<ulink url="https://www.youtube.com/watch?v=wvtFGa6XJDU">lightning
talk</ulink> related to C++ expressions.</para>
<para>The talk included a very, very simple example similar to the
following:</para>
<para><programlisting><xi:include href="../../example/example4.cpp"
parse="text" xmlns:xi="http://www.w3.org/2001/XInclude"/></programlisting><screen>example 3: implicit conversions change data values
Not using safe numerics
a is -1 b is 1
b is less than a
error NOT detected!
Using safe numerics
a is -1 b is 1
converted negative value to unsigned: domain error
error detected!
</screen></para>
<para>A normal person reads the above code and has to be dumbfounded by
this. The code doesn't do what the text - according to the rules of
algebra - says it does. But C++ doesn't follow the rules of algebra - it
has its own rules. There is generally no compile time error. You can get a
compile time warning if you set some specific compile time switches. The
explanation lies in reviewing how C++ reconciles binary expressions
(<code>a &lt; b</code> is an expression here) where operands are different
types. In processing this expression, the compiler:</para>
<para><itemizedlist>
<listitem>
<para>Determines the "best" common type for the two operands. In
this case, application of the rules in the C++ standard dictate that
this type will be an <code>unsigned int</code>.</para>
</listitem>
<listitem>
<para>Converts each operand to this common type. The signed value of
-1 is converted to an unsigned value with the same bit-wise
contents, 0xFFFFFFFF, on a machine with 32 bit integers. This
corresponds to a decimal value of 4294967295.</para>
</listitem>
<listitem>
<para>Performs the calculation - in this case it's
<code>&lt;</code>, the "less than" operation. Since 1 is less than
4294967295 the program prints "b is less than a".</para>
</listitem>
</itemizedlist></para>
<para>In order for a programmer to detect and understand this error he
should be pretty familiar with the implicit conversion rules of the C++
standard. These are available in a copy of the standard and also in the
canonical reference book <citetitle><link linkend="stroustrup">The C++
Programming Language</link></citetitle> (both are over 1200 pages long!).
Even experienced programmers won't spot this issue and know to take
precautions to avoid it. And this is a relatively easy one to spot. In the
more general case this will use integers which don't correspond to easily
recognizable numbers and/or will be buried as a part of some more complex
expression.</para>
<para>This example generated a good amount of web traffic along with
everyone's pet suggestions. See for example <ulink
url="https://bulldozer00.com/2016/10/16/the-unsigned-conundrum/">a blog
post with everyone's favorite "solution"</ulink>. All the proposed
"solutions" have disadvantages and attempts to agree on how handle this
are ultimately fruitless in spite of, or maybe because of, the <ulink
url="https://twitter.com/robertramey1/status/795742870045016065">emotional
content</ulink>. Our solution is by far the simplest: just use the safe
numerics library as shown in the example above.</para>
<para>Note that in this particular case, usage of the safe types results
in no runtime overhead in using the safe integer library. Code generated
will either equal or exceed the efficiency of using primitive integer
types.</para>
</section>
<section id="safe_numerics.tutorial.5">
<title>Mixing Data Types Can Create Subtle Errors</title>
<para>C++ contains signed and unsigned integer types. In spite of their
names, they function differently which often produces surprising results
for some operands. Program errors from this behavior can be exceedingly
difficult to find. This has lead to recommendations of various ad hoc
"rules" to avoid these problems. It's not always easy to apply these
"rules" to existing code without creating even more bugs. Here is a
typical example of this problem:</para>
<para><programlisting><xi:include href="../../example/example10.cpp"
parse="text" xmlns:xi="http://www.w3.org/2001/XInclude"/></programlisting>Here
is the output of the above program:<screen>example 4: mixing types produces surprising results
Not using safe numerics
10000
4294957296
error NOT detected!
Using safe numerics
10000
error detected!converted negative value to unsigned: domain error
</screen>This solution is simple, just replace instances of <code>int</code>
with <code>safe&lt;int&gt;</code>.</para>
</section>
<section id="safe_numerics.tutorial.6">
<title>Array Index Value Can Exceed Array Limits</title>
<para>Using an intrinsic C++ array, it's very easy to exceed array limits.
This can fail to be detected when it occurs and create bugs which are hard
to find. There are several ways to address this, but one of the simplest
would be to use safe_unsigned_range;</para>
<para><programlisting><xi:include href="../../example/example5.cpp"
parse="text" xmlns:xi="http://www.w3.org/2001/XInclude"/></programlisting><screen>example 5: array index values can exceed array bounds
Not using safe numerics
error NOT detected!
Using safe numerics
error detected:Value out of range for this safe type: domain error
</screen></para>
<para>Collections like standard arrays and vectors do array index checking
in some function calls and not in others so this may not be the best
example. However it does illustrate the usage of
<code>safe_range&lt;T&gt;</code> for assigning legal ranges to variables.
This will guarantee that under no circumstances will the variable contain
a value outside of the specified range.</para>
</section>
<section id="safe_numerics.tutorial.7">
<title>Checking of Input Values Can Be Easily Overlooked</title>
<para>It's way too easy to overlook the checking of parameters received
from outside the current program.<programlisting><xi:include
href="../../example/example6.cpp" parse="text"
xmlns:xi="http://www.w3.org/2001/XInclude"/></programlisting><screen>example 6: checking of externally produced value can be overlooked
Not using safe numerics
2147483647 0
error NOT detected!
Using safe numerics
error detected:error in file input: domain error
</screen>Without safe integer, one will have to insert new code every time an
integer variable is retrieved. This is a tedious and error prone
procedure. Here we have used program input. But in fact this problem can
occur with any externally produced input.</para>
</section>
<section id="safe_numerics.tutorial.8">
<title>Cannot Recover From Arithmetic Errors</title>
<para>If a divide by zero error occurs in a program, it's detected by
hardware. The way this manifests itself to the program can and will depend
upon</para>
<para><itemizedlist>
<listitem>
<para>data type - int, float, etc</para>
</listitem>
<listitem>
<para>setting of compile time command line switches</para>
</listitem>
<listitem>
<para>invocation of some configuration functions which convert these
hardware events into C++ exceptions</para>
</listitem>
</itemizedlist>It's not all that clear how one would detect and recover
from a divide by zero error in a simple portable way. Usually, users just
ignore the issue which usually results in immediate program termination
when this situation occurs.</para>
<para>This library will detect divide by zero errors before the operation
is invoked. Any errors of this nature are handled according to the <link
linkend="safe_numeric.exception_policies">exception_policy</link> selected
by the library user.</para>
<para><programlisting><xi:include href="../../example/example13.cpp"
parse="text" xmlns:xi="http://www.w3.org/2001/XInclude"/></programlisting><screen>example 7: cannot recover from arithmetic errors
Not using safe numerics
error NOT detectable!
Using safe numerics
error detected:divide by zero: domain error
</screen></para>
</section>
<section id="safe_numerics.tutorial.9">
<title>Compile Time Arithmetic is Not Always Correct</title>
<para>If a divide by zero error occurs while a program is being compiled,
there is not guarantee that it will be detected. This example shows a real
example compiled with a recent version of CLang.</para>
<itemizedlist>
<listitem>
<para>Source code includes a constant expression containing a simple
arithmetic error.</para>
</listitem>
<listitem>
<para>The compiler emits a warning but otherwise calculates the wrong
result.</para>
</listitem>
<listitem>
<para>Replacing int with safe&lt;int&gt; will guarantee that the error
is detected at runtime</para>
</listitem>
<listitem>
<para>Operations using safe types are marked constexpr. So we can
force the operations to occur at runtime by marking the results as
constexpr. This will result in an error at compile time if the
operations cannot be correctly calculated.</para>
</listitem>
</itemizedlist>
<para><programlisting><xi:include href="../../example/example14.cpp"
parse="text" xmlns:xi="http://www.w3.org/2001/XInclude"/></programlisting><screen>example 8: cannot detect compile time arithmetic errors
Not using safe numerics
0error NOT detected!
Using safe numerics
error detected:positive overflow error
Program ended with exit code: 0</screen></para>
</section>
<section id="safe_numerics.tutorial.10">
<title>Programming by Contract is Too Slow</title>
<para>Programming by Contract is a highly regarded technique. There has
been much written about it and it has been proposed as an addition to the
C++ language <citation><xref linkend="garcia"/></citation><citation><xref
linkend="crowl2"/></citation> It (mostly) depends upon runtime checking of
parameter and object values upon entry to and exit from every function.
This can slow the program down considerably which in turn undermines the
main motivation for using C++ in the first place! One popular scheme for
addressing this issue is to enable parameter checking only during
debugging and testing which defeats the guarantee of correctness which we
are seeking here! Programming by Contract will never be accepted by
programmers as long as it is associated with significant additional
runtime cost.</para>
<para>The Safe Numerics Library has facilities which, in many cases, can
check guaranteed parameter requirements with little or no runtime
overhead. Consider the following example:</para>
<para><programlisting><xi:include href="../../example/example7.cpp"
parse="text" xmlns:xi="http://www.w3.org/2001/XInclude"/></programlisting><screen>example 8:
enforce contracts with zero runtime cost
parameter error detected</screen></para>
<para>In the example above, the function <code>convert</code> incurs
significant runtime cost every time the function is called. By using
"safe" types, this cost is moved to the moment when the parameters are
constructed. Depending on how the program is constructed, this may totally
eliminate extraneous computations for parameter requirement type checking.
In this scenario, there is no reason to suppress the checking for release
mode and our program can be guaranteed to be always arithmetically
correct.</para>
</section>
</section>