numeric_conversion/doc/definitions.qbk
2009-02-23 18:39:32 +00:00

551 lines
22 KiB
Plaintext

[/
Boost.Optional
Copyright (c) 2003-2007 Fernando Luis Cacciola Carballal
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at
http://www.boost.org/LICENSE_1_0.txt)
]
[section Definitions]
[section Introduction]
This section provides definitions of terms used in the Numeric Conversion library.
[blurb [*Notation]
[_underlined text] denotes terms defined in the C++ standard.
[*bold face] denotes terms defined here but not in the standard.
]
[endsect]
[section Types and Values]
As defined by the [_C++ Object Model] (§1.7) the [_storage] or memory on which a
C++ program runs is a contiguous sequence of [_bytes] where each byte is a
contiguous sequence of bits.
An [_object] is a region of storage (§1.8) and has a type (§3.9).
A [_type] is a discrete set of values.
An object of type `T` has an [_object representation] which is the sequence of
bytes stored in the object (§3.9/4)
An object of type `T` has a [_value representation] which is the set of
bits that determine the ['value] of an object of that type (§3.9/4).
For [_POD] types (§3.9/10), this bitset is given by the object representation,
but not all the bits in the storage need to participate in the value
representation (except for character types): for example, some bits might
be used for padding or there may be trap-bits.
__SPACE__
The [*typed value] that is held by an object is the value which is determined
by its value representation.
An [*abstract value] (untyped) is the conceptual information that is
represented in a type (i.e. the number π).
The [*intrinsic value] of an object is the binary value of the sequence of
unsigned characters which form its object representation.
__SPACE__
['Abstract] values can be [*represented] in a given type.
To [*represent] an abstract value `V` in a type `T` is to obtain a typed value
`v` which corresponds to the abstract value `V`.
The operation is denoted using the `rep()` operator, as in: `v=rep(V)`.
`v` is the [*representation] of `V` in the type `T`.
For example, the abstract value π can be represented in the type
`double` as the `double value M_PI` and in the type `int` as the
`int value 3`
__SPACE__
Conversely, ['typed values] can be [*abstracted].
To [*abstract] a typed value `v` of type `T` is to obtain the abstract value `V`
whose representation in `T` is `v`.
The operation is denoted using the `abt()` operator, as in: `V=abt(v)`.
`V` is the [*abstraction] of `v` of type `T`.
Abstraction is just an abstract operation (you can't do it); but it is
defined nevertheless because it will be used to give the definitions in the
rest of this document.
[endsect]
[section C++ Arithmetic Types]
The C++ language defines [_fundamental types] (§3.9.1). The following subsets of
the fundamental types are intended to represent ['numbers]:
[variablelist
[[[_signed integer types] (§3.9.1/2):][
`{signed char, signed short int, signed int, signed long int}`
Can be used to represent general integer numbers (both negative and positive).
]]
[[[_unsigned integer types] (§3.9.1/3):][
`{unsigned char, unsigned short int, unsigned int, unsigned long int}`
Can be used to represent positive integer numbers with modulo-arithmetic.
]]
[[[_floating-point types] (§3.9.1/8):][
`{float,double,long double}`
Can be used to represent real numbers.
]]
[[[_integral or integer types] (§3.9.1/7):][
`{{signed integers},{unsigned integers}, bool, char and wchar_t}`
]]
[[[_arithmetic types] (§3.9.1/8):][
`{{integer types},{floating types}}`
]]
]
The integer types are required to have a ['binary] value representation.
Additionally, the signed/unsigned integer types of the same base type
(`short`, `int` or `long`) are required to have the same value representation,
that is:
int i = -3 ; // suppose value representation is: 10011 (sign bit + 4 magnitude bits)
unsigned int u = i ; // u is required to have the same 10011 as its value representation.
In other words, the integer types signed/unsigned X use the same value
representation but a different ['interpretation] of it; that is, their
['typed values] might differ.
Another consequence of this is that the range for signed X is always a smaller
subset of the range of unsigned X, as required by §3.9.1/3.
[note
Always remember that unsigned types, unlike signed types, have modulo-arithmetic;
that is, they do not overflow.
This means that:
[*-] Always be extra careful when mixing signed/unsigned types
[*-] Use unsigned types only when you need modulo arithmetic or very very large
numbers. Don't use unsigned types just because you intend to deal with
positive values only (you can do this with signed types as well).
]
[endsect]
[section Numeric Types]
This section introduces the following definitions intended to integrate
arithmetic types with user-defined types which behave like numbers.
Some definitions are purposely broad in order to include a vast variety of
user-defined number types.
Within this library, the term ['number] refers to an abstract numeric value.
A type is [*numeric] if:
* It is an arithmetic type, or,
* It is a user-defined type which
* Represents numeric abstract values (i.e. numbers).
* Can be converted (either implicitly or explicitly) to/from at least one arithmetic type.
* Has [link boost_numericconversion.definitions.range_and_precision range] (possibly unbounded)
and [link boost_numericconversion.definitions.range_and_precision precision] (possibly dynamic or
unlimited).
* Provides an specialization of `std::numeric_limits`.
A numeric type is [*signed] if the abstract values it represent include negative numbers.
A numeric type is [*unsigned] if the abstract values it represent exclude negative numbers.
A numeric type is [*modulo] if it has modulo-arithmetic (does not overflow).
A numeric type is [*integer] if the abstract values it represent are whole numbers.
A numeric type is [*floating] if the abstract values it represent are real numbers.
An [*arithmetic value] is the typed value of an arithmetic type
A [*numeric value] is the typed value of a numeric type
These definitions simply generalize the standard notions of arithmetic types and
values by introducing a superset called [_numeric]. All arithmetic types and values are
numeric types and values, but not vice versa, since user-defined numeric types are not
arithmetic types.
The following examples clarify the differences between arithmetic and numeric
types (and values):
// A numeric type which is not an arithmetic type (is user-defined)
// and which is intended to represent integer numbers (i.e., an 'integer' numeric type)
class MyInt
{
MyInt ( long long v ) ;
long long to_builtin();
} ;
namespace std {
template<> numeric_limits<MyInt> { ... } ;
}
// A 'floating' numeric type (double) which is also an arithmetic type (built-in),
// with a float numeric value.
double pi = M_PI ;
// A 'floating' numeric type with a whole numeric value.
// NOTE: numeric values are typed valued, hence, they are, for instance,
// integer or floating, despite the value itself being whole or including
// a fractional part.
double two = 2.0 ;
// An integer numeric type with an integer numeric value.
MyInt i(1234);
[endsect]
[section Range and Precision]
Given a number set `N`, some of its elements are representable in a numeric type `T`.
The set of representable values of type `T`, or numeric set of `T`, is a set of numeric
values whose elements are the representation of some subset of `N`.
For example, the interval of `int` values `[INT_MIN,INT_MAX]` is the set of representable
values of type `int`, i.e. the `int` numeric set, and corresponds to the representation
of the elements of the interval of abstract values `[abt(INT_MIN),abt(INT_MAX)]` from
the integer numbers.
Similarly, the interval of `double` values `[-DBL_MAX,DBL_MAX]` is the `double`
numeric set, which corresponds to the subset of the real numbers from `abt(-DBL_MAX)` to
`abt(DBL_MAX)`.
__SPACE__
Let [*`next(x)`] denote the lowest numeric value greater than x.
Let [*`prev(x)`] denote the highest numeric value lower then x.
Let [*`v=prev(next(V))`] and [*`v=next(prev(V))`] be identities that relate a numeric
typed value `v` with a number `V`.
An ordered pair of numeric values `x`,`y` s.t. `x<y` are [*consecutive] iff `next(x)==y`.
The abstract distance between consecutive numeric values is usually referred to as a
[_Unit in the Last Place], or [*ulp] for short. A ulp is a quantity whose abstract
magnitude is relative to the numeric values it corresponds to: If the numeric set
is not evenly distributed, that is, if the abstract distance between consecutive
numeric values varies along the set -as is the case with the floating-point types-,
the magnitude of 1ulp after the numeric value `x` might be (usually is) different
from the magnitude of a 1ulp after the numeric value y for `x!=y`.
Since numbers are inherently ordered, a [*numeric set] of type `T` is an ordered sequence
of numeric values (of type `T`) of the form:
REP(T)={l,next(l),next(next(l)),...,prev(prev(h)),prev(h),h}
where `l` and `h` are respectively the lowest and highest values of type `T`, called
the boundary values of type `T`.
__SPACE__
A numeric set is discrete. It has a [*size] which is the number of numeric values in the set,
a [*width] which is the abstract difference between the highest and lowest boundary values:
`[abt(h)-abt(l)]`, and a [*density] which is the relation between its size and width:
`density=size/width`.
The integer types have density 1, which means that there are no unrepresentable integer
numbers between `abt(l)` and `abt(h)` (i.e. there are no gaps). On the other hand,
floating types have density much smaller than 1, which means that there are real numbers
unrepresented between consecutive floating values (i.e. there are gaps).
__SPACE__
The interval of [_abstract values] `[abt(l),abt(h)]` is the range of the type `T`,
denoted `R(T)`.
A range is a set of abstract values and not a set of numeric values. In other
documents, such as the C++ standard, the word `range` is ['sometimes] used as synonym
for `numeric set`, that is, as the ordered sequence of numeric values from `l` to `h`.
In this document, however, a range is an abstract interval which subtends the
numeric set.
For example, the sequence `[-DBL_MAX,DBL_MAX]` is the numeric set of the type
`double`, and the real interval `[abt(-DBL_MAX),abt(DBL_MAX)]` is its range.
Notice, for instance, that the range of a floating-point type is ['continuous]
unlike its numeric set.
This definition was chosen because:
* [*(a)] The discrete set of numeric values is already given by the numeric set.
* [*(b)] Abstract intervals are easier to compare and overlap since only boundary
values need to be considered.
This definition allows for a concise definition of `subranged` as given in the last section.
The width of a numeric set, as defined, is exactly equivalent to the width of a range.
__SPACE__
The [*precision] of a type is given by the width or density of the numeric set.
For integer types, which have density 1, the precision is conceptually equivalent
to the range and is determined by the number of bits used in the value representation:
The higher the number of bits the bigger the size of the numeric set, the wider the
range, and the higher the precision.
For floating types, which have density <<1, the precision is given not by the width
of the range but by the density. In a typical implementation, the range is determined
by the number of bits used in the exponent, and the precision by the number of bits
used in the mantissa (giving the maximum number of significant digits that can be
exactly represented). The higher the number of exponent bits the wider the range,
while the higher the number of mantissa bits, the higher the precision.
[endsect]
[section Exact, Correctly Rounded and Out-Of-Range Representations]
Given an abstract value `V` and a type `T` with its corresponding range `[abt(l),abt(h)]`:
If `V < abt(l)` or `V > abt(h)`, `V` is [*not representable] (cannot be represented) in
the type `T`, or, equivalently, it's representation in the type `T` is [*out of range],
or [*overflows].
* If `V < abt(l)`, the [*overflow is negative].
* If `V > abt(h)`, the [*overflow is positive].
If `V >= abt(l)` and `V <= abt(h)`, `V` is [*representable] (can be represented) in the
type `T`, or, equivalently, its representation in the type `T` is [*in range], or
[*does not overflow].
Notice that a numeric type, such as a C++ unsigned type, can define that any `V` does
not overflow by always representing not `V` itself but the abstract value
`U = [ V % (abt(h)+1) ]`, which is always in range.
Given an abstract value `V` represented in the type `T` as `v`, the [*roundoff] error
of the representation is the abstract difference: `(abt(v)-V)`.
Notice that a representation is an ['operation], hence, the roundoff error corresponds
to the representation operation and not to the numeric value itself
(i.e. numeric values do not have any error themselves)
* If the roundoff is 0, the representation is [*exact], and `V` is exactly representable
in the type `T`.
* If the roundoff is not 0, the representation is [*inexact], and `V` is inexactly
representable in the type `T`.
If a representation `v` in a type `T` -either exact or inexact-, is any of the adjacents
of `V` in that type, that is, if `v==prev` or `v==next`, the representation is
faithfully rounded. If the choice between `prev` and `next` matches a given
[*rounding direction], it is [*correctly rounded].
All exact representations are correctly rounded, but not all inexact representations are.
In particular, C++ requires numeric conversions (described below) and the result of
arithmetic operations (not covered by this document) to be correctly rounded, but
batch operations propagate roundoff, thus final results are usually incorrectly
rounded, that is, the numeric value `r` which is the computed result is neither of
the adjacents of the abstract value `R` which is the theoretical result.
Because a correctly rounded representation is always one of adjacents of the abstract
value being represented, the roundoff is guaranteed to be at most 1ulp.
The following examples summarize the given definitions. Consider:
* A numeric type `Int` representing integer numbers with a
['numeric set]: `{-2,-1,0,1,2}` and
['range]: `[-2,2]`
* A numeric type `Cardinal` representing integer numbers with a
['numeric set]: `{0,1,2,3,4,5,6,7,8,9}` and
['range]: `[0,9]` (no modulo-arithmetic here)
* A numeric type `Real` representing real numbers with a
['numeric set]: `{-2.0,-1.5,-1.0,-0.5,-0.0,+0.0,+0.5,+1.0,+1.5,+2.0}` and
['range]: `[-2.0,+2.0]`
* A numeric type `Whole` representing real numbers with a
['numeric set]: `{-2.0,-1.0,0.0,+1.0,+2.0}` and
['range]: `[-2.0,+2.0]`
First, notice that the types `Real` and `Whole` both represent real numbers,
have the same range, but different precision.
* The integer number `1` (an abstract value) can be exactly represented
in any of these types.
* The integer number `-1` can be exactly represented in `Int`, `Real` and `Whole`,
but cannot be represented in `Cardinal`, yielding negative overflow.
* The real number `1.5` can be exactly represented in `Real`, and inexactly
represented in the other types.
* If `1.5` is represented as either `1` or `2` in any of the types (except `Real`),
the representation is correctly rounded.
* If `0.5` is represented as `+1.5` in the type `Real`, it is incorrectly rounded.
* `(-2.0,-1.5)` are the `Real` adjacents of any real number in the interval
`[-2.0,-1.5]`, yet there are no `Real` adjacents for `x < -2.0`, nor for `x > +2.0`.
[endsect]
[section Standard (numeric) Conversions]
The C++ language defines [_Standard Conversions] (§4) some of which are conversions
between arithmetic types.
These are [_Integral promotions] (§4.5), [_Integral conversions] (§4.7),
[_Floating point promotions] (§4.6), [_Floating point conversions] (§4.8) and
[_Floating-integral conversions] (§4.9).
In the sequel, integral and floating point promotions are called [*arithmetic promotions],
and these plus integral, floating-point and floating-integral conversions are called
[*arithmetic conversions] (i.e, promotions are conversions).
Promotions, both Integral and Floating point, are ['value-preserving], which means that
the typed value is not changed with the conversion.
In the sequel, consider a source typed value `s` of type `S`, the source abstract
value `N=abt(s)`, a destination type `T`; and whenever possible, a result typed value
`t` of type `T`.
Integer to integer conversions are always defined:
* If `T` is unsigned, the abstract value which is effectively represented is not
`N` but `M=[ N % ( abt(h) + 1 ) ]`, where `h` is the highest unsigned typed
value of type `T`.
* If `T` is signed and `N` is not directly representable, the result `t` is
[_implementation-defined], which means that the C++ implementation is required to
produce a value `t` even if it is totally unrelated to `s`.
Floating to Floating conversions are defined only if `N` is representable;
if it is not, the conversion has [_undefined behavior].
* If `N` is exactly representable, `t` is required to be the exact representation.
* If `N` is inexactly representable, `t` is required to be one of the two
adjacents, with an implementation-defined choice of rounding direction;
that is, the conversion is required to be correctly rounded.
Floating to Integer conversions represent not `N` but `M=trunc(N)`, were
`trunc()` is to truncate: i.e. to remove the fractional part, if any.
* If `M` is not representable in `T`, the conversion has [_undefined behavior]
(unless `T` is `bool`, see §4.12).
Integer to Floating conversions are always defined.
* If `N` is exactly representable, `t` is required to be the exact representation.
* If `N` is inexactly representable, `t` is required to be one of the
two adjacents, with an implementation-defined choice of rounding direction;
that is, the conversion is required to be correctly rounded.
[endsect]
[section Subranged Conversion Direction, Subtype and Supertype]
Given a source type `S` and a destination type `T`, there is a
[*conversion direction] denoted: `S->T`.
For any two ranges the following ['range relation] can be defined:
A range `X` can be ['entirely contained] in a range `Y`, in which case
it is said that `X` is enclosed by `Y`.
[: [*Formally:] `R(S)` is enclosed by `R(T)` iif `(R(S) intersection R(T)) == R(S)`.]
If the source type range, `R(S)`, is not enclosed in the target type range,
`R(T)`; that is, if `(R(S) & R(T)) != R(S)`, the conversion direction is said
to be [*subranged], which means that `R(S)` is not entirely contained in `R(T)`
and therefore there is some portion of the source range which falls outside
the target range. In other words, if a conversion direction `S->T` is subranged,
there are values in `S` which cannot be represented in `T` because they are
out of range.
Notice that for `S->T`, the adjective subranged applies to `T`.
Examples:
Given the following numeric types all representing real numbers:
* `X` with numeric set `{-2.0,-1.0,0.0,+1.0,+2.0}` and range `[-2.0,+2.0]`
* `Y` with numeric set `{-2.0,-1.5,-1.0,-0.5,0.0,+0.5,+1.0,+1.5,+2.0}` and range `[-2.0,+2.0]`
* `Z` with numeric set `{-1.0,0.0,+1.0}` and range `[-1.0,+1.0]`
For:
[variablelist
[[(a) X->Y:][
`R(X) & R(Y) == R(X)`, then `X->Y` is not subranged.
Thus, all values of type `X` are representable in the type `Y`.
]]
[[(b) Y->X:][
`R(Y) & R(X) == R(Y)`, then `Y->X` is not subranged.
Thus, all values of type `Y` are representable in the type `X`, but in this case,
some values are ['inexactly] representable (all the halves).
(note: it is to permit this case that a range is an interval of abstract values and
not an interval of typed values)
]]
[[(b) X->Z:][
`R(X) & R(Z) != R(X)`, then `X->Z` is subranged.
Thus, some values of type `X` are not representable in the type `Z`, they fall
out of range `(-2.0 and +2.0)`.
]]
]
It is possible that `R(S)` is not enclosed by `R(T)`, while neither is `R(T)` enclosed
by `R(S)`; for example, `UNSIG=[0,255]` is not enclosed by `SIG=[-128,127]`;
neither is `SIG` enclosed by `UNSIG`.
This implies that is possible that a conversion direction is subranged both ways.
This occurs when a mixture of signed/unsigned types are involved and indicates that
in both directions there are values which can fall out of range.
Given the range relation (subranged or not) of a conversion direction `S->T`, it
is possible to classify `S` and `T` as [*supertype] and [*subtype]:
If the conversion is subranged, which means that `T` cannot represent all possible
values of type `S`, `S` is the supertype and `T` the subtype; otherwise, `T` is the
supertype and `S` the subtype.
For example:
[: `R(float)=[-FLT_MAX,FLT_MAX]` and `R(double)=[-DBL_MAX,DBL_MAX]` ]
If `FLT_MAX < DBL_MAX`:
* `double->float` is subranged and `supertype=double`, `subtype=float`.
* `float->double` is not subranged and `supertype=double`, `subtype=float`.
Notice that while `double->float` is subranged, `float->double` is not,
which yields the same supertype,subtype for both directions.
Now consider:
[: `R(int)=[INT_MIN,INT_MAX]` and `R(unsigned int)=[0,UINT_MAX]` ]
A C++ implementation is required to have `UINT_MAX > INT_MAX` (§3.9/3), so:
* 'int->unsigned' is subranged (negative values fall out of range)
and `supertype=int`, `subtype=unsigned`.
* 'unsigned->int' is ['also] subranged (high positive values fall out of range)
and `supertype=unsigned`, `subtype=int`.
In this case, the conversion is subranged in both directions and the
supertype,subtype pairs are not invariant (under inversion of direction).
This indicates that none of the types can represent all the values of the other.
When the supertype is the same for both `S->T` and `T->S`, it is effectively
indicating a type which can represent all the values of the subtype.
Consequently, if a conversion `X->Y` is not subranged, but the opposite `(Y->X)` is,
so that the supertype is always `Y`, it is said that the direction `X->Y` is [*correctly
rounded value preserving], meaning that all such conversions are guaranteed to
produce results in range and correctly rounded (even if inexact).
For example, all integer to floating conversions are correctly rounded value preserving.
[endsect]
[endsect]