293 lines
11 KiB
Plaintext
293 lines
11 KiB
Plaintext
[section:next_float Floating-Point Representation Distance (ULP),
|
|
and Finding Adjacent Floating-Point Values]
|
|
|
|
[@http://en.wikipedia.org/wiki/Unit_in_the_last_place Unit of Least Precision or Unit in the Last Place]
|
|
is the gap between two different, but as close as possible, floating-point numbers.
|
|
|
|
Most decimal values, for example 0.1, cannot be exactly represented as floating-point values,
|
|
but will be stored as the
|
|
[@http://en.wikipedia.org/wiki/Floating_point#Representable_numbers.2C_conversion_and_rounding
|
|
closest representable floating-point].
|
|
|
|
Functions are provided for finding adjacent greater and lesser floating-point values,
|
|
and estimating the number of gaps between any two floating-point values.
|
|
|
|
The floating-point type (FPT) must have has a fixed number of bits in the representation.
|
|
The number of bits may set at runtime, but must be the same for all numbers.
|
|
For example, __NTL_quad_float type (fixed 128-bit representation),
|
|
__NTL_RR type (arbitrary but fixed decimal digits, default 150) or
|
|
__multiprecision __cpp_dec_float and__cpp_bin_float are fixed at runtime,
|
|
but [*not] a type that extends the representation to provide an exact representation
|
|
for any number, for example [@http://keithbriggs.info/xrc.html XRC eXact Real in C].
|
|
|
|
The accuracy of mathematical functions can be assessed and displayed in terms of __ULP,
|
|
often as a ulps plot or by binning the differences as a histogram.
|
|
Samples are evaluated using the implementation under test and compared with 'known good'
|
|
representation obtained using a more accurate method. Other implementations, often using
|
|
arbitrary precision arithmetic, for example __WolframAlpha are one source of references
|
|
values. The other method, used widely in Boost.Math special functions, it to carry out
|
|
the same algorithm, but using a higher precision type, typically using Boost.Multiprecision
|
|
types like `cpp_bin_float_quad` for 128-bit (about 35 decimal digit precision), or
|
|
`cpp_bin_float_50` (for 50 decimal digit precision).
|
|
|
|
When converted to a particular machine representation, say `double`, say using a `static_cast`,
|
|
the value is the nearest representation possible for the `double` type. This value
|
|
cannot be 'wrong' by more than half a __ulp, and can be obtained using the Boost.Math function `ulp`.
|
|
(Unless the algorithm is fundamentally flawed, something that should be revealed by 'sanity'
|
|
checks using some independent sources).
|
|
|
|
See some discussion and example plots by Cleve Moler of Mathworks
|
|
[@https://blogs.mathworks.com/cleve/2017/01/23/ulps-plots-reveal-math-function-accurary/
|
|
ulps plots reveal math-function accuracy].
|
|
|
|
[section:nextafter Finding the Next Representable Value in a Specific Direction (nextafter)]
|
|
|
|
[h4 Synopsis]
|
|
|
|
``
|
|
#include <boost/math/special_functions/next.hpp>
|
|
``
|
|
|
|
namespace boost{ namespace math{
|
|
|
|
template <class FPT>
|
|
FPT nextafter(FPT val, FPT direction);
|
|
|
|
}} // namespaces
|
|
|
|
[h4 Description - nextafter]
|
|
|
|
This is an implementation of the `nextafter` function included in the C99 standard.
|
|
(It is also effectively an implementation of the C99 `nexttoward` legacy function
|
|
which differs only having a `long double` direction,
|
|
and can generally serve in its place if required).
|
|
|
|
[note The C99 functions must use suffixes f and l to distinguish `float` and `long double` versions.
|
|
C++ uses the template mechanism instead.]
|
|
|
|
Returns the next representable value after /x/ in the direction of /y/. If
|
|
`x == y` then returns /x/. If /x/ is non-finite then returns the result of
|
|
a __domain_error. If there is no such value in the direction of /y/ then
|
|
returns an __overflow_error.
|
|
|
|
[warning The template parameter FTP must be a floating-point type.
|
|
An integer type, for example, will produce an unhelpful error message.]
|
|
|
|
[tip Nearly always, you just want the next or prior representable value,
|
|
so instead use `float_next` or `float_prior` below.]
|
|
|
|
[h4 Examples - nextafter]
|
|
|
|
The two representations using a 32-bit float either side of unity are:
|
|
``
|
|
The nearest (exact) representation of 1.F is 1.00000000
|
|
nextafter(1.F, 999) is 1.00000012
|
|
nextafter(1/f, -999) is 0.99999994
|
|
|
|
The nearest (not exact) representation of 0.1F is 0.100000001
|
|
nextafter(0.1F, 10) is 0.100000009
|
|
nextafter(0.1F, 10) is 0.099999994
|
|
``
|
|
|
|
[endsect] [/section:nextafter Finding the Next Representable Value in a Specific Direction (nextafter)]
|
|
|
|
[section:float_next Finding the Next Greater Representable Value (float_next)]
|
|
|
|
[h4 Synopsis]
|
|
|
|
``
|
|
#include <boost/math/special_functions/next.hpp>
|
|
``
|
|
|
|
namespace boost{ namespace math{
|
|
|
|
template <class FPT>
|
|
FPT float_next(FPT val);
|
|
|
|
}} // namespaces
|
|
|
|
[h4 Description - float_next]
|
|
|
|
Returns the next representable value which is greater than /x/.
|
|
If /x/ is non-finite then returns the result of
|
|
a __domain_error. If there is no such value greater than /x/ then
|
|
returns an __overflow_error.
|
|
|
|
Has the same effect as
|
|
|
|
nextafter(val, (std::numeric_limits<FPT>::max)());
|
|
|
|
[endsect] [/section:float_next Finding the Next Greater Representable Value (float_prior)]
|
|
|
|
[section:float_prior Finding the Next Smaller Representable Value (float_prior)]
|
|
|
|
[h4 Synopsis]
|
|
|
|
``
|
|
#include <boost/math/special_functions/next.hpp>
|
|
``
|
|
|
|
namespace boost{ namespace math{
|
|
|
|
template <class FPT>
|
|
FPT float_prior(FPT val);
|
|
|
|
}} // namespaces
|
|
|
|
|
|
[h4 Description - float_prior]
|
|
|
|
Returns the next representable value which is less than /x/.
|
|
If /x/ is non-finite then returns the result of
|
|
a __domain_error. If there is no such value less than /x/ then
|
|
returns an __overflow_error.
|
|
|
|
Has the same effect as
|
|
|
|
nextafter(val, -(std::numeric_limits<FPT>::max)()); // Note most negative value -max.
|
|
|
|
[endsect] [/section:float_prior Finding the Next Smaller Representable Value (float_prior)]
|
|
|
|
[section:float_distance Calculating the Representation Distance
|
|
Between Two floating-point Values (ULP) float_distance]
|
|
|
|
Function float_distance finds the number of gaps/bits/ULP between any two floating-point values.
|
|
If the significands of floating-point numbers are viewed as integers,
|
|
then their difference is the number of ULP/gaps/bits different.
|
|
|
|
[h4 Synopsis]
|
|
|
|
``
|
|
#include <boost/math/special_functions/next.hpp>
|
|
``
|
|
|
|
namespace boost{ namespace math{
|
|
|
|
template <class FPT>
|
|
FPT float_distance(FPT a, FPT b);
|
|
|
|
}} // namespaces
|
|
|
|
[h4 Description - float_distance]
|
|
|
|
Returns the distance between /a/ and /b/: the result is always
|
|
a signed integer value (stored in floating-point type FPT)
|
|
representing the number of distinct representations between /a/ and /b/.
|
|
|
|
Note that
|
|
|
|
* `float_distance(a, a)` always returns 0.
|
|
* `float_distance(float_next(a), a)` always returns -1.
|
|
* `float_distance(float_prior(a), a)` always returns 1.
|
|
|
|
The function `float_distance` is equivalent to calculating the number
|
|
of ULP (Units in the Last Place) between /a/ and /b/ except that it
|
|
returns a signed value indicating whether `a > b` or not.
|
|
|
|
If the distance is too great then it may not be able
|
|
to be represented as an exact integer by type FPT,
|
|
but in practice this is unlikely to be a issue.
|
|
|
|
[endsect] [/section:float_distance Calculating the Representation Distance
|
|
Between Two floating-point Values (ULP) float_distance]
|
|
|
|
[section:float_advance Advancing a floating-point Value by a Specific
|
|
Representation Distance (ULP) float_advance]
|
|
|
|
Function `float_advance` advances a floating-point number by a specified number
|
|
of ULP.
|
|
|
|
[h4 Synopsis]
|
|
|
|
``
|
|
#include <boost/math/special_functions/next.hpp>
|
|
``
|
|
|
|
namespace boost{ namespace math{
|
|
|
|
template <class FPT>
|
|
FPT float_advance(FPT val, int distance);
|
|
|
|
}} // namespaces
|
|
|
|
[h4 Description - float_advance]
|
|
|
|
Returns a floating-point number /r/ such that `float_distance(val, r) == distance`.
|
|
|
|
[endsect] [/section:float_advance]
|
|
|
|
[section:ulp Obtaining the Size of a Unit In the Last Place - ULP]
|
|
|
|
Function `ulp` gives the size of a unit-in-the-last-place for a specified floating-point value.
|
|
|
|
[h4 Synopsis]
|
|
|
|
``
|
|
#include <boost/math/special_functions/ulp.hpp>
|
|
``
|
|
|
|
namespace boost{ namespace math{
|
|
|
|
template <class FPT>
|
|
FPT ulp(const FPT& x);
|
|
|
|
template <class FPT, class Policy>
|
|
FPT ulp(const FPT& x, const Policy&);
|
|
|
|
}} // namespaces
|
|
|
|
[h4 Description - ulp]
|
|
|
|
Returns one [@http://en.wikipedia.org/wiki/Unit_in_the_last_place unit in the last place] of ['x].
|
|
|
|
Corner cases are handled as follows:
|
|
|
|
* If the argument is a NaN, then raises a __domain_error.
|
|
* If the argument is an infinity, then raises an __overflow_error.
|
|
* If the argument is zero then returns the smallest representable value: for example for type
|
|
`double` this would be either `std::numeric_limits<double>::min()` or `std::numeric_limits<double>::denorm_min()`
|
|
depending whether denormals are supported (which have the values `2.2250738585072014e-308` and `4.9406564584124654e-324` respectively).
|
|
* If the result is too small to represent, then returns the smallest representable value.
|
|
* Always returns a positive value such that `ulp(x) == ulp(-x)`.
|
|
|
|
[*Important:] The behavior of this function is aligned to that of [@http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#ulp%28double%29
|
|
Java's ulp function], please note
|
|
however that this function should only ever be used for rough and ready calculations as there are enough
|
|
corner cases to trap even careful programmers. In particular:
|
|
|
|
* The function is asymetrical, which is to say, given `u = ulp(x)` if `x > 0` then `x + u` is the
|
|
next floating-point value, but `x - u` is not necessarily the previous value. Similarly, if
|
|
`x < 0` then `x - u` is the previous floating-point value, but `x + u` is not necessarily the next
|
|
value. The corner cases occur at power of 2 boundaries.
|
|
* When the argument becomes very small, it may be that there is no floating-point value that
|
|
represents one ULP. Whether this is the case or not depends not only on whether the hardware
|
|
may ['sometimes] support denormals (as signalled by `std::numeric_limits<FPT>::has_denorm`), but also whether these are
|
|
currently enabled at runtime (for example on SSE hardware, the DAZ or FTZ flags will disable denormal support).
|
|
In this situation, the `ulp` function may return a value that is many orders of magnitude too large.
|
|
|
|
In light of the issues above, we recomend that:
|
|
|
|
* To move between adjacent floating-point values always use __float_next, __float_prior or __nextafter (`std::nextafter`
|
|
is another candidate, but our experience is that this also often breaks depending which optimizations and
|
|
hardware flags are in effect).
|
|
* To move several floating-point values away use __float_advance.
|
|
* To calculate the edit distance between two floats use __float_distance.
|
|
|
|
There is none the less, one important use case for this function:
|
|
|
|
If it is known that the true result of some function is x[sub t] and the calculated result
|
|
is x[sub c], then the error measured in ulp is simply [^fabs(x[sub t] - x[sub c]) / ulp(x[sub t])].
|
|
|
|
[endsect] [/section ulp]
|
|
|
|
[endsect] [/ section:next_float Floating-Point Representation Distance (ULP),
|
|
and Finding Adjacent Floating-Point Values]
|
|
|
|
[/
|
|
Copyright 2008 John Maddock and Paul A. Bristow.
|
|
Distributed under the Boost Software License, Version 1.0.
|
|
(See accompanying file LICENSE_1_0.txt or copy at
|
|
http://www.boost.org/LICENSE_1_0.txt).
|
|
]
|
|
|