817 lines
41 KiB
Plaintext
817 lines
41 KiB
Plaintext
[section:lambert_w Lambert /W/ function]
|
|
|
|
[h4:synopsis Synopsis]
|
|
|
|
``
|
|
#include <boost/math/special_functions/lambert_w.hpp>
|
|
``
|
|
|
|
namespace boost { namespace math {
|
|
|
|
template <class T>
|
|
``__sf_result`` lambert_w0(T z); // W0 branch, default policy.
|
|
template <class T>
|
|
``__sf_result`` lambert_wm1(T z); // W-1 branch, default policy.
|
|
template <class T>
|
|
``__sf_result`` lambert_w0_prime(T z); // W0 branch 1st derivative.
|
|
template <class T>
|
|
``__sf_result`` lambert_wm1_prime(T z); // W-1 branch 1st derivative.
|
|
|
|
template <class T, class ``__Policy``>
|
|
``__sf_result`` lambert_w0(T z, const ``__Policy``&); // W0 with policy.
|
|
template <class T, class ``__Policy``>
|
|
``__sf_result`` lambert_wm1(T z, const ``__Policy``&); // W-1 with policy.
|
|
template <class T, class ``__Policy``>
|
|
``__sf_result`` lambert_w0_prime(T z, const ``__Policy``&); // W0 derivative with policy.
|
|
template <class T, class ``__Policy``>
|
|
``__sf_result`` lambert_wm1_prime(T z, const ``__Policy``&); // W-1 derivative with policy.
|
|
|
|
} // namespace boost
|
|
} // namespace math
|
|
|
|
|
|
[h4:description Description]
|
|
|
|
The __Lambert_W is the solution of the equation /W/(/z/)/e/[super /W/(/z/)] = /z/.
|
|
It is also called the Omega function, the inverse of /f/(/W/) = /We/[super /W/].
|
|
|
|
On the interval \[0, [infin]\), there is just one real solution.
|
|
On the interval (-/e/[super -1], 0), there are two real solutions, generating two branches which we will denote by /W/[sub 0] and /W/[sub -1].
|
|
In Boost.Math, we call these principal branches `lambert_w0` and `lambert_wm1`; their derivatives are labelled `lambert_w0_prime` and `lambert_wm1_prime`.
|
|
|
|
[import ../../example/lambert_w_graph.cpp]
|
|
|
|
[graph lambert_w_graph]
|
|
[graph lambert_w_graph_big_w]
|
|
[graph lambert_w0_prime_graph]
|
|
[graph lambert_wm1_prime_graph]
|
|
|
|
There is a singularity where the branches meet at /e/[super -1] [cong] [^ -0.367879].
|
|
Approaching this point, the condition number of function evaluation tends to infinity,
|
|
and the only method of recovering high accuracy is use of higher precision.
|
|
|
|
This implementation computes the two real branches /W/[sub 0] and /W/[sub -1]
|
|
with the functions `lambert_w0` and `lambert_wm1`,
|
|
and their derivatives, `lambert_w0_prime` and `lambert_wm1_prime`.
|
|
Complex arguments are not supported.
|
|
|
|
The final __Policy argument is optional and can be used to control how the function deals with errors.
|
|
Refer to __policy_section for more details and see examples below.
|
|
|
|
[h5:applications Applications of the Lambert /W/ function]
|
|
|
|
The Lambert /W/ function has a myriad of applications.
|
|
[@http://www.apmaths.uwo.ca/~djeffrey/Offprints/W-adv-cm.pdf Corless et al.] provide a summary of applications,
|
|
from the mathematical, like iterated exponentiation and asymptotic roots of trinomials,
|
|
to the real-world, such as the range of a jet plane, enzyme kinetics, water movement in soil,
|
|
epidemics, and diode current (an example replicated [@../../example/lambert_w_diode.cpp here]).
|
|
Since the publication of their landmark paper, there have been many more applications, and
|
|
also many new implementations of the function, upon which this implementation builds.
|
|
|
|
[h4:examples Examples]
|
|
|
|
[import ../../example/lambert_w_simple_examples.cpp]
|
|
|
|
The most basic usage of the Lambert-/W/ function is demonstrated below:
|
|
|
|
[lambert_w_simple_examples_includes]
|
|
|
|
[lambert_w_simple_examples_0]
|
|
|
|
Other floating-point types can be used too, here `float`,
|
|
including user-defined types like __multiprecision.
|
|
It is convenient to use a function like `show_value`
|
|
to display all (and only) potentially significant decimal digits, including any significant trailing zeros,
|
|
(`std::numeric_limits<T>::max_digits10`) for the type `T`.
|
|
|
|
[lambert_w_simple_examples_1]
|
|
|
|
Example of an integer argument to `lambert_w0`,
|
|
showing that an `int` literal is correctly promoted to a `double`.
|
|
|
|
[lambert_w_simple_examples_2]
|
|
|
|
Using __multiprecision types to get much higher precision is painless.
|
|
|
|
[lambert_w_simple_examples_3]
|
|
|
|
[warning When using multiprecision, take very great care not to
|
|
construct or assign non-integers from `double`, `float` ... silently losing precision.
|
|
Use `"1.2345678901234567890123456789"` rather than `1.2345678901234567890123456789`.]
|
|
|
|
Using multiprecision types, it is all too easy to get multiprecision precision wrong!
|
|
|
|
[lambert_w_simple_examples_4]
|
|
|
|
[note See spurious non-seven decimal digits appearing after digit #17 in the argument 0.7777777777777777...!]
|
|
|
|
And similarly constructing from a literal `double 0.9`, with more random digits after digit number 17.
|
|
|
|
[lambert_w_simple_examples_4a]
|
|
|
|
Note how the `cpp_float_dec_50` result is only as correct as from a `double = 0.9`.
|
|
|
|
Now see the correct result for all 50 decimal digits constructing from a decimal digit string "0.9":
|
|
|
|
[lambert_w_simple_examples_4b]
|
|
|
|
Note the expected zeros for all places up to 50 - and the correct Lambert /W/ result!
|
|
|
|
(It is just as easy to compute even higher precisions,
|
|
at least to thousands of decimal digits, but not shown here for brevity.
|
|
See [@../../example/lambert_w_simple_examples.cpp lambert_w_simple_examples.cpp]
|
|
for comparison of an evaluation at 1000 decimal digit precision with __WolframAlpha).
|
|
|
|
Policies can be used to control what action to take on errors:
|
|
|
|
[lambert_w_simple_examples_error_policies]
|
|
|
|
An example error message:
|
|
|
|
[lambert_w_simple_examples_error_message_1]
|
|
|
|
Showing an error reported if a value is passed to `lambert_w0` that is out of range,
|
|
(and was probably meant to be passed to `lambert_wm1` instead).
|
|
|
|
[lambert_w_simple_examples_out_of_range]
|
|
|
|
The full source of these examples is at [@../../example/lambert_w_simple_examples.cpp lambert_w_simple_examples.cpp]
|
|
|
|
[h5:diode_resistance Diode Resistance Example]
|
|
|
|
[import ../../example/lambert_w_diode_graph.cpp]
|
|
|
|
A typical example of a practical application is estimating the current flow
|
|
through a diode with series resistance from a paper by Banwell and Jayakumar.
|
|
|
|
Having the Lambert /W/ function available makes it simple to reproduce the plot
|
|
in their paper (Fig 2) comparing estimates using with Lambert /W/ function
|
|
and some actual measurements.
|
|
The colored curves show the effect of various series resistance on the current
|
|
compared to an extrapolated line in grey with no internal (or external) resistance.
|
|
|
|
Two formulae relating the diode current and effect of series resistance can be combined,
|
|
but yield an otherwise intractable equation relating the current versus voltage
|
|
with a varying series resistance. This was reformulated as a
|
|
generalized equation in terms of the Lambert W function:
|
|
|
|
Banwell and Jakaumar equation 5
|
|
|
|
[expression I(V) = [mu] V[sub T]/ R [sub S] [dot] W[sub 0](I[sub 0] R[sub S] / ([mu] V[sub T]))]
|
|
|
|
Using these variables
|
|
|
|
[lambert_w_diode_graph_1]
|
|
|
|
the formulas can be rendered in C++
|
|
|
|
[lambert_w_diode_graph_2]
|
|
|
|
to reproduce their Fig 2:
|
|
|
|
[graph diode_iv_plot]
|
|
|
|
The plotted points for no external series resistance
|
|
(derived from their published plot as the raw data are not publicly available)
|
|
are used to extrapolate back to estimate the intrinsic emitter resistance as 0.3 ohm.
|
|
The effect of external series resistance is visible when the colored lines start to curve away from the straight line as voltage increases.
|
|
|
|
See [@../../example/lambert_w_diode.cpp lambert_w_diode.cpp] and
|
|
[@../../example/lambert_w_diode_graph.cpp lambert_w_diode_graph.cpp]
|
|
for details of the calculation.
|
|
|
|
[h5:implementations Existing implementations]
|
|
|
|
The principal value of the Lambert /W/ function is implemented in the [@http://mathworld.wolfram.com/LambertW-Function.html Wolfram Language] as `ProductLog[k, z]`,
|
|
where `k` is the branch.
|
|
|
|
The symbolic algebra program __Maple also computes Lambert /W/ to an arbitrary precision.
|
|
|
|
[h4:precision Controlling the compromise between Precision and Speed]
|
|
|
|
[h5:small_floats Floating-point types `double` and `float`]
|
|
This implementation provides good precision and excellent speed for __fundamental `float` and `double`.
|
|
|
|
All the functions usually return values within a few __ulp
|
|
for the floating-point type, except for very small arguments very near zero,
|
|
and for arguments very close to the singularity at the branch point.
|
|
|
|
By default, this implementation provides the best possible speed.
|
|
Very slightly average higher precision and less bias might be obtained
|
|
by adding a __halley step refinement,
|
|
but at the cost of more than doubling the runtime.
|
|
|
|
[h5:big_floats Floating-point types larger than double]
|
|
|
|
For floating-point types with precision greater than `double` and `float` __fundamental_types,
|
|
a `double` evaluation is used as a first approximation followed by Halley refinement,
|
|
using a single step where it can be predicted that this will be sufficient,
|
|
and only using __halley iteration when necessary.
|
|
Higher precision types are always going to be [*very, very much slower].
|
|
|
|
The 'best' evaluation (the nearest __representable) can be achieved by `static_cast`ing from a
|
|
higher precision type, typically a __multiprecision type like `cpp_bin_float_50`,
|
|
but at the cost of increasing run-time 100-fold;
|
|
this has been used here to provide some of our reference values for testing.
|
|
|
|
[import ../../example/lambert_w_precision_example.cpp]
|
|
|
|
For example, we get a reference value using a high precision type, for example;
|
|
|
|
using boost::multiprecision::cpp_bin_float_50;
|
|
|
|
that uses Halley iteration to refine until it is as precise as possible for this `cpp_bin_float_50` type.
|
|
|
|
As a further check we can compare this with a __WolframAlpha computation
|
|
using command [^N\[ProductLog\[10.\], 50\]] to get 50 decimal digits
|
|
and similarly [^N\[ProductLog\[10.\], 17\]] to get the nearest representable for 64-bit `double` precision.
|
|
|
|
[lambert_w_precision_reference_w]
|
|
|
|
giving us the same nearest representable using 64-bit `double` as `1.7455280027406994`.
|
|
|
|
However, the rational polynomial and Fukushima Schroder approximations are so good for type `float` and `double`
|
|
that negligible improvement is gained from a `double` Halley step.
|
|
|
|
This is shown with [@../../example/lambert_w_precision_example.cpp lambert_w_precision_example.cpp]
|
|
for Lambert /W/[sub 0]:
|
|
|
|
[lambert_w_precision_1]
|
|
|
|
with this output:
|
|
|
|
[lambert_w_precision_output_1]
|
|
|
|
and then for /W/[sub -1]:
|
|
|
|
[lambert_w_precision_2]
|
|
|
|
with this output:
|
|
|
|
[lambert_w_precision_output_2]
|
|
|
|
[h5:differences_distribution Distribution of differences from 'best' `double` evaluations]
|
|
|
|
The distribution of differences from 'best' are shown in these graphs comparing `double` precision evaluations
|
|
with reference 'best' z50 evaluations using `cpp_bin_float_50` type reduced to `double` with `static_cast<double(z50)` :
|
|
|
|
[graph lambert_w0_errors_graph]
|
|
|
|
[graph lambert_wm1_errors_graph]
|
|
|
|
As noted in the implementation section, the distribution of these differences is somewhat biased
|
|
for Lambert /W/[sub -1] and this might be reduced using a `double` Halley step at small runtime cost.
|
|
But if you are seriously concerned to get really precise computations,
|
|
the only way is using a higher precision type and then reduce to the desired type.
|
|
Fortunately, __multiprecision makes this very easy to program, if much slower.
|
|
|
|
[h4:edge_cases Edge and Corner cases]
|
|
|
|
[h5:w0_edges The /W/[sub 0] Branch]
|
|
|
|
The domain of /W/[sub 0] is \[-/e/[super -1], [infin]\). Numerically,
|
|
|
|
* `lambert_w0(-1/e)` is exactly -1.
|
|
* `lambert_w0(z)` for `z < -1/e` throws a `domain_error`, or returns `NaN` according to the policy.
|
|
* `lambert_w0(std::numeric_limits<T>::infinity())` throws an `overflow_error`.
|
|
|
|
(An infinite argument probably indicates that something has already gone wrong,
|
|
but if it is desired to return infinity,
|
|
this case should be handled before calling `lambert_w0`).
|
|
|
|
[h5:wm1_edges /W/[sub -1] Branch]
|
|
|
|
The domain of /W/[sub -1] is \[-/e/[super -1], 0\). Numerically,
|
|
|
|
* `lambert_wm1(-1/e)` is exactly -1.
|
|
* `lambert_wm1(0)` returns -[infin] (or the nearest equivalent if `std::has_infinity == false`).
|
|
* `lambert_wm1(-std::numeric_limits<T>::min())` returns the maximum (most negative) possible value of Lambert /W/ for the type T. [br]
|
|
For example, for `double`: lambert_wm1(-2.2250738585072014e-308) = -714.96865723796634 [br]
|
|
and for `float`: lambert_wm1(-1.17549435e-38) = -91.8567734 [br]
|
|
|
|
* `z < -std::numeric_limits<T>::min()`, means that z is zero or denormalized (if `std::numeric_limits<T>::has_denorm_min == true`),
|
|
for example: `r = lambert_wm1(-std::numeric_limits<double>::denorm_min());` and an overflow_error exception is thrown,
|
|
and will give a message like:
|
|
|
|
Error in function boost::math::lambert_wm1<RealType>(<RealType>):
|
|
Argument z = -4.9406564584124654e-324 is too small
|
|
(z < -std::numeric_limits<T>::min so denormalized) for Lambert W-1 branch!
|
|
|
|
Denormalized values are not supported for Lambert /W/[sub -1] (because not all floating-point types denormalize),
|
|
and anyway it only covers a tiny fraction of the range of possible z arguments values.
|
|
|
|
[h4:compilers Compilers]
|
|
|
|
The `lambert_w.hpp` code has been shown to work on most C++98 compilers.
|
|
(Apart from requiring C++11 extensions for using of `std::numeric_limits<>::max_digits10` in some diagnostics.
|
|
Many old pre-c++11 compilers provide this extension but may require enabling to use,
|
|
for example using b2/bjam the lambert_w examples use this command:
|
|
|
|
[ run lambert_w_basic_example.cpp : : : [ requires cxx11_numeric_limits ] ]
|
|
|
|
See [@../../example/Jamfile.v2 jamfile.v2].)
|
|
|
|
For details of which compilers are expected to work see lambert_w tests and examples in:[br]
|
|
[@https://www.boost.org/development/tests/master/developer/math.html Boost Test Summary report for master branch (used for latest release)][br]
|
|
[@https://www.boost.org/development/tests/develop/developer/math.html Boost Test Summary report for latest developer branch].
|
|
|
|
As expected, debug mode is very much slower than release.
|
|
|
|
[h5:diagnostics Diagnostics Macros]
|
|
|
|
Several macros are provided to output diagnostic information (potentially [*much] output).
|
|
These can be statements, for example:
|
|
|
|
`#define BOOST_MATH_INSTRUMENT_LAMBERT_W_TERMS`
|
|
|
|
placed [*before] the `lambert_w` include statement
|
|
|
|
`#include <boost/math/special_functions/lambert_w.hpp>`,
|
|
|
|
or defined on the project compile command-line: `/DBOOST_MATH_INSTRUMENT_LAMBERT_W_TERMS`,
|
|
|
|
or defined in a jamfile.v2: `<define>BOOST_MATH_INSTRUMENT_LAMBERT_W_TERMS`
|
|
|
|
[import ../../include/boost/math/special_functions/lambert_w.hpp]
|
|
|
|
[boost_math_instrument_lambert_w_macros]
|
|
|
|
[h4:implementation Implementation]
|
|
|
|
There are many previous implementations, each with increasing accuracy and/or speed.
|
|
See __lambert_w_references below.
|
|
|
|
For most of the range of ['z] arguments, some initial approximation followed by a single refinement,
|
|
often using Halley or similar method, gives a useful precision.
|
|
For speed, several implementations avoid evaluation of a iteration test using the exponential function,
|
|
estimating that a single refinement step will suffice,
|
|
but these rarely get to the best result possible.
|
|
To get a better precision, additional refinements, probably iterative, are needed
|
|
for example, using __halley or __schroder methods.
|
|
|
|
For C++, the most precise results possible, closest to the nearest __representable for the C++ type being used,
|
|
it is usually necessary to use a higher precision type for intermediate computation,
|
|
finally static-casting back to the smaller desired result type.
|
|
This strategy is used by __Maple and __WolframAlpha, for example, using arbitrary precision arithmetic,
|
|
and some of their high-precision values are used for testing this library.
|
|
This method is also used to provide some __boost_test values using __multiprecision,
|
|
typically, a 50 decimal digit type like `cpp_bin_float_50`
|
|
`static_cast` to a `float`, `double` or `long double` type.
|
|
|
|
For ['z] argument values near the singularity and near zero, other approximations may be used,
|
|
possibly followed by refinement or increasing number of series terms until a desired precision is achieved.
|
|
At extreme arguments near to zero or the singularity at the branch point,
|
|
even this fails and the only method to achieve a really close result is to cast from a higher precision type.
|
|
|
|
In practical applications, the increased computation required
|
|
(often towards a thousand-fold slower and requiring much additional code for __multiprecision)
|
|
is not justified and the algorithms here do not implement this.
|
|
But because the Boost.Lambert_W algorithms has been tested using __multiprecision,
|
|
users who require this can always easily achieve the nearest representation for __fundamental_types
|
|
- if the application justifies the very large extra computation cost.
|
|
|
|
[h5 Evolution of this implementation]
|
|
|
|
One compact real-only implementation was based on an algorithm by __Luu_thesis,
|
|
(see routine 11 on page 98 for his Lambert W algorithm)
|
|
and his Halley refinement is used iteratively when required.
|
|
A first implementation was based on Thomas Luu's code posted at
|
|
[@https://svn.boost.org/trac/boost/ticket/11027 Boost Trac \#11027].
|
|
It has been implemented from Luu's algorithm but templated on `RealType` parameter and result
|
|
and handles both __fundamental_types (`float, double, long double`), __multiprecision,
|
|
and also has been tested successfully with a proposed fixed_point type.
|
|
|
|
A first approximation was computed using the method of Barry et al (see references 5 & 6 below).
|
|
This was extended to the widely used [@https://people.sc.fsu.edu/~jburkardt/f_src/toms443/toms443.html TOMS443]
|
|
FORTRAN and C++ versions by John Burkardt using Schroeder refinement(s).
|
|
(For users only requiring an accuracy of relative accuracy of 0.02%, Barry's function alone might suffice,
|
|
but a better __rational_function approximation method has since been developed for this implementation).
|
|
|
|
We also considered using __newton method.
|
|
``
|
|
f(w) = w e^w -z = 0 // Luu equation 6.37
|
|
f'(w) = e^w (1 + w), Wolfram alpha (d)/(dw)(f(w) = w exp(w) - z) = e^w (w + 1)
|
|
if (f(w) / f'(w) -1 < tolerance
|
|
w1 = w0 - (expw0 * (w0 + 1)); // Refine new Newton/Raphson estimate.
|
|
``
|
|
but concluded that since the Newton-Raphson method takes typically 6 iterations to converge within tolerance,
|
|
whereas Halley usually takes only 1 to 3 iterations to achieve an result within 1 __ulp,
|
|
so the Newton-Raphson method is unlikely to be quicker
|
|
than the additional cost of computing the 2nd derivative for Halley's method.
|
|
|
|
Halley refinement uses the simplified formulae obtained from
|
|
[@http://www.wolframalpha.com/input/?i=%5B2(z+exp(z)-w)+d%2Fdx+(z+exp(z)-w)%5D+%2F+%5B2+(d%2Fdx+(z+exp(z)-w))%5E2+-+(z+exp(z)-w)+d%5E2%2Fdx%5E2+(z+exp(z)-w)%5D Wolfram Alpha]
|
|
|
|
[2(z exp(z)-w) d/dx (z exp(z)-w)] / [2 (d/dx (z exp(z)-w))^2 - (z exp(z)-w) d^2/dx^2 (z exp(z)-w)]
|
|
|
|
[h4:compact_implementation Implementing Compact Algorithms]
|
|
|
|
The most compact algorithm can probably be implemented using the log approximation of Corless et al.
|
|
followed by Halley iteration (but is also slowest and least precise near zero and near the branch singularity).
|
|
|
|
[h4:faster_implementation Implementing Faster Algorithms]
|
|
|
|
More recently, the Tosio Fukushima has developed an even faster algorithm,
|
|
avoiding any transcendental function calls as these are necessarily expensive.
|
|
The current implementation of Lambert /W/[sub -1] is based on his algorithm
|
|
starting with a translation from Fukushima's FORTRAN into C++ by Darko Veberic.
|
|
|
|
Many applications of the Lambert W function make many repeated evaluations for Monte Carlo methods;
|
|
for these applications speed is very important.
|
|
Luu, and Chapeau-Blondeau and Monir provide typical usage examples.
|
|
|
|
Fukushima improves the important observation that much of the execution time of all
|
|
previous iterative algorithms was spent evaluating transcendental functions, usually `exp`.
|
|
He has put a lot of work into avoiding any slow transcendental functions by using lookup tables and
|
|
bisection, finishing with a single Schroeder refinement,
|
|
without any check on the final precision of the result (necessarily evaluating an expensive exponential).
|
|
|
|
Theoretical and practical tests confirm that Fukushima's algorithm gives Lambert W estimates
|
|
with a known small error bound (several __ulp) over nearly all the range of ['z] argument.
|
|
|
|
A mean difference was computed to express the typical error and is often about 0.5 epsilon,
|
|
the theoretical minimum. Using the __float_distance, we can also express this as the number of
|
|
bits that are different from the nearest representable or 'exact' or 'best' value.
|
|
The number and distribution of these few bits differences was studied by binning, including their sign.
|
|
Bins for (signed) 0, 1, 2 and 3 and 4 bits proved suitable.
|
|
|
|
However, though these give results within a few __epsilon of the nearest representable result,
|
|
they do not get as close as is very often possible with further refinement,
|
|
nrealy always to within one or two __epsilon.
|
|
|
|
More significantly, the evaluations of the sum of all signed differences using the Fukshima algorithm
|
|
show a slight bias, being more likely to be a bit or few below the nearest representation than above;
|
|
bias might have unwanted effects on some statistical computations.
|
|
|
|
Fukushima's method also does not cover the full range of z arguments of 'float' precision and above.
|
|
|
|
For this implementation of Lambert /W/[sub 0], John Maddock used the Boost.Math __remez method program
|
|
to devise a __rational_function for several ranges of argument for the /W/[sub 0] branch of Lambert /W/ function.
|
|
These minimax rational approximations are combined for an algorithm that is both smaller and faster.
|
|
|
|
Sadly it has not proved practical to use the same __remez method for Lambert /W/[sub -1] branch
|
|
and so the Fukushima algorithm is retained for this branch.
|
|
|
|
An advantage of both minimax rational __remez approximations
|
|
is that the [*distribution] from the reference values is reasonably random and insignificantly biased.
|
|
|
|
For example, table below a test of Lambert /W/[sub 0] 10000 values of argument covering the main range of possible values,
|
|
10000 comparisons from z = 0.0501 to 703, in 0.001 step factor 1.05 when module 7 == 0
|
|
|
|
[table:lambert_w0_Fukushima Fukushima Lambert /W/[sub 0] and typical improvement from a single Halley step.
|
|
[[Method] [Exact] [One_bit] [Two_bits] [Few_bits] [inexact] [bias]]
|
|
[[Schroeder /W/[sub 0] ] [8804 ] [ 1154 ] [ 37 ] [ 5 ] [1243 ] [-1193]]
|
|
[[after Halley step] [ 9710 ] [ 288 ] [ 2 ] [0] [ 292 ] [22]]
|
|
] [/table:lambert_w0_Fukushima /W/[sub 0] ]
|
|
|
|
Lambert /W/[sub 0] values computed using the Fukushima method with Schroeder refinement gave about 1/6 `lambert_w0` values that are
|
|
one bit different from the 'best', and < 1% that are a few bits 'wrong'.
|
|
If a Halley refinement step is added, only 1 in 30 are even one bit different, and only 2 two-bits 'wrong'.
|
|
|
|
[table:lambert_w0_plus_halley Rational polynomial Lambert /W/[sub 0] and typical improvement from a single Halley step.
|
|
[[Method] [Exact] [One_bit] [Two_bits] [Few_bits] [inexact] [bias]]
|
|
[[rational/polynomial] [7135] [ 2863 ] [ 2 ] [0] [ 2867 ] [ -59] ]
|
|
[[after Halley step ] [ 9724 ] [273] [ 3 ] [0] [ 279 ] [5] ]
|
|
] [/table:lambert_w0_plus_halley]
|
|
|
|
With the rational polynomial approximation method, there are a third one-bit from the best and none more than two-bits.
|
|
Adding a Halley step (or iteration) reduces the number that are one-bit different from about a third down to one in 30;
|
|
this is unavoidable 'computational noise'.
|
|
An extra Halley step would double the runtime for a tiny gain and so is not chosen for this implementation,
|
|
but remains a option, as detailed above.
|
|
|
|
For the Lambert /W/[sub -1] branch, the Fukushima algorithm is used.
|
|
|
|
[table:lambert_wm1_fukushima Lambert /W/[sub -1] using Fukushima algorithm.
|
|
[[Method] [Exact] [One_bit] [Two_bits] [Few_bits] [inexact] [bias]]
|
|
[[Fukushima /W/[sub -1]] [ 7167] [2704 ] [ 129 ] [0] [2962 ] [-160]]
|
|
[[plus Halley step] [ 7379 ] [ 2529 ] [92 ] [0] [ 2713 ] [ 549]]
|
|
] [/table:lambert_wm1_fukushima]
|
|
|
|
[/ generated using PAB j:\Cpp\Misc\lambert_w_compare_jm2\lambert_w_compare_jm2.cpp]
|
|
|
|
[h5:lookup_tables Lookup tables]
|
|
|
|
For speed during the bisection, Fukushima's algorithm computes lookup tables of powers of e and z for integral Lambert W.
|
|
There are 64 elements in these tables. The FORTRAN version (and the C++ translation by Veberic)
|
|
computed these (once) as `static` data. This is slower, may cause trouble with multithreading, and
|
|
is slightly inaccurate because of rounding errors from repeated(64) multiplications.
|
|
|
|
In this implementation the array values have been computed using __multiprecision 50 decimal digit
|
|
and output as C++ arrays 37 decimal digit `long double` literals using `max_digits10` precision
|
|
|
|
std::cout.precision(std::numeric_limits<cpp_bin_float_quad>::max_digits10);
|
|
|
|
The arrays are as `const` and `constexpr` and `static` as possible (for the compiler version),
|
|
using BOOST_STATIC_CONSTEXPR macro.
|
|
(See [@../../tools/lambert_w_lookup_table_generator.cpp lambert_w_lookup_table_generator.cpp]
|
|
The precision was chosen to ensure that if used as `long double` arrays,
|
|
then the values output to
|
|
[@ ../../include/boost/math/special_functions/detail/lambert_w_lookup_table.ipp lambert_w_lookup_table.ipp]
|
|
will be the nearest representable value for the type chose by a `typedef` in
|
|
[@../../include/boost/math/special_functions/lambert_w.hpp lambert_w.hpp].
|
|
|
|
typedef double lookup_t; // Type for lookup table (`double` or `float`, or even `long double`?)
|
|
|
|
This is to allow for future use at higher precision, up to platforms that use 128-bit
|
|
(hardware or software) for their `long double` type.
|
|
|
|
The accuracy of the tables was confirmed using __WolframAlpha and agrees at the 37th decimal place,
|
|
so ensuring that the value is exactly read into even 128-bit `long double` to the nearest representation.
|
|
|
|
[h5:higher_precision Higher precision]
|
|
|
|
For types more precise than `double`, Fukushima reported that it was best to use the `double` estimate
|
|
as a starting point, followed by refinement using __halley iterations or other methods;
|
|
our experience confirms this.
|
|
|
|
Using __multiprecision it is simple to compute very high precision values of Lambert
|
|
W at least to thousands of decimal digits over most of the range of z arguments.
|
|
|
|
For this reason, the lookup tables and bisection are only carried out at low precision,
|
|
usually `double`, chosen by the `typedef double lookup_t`. Unlike the FORTRAN version,
|
|
the lookup tables of Lambert_W of integral values are precomputed as C++ static arrays of floating-point literals.
|
|
The default is a `typedef` setting the type to `double`.
|
|
To allow users to vary the precision from `float` to `long double` these are computed to
|
|
128-bit precision to ensure that even platforms with `long double` do not lose precision.
|
|
|
|
The FORTRAN version and translation only permits the z argument to be the largest
|
|
items in these lookup arrays, `wm0s[64] = 3.99049`, producing an error message and returning `NaN`.
|
|
So 64 is the largest possible value ever returned from the `lambert_w0` function.
|
|
This is far from the `std::numeric_limits<>::max()` for even `float`s.
|
|
Therefore this implementation uses an approximation or 'guess' and Halley's method to refine the result.
|
|
Logarithmic approximation is discussed at length by R.M.Corless et al. (page 349).
|
|
Here we use the first two terms of equation 4.19:
|
|
|
|
T lz = log(z);
|
|
T llz = log(lz);
|
|
guess = lz - llz + (llz / lz);
|
|
|
|
This gives a useful precision suitable for Halley refinement.
|
|
|
|
Similarly, for Lambert /W/[sub -1] branch, tiny values very near zero,
|
|
W > 64 cannot be computed using the lookup table.
|
|
For this region, an approximation followed by a few (usually 3) Halley refinements.
|
|
See __lambert_w_wm1_near_zero.
|
|
|
|
For the less well-behaved regions for Lambert /W/[sub 0] ['z] arguments near zero,
|
|
and near the branch singularity at ['-1/e], some series functions are used.
|
|
|
|
[h5:small_z Small values of argument z near zero]
|
|
|
|
When argument ['z] is small and near zero, there is an efficient and accurate series evaluation method available
|
|
(implemented in `lambert_w0_small_z`).
|
|
There is no equivalent for the /W/[sub -1] branch as this only covers argument `z < -1/e`.
|
|
The cutoff used `abs(z) < 0.05` is as found by trial and error by Fukushima.
|
|
|
|
Coefficients of the inverted series expansion of the Lambert W function around `z = 0`
|
|
are computed following Fukushima using 17 terms of a Taylor series
|
|
computed using __Mathematica with
|
|
|
|
InverseSeries[Series[z Exp[z],{z,0,17}]]
|
|
|
|
See Tosio Fukushima, Journal of Computational and Applied Mathematics 244 (2013), page 86.
|
|
|
|
To provide higher precision constants (34 decimal digits) for types larger than `long double`,
|
|
|
|
InverseSeries[Series[z Exp[z],{z,0,34}]]
|
|
|
|
were also computed, but for current hardware it was found that evaluating a `double` precision
|
|
and then refining with Halley's method was quicker and more accurate.
|
|
|
|
Decimal values of specifications for built-in floating-point types below
|
|
are 21 digits precision == `std::numeric_limits<T>::max_digits10` for `long double`.
|
|
|
|
Specializations for `lambert_w0_small_z` are provided for
|
|
`float`, `double`, `long double`, `float128` and for __multiprecision types.
|
|
|
|
The `tag_type` selection is based on the value `std::numeric_limits<T>::max_digits10`
|
|
(and [*not] on the floating-point type T).
|
|
This distinguishes between `long double` types that commonly vary between 64 and 80-bits,
|
|
and also compilers that have a `float` type using 64 bits and/or `long double` using 128-bits.
|
|
|
|
As noted in the __lambert_w_implementation section above,
|
|
it is only possible to ensure the nearest representable value by casting from a higher precision type,
|
|
computed at very, very much greater cost.
|
|
|
|
For multiprecision types, first several terms of the series are tabulated and evaluated as a polynomial:
|
|
(this will save us a bunch of expensive calls to `pow`).
|
|
Then our series functor is initialized "as if" it had already reached term 18,
|
|
enough evaluation of built-in 64-bit double and float (and 80-bit `long double`) types.
|
|
Finally the functor is called repeatedly to compute as many additional series terms
|
|
as necessary to achive the desired precision, set from `get_epsilon`
|
|
(or terminated by `evaluation_error` on reaching the set iteration limit `max_series_iterations`).
|
|
|
|
A little more than one decimal digit of precision is gained by each additional series term.
|
|
This allows computation of Lambert W near zero to at least 1000 decimal digit precision,
|
|
given sufficient compute time.
|
|
|
|
[h4:near_singularity Argument z near the singularity at -1/e between branches /W/[sub 0] and /W/[sub -1] ]
|
|
|
|
Variants of Function `lambert_w_singularity_series` are used to handle ['z] arguments
|
|
which are near to the singularity at `z = -exp(-1) = -3.6787944` where the branches /W/[sub 0] and /W/[sub -1] join.
|
|
|
|
T. Fukushima / Journal of Computational and Applied Mathematics 244 (2013) 77-89
|
|
describes using __Mathematica
|
|
|
|
InverseSeries\[Series\[sqrt\[2(p Exp\[1 + p\] + 1)\], {p,-1, 20}\]\]
|
|
|
|
to provide his Table 3, page 85.
|
|
|
|
This implementation used __Mathematica to obtain 40 series terms at 50 decimal digit precision
|
|
``
|
|
N\[InverseSeries\[Series\[Sqrt\[2(p Exp\[1 + p\] + 1)\], { p,-1,40 }\]\], 50\]
|
|
|
|
-1+p-p^2/3+(11 p^3)/72-(43 p^4)/540+(769 p^5)/17280-(221 p^6)/8505+(680863 p^7)/43545600 ...
|
|
``
|
|
These constants are computed at compile time for the full precision for any `RealType T`
|
|
using the original rationals from Fukushima Table 3.
|
|
|
|
Longer decimal digits strings are rationals pre-evaluated using __Mathematica.
|
|
Some integer constants overflow, so largest size available is used, suffixed by `uLL`.
|
|
|
|
Above the 14th term, the rationals exceed the range of `unsigned long long` and are replaced
|
|
by pre-computed decimal values at least 21 digits precision == `max_digits10` for `long double`.
|
|
|
|
A macro `BOOST_MATH_TEST_VALUE` (defined in [@../../test/test_value.hpp test_value.hpp])
|
|
taking a decimal floating-point literal was used
|
|
to allow testing with both built-in floating-point types like `double`
|
|
which have contructors taking literal decimal values like `3.14`,
|
|
[*and] also multiprecision and other User-defined Types that only provide full-precision construction
|
|
from decimal digit strings like `"3.14"`.
|
|
(Construction of multiprecision types from built-in floating-point types
|
|
only provides the precision of the built-in type, like `double`, only 17 decimal digits).
|
|
|
|
[tip Be exceeding careful not to silently lose precision by constructing multiprecision types from literal decimal types, usually [^double]. Use decimal digit strings like "3.1459" instead. See examples.]
|
|
|
|
Fukushima's implementation used 20 series terms; it was confirmed that using more terms does not usefully increase accuracy.
|
|
|
|
[h5:wm1_near_zero Lambert /W/[sub -1] arguments values very near zero.]
|
|
|
|
The lookup tables of Fukushima have only 64 elements,
|
|
so that the z argument nearest zero is -1.0264389699511303e-26,
|
|
that corresponds to a maximum Lambert /W/[sub -1] value of 64.0.
|
|
Fukushima's implementation did not cater for z argument values that are smaller (nearer to zero),
|
|
but this implementation adds code to accept smaller (but not denormalised) values of z.
|
|
A crude approximation for these very small values is to take the exponent and multiply by ln[10] ~= 2.3.
|
|
We also tried the approximation first proposed by Corless et al. using ln(-z), (equation 4.19 page 349)
|
|
and then tried improving by a 2nd term -ln(ln(-z)), and finally the ratio term -ln(ln(-z))/ln(-z).
|
|
|
|
For a z very close to z = -1.0264389699511303e-26 when W = 64, when effect of ln(ln(-z) term, and ratio L1/L2 is greatest,
|
|
the possible 'guesses' are
|
|
|
|
z = -1.e-26, w = -64.02, guess = -64.0277, ln(-z) = -59.8672, ln(-ln(-z) = 4.0921, llz/lz = -0.0684
|
|
|
|
whereas at the minimum (unnormalized) z
|
|
|
|
z = -2.2250e-308, w = -714.9, guess = -714.9687, ln(-z) = -708.3964, ln(-ln(-z) = 6.5630, llz/lz = -0.0092
|
|
|
|
Although the addition of the 3rd ratio term did not reduce the number of Halley iterations needed,
|
|
it might allow return of a better low precision estimate [*without any Halley iterations].
|
|
For the worst case near w = 64, the error in the 'guess' is 0.008, ratio 0.0001 or 1 in 10,000 digits 10 ~= 4.
|
|
Two log evalutations are still needed, but is probably over an order of magnitude faster.
|
|
|
|
Halley's method was then used to refine the estimate of Lambert /W/[sub -1] from this guess.
|
|
Experiments showed that although all approximations reached with __ulp of the closest representable value,
|
|
the computational cost of the log functions was easily paid by far fewer iterations
|
|
(typically from 8 down to 4 iterations for double or float).
|
|
[h5:halley Halley refinement]
|
|
|
|
After obtaining a double approximation, for `double`, `long double` and `quad` 128-bit precision,
|
|
a single iteration should suffice because
|
|
Halley iteration should triple the precision with each step
|
|
(as long as the function is well behaved - and it is),
|
|
and since we have at least half of the bits correct already,
|
|
one Halley step is ample to get to 128-bit precision.
|
|
|
|
|
|
[h5:lambert_w_derivatives Lambert W Derivatives]
|
|
|
|
The derivatives are computed using the formulae in [@https://en.wikipedia.org/wiki/Lambert_W_function#Derivative Wikipedia].
|
|
|
|
[h4:testing Testing]
|
|
|
|
Initial testing of the algorithm was done using a small number of spot tests.
|
|
|
|
After it was established that the underlying algorithm (including unlimited Halley refinements with a tight terminating criterion) was correct,
|
|
some tables of Lambert W values were computed using a 100 decimal digit precision __multiprecision `cpp_dec_float_100` type and saved as
|
|
a C++ program that will initialise arrays of values of z arguments and lambert_W0 (`lambert_w_mp_high_values.ipp` and `lambert_w_mp_low_values.ipp` ).
|
|
|
|
(A few of these pairs were checked against values computed by Wolfram Alpha to try to guard against mistakes;
|
|
all those tested agreed to the penultimate decimal place, so they can be considered reliable to at least 98 decimal digits precision).
|
|
|
|
A macro `BOOST_MATH_TEST_VALUE` was used to allow tests with any real type, both __fundamental_types and __multiprecision.
|
|
(This is necessary because __fundamental_types have a constructor from floating-point literals like 3.1459F, 3.1459 or 3.1459L
|
|
whereas __multiprecision types may lose precision unless constructed from decimal digits strings like "3.1459").
|
|
|
|
The 100-decimal digits precision pairs were then used to assess the precision of less-precise types, including
|
|
__multiprecision `cpp_bin_float_quad` and `cpp_bin_float_50`. `static_cast`ing from the high precision types should
|
|
give the closest representable value of the less-precise type; this is then be used to assess the precision of
|
|
the Lambert W algorithm.
|
|
|
|
Tests using
|
|
confirm that over nearly all the range of z arguments,
|
|
nearly all estimates are the nearest __representable value, a minority are within 1 __ulp and only a very few 2 ULP.
|
|
|
|
[graph lambert_w0_errors_graph]
|
|
[graph lambert_wm1_errors_graph]
|
|
|
|
For the range of z arguments over the range -0.35 to 0.5, a different algorithm is used, but the same
|
|
technique of evaluating reference values using a __multiprecision `cpp_dec_float_100` was used.
|
|
For extremely small z arguments, near zero, and those extremely near the singularity at the branch point,
|
|
precision can be much lower, as might be expected.
|
|
|
|
See source at:
|
|
[@../../example/lambert_w_simple_examples.cpp lambert_w_simple_examples.cpp]
|
|
[@../../test/test_lambert_w.cpp test_lambert_w.cpp] contains routine tests using __boost_test.
|
|
[@../../tools/lambert_w_errors_graph.cpp lambert_w_errors_graph.cpp] generating error graphs.
|
|
|
|
[h5:quadrature_testing Testing with quadrature]
|
|
|
|
A further method of testing over a wide range of argument z values was devised by Nick Thompson
|
|
(cunningly also to test the recently written quadrature routines including __multiprecision !).
|
|
These are definite integral formulas involving the W function that are exactly known constants,
|
|
for example, LambertW0(1/(z[sup2]) == [sqrt](2[pi]),
|
|
see [@https://en.wikipedia.org/wiki/Lambert_W_function#Definite_integrals Definite Integrals].
|
|
Some care was needed to avoid overflow and underflow as the integral function must evaluate to a finite result over the entire range.
|
|
|
|
[h5 Other implementations]
|
|
|
|
The Lambert W has also been discussed in a [@http://lists.boost.org/Archives/boost/2016/09/230819.php Boost thread].
|
|
|
|
This also gives link to a prototype version by which also gives complex results [^(x < -exp(-1)], about -0.367879).
|
|
[@https://github.com/CzB404/lambert_w/ Balazs Cziraki 2016]
|
|
Physicist, PhD student at Eotvos Lorand University, ELTE TTK Institute of Physics, Budapest.
|
|
has also produced a prototype C++ library that can compute the Lambert W function for floating point
|
|
[*and complex number types].
|
|
This is not implemented here but might be completed in the future.
|
|
|
|
[h4:acknowledgements Acknowledgements]
|
|
|
|
* Thanks to Wolfram for use of their invaluable online Wolfram Alpha service.
|
|
* Thanks for Mark Chapman for performing offline Wolfram computations.
|
|
|
|
[h4:references References]
|
|
|
|
# NIST Digital Library of Mathematical Functions. [@http://dlmf.nist.gov/4.13.F1].
|
|
|
|
# [@http://www.orcca.on.ca/LambertW/ Lambert W Poster],
|
|
R. M. Corless, G. H. Gonnet, D. E. G. Hare, D. J. Jeffery and D. E. Knuth,
|
|
On the Lambert W function Advances in Computational Mathematics, Vol 5, (1996) pp 329-359.
|
|
|
|
# [@https://people.sc.fsu.edu/~jburkardt/f_src/toms443/toms443.html TOMS443],
|
|
Andrew Barry, S. J. Barry, Patricia Culligan-Hensley,
|
|
Algorithm 743: WAPR - A Fortran routine for calculating real values of the W-function,[br]
|
|
ACM Transactions on Mathematical Software, Volume 21, Number 2, June 1995, pages 172-181.[br]
|
|
BISECT approximates the W function using bisection (GNU licence).
|
|
Original FORTRAN77 version by Andrew Barry, S. J. Barry, Patricia Culligan-Hensley,
|
|
this version by C++ version by John Burkardt.
|
|
|
|
# [@https://people.sc.fsu.edu/~jburkardt/f_src/toms743/toms743.html TOMS743] Fortran 90 (updated 2014).
|
|
|
|
Initial guesses based on:
|
|
|
|
# R.M.Corless, G.H.Gonnet, D.E.G.Hare, D.J.Jeffrey, and D.E.Knuth,
|
|
On the Lambert W function, Adv.Comput.Math., vol. 5, pp. 329 to 359, (1996).
|
|
|
|
# D.A. Barry, J.-Y. Parlange, L. Li, H. Prommer, C.J. Cunningham, and
|
|
F. Stagnitti. Analytical approximations for real values of the Lambert
|
|
W-function. Mathematics and Computers in Simulation, 53(1), 95-103 (2000).
|
|
|
|
# D.A. Barry, J.-Y. Parlange, L. Li, H. Prommer, C.J. Cunningham, and
|
|
F. Stagnitti. Erratum to analytical approximations for real values of the
|
|
Lambert W-function. Mathematics and Computers in Simulation, 59(6):543-543, 2002.
|
|
|
|
# C++ __CUDA version of Luu algorithm, [@https://github.com/thomasluu/plog/blob/master/plog.cu plog].
|
|
|
|
# __Luu_thesis, see routine 11, page 98 for Lambert W algorithm.
|
|
|
|
# Having Fun with Lambert W(x) Function, Darko Veberic
|
|
University of Nova Gorica, Slovenia IK, Forschungszentrum Karlsruhe, Germany, J. Stefan Institute, Ljubljana, Slovenia.
|
|
|
|
# Fran[ccedil]ois Chapeau-Blondeau and Abdelilah Monir, Numerical Evaluation of the Lambert W Function and Application to Generation of Generalized
|
|
Gaussian Noise With Exponent 1/2, IEEE Transactions on Signal Processing, 50(9) (2002) 2160 - 2165.
|
|
|
|
# Toshio Fukushima, Precise and fast computation of Lambert W-functions without transcendental function evaluations, Journal of Computational and Applied
|
|
Mathematics, 244 (2013) 77-89.
|
|
|
|
# T.C. Banwell and A. Jayakumar, Electronic Letter, Feb 2000, 36(4), pages 291-2.
|
|
Exact analytical solution for current flow through diode with series resistance.
|
|
[@https://doi.org/10.1049/el:20000301]
|
|
|
|
# Princeton Companion to Applied Mathematics,
|
|
'The Lambert-W function', Section 1.3: Series and Generating Functions.
|
|
|
|
# Cleve Moler, Mathworks blog [@https://blogs.mathworks.com/cleve/2013/09/02/the-lambert-w-function/#bfba4e2d-e049-45a6-8285-fe8b51d69ce7 The Lambert W Function]
|
|
|
|
# Digital Library of Mathematical Function, [@https://dlmf.nist.gov/4.13 Lambert W function].
|
|
|
|
[endsect] [/section:lambert_w Lambert W function]
|
|
|
|
[/
|
|
Copyright 2016 John Maddock, Paul A. Bristow, Thomas Luu, Nicholas Thompson.
|
|
Distributed under the Boost Software License, Version 1.0.
|
|
(See accompanying file LICENSE_1_0.txt
|
|
or copy at http://www.boost.org/LICENSE_1_0.txt).
|
|
]
|