95defb67df
Improve tests and coverage. C++11/14 support. (@kedarbhat)
1055 lines
49 KiB
TeX
1055 lines
49 KiB
TeX
% Copyright Matthew Pulver 2018 - 2019.
|
|
% Distributed under the Boost Software License, Version 1.0.
|
|
% (See accompanying file LICENSE_1_0.txt or copy at
|
|
% https://www.boost.org/LICENSE_1_0.txt)
|
|
|
|
\documentclass{article}
|
|
\usepackage{amsmath} %\usepackage{mathtools}
|
|
\usepackage{amssymb} %\mathbb
|
|
\usepackage{array} % m{} column in tabular
|
|
\usepackage{csquotes} % displayquote
|
|
\usepackage{fancyhdr}
|
|
\usepackage{fancyvrb}
|
|
\usepackage[margin=0.75in]{geometry}
|
|
\usepackage{graphicx}
|
|
\usepackage{hyperref}
|
|
%\usepackage{listings}
|
|
\usepackage{multirow}
|
|
\usepackage[super]{nth}
|
|
\usepackage{wrapfig}
|
|
\usepackage{xcolor}
|
|
|
|
\hypersetup{%
|
|
colorlinks=false,% hyperlinks will be black
|
|
linkbordercolor=blue,% hyperlink borders will be red
|
|
urlbordercolor=blue,%
|
|
pdfborderstyle={/S/U/W 1}% border style will be underline of width 1pt
|
|
}
|
|
|
|
\pagestyle{fancyplain}
|
|
\fancyhf{}
|
|
\renewcommand{\headrulewidth}{0pt}
|
|
\cfoot[]{\thepage\\
|
|
\scriptsize\color{gray} Copyright \textcopyright\/ Matthew Pulver 2018--2019.
|
|
Distributed under the Boost Software License, Version 1.0.\\
|
|
(See accompanying file LICENSE\_1\_0.txt or copy at
|
|
\url{https://www.boost.org/LICENSE\_1\_0.txt})}
|
|
|
|
\DeclareMathOperator{\sinc}{sinc}
|
|
|
|
\begin{document}
|
|
|
|
\title{Autodiff\\
|
|
\large Automatic Differentiation C++ Library}
|
|
\author{Matthew Pulver}
|
|
\maketitle
|
|
|
|
%\date{}
|
|
|
|
%\begin{abstract}
|
|
%\end{abstract}
|
|
|
|
\tableofcontents
|
|
|
|
%\section{Synopsis}
|
|
%\begingroup
|
|
%\fontsize{10pt}{10pt}\selectfont
|
|
%\begin{verbatim}
|
|
% example/synopsis.cpp
|
|
%\end{verbatim}
|
|
%\endgroup
|
|
|
|
\newpage
|
|
|
|
\section{Description}
|
|
|
|
Autodiff is a header-only C++ library that facilitates the
|
|
\href{https://en.wikipedia.org/wiki/Automatic_differentiation}{automatic differentiation} (forward mode) of
|
|
mathematical functions of single and multiple variables.
|
|
|
|
This implementation is based upon the \href{https://en.wikipedia.org/wiki/Taylor_series}{Taylor series} expansion of
|
|
an analytic function $f$ at the point $x_0$:
|
|
|
|
\begin{align*}
|
|
f(x_0+\varepsilon) &= f(x_0) + f'(x_0)\varepsilon + \frac{f''(x_0)}{2!}\varepsilon^2 + \frac{f'''(x_0)}{3!}\varepsilon^3 + \cdots \\
|
|
&= \sum_{n=0}^N\frac{f^{(n)}(x_0)}{n!}\varepsilon^n + O\left(\varepsilon^{N+1}\right).
|
|
\end{align*}
|
|
The essential idea of autodiff is the substitution of numbers with polynomials in the evaluation of $f(x_0)$. By
|
|
substituting the number $x_0$ with the first-order polynomial $x_0+\varepsilon$, and using the same algorithm
|
|
to compute $f(x_0+\varepsilon)$, the resulting polynomial in $\varepsilon$ contains the function's derivatives
|
|
$f'(x_0)$, $f''(x_0)$, $f'''(x_0)$, ... within the coefficients. Each coefficient is equal to the derivative of
|
|
its respective order, divided by the factorial of the order.
|
|
|
|
In greater detail, assume one is interested in calculating the first $N$ derivatives of $f$ at $x_0$. Without loss
|
|
of precision to the calculation of the derivatives, all terms $O\left(\varepsilon^{N+1}\right)$ that include powers
|
|
of $\varepsilon$ greater than $N$ can be discarded. (This is due to the fact that each term in a polynomial depends
|
|
only upon equal and lower-order terms under arithmetic operations.) Under these truncation rules, $f$ provides a
|
|
polynomial-to-polynomial transformation:
|
|
|
|
\[
|
|
f \qquad : \qquad x_0+\varepsilon \qquad \mapsto \qquad
|
|
\sum_{n=0}^Ny_n\varepsilon^n=\sum_{n=0}^N\frac{f^{(n)}(x_0)}{n!}\varepsilon^n.
|
|
\]
|
|
C++'s ability to overload operators and functions allows for the creation of a class {\tt fvar}
|
|
(\underline{f}orward-mode autodiff \underline{var}iable) that represents polynomials in $\varepsilon$. Thus
|
|
the same algorithm $f$ that calculates the numeric value of $y_0=f(x_0)$, when
|
|
written to accept and return variables of a generic (template) type, is also used to calculate the polynomial
|
|
$\sum_{n=0}^Ny_n\varepsilon^n=f(x_0+\varepsilon)$. The derivatives $f^{(n)}(x_0)$ are then found from the
|
|
product of the respective factorial $n!$ and coefficient $y_n$:
|
|
|
|
\[ \frac{d^nf}{dx^n}(x_0)=n!y_n. \]
|
|
|
|
\section{Examples}
|
|
|
|
\subsection{Example 1: Single-variable derivatives}
|
|
|
|
\subsubsection{Calculate derivatives of $f(x)=x^4$ at $x=2$.}
|
|
|
|
In this example, {\tt make\_fvar<double, Order>(2.0)} instantiates the polynomial $2+\varepsilon$. The {\tt Order=5}
|
|
means that enough space is allocated (on the stack) to hold a polynomial of up to degree 5 during the proceeding
|
|
computation.
|
|
|
|
Internally, this is modeled by a {\tt std::array<double,6>} whose elements {\tt \{2, 1, 0, 0, 0, 0\}} correspond
|
|
to the 6 coefficients of the polynomial upon initialization. Its fourth power, at the end of the computation, is
|
|
a polynomial with coefficients {\tt y = \{16, 32, 24, 8, 1, 0\}}. The derivatives are obtained using the formula
|
|
$f^{(n)}(2)=n!*{\tt y[n]}$.
|
|
|
|
\begin{verbatim}
|
|
#include <boost/math/differentiation/autodiff.hpp>
|
|
#include <iostream>
|
|
|
|
template <typename T>
|
|
T fourth_power(T const& x) {
|
|
T x4 = x * x; // retval in operator*() uses x4's memory via NRVO.
|
|
x4 *= x4; // No copies of x4 are made within operator*=() even when squaring.
|
|
return x4; // x4 uses y's memory in main() via NRVO.
|
|
}
|
|
|
|
int main() {
|
|
using namespace boost::math::differentiation;
|
|
|
|
constexpr unsigned Order = 5; // Highest order derivative to be calculated.
|
|
auto const x = make_fvar<double, Order>(2.0); // Find derivatives at x=2.
|
|
auto const y = fourth_power(x);
|
|
for (unsigned i = 0; i <= Order; ++i)
|
|
std::cout << "y.derivative(" << i << ") = " << y.derivative(i) << std::endl;
|
|
return 0;
|
|
}
|
|
/*
|
|
Output:
|
|
y.derivative(0) = 16
|
|
y.derivative(1) = 32
|
|
y.derivative(2) = 48
|
|
y.derivative(3) = 48
|
|
y.derivative(4) = 24
|
|
y.derivative(5) = 0
|
|
*/
|
|
\end{verbatim}
|
|
The above calculates
|
|
|
|
\begin{alignat*}{3}
|
|
{\tt y.derivative(0)} &=& f(2) =&& \left.x^4\right|_{x=2} &= 16\\
|
|
{\tt y.derivative(1)} &=& f'(2) =&& \left.4\cdot x^3\right|_{x=2} &= 32\\
|
|
{\tt y.derivative(2)} &=& f''(2) =&& \left.4\cdot 3\cdot x^2\right|_{x=2} &= 48\\
|
|
{\tt y.derivative(3)} &=& f'''(2) =&& \left.4\cdot 3\cdot2\cdot x\right|_{x=2} &= 48\\
|
|
{\tt y.derivative(4)} &=& f^{(4)}(2) =&& 4\cdot 3\cdot2\cdot1 &= 24\\
|
|
{\tt y.derivative(5)} &=& f^{(5)}(2) =&& 0 &
|
|
\end{alignat*}
|
|
|
|
\subsection{Example 2: Multi-variable mixed partial derivatives with multi-precision data type}\label{multivar}
|
|
\subsubsection{Calculate $\frac{\partial^{12}f}{\partial w^{3}\partial x^{2}\partial y^{4}\partial z^{3}}(11,12,13,14)$
|
|
with a precision of about 50 decimal digits,\\
|
|
where $f(w,x,y,z)=\exp\left(w\sin\left(\frac{x\log(y)}{z}\right)+\sqrt{\frac{wz}{xy}}\right)+\frac{w^2}{\tan(z)}$.}
|
|
|
|
In this example, {\tt make\_ftuple<float50, Nw, Nx, Ny, Nz>(11, 12, 13, 14)} returns a {\tt std::tuple} of 4
|
|
independent {\tt fvar} variables, with values of 11, 12, 13, and 14, for which the maximum order derivative to
|
|
be calculated for each are 3, 2, 4, 3, respectively. The order of the variables is important, as it is the same
|
|
order used when calling {\tt v.derivative(Nw, Nx, Ny, Nz)} in the example below.
|
|
|
|
\begin{verbatim}
|
|
#include <boost/math/differentiation/autodiff.hpp>
|
|
#include <boost/multiprecision/cpp_bin_float.hpp>
|
|
#include <iostream>
|
|
|
|
using namespace boost::math::differentiation;
|
|
|
|
template <typename W, typename X, typename Y, typename Z>
|
|
promote<W, X, Y, Z> f(const W& w, const X& x, const Y& y, const Z& z) {
|
|
using namespace std;
|
|
return exp(w * sin(x * log(y) / z) + sqrt(w * z / (x * y))) + w * w / tan(z);
|
|
}
|
|
|
|
int main() {
|
|
using float50 = boost::multiprecision::cpp_bin_float_50;
|
|
|
|
constexpr unsigned Nw = 3; // Max order of derivative to calculate for w
|
|
constexpr unsigned Nx = 2; // Max order of derivative to calculate for x
|
|
constexpr unsigned Ny = 4; // Max order of derivative to calculate for y
|
|
constexpr unsigned Nz = 3; // Max order of derivative to calculate for z
|
|
// Declare 4 independent variables together into a std::tuple.
|
|
auto const variables = make_ftuple<float50, Nw, Nx, Ny, Nz>(11, 12, 13, 14);
|
|
auto const& w = std::get<0>(variables); // Up to Nw derivatives at w=11
|
|
auto const& x = std::get<1>(variables); // Up to Nx derivatives at x=12
|
|
auto const& y = std::get<2>(variables); // Up to Ny derivatives at y=13
|
|
auto const& z = std::get<3>(variables); // Up to Nz derivatives at z=14
|
|
auto const v = f(w, x, y, z);
|
|
// Calculated from Mathematica symbolic differentiation.
|
|
float50 const answer("1976.319600747797717779881875290418720908121189218755");
|
|
std::cout << std::setprecision(std::numeric_limits<float50>::digits10)
|
|
<< "mathematica : " << answer << '\n'
|
|
<< "autodiff : " << v.derivative(Nw, Nx, Ny, Nz) << '\n'
|
|
<< std::setprecision(3)
|
|
<< "relative error: " << (v.derivative(Nw, Nx, Ny, Nz) / answer - 1) << '\n';
|
|
return 0;
|
|
}
|
|
/*
|
|
Output:
|
|
mathematica : 1976.3196007477977177798818752904187209081211892188
|
|
autodiff : 1976.3196007477977177798818752904187209081211892188
|
|
relative error: 2.67e-50
|
|
*/
|
|
\end{verbatim}
|
|
|
|
\subsection{Example 3: Black-Scholes Option Pricing with Greeks Automatically Calculated}
|
|
\subsubsection{Calculate greeks directly from the Black-Scholes pricing function.}
|
|
|
|
Below is the standard Black-Scholes pricing function written as a function template, where the price, volatility
|
|
(sigma), time to expiration (tau) and interest rate are template parameters. This means that any Greek based on
|
|
these 4 variables can be calculated using autodiff. The below example calculates delta and gamma where the variable
|
|
of differentiation is only the price. For examples of more exotic greeks, see {\tt example/black\_scholes.cpp}.
|
|
|
|
\begin{verbatim}
|
|
#include <boost/math/differentiation/autodiff.hpp>
|
|
#include <iostream>
|
|
|
|
using namespace boost::math::constants;
|
|
using namespace boost::math::differentiation;
|
|
|
|
// Equations and function/variable names are from
|
|
// https://en.wikipedia.org/wiki/Greeks_(finance)#Formulas_for_European_option_Greeks
|
|
|
|
// Standard normal cumulative distribution function
|
|
template <typename X>
|
|
X Phi(X const& x) {
|
|
return 0.5 * erfc(-one_div_root_two<X>() * x);
|
|
}
|
|
|
|
enum class CP { call, put };
|
|
|
|
// Assume zero annual dividend yield (q=0).
|
|
template <typename Price, typename Sigma, typename Tau, typename Rate>
|
|
promote<Price, Sigma, Tau, Rate> black_scholes_option_price(CP cp,
|
|
double K,
|
|
Price const& S,
|
|
Sigma const& sigma,
|
|
Tau const& tau,
|
|
Rate const& r) {
|
|
using namespace std;
|
|
auto const d1 = (log(S / K) + (r + sigma * sigma / 2) * tau) / (sigma * sqrt(tau));
|
|
auto const d2 = (log(S / K) + (r - sigma * sigma / 2) * tau) / (sigma * sqrt(tau));
|
|
switch (cp) {
|
|
case CP::call:
|
|
return S * Phi(d1) - exp(-r * tau) * K * Phi(d2);
|
|
case CP::put:
|
|
return exp(-r * tau) * K * Phi(-d2) - S * Phi(-d1);
|
|
}
|
|
}
|
|
|
|
int main() {
|
|
double const K = 100.0; // Strike price.
|
|
auto const S = make_fvar<double, 2>(105); // Stock price.
|
|
double const sigma = 5; // Volatility.
|
|
double const tau = 30.0 / 365; // Time to expiration in years. (30 days).
|
|
double const r = 1.25 / 100; // Interest rate.
|
|
auto const call_price = black_scholes_option_price(CP::call, K, S, sigma, tau, r);
|
|
auto const put_price = black_scholes_option_price(CP::put, K, S, sigma, tau, r);
|
|
|
|
std::cout << "black-scholes call price = " << call_price.derivative(0) << '\n'
|
|
<< "black-scholes put price = " << put_price.derivative(0) << '\n'
|
|
<< "call delta = " << call_price.derivative(1) << '\n'
|
|
<< "put delta = " << put_price.derivative(1) << '\n'
|
|
<< "call gamma = " << call_price.derivative(2) << '\n'
|
|
<< "put gamma = " << put_price.derivative(2) << '\n';
|
|
return 0;
|
|
}
|
|
/*
|
|
Output:
|
|
black-scholes call price = 56.5136
|
|
black-scholes put price = 51.4109
|
|
call delta = 0.773818
|
|
put delta = -0.226182
|
|
call gamma = 0.00199852
|
|
put gamma = 0.00199852
|
|
*/
|
|
\end{verbatim}
|
|
|
|
\section{Advantages of Automatic Differentiation}
|
|
The above examples illustrate some of the advantages of using autodiff:
|
|
\begin{itemize}
|
|
\item Elimination of code redundancy. The existence of $N$ separate functions to calculate derivatives is a form
|
|
of code redundancy, with all the liabilities that come with it:
|
|
\begin{itemize}
|
|
\item Changes to one function require $N$ additional changes to other functions. In the \nth{3} example above,
|
|
consider how much larger and inter-dependent the above code base would be if a separate function were
|
|
written for \href{https://en.wikipedia.org/wiki/Greeks\_(finance)#Formulas\_for\_European\_option\_Greeks}
|
|
{each Greek} value.
|
|
\item Dependencies upon a derivative function for a different purpose will break when changes are made to
|
|
the original function. What doesn't need to exist cannot break.
|
|
\item Code bloat, reducing conceptual integrity. Control over the evolution of code is easier/safer when
|
|
the code base is smaller and able to be intuitively grasped.
|
|
\end{itemize}
|
|
\item Accuracy of derivatives over finite difference methods. Single-iteration finite difference methods always
|
|
include a $\Delta x$ free variable that must be carefully chosen for each application. If $\Delta x$ is too
|
|
small, then numerical errors become large. If $\Delta x$ is too large, then mathematical errors become large.
|
|
With autodiff, there are no free variables to set and the accuracy of the answer is generally superior to finite
|
|
difference methods even with the best choice of $\Delta x$.
|
|
\end{itemize}
|
|
|
|
\section{Mathematics}
|
|
|
|
In order for the usage of the autodiff library to make sense, a basic understanding of the mathematics will help.
|
|
|
|
\subsection{Truncated Taylor Series}
|
|
|
|
Basic calculus courses teach that a real \href{https://en.wikipedia.org/wiki/Analytic_function}{analytic function}
|
|
$f : D\rightarrow\mathbb{R}$ is one which can be expressed as a Taylor series at a point
|
|
$x_0\in D\subseteq\mathbb{R}$:
|
|
|
|
\[
|
|
f(x) = f(x_0) + f'(x_0)(x-x_0) + \frac{f''(x_0)}{2!}(x-x_0)^2 + \frac{f'''(x_0)}{3!}(x-x_0)^3 + \cdots
|
|
\]
|
|
One way of thinking about this form is that given the value of an analytic function $f(x_0)$ and its derivatives
|
|
$f'(x_0), f''(x_0), f'''(x_0), ...$ evaluated at a point $x_0$, then the value of the function
|
|
$f(x)$ can be obtained at any other point $x\in D$ using the above formula.
|
|
|
|
Let us make the substitution $x=x_0+\varepsilon$ and rewrite the above equation to get:
|
|
|
|
\[
|
|
f(x_0+\varepsilon) = f(x_0) + f'(x_0)\varepsilon + \frac{f''(x_0)}{2!}\varepsilon^2 + \frac{f'''(x_0)}{3!}\varepsilon^3 + \cdots
|
|
\]
|
|
Now consider $\varepsilon$ as {\it an abstract algebraic entity that never acquires a numeric value}, much like
|
|
one does in basic algebra with variables like $x$ or $y$. For example, we can still manipulate entities
|
|
like $xy$ and $(1+2x+3x^2)$ without having to assign specific numbers to them.
|
|
|
|
Using this formula, autodiff goes in the other direction. Given a general formula/algorithm for calculating
|
|
$f(x_0+\varepsilon)$, the derivatives are obtained from the coefficients of the powers of $\varepsilon$
|
|
in the resulting computation. The general coefficient for $\varepsilon^n$ is
|
|
|
|
\[\frac{f^{(n)}(x_0)}{n!}.\]
|
|
Thus to obtain $f^{(n)}(x_0)$, the coefficient of $\varepsilon^n$ is multiplied by $n!$.
|
|
|
|
\subsubsection{Example}
|
|
|
|
Apply the above technique to calculate the derivatives of $f(x)=x^4$ at $x_0=2$.
|
|
|
|
The first step is to evaluate $f(x_0+\varepsilon)$ and simply go through the calculation/algorithm, treating
|
|
$\varepsilon$ as an abstract algebraic entity:
|
|
|
|
\begin{align*}
|
|
f(x_0+\varepsilon) &= f(2+\varepsilon) \\
|
|
&= (2+\varepsilon)^4 \\
|
|
&= \left(4+4\varepsilon+\varepsilon^2\right)^2 \\
|
|
&= 16+32\varepsilon+24\varepsilon^2+8\varepsilon^3+\varepsilon^4.
|
|
\end{align*}
|
|
Equating the powers of $\varepsilon$ from this result with the above $\varepsilon$-taylor expansion
|
|
yields the following equalities:
|
|
|
|
\[
|
|
f(2) = 16, \qquad
|
|
f'(2) = 32, \qquad
|
|
\frac{f''(2)}{2!} = 24, \qquad
|
|
\frac{f'''(2)}{3!} = 8, \qquad
|
|
\frac{f^{(4)}(2)}{4!} = 1, \qquad
|
|
\frac{f^{(5)}(2)}{5!} = 0.
|
|
\]
|
|
Multiplying both sides by the respective factorials gives
|
|
|
|
\[
|
|
f(2) = 16, \qquad
|
|
f'(2) = 32, \qquad
|
|
f''(2) = 48, \qquad
|
|
f'''(2) = 48, \qquad
|
|
f^{(4)}(2) = 24, \qquad
|
|
f^{(5)}(2) = 0.
|
|
\]
|
|
These values can be directly confirmed by the \href{https://en.wikipedia.org/wiki/Power_rule}{power rule}
|
|
applied to $f(x)=x^4$.
|
|
|
|
\subsection{Arithmetic}
|
|
|
|
What was essentially done above was to take a formula/algorithm for calculating $f(x_0)$ from a number $x_0$,
|
|
and instead apply the same formula/algorithm to a polynomial $x_0+\varepsilon$. Intermediate steps operate on
|
|
values of the form
|
|
|
|
\[
|
|
{\bf x} = x_0 + x_1\varepsilon + x_2\varepsilon^2 +\cdots+ x_N\varepsilon^N
|
|
\]
|
|
and the final return value is of this polynomial form as well. In other words, the normal arithmetic operators
|
|
$+,-,\times,\div$ applied to numbers $x$ are instead applied to polynomials $\bf x$. Through the overloading of C++
|
|
operators and functions, floating point data types are replaced with data types that represent these polynomials. More
|
|
specifically, C++ types such as {\tt double} are replaced with {\tt std::array<double,N+1>}, which hold the above
|
|
$N+1$ coefficients $x_i$, and are wrapped in a {\tt class} that overloads all of the arithmetic operators.
|
|
|
|
The logic of these arithmetic operators simply mirror that which is applied to polynomials. We'll look at
|
|
each of the 4 arithmetic operators in detail.
|
|
|
|
\subsubsection{Addition}
|
|
|
|
The addition of polynomials $\bf x$ and $\bf y$ is done component-wise:
|
|
|
|
\begin{align*}
|
|
{\bf z} &= {\bf x} + {\bf y} \\
|
|
&= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) + \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\
|
|
&= \sum_{i=0}^N(x_i+y_i)\varepsilon^i \\
|
|
z_i &= x_i + y_i \qquad \text{for}\; i\in\{0,1,2,...,N\}.
|
|
\end{align*}
|
|
|
|
\subsubsection{Subtraction}
|
|
|
|
Subtraction follows the same form as addition:
|
|
|
|
\begin{align*}
|
|
{\bf z} &= {\bf x} - {\bf y} \\
|
|
&= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) - \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\
|
|
&= \sum_{i=0}^N(x_i-y_i)\varepsilon^i \\
|
|
z_i &= x_i - y_i \qquad \text{for}\; i\in\{0,1,2,...,N\}.
|
|
\end{align*}
|
|
|
|
\subsubsection{Multiplication}
|
|
|
|
Multiplication produces higher-order terms:
|
|
|
|
\begin{align*}
|
|
{\bf z} &= {\bf x} \times {\bf y} \\
|
|
&= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\
|
|
&= x_0y_0 + (x_0y_1+x_1y_0)\varepsilon + (x_0y_2+x_1y_1+x_2y_0)\varepsilon^2 + \cdots +
|
|
\left(\sum_{j=0}^Nx_jy_{N-j}\right)\varepsilon^N + O\left(\varepsilon^{N+1}\right) \\
|
|
&= \sum_{i=0}^N\sum_{j=0}^ix_jy_{i-j}\varepsilon^i + O\left(\varepsilon^{N+1}\right) \\
|
|
z_i &= \sum_{j=0}^ix_jy_{i-j} \qquad \text{for}\; i\in\{0,1,2,...,N\}.
|
|
\end{align*}
|
|
In the case of multiplication, terms involving powers of $\varepsilon$ greater than $N$, collectively denoted
|
|
by $O\left(\varepsilon^{N+1}\right)$, are simply discarded. Fortunately, the values of $z_i$ for $i\le N$ do not
|
|
depend on any of these discarded terms, so there is no loss of precision in the final answer. The only information
|
|
that is lost are the values of higher order derivatives, which we are not interested in anyway. If we were, then
|
|
we would have simply chosen a larger value of $N$ to begin with.
|
|
|
|
\subsubsection{Division}
|
|
|
|
Division is not directly calculated as are the others. Instead, to find the components of
|
|
${\bf z}={\bf x}\div{\bf y}$ we require that ${\bf x}={\bf y}\times{\bf z}$. This yields
|
|
a recursive formula for the components $z_i$:
|
|
|
|
\begin{align*}
|
|
x_i &= \sum_{j=0}^iy_jz_{i-j} \\
|
|
&= y_0z_i + \sum_{j=1}^iy_jz_{i-j} \\
|
|
z_i &= \frac{1}{y_0}\left(x_i - \sum_{j=1}^iy_jz_{i-j}\right) \qquad \text{for}\; i\in\{0,1,2,...,N\}.
|
|
\end{align*}
|
|
In the case of division, the values for $z_i$ must be calculated sequentially, since $z_i$
|
|
depends on the previously calculated values $z_0, z_1, ..., z_{i-1}$.
|
|
|
|
\subsection{General Functions}
|
|
|
|
Calling standard mathematical functions such as {\tt log()}, {\tt cos()}, etc. should return accurate higher
|
|
order derivatives. For example, {\tt exp(x)} may be written internally as a specific \nth{14}-degree polynomial to
|
|
approximate $e^x$ when $0<x<1$. This would mean that the \nth{15} derivative, and all higher order derivatives, would
|
|
be 0, however we know that $\frac{d^{15}}{dx^{15}}e^x=e^x$. How should such functions whose derivatives are known
|
|
be written to provide accurate higher order derivatives? The answer again comes back to the function's Taylor series.
|
|
|
|
To simplify notation, for a given polynomial ${\bf x} = x_0 + x_1\varepsilon + x_2\varepsilon^2 +\cdots+
|
|
x_N\varepsilon^N$ define
|
|
|
|
\[
|
|
{\bf x}_\varepsilon = x_1\varepsilon + x_2\varepsilon^2 +\cdots+ x_N\varepsilon^N = \sum_{i=1}^Nx_i\varepsilon^i.
|
|
\]
|
|
This allows for a concise expression of a general function $f$ of $\bf x$:
|
|
|
|
\begin{align*}
|
|
f({\bf x}) &= f(x_0 + {\bf x}_\varepsilon) \\
|
|
& = f(x_0) + f'(x_0){\bf x}_\varepsilon + \frac{f''(x_0)}{2!}{\bf x}_\varepsilon^2 + \frac{f'''(x_0)}{3!}{\bf x}_\varepsilon^3 + \cdots + \frac{f^{(N)}(x_0)}{N!}{\bf x}_\varepsilon^N + O\left(\varepsilon^{N+1}\right) \\
|
|
& = \sum_{i=0}^N\frac{f^{(i)}(x_0)}{i!}{\bf x}_\varepsilon^i + O\left(\varepsilon^{N+1}\right)
|
|
\end{align*}
|
|
where $\varepsilon$ has been substituted with ${\bf x}_\varepsilon$ in the $\varepsilon$-taylor series
|
|
for $f(x)$. This form gives a recipe for calculating $f({\bf x})$ in general from regular numeric calculations
|
|
$f(x_0)$, $f'(x_0)$, $f''(x_0)$, ... and successive powers of the epsilon terms ${\bf x}_\varepsilon$.
|
|
|
|
For an application in which we are interested in up to $N$ derivatives in $x$ the data structure to hold
|
|
this information is an $(N+1)$-element array {\tt v} whose general element is
|
|
|
|
\[ {\tt v[i]} = \frac{f^{(i)}(x_0)}{i!} \qquad \text{for}\; i\in\{0,1,2,...,N\}. \]
|
|
|
|
\subsection{Multiple Variables}
|
|
|
|
In C++, the generalization to mixed partial derivatives with multiple independent variables is conveniently achieved
|
|
with recursion. To begin to see the recursive pattern, consider a two-variable function $f(x,y)$. Since $x$
|
|
and $y$ are independent, they require their own independent epsilons $\varepsilon_x$ and $\varepsilon_y$,
|
|
respectively.
|
|
|
|
Expand $f(x,y)$ for $x=x_0+\varepsilon_x$:
|
|
\begin{align*}
|
|
f(x_0+\varepsilon_x,y) &= f(x_0,y)
|
|
+ \frac{\partial f}{\partial x}(x_0,y)\varepsilon_x
|
|
+ \frac{1}{2!}\frac{\partial^2 f}{\partial x^2}(x_0,y)\varepsilon_x^2
|
|
+ \frac{1}{3!}\frac{\partial^3 f}{\partial x^3}(x_0,y)\varepsilon_x^3
|
|
+ \cdots
|
|
+ \frac{1}{M!}\frac{\partial^M f}{\partial x^M}(x_0,y)\varepsilon_x^M
|
|
+ O\left(\varepsilon_x^{M+1}\right) \\
|
|
&= \sum_{i=0}^M\frac{1}{i!}\frac{\partial^i f}{\partial x^i}(x_0,y)\varepsilon_x^i + O\left(\varepsilon_x^{M+1}\right).
|
|
\end{align*}
|
|
Next, expand $f(x_0+\varepsilon_x,y)$ for $y=y_0+\varepsilon_y$:
|
|
|
|
\begin{align*}
|
|
f(x_0+\varepsilon_x,y_0+\varepsilon_y) &= \sum_{j=0}^N\frac{1}{j!}\frac{\partial^j}{\partial y^j}
|
|
\left(\sum_{i=0}^M\varepsilon_x^i\frac{1}{i!}\frac{\partial^if}{\partial x^i}\right)(x_0,y_0)\varepsilon_y^j
|
|
+ O\left(\varepsilon_x^{M+1}\right) + O\left(\varepsilon_y^{N+1}\right) \\
|
|
&= \sum_{i=0}^M\sum_{j=0}^N\frac{1}{i!j!}\frac{\partial^{i+j}f}{\partial x^i\partial y^j}(x_0,y_0)
|
|
\varepsilon_x^i\varepsilon_y^j + O\left(\varepsilon_x^{M+1}\right) + O\left(\varepsilon_y^{N+1}\right).
|
|
\end{align*}
|
|
|
|
Similar to the single-variable case, for an application in which we are interested in up to $M$ derivatives in
|
|
$x$ and $N$ derivatives in $y$, the data structure to hold this information is an $(M+1)\times(N+1)$
|
|
array {\tt v} whose element at $(i,j)$ is
|
|
|
|
\[
|
|
{\tt v[i][j]} = \frac{1}{i!j!}\frac{\partial^{i+j}f}{\partial x^i\partial y^j}(x_0,y_0)
|
|
\qquad \text{for}\; (i,j)\in\{0,1,2,...,M\}\times\{0,1,2,...,N\}.
|
|
\]
|
|
The generalization to additional independent variables follows the same pattern.
|
|
|
|
\subsubsection{Declaring Multiple Variables}
|
|
|
|
Internally, independent variables are represented by vectors within orthogonal vector spaces. Because of this,
|
|
one must be careful when declaring more than one independent variable so that they do not end up in
|
|
parallel vector spaces. This can easily be achieved by following one rule:
|
|
\begin{itemize}
|
|
\item When declaring more than one independent variable, call {\tt make\_ftuple<>()} once and only once.
|
|
\end{itemize}
|
|
The tuple of values returned are independent. Though it is possible to achieve the same result with multiple calls
|
|
to {\tt make\_fvar}, this is an easier and less error-prone method. See Section~\ref{multivar} for example usage.
|
|
|
|
%\section{Usage}
|
|
%
|
|
%\subsection{Single Variable}
|
|
%
|
|
%To calculate derivatives of a single variable $x$, at a particular value $x_0$, the following must be
|
|
%specified at compile-time:
|
|
%
|
|
%\begin{enumerate}
|
|
%\item The numeric data type {\tt T} of $x_0$. Examples: {\tt double},
|
|
% {\tt boost::multiprecision::cpp\_bin\_float\_50}, etc.
|
|
%\item The maximum derivative order $M$ that is to be calculated with respect to $x$.
|
|
%\end{enumerate}
|
|
%Note that both of these requirements are entirely analogous to declaring and using a {\tt std::array<T,N>}. {\tt T}
|
|
%and {\tt N} must be set as compile-time, but which elements in the array are accessed can be determined at run-time,
|
|
%just as the choice of what derivatives to query in autodiff can be made during run-time.
|
|
%
|
|
%To declare and initialize $x$:
|
|
%
|
|
%\begin{verbatim}
|
|
% using namespace boost::math::differentiation;
|
|
% autodiff_fvar<T,M> x = make_fvar<T,M>(x0);
|
|
%\end{verbatim}
|
|
%where {\tt x0} is a run-time value of type {\tt T}. Assuming {\tt 0 < M}, this represents the polynomial $x_0 +
|
|
%\varepsilon$. Internally, the member variable of type {\tt std::array<T,M>} is {\tt v = \{ x0, 1, 0, 0, ... \}},
|
|
%consistent with the above mathematical treatise.
|
|
%
|
|
%To find the derivatives $f^{(n)}(x_0)$ for $0\le n\le M$ of a function
|
|
%$f : \mathbb{R}\rightarrow\mathbb{R}$, the function can be represented as a template
|
|
%
|
|
%\begin{verbatim}
|
|
% template<typename T>
|
|
% T f(T x);
|
|
%\end{verbatim}
|
|
%Using a generic type {\tt T} allows for {\tt x} to be of a regular type such as {\tt double}, but also allows for\\
|
|
%{\tt boost::math::differentiation::autodiff\_fvar<>} types.
|
|
%
|
|
%Internal calls to mathematical functions must allow for
|
|
%\href{https://en.cppreference.com/w/cpp/language/adl}{argument-dependent lookup} (ADL). Many standard library functions
|
|
%are overloaded in the {\tt boost::math::differentiation} namespace. For example, instead of calling {\tt std::cos(x)}
|
|
%from within {\tt f}, include the line {\tt using std::cos;} and call {\tt cos(x)} without a namespace prefix.
|
|
%
|
|
%Calling $f$ and retrieving the calculated value and derivatives:
|
|
%
|
|
%\begin{verbatim}
|
|
% using namespace boost::math::differentiation;
|
|
% autodiff_fvar<T,M> x = make_fvar<T,M>(x0);
|
|
% autodiff_fvar<T,M> y = f(x);
|
|
% for (int n=0 ; n<=M ; ++n)
|
|
% std::cout << "y.derivative("<<n<<") == " << y.derivative(n) << std::endl;
|
|
%\end{verbatim}
|
|
%{\tt y.derivative(0)} returns the undifferentiated value $f(x_0)$, and {\tt y.derivative(n)} returns $f^{(n)}(x_0)$.
|
|
%Casting {\tt y} to type {\tt T} also gives the undifferentiated value. In other words, the following 3 values
|
|
%are equal:
|
|
%
|
|
%\begin{enumerate}
|
|
%\item {\tt f(x0)}
|
|
%\item {\tt y.derivative(0)}
|
|
%\item {\tt static\_cast<T>(y)}
|
|
%\end{enumerate}
|
|
%
|
|
%\subsection{Multiple Variables}
|
|
%
|
|
%Independent variables are represented in autodiff as independent dimensions within a multi-dimensional array.
|
|
%This is perhaps best illustrated with examples. The {\tt namespace boost::math::differentiation} is assumed.
|
|
%
|
|
%The following instantiates a variable of $x=13$ with up to 3 orders of derivatives:
|
|
%
|
|
%\begin{verbatim}
|
|
% autodiff_fvar<double,3> x = make_fvar<double,3>(13);
|
|
%\end{verbatim}
|
|
%This instantiates {\bf an independent} value of $y=14$ with up to 4 orders of derivatives:
|
|
%
|
|
%\begin{verbatim}
|
|
% autodiff_fvar<double,0,4> y = make_fvar<double,0,4>(14);
|
|
%\end{verbatim}
|
|
%Combining them together {\bf promotes} their data type automatically to the smallest multidimensional array that
|
|
%accommodates both.
|
|
%
|
|
%\begin{verbatim}
|
|
% // z is promoted to autodiff_fvar<double,3,4>
|
|
% auto z = 10*x*x + 50*x*y + 100*y*y;
|
|
%\end{verbatim}
|
|
%The object {\tt z} holds a 2-dimensional array, thus {\tt derivative(...)} is a 2-parameter method:
|
|
%
|
|
%\[
|
|
%{\tt z.derivative(i,j)} = \frac{\partial^{i+j}f}{\partial x^i\partial y^j}(13,14)
|
|
% \qquad \text{for}\; (i,j)\in\{0,1,2,3\}\times\{0,1,2,3,4\}.
|
|
%\]
|
|
%A few values of the result can be confirmed through inspection:
|
|
%
|
|
%\begin{verbatim}
|
|
% z.derivative(2,0) == 20
|
|
% z.derivative(1,1) == 50
|
|
% z.derivative(0,2) == 200
|
|
%\end{verbatim}
|
|
%Note how the position of the parameters in {\tt derivative(...)} match how {\tt x} and {\tt y} were declared.
|
|
%This will be clarified next.
|
|
%
|
|
%\subsubsection{Two Rules of Variable Initialization}
|
|
%
|
|
%In general, there are two rules to keep in mind when dealing with multiple variables:
|
|
%
|
|
%\begin{enumerate}
|
|
%\item Independent variables correspond to parameter position, in both the initialization {\tt make\_fvar<T,...>}
|
|
% and calls to {\tt derivative(...)}.
|
|
%\item The last template position in {\tt make\_fvar<T,...>} determines which variable a derivative will be
|
|
% taken with respect to.
|
|
%\end{enumerate}
|
|
%Both rules are illustrated with an example in which there are 3 independent variables $x,y,z$ and 1 dependent
|
|
%variable $w=f(x,y,z)$, though the following code readily generalizes to any number of independent variables, limited
|
|
%only by the C++ compiler/memory/platform. The maximum derivative order of each variable is {\tt Nx}, {\tt Ny}, and
|
|
%{\tt Nz}, respectively. Then the type for {\tt w} is {\tt boost::math::differentiation::autodiff\_fvar<T,Nx,Ny,Nz>}
|
|
%and all possible mixed partial derivatives are available via
|
|
%
|
|
%\[
|
|
%{\tt w.derivative(nx,ny,nz)} =
|
|
% \frac{\partial^{n_x+n_y+n_z}f}{\partial x^{n_x}\partial y^{n_y}\partial z^{n_z} }(x_0,y_0,z_0)
|
|
%\]
|
|
%for $(n_x,n_y,n_z)\in\{0,1,2,...,N_x\}\times\{0,1,2,...,N_y\}\times\{0,1,2,...,N_z\}$ where $x_0, y_0, z_0$ are
|
|
%the numerical values at which the function $f$ and its derivatives are evaluated.
|
|
%
|
|
%In code:
|
|
%\begin{verbatim}
|
|
% using namespace boost::math::differentiation;
|
|
%
|
|
% using var = autodiff_fvar<double,Nx,Ny,Nz>; // Nx, Ny, Nz are constexpr size_t.
|
|
%
|
|
% var x = make_fvar<double,Nx>(x0); // x0 is of type double
|
|
% var y = make_fvar<double,Nx,Ny>(y0); // y0 is of type double
|
|
% var z = make_fvar<double,Nx,Ny,Nz>(z0); // z0 is of type double
|
|
%
|
|
% var w = f(x,y,z);
|
|
%
|
|
% for (size_t nx=0 ; nx<=Nx ; ++nx)
|
|
% for (size_t ny=0 ; ny<=Ny ; ++ny)
|
|
% for (size_t nz=0 ; nz<=Nz ; ++nz)
|
|
% std::cout << "w.derivative("<<nx<<','<<ny<<','<<nz<<") == "
|
|
% << w.derivative(nx,ny,nz) << std::endl;
|
|
%\end{verbatim}
|
|
%Note how {\tt x}, {\tt y}, and {\tt z} are initialized: the last template parameter determines which variable
|
|
%$x, y,$ or $z$ a derivative is taken with respect to. In terms of the $\varepsilon$-polynomials
|
|
%above, this determines whether to add $\varepsilon_x, \varepsilon_y,$ or $\varepsilon_z$ to
|
|
%$x_0, y_0,$ or $z_0$, respectively.
|
|
%
|
|
%In contrast, the following initialization of {\tt x} would be INCORRECT:
|
|
%
|
|
%\begin{verbatim}
|
|
% var x = make_fvar<T,Nx,0>(x0); // WRONG
|
|
%\end{verbatim}
|
|
%Mathematically, this represents $x_0+\varepsilon_y$, since the last template parameter corresponds to the
|
|
%$y$ variable, and thus the resulting value will be invalid.
|
|
%
|
|
%\subsubsection{Type Promotion}
|
|
%
|
|
%The previous example can be optimized to save some unnecessary computation, by declaring smaller arrays,
|
|
%and relying on autodiff's automatic type-promotion:
|
|
%
|
|
%\begin{verbatim}
|
|
% using namespace boost::math::differentiation;
|
|
%
|
|
% autodiff_fvar<double,Nx> x = make_fvar<double,Nx>(x0);
|
|
% autodiff_fvar<double,0,Ny> y = make_fvar<double,0,Ny>(y0);
|
|
% autodiff_fvar<double,0,0,Nz> z = make_fvar<double,0,0,Nz>(z0);
|
|
%
|
|
% autodiff_fvar<double,Nx,Ny,Nz> w = f(x,y,z);
|
|
%
|
|
% for (size_t nx=0 ; nx<=Nx ; ++nx)
|
|
% for (size_t ny=0 ; ny<=Ny ; ++ny)
|
|
% for (size_t nz=0 ; nz<=Nz ; ++nz)
|
|
% std::cout << "w.derivative("<<nx<<','<<ny<<','<<nz<<") == "
|
|
% << w.derivative(nx,ny,nz) << std::endl;
|
|
%\end{verbatim}
|
|
%For example, if one of the first steps in the computation of $f$ was {\tt z*z}, then a significantly less number of
|
|
%multiplications and additions may occur if {\tt z} is declared as {\tt autodiff\_fvar<double,0,0,Nz>} as opposed to \\
|
|
%{\tt autodiff\_fvar<double,Nx,Ny,Nz>}. There is no loss of precision with the former, since the extra dimensions
|
|
%represent 0 values. Once {\tt z} is combined with {\tt x} and {\tt y} during the computation, the types will be
|
|
%promoted as necessary. This is the recommended way to initialize variables in autodiff.
|
|
|
|
\section{Writing Functions for Autodiff Compatibility}\label{compatibility}
|
|
|
|
In this section, a general procedure is given for writing new, and transforming existing, C++ mathematical
|
|
functions for compatibility with autodiff.
|
|
|
|
There are 3 categories of functions that require different strategies:
|
|
\begin{enumerate}
|
|
\item Piecewise-rational functions. These are simply piecewise quotients of polynomials. All that is needed is to
|
|
turn the function parameters and return value into generic (template) types. This will then allow the function
|
|
to accept and return autodiff's {\tt fvar} types, thereby using autodiff's overloaded arithmetic operators
|
|
which calculate the derivatives automatically.
|
|
\item Functions that call existing autodiff functions. This is the same as the previous, but may also include
|
|
calls to functions that are in the autodiff library. Examples: {\tt exp()}, {\tt log()}, {\tt tgamma()}, etc.
|
|
\item New functions for which the derivatives can be calculated. This is the most general technique, as it
|
|
allows for the development of a function which do not fall into the previous two categories.
|
|
\end{enumerate}
|
|
Functions written in any of these ways may then be added to the autodiff library.
|
|
|
|
\subsection{Piecewise-Rational Functions}
|
|
\[
|
|
f(x) = \frac{1}{1+x^2}
|
|
\]
|
|
By simply writing this as a template function, autodiff can calculate derivatives for it:
|
|
\begin{Verbatim}[xleftmargin=2em]
|
|
#include <boost/math/differentiation/autodiff.hpp>
|
|
#include <iostream>
|
|
|
|
template <typename T>
|
|
T rational(T const& x) {
|
|
return 1 / (1 + x * x);
|
|
}
|
|
|
|
int main() {
|
|
using namespace boost::math::differentiation;
|
|
auto const x = make_fvar<double, 10>(0);
|
|
auto const y = rational(x);
|
|
std::cout << std::setprecision(std::numeric_limits<double>::digits10)
|
|
<< "y.derivative(10) = " << y.derivative(10) << std::endl;
|
|
return 0;
|
|
}
|
|
/*
|
|
Output:
|
|
y.derivative(10) = -3628800
|
|
*/
|
|
\end{Verbatim}
|
|
As simple as $f(x)$ may seem, the derivatives can get increasingly complex as derivatives are taken.
|
|
For example, the \nth{10} derivative has the form
|
|
\[
|
|
f^{(10)}(x) = -3628800\frac{1 - 55x^2 + 330x^4 - 462x^6 + 165x^8 - 11x^{10}}{(1 + x^2)^{11}}.
|
|
\]
|
|
Derivatives of $f(x)$ are useful, and in fact used, in calculating higher order derivatives for $\arctan(x)$
|
|
for instance, since
|
|
\[
|
|
\arctan^{(n)}(x) = \left(\frac{d}{dx}\right)^{n-1} \frac{1}{1+x^2}\qquad\text{for}\quad 1\le n.
|
|
\]
|
|
|
|
\subsection{Functions That Call Existing Autodiff Functions}
|
|
|
|
Many of the standard library math function are overloaded in autodiff. It is recommended to use
|
|
\href{https://en.cppreference.com/w/cpp/language/adl}{argument-dependent lookup} (ADL) in order for functions to
|
|
be written in a way that is general enough to accommodate standard types ({\tt double}) as well as autodiff types
|
|
({\tt fvar}).
|
|
\\
|
|
Example:
|
|
\begin{Verbatim}[xleftmargin=2em]
|
|
#include <boost/math/constants/constants.hpp>
|
|
#include <cmath>
|
|
|
|
using namespace boost::math::constants;
|
|
|
|
// Standard normal cumulative distribution function
|
|
template <typename T>
|
|
T Phi(T const& x)
|
|
{
|
|
return 0.5 * std::erfc(-one_div_root_two<T>() * x);
|
|
}
|
|
\end{Verbatim}
|
|
Though {\tt Phi(x)} is general enough to handle the various fundamental floating point types, this will
|
|
not work if {\tt x} is an autodiff {\tt fvar} variable, since {\tt std::erfc} does not include a specialization
|
|
for {\tt fvar}. The recommended solution is to remove the namespace prefix {\tt std::} from {\tt erfc}:
|
|
\begin{Verbatim}[xleftmargin=2em]
|
|
#include <boost/math/constants/constants.hpp>
|
|
#include <boost/math/differentiation/autodiff.hpp>
|
|
#include <cmath>
|
|
|
|
using namespace boost::math::constants;
|
|
|
|
// Standard normal cumulative distribution function
|
|
template <typename T>
|
|
T Phi(T const& x)
|
|
{
|
|
using std::erfc;
|
|
return 0.5 * erfc(-one_div_root_two<T>() * x);
|
|
}
|
|
\end{Verbatim}
|
|
In this form, when {\tt x} is of type {\tt fvar}, the C++ compiler will search for and find a function {\tt erfc}
|
|
within the same namespace as {\tt fvar}, which is in the autodiff library, via ADL. Because of the using-declaration,
|
|
it will also call {\tt std::erfc} when {\tt x} is a fundamental type such as {\tt double}.
|
|
|
|
\subsection{New Functions For Which The Derivatives Can Be Calculated}\label{new_functions}
|
|
|
|
Mathematical functions which do not fall into the previous two categories can be constructed using autodiff helper
|
|
functions. This requires a separate function for calculating the derivatives. In case you are asking yourself what
|
|
good is an autodiff library if one needs to supply the derivatives, the answer is that the new function will fit
|
|
in with the rest of the autodiff library, thereby allowing for the creation of additional functions via all of
|
|
the arithmetic operators, plus function composition, which was not readily available without the library.
|
|
|
|
The example given here is for {\tt cos}:
|
|
\begin{Verbatim}[xleftmargin=2em]
|
|
template <typename RealType, size_t Order>
|
|
fvar<RealType, Order> cos(fvar<RealType, Order> const& cr) {
|
|
using std::cos;
|
|
using std::sin;
|
|
using root_type = typename fvar<RealType, Order>::root_type;
|
|
constexpr size_t order = fvar<RealType, Order>::order_sum;
|
|
root_type const d0 = cos(static_cast<root_type>(cr));
|
|
if constexpr (order == 0)
|
|
return fvar<RealType, Order>(d0);
|
|
else {
|
|
root_type const d1 = -sin(static_cast<root_type>(cr));
|
|
root_type const derivatives[4]{d0, d1, -d0, -d1};
|
|
return cr.apply_derivatives(order,
|
|
[&derivatives](size_t i) { return derivatives[i & 3]; });
|
|
}
|
|
}
|
|
\end{Verbatim}
|
|
This uses the helper function {\tt fvar::apply\_derivatives} which takes two parameters:
|
|
\begin{enumerate}
|
|
\item The highest order derivative to be calculated.
|
|
\item A function that maps derivative order to derivative value.
|
|
\end{enumerate}
|
|
The highest order derivative necessary to be calculated is generally equal to {\tt fvar::order\_sum}. In the case
|
|
of {\tt sin} and {\tt cos}, the derivatives are cyclical with period 4. Thus it is sufficient to store only these
|
|
4 values into an array, and take the derivative order modulo 4 as the index into this array.
|
|
|
|
A second helper function, not shown here, is {\tt apply\_coefficients}. This is used the same as
|
|
{\tt apply\_derivatives} except that the supplied function calculates coefficients instead of derivatives.
|
|
The relationship between a coefficient $C_n$ and derivative $D_n$ for derivative order $n$ is
|
|
\[
|
|
C_n = \frac{D_n}{n!}.
|
|
\]
|
|
Internally, {\tt fvar} holds coefficients rather than derivatives, so in case the coefficient values are more readily
|
|
available than the derivatives, it can save some unnecessary computation to use {\tt apply\_coefficients}.
|
|
See the definition of {\tt atan} for an example.
|
|
|
|
Both of these helper functions use Horner's method when calculating the resulting polynomial {\tt fvar}. This works
|
|
well when the derivatives are finite, but in cases where derivatives are infinite, this can quickly result in NaN
|
|
values as the computation progresses. In these cases, one can call non-Horner versions of both function which
|
|
better ``isolate'' infinite values so that they are less likely to evolve into NaN values.
|
|
|
|
The four helper functions available for constructing new autodiff functions from known coefficients/derivatives are:
|
|
\begin{enumerate}
|
|
\item {\tt fvar::apply\_coefficients}
|
|
\item {\tt fvar::apply\_coefficients\_nonhorner}
|
|
\item {\tt fvar::apply\_derivatives}
|
|
\item {\tt fvar::apply\_derivatives\_nonhorner}
|
|
\end{enumerate}
|
|
|
|
\section{Function Writing Guidelines}
|
|
|
|
At a high level there is one fairly simple principle, loosely and intuitively speaking, to writing functions for
|
|
which autodiff can effectively calculate derivatives: \\
|
|
|
|
{\bf Autodiff Function Principle (AFP)}
|
|
\begin{displayquote}
|
|
A function whose branches in logic correspond to piecewise analytic calculations over non-singleton intervals,
|
|
with smooth transitions between the intervals, and is free of indeterminate forms in the calculated value and
|
|
higher order derivatives, will work fine with autodiff.
|
|
\end{displayquote}
|
|
Stating this with greater mathematical rigor can be done. However what seems to be more practical, in this
|
|
case, is to give examples and categories of examples of what works, what doesn't, and how to remedy some of the
|
|
common problems that may be encountered. That is the approach taken here.
|
|
|
|
\subsection{Example 1: $f(x)=\max(0,x)$}
|
|
|
|
One potential implementation of $f(x)=\max(0,x)$ is:
|
|
|
|
\begin{verbatim}
|
|
template<typename T>
|
|
T f(const T& x)
|
|
{
|
|
return 0 < x ? x : 0;
|
|
}
|
|
\end{verbatim}
|
|
Though this is consistent with Section~\ref{compatibility}, there are two problems with it:
|
|
|
|
\begin{enumerate}
|
|
\item {\tt f(nan) = 0}. This problem is independent of autodiff, but is worth addressing anyway. If there is
|
|
an indeterminate form that arises within a calculation and is input into $f$, then it gets ``covered up'' by
|
|
this implementation leading to an unknowingly incorrect result. Better for functions in general to propagate
|
|
NaN values, so that the user knows something went wrong and doesn't rely on an incorrect result, and likewise
|
|
the developer can track down where the NaN originated from and remedy it.
|
|
\item $f'(0) = 0$ when autodiff is applied. This is because {\tt f} returns 0 as a constant when {\tt x==0}, wiping
|
|
out any of the derivatives (or sensitivities) that {\tt x} was holding as an autodiff variable. Instead, let us
|
|
apply the AFP and identify the two intervals over which $f$ is defined: $(-\infty,0]\cup(0,\infty)$.
|
|
Though the function itself is not analytic at $x=0$, we can attempt somewhat to smooth out this transition
|
|
point by averaging the calculation of $f(x)$ at $x=0$ from each interval. If $x<0$ then the result is simply
|
|
0, and if $0<x$ then the result is $x$. The average is $\frac{1}{2}(0 + x)$ which will allow autodiff to
|
|
calculate $f'(0)=\frac{1}{2}$. This is a more reasonable answer.
|
|
\end{enumerate}
|
|
A better implementation that resolves both issues is:
|
|
\begin{verbatim}
|
|
template<typename T>
|
|
T f(const T& x)
|
|
{
|
|
if (x < 0)
|
|
return 0;
|
|
else if (x == 0)
|
|
return 0.5*x;
|
|
else
|
|
return x;
|
|
}
|
|
\end{verbatim}
|
|
|
|
\subsection{Example 2: $f(x)=\sinc(x)$}
|
|
|
|
The definition of $\sinc:\mathbb{R}\rightarrow\mathbb{R}$ is
|
|
|
|
\[
|
|
\sinc(x) = \begin{cases}
|
|
1 &\text{if}\; x = 0 \\
|
|
\frac{\sin(x)}{x} &\text{otherwise.}\end{cases}
|
|
\]
|
|
A potential implementation is:
|
|
|
|
\begin{verbatim}
|
|
template<typename T>
|
|
T sinc(const T& x)
|
|
{
|
|
using std::sin;
|
|
return x == 0 ? 1 : sin(x) / x;
|
|
}
|
|
\end{verbatim}
|
|
Though this is again consistent with Section~\ref{compatibility}, and returns correct non-derivative values,
|
|
it returns a constant when {\tt x==0} thereby losing all derivative information contained in {\tt x} and
|
|
contributions from $\sinc$. For example, $\sinc''(0)=-\frac{1}{3}$, however {\tt y.derivative(2) == 0} when
|
|
{\tt y = sinc(make\_fvar<double,2>(0))} using the above incorrect implementation. Applying the AFP, the intervals
|
|
upon which separate branches of logic are applied are $(-\infty,0)\cup[0,0]\cup(0,\infty)$. The violation occurs
|
|
due to the singleton interval $[0,0]$, even though a constant function of 1 is technically analytic. The remedy
|
|
is to define a custom $\sinc$ overload and add it to the autodiff library. This has been done. Mathematically, it
|
|
is well-defined and free of indeterminate forms, as is the \nth{3} expression in the equalities
|
|
\[
|
|
\frac{1}{x}\sin(x) = \frac{1}{x}\sum_{n=0}^\infty\frac{(-1)^n}{(2n+1)!}x^{2n+1}
|
|
= \sum_{n=0}^\infty\frac{(-1)^n}{(2n+1)!}x^{2n}.
|
|
\]
|
|
The autodiff library contains helper functions to help write function overloads when the derivatives of a function
|
|
are known. This is an advanced feature and documentation for this may be added at a later time.
|
|
|
|
For now, it is worth understanding the ways in which indeterminate forms can occur within a mathematical calculation,
|
|
and avoid them when possible by rewriting the function. Table~\ref{3nans} compares 3 types of indeterminate
|
|
forms. Assume the product {\tt a*b} is a positive finite value.
|
|
|
|
\begin{table}[h]
|
|
\centering\begin{tabular}{m{7em}||c|c|c}
|
|
& $\displaystyle f(x)=\left(\frac{a}{x}\right)\times(bx^2)$
|
|
& $\displaystyle g(x)=\left(\frac{a}{x}\right)\times(bx)$
|
|
& $\displaystyle h(x)=\left(\frac{a}{x^2}\right)\times(bx)$ \\[0.618em]
|
|
\hline\hline
|
|
Mathematical\newline Limit
|
|
& $\displaystyle\lim_{x\rightarrow0}f(x) = 0$
|
|
& $\displaystyle\lim_{x\rightarrow0}g(x) = ab$
|
|
& $\displaystyle\lim_{x\rightarrow0}h(x) = \infty$ \\
|
|
\hline
|
|
Floating Point\newline Arithmetic
|
|
& {\tt f(0) = inf*0 = nan} & {\tt g(0) = inf*0 = nan} & {\tt h(0) = inf*0 = nan}
|
|
\end{tabular}
|
|
\caption{Automatic differentiation does not compute limits.
|
|
Indeterminate forms must be simplified manually. (These cases are not meant to be exhaustive.)}\label{3nans}
|
|
\end{table}
|
|
|
|
Indeterminate forms result in NaN values within a calculation. Mathematically, if they occur at locally isolated
|
|
points, then we generally prefer the mathematical limit as the result, even if it is infinite. As demonstrated in
|
|
Table~\ref{3nans}, depending upon the nature of the indeterminate form, the mathematical limit can be 0 (no matter
|
|
the values of $a$ or $b$), or $ab$, or $\infty$, but these 3 cases cannot be distinguished by the floating point
|
|
result of nan. Floating point arithmetic does not perform limits (directly), and neither does the autodiff library.
|
|
Thus it is up to the diligence of the developer to keep a watchful eye over where indeterminate forms can arise.
|
|
|
|
\subsection{Example 3: $f(x)=\sqrt x$ and $f'(0)=\infty$}
|
|
|
|
When working with functions that have infinite higher order derivatives, this can very quickly result in nans in
|
|
higher order derivatives as the computation progresses, as {\tt inf-inf}, {\tt inf/inf}, and {\tt 0*inf} result
|
|
in {\tt nan}. See Table~\ref{sqrtnan} for an example.
|
|
|
|
\begin{table}[h]
|
|
\centering\begin{tabular}{c||c|c|c|c}
|
|
$f(x)$ & $f(0)$ & $f'(0)$ & $f''(0)$ & $f'''(0)$ \\
|
|
\hline\hline
|
|
{\tt sqrt(x)} & {\tt 0} & {\tt inf} & {\tt -inf} & {\tt inf} \\
|
|
\hline
|
|
{\tt sqr(sqrt(x)+1)} & {\tt 1} & {\tt inf} & {\tt nan} & {\tt nan} \\
|
|
\hline
|
|
{\tt x+2*sqrt(x)+1} & {\tt 1} & {\tt inf} & {\tt -inf}& {\tt inf}
|
|
\end{tabular}
|
|
\caption{Indeterminate forms in higher order derivatives. {\tt sqr(x) == x*x}.}\label{sqrtnan}
|
|
\end{table}
|
|
|
|
Calling the autodiff-overloaded implementation of $f(x)=\sqrt x$ at the value {\tt x==0} results in the
|
|
\nth{1} row (after the header row) of Table~\ref{sqrtnan}, as is mathematically correct. The \nth{2} row shows
|
|
$f(x)=(\sqrt{x}+1)^2$ resulting in {\tt nan} values for $f''(0)$ and all higher order derivatives. This is due to
|
|
the internal arithmetic in which {\tt inf} is added to {\tt -inf} during the squaring, resulting in a {\tt nan}
|
|
value for $f''(0)$ and all higher orders. This is typical of {\tt inf} values in autodiff. Where they show up,
|
|
they are correct, however they can quickly introduce {\tt nan} values into the computation upon the addition of
|
|
oppositely signed {\tt inf} values, division by {\tt inf}, or multiplication by {\tt 0}. It is worth noting that
|
|
the infection of {\tt nan} only spreads upward in the order of derivatives, since lower orders do not depend upon
|
|
higher orders (which is also why dropping higher order terms in an autodiff computation does not result in any
|
|
loss of precision for lower order terms.)
|
|
|
|
The resolution in this case is to manually perform the squaring in the computation, replacing the \nth{2} row
|
|
with the \nth{3}: $f(x)=x+2\sqrt{x}+1$. Though mathematically equivalent, it allows autodiff to avoid {\tt nan}
|
|
values since $\sqrt x$ is more ``isolated'' in the computation. That is, the {\tt inf} values that unavoidably
|
|
show up in the derivatives of {\tt sqrt(x)} for {\tt x==0} do not have the chance to interact with other {\tt inf}
|
|
values as with the squaring.
|
|
|
|
\subsection{Summary}
|
|
|
|
The AFP gives a high-level unified guiding principle for writing C++ template functions that autodiff can
|
|
effectively evaluate derivatives for.
|
|
|
|
Examples have been given to illustrate some common items to avoid doing:
|
|
|
|
\begin{enumerate}
|
|
\item It is not enough for functions to be piecewise continuous. On boundary points between intervals, consider
|
|
returning the average expression of both intervals, rather than just one of them. Example: $\max(0,x)$ at $x=0$.
|
|
In cases where the limits from both sides must match, and they do not, then {\tt nan} may be a more appropriate
|
|
value depending on the application.
|
|
\item Avoid returning individual constant values (e.g. $\sinc(0)=1$.) Values must be computed uniformly along
|
|
with other values in its local interval. If that is not possible, then the function must be overloaded to
|
|
compute the derivatives precisely using the helper functions from Section~\ref{new_functions}.
|
|
\item Avoid intermediate indeterminate values in both the value ($\sinc(x)$ at $x=0$) and derivatives
|
|
($(\sqrt{x}+1)^2$ at $x=0$). Try to isolate expressions that may contain infinite values/derivatives so
|
|
that they do not introduce NaN values into the computation.
|
|
\end{enumerate}
|
|
|
|
\section{Acknowledgments}
|
|
|
|
\begin{itemize}
|
|
\item Kedar Bhat --- C++11 compatibility, Boost Special Functions compatibility testing, codecov integration,
|
|
and feedback.
|
|
\item Nick Thompson --- Initial feedback and help with Boost integration.
|
|
\item John Maddock --- Initial feedback and help with Boost integration.
|
|
\end{itemize}
|
|
|
|
\begin{thebibliography}{1}
|
|
\bibitem{ad} \url{https://en.wikipedia.org/wiki/Automatic\_differentiation}
|
|
\bibitem{ed} Andreas Griewank, Andrea Walther. \textit{Evaluating Derivatives}. SIAM, 2nd ed. 2008.
|
|
\end{thebibliography}
|
|
|
|
\end{document}
|