Flexible Automatic differentiation using templates and operator overloading
in ANSI C++
Content:
Authors:
FADBAD++ were developed by Ole Stauning and Claus Bendtsen. Any comments,
questions, flames etc. should be sent to the authors <fadbad@uning.dk>. We are also very interested in hearing if you have
used FADBAD++ in your own application(s).
Copyright:
FADBAD++ is copyright © 1996-2003 Claus Bendtsen and Ole Stauning.
You are free to use FADBAD++ for non-commercial purposes. You should contact
the authors if you wish to use FADBAD++ in a commercial product. See also
the copyright message which is distributed with the source code.
General introduction to automatic differentiation
and FADBAD++:
The importance of differentiation as a mathematical tool
is obvious. One of the first things we learn in elementary school is
how to manually differentiate expressions using a few elementary formulas.
Unfortunately the use of derivatives in scientific computing has been
quite limited due to the misunderstanding that derivatives are hard to
obtain. Many people still think that the only alternative to the symbolic
way of obtaining derivatives is to use divided differences in which the
difficulties in finding an expression for the derivatives are avoided.
But by using divided differences, truncation errors are introduced and
this usually has a negative effect on further computations - in fact it
can lead to very inaccurate results.
The use of a symbolic differentiation package such as Maple or
Mathematica can solve the problem of obtaining expressions for the derivatives.
This method obviously avoids truncation errors but usually these packages
have problems in handling large expressions and the time/space usage
for computing derivatives can be enormous. In worst case it can even
cause a program to crash. Furthermore, common sub expressions are usually
not identified in the expressions and this leads to unnecessary computations
during the evaluation of the derivatives.
Automatic differentiation is an alternative to the above methods.
Here derivatives are computed by using the very well known chain rule
for composite functions, in a clever way. In automatic differentiation
the evaluation of a function and its derivatives are calculated simultaneously,
using the same code and common temporary values. If the code for the
evaluation is optimised, then the computation of the derivatives will
automatically be optimised. The resulting differentiation is free from
truncation errors, and if we calculate the derivatives using interval
analysis we will obtain enclosures of the true derivatives. Automatic
differentiation is easy to implement in languages with operator overloading
such as C++, Ada and PASCAL-XSC.
FADBAD++ contains templates for performing automatic differentiation
on functions implemented in C++ code. If the source code of a program,
which is an implementation of a differentiable function, is available,
then FADBAD++ can be applied to obtain the derivatives of this function.
To apply automatic differentiation on a program, the arithmetic type used
in the program (normally double) is changed to an overloading type
( normally F<double>, B<double> or T<double>
). Since the overloading type behaves just like the usual arithmetic
type and the functionality therefore is kept, the program, which performs
the function evaluation, should not be changed in any other way. When calling
the program, the user should specify what output variables, returned from
the program, to differentiate, and what input variables to differentiate
with respect to. Since a program using automatic differentiation, itself
can be differentiated using FADBAD++; it is possible to obtain higher order
derivatives by applying multiple layers of automatic differentiation. Any
type of combinations of the automatic differentiation methods that are implemented
in FADBAD++ is possible. This way derivatives can be obtained in several
ways, making it possible to optimise with respect to the time and space used
in the process.
Examples of applications:
Forward automatic differentiation on a function f : R^{n}->
R^{n}, evaluated in interval arithmetic, using the BIAS/PROFIL package,
to obtain function values and derivatives. Used with the interval Newton
method to obtain guaranteed enclosures of all solutions to the non-linear
equation f(x)=0.
Forward-Backward automatic differentiation on a function f : R^{n}->R,
evaluated in interval arithmetic, using the BIAS/PROFIL package and the
backward method to obtain first order derivatives (the gradient) and the
forward method to differentiate the first order derivatives to obtain
the second order derivatives (the Hessian). Used with the interval Krawczyk
method to perform global optimisation obtaining a guaranteed enclosure of
the global minimum.
Numerical integration of a function f : RxR^{n}->R, using
a three-point-two-derivative formula. The Backward method has been used
to differentiate the Numerical integration program obtaining the n partial
derivatives of the integral with respect to the n parameters in the function.
Taylor expansion of the solution to an ordinary differential equation,
used to solve initial value problems. The Forward method has been used
to differentiate the initial value problem solver to obtain the solution
of the variational problem.
Taylor expansion of the solution to an ordinary differential equation
using interval arithmetic and using the Forward method to obtain derivatives
of the Taylor coefficients with respect to the point of expansion, which
are the values of the Taylor coefficients for the solution of the variational
problem. Used in a method that solves initial value problems with guaranteed
enclosures.
Crash course in using FADBAD++:
FADBAD++ works by overloading all elementary arithmetic operations to include
calculation of derivatives. To enable these overloaded operations you have
to change the arithmetic type (normally double) to the appropriate
AD-type. The AD-types are defined by the three templates F<> for forward
automatic differentiation, B<> for backward automatic differentiation
and T<> for Taylor expansion.
Forward automatic differentiation
Lets see what that looks like when using forward automatic differentiation.
Assume that we have a function, func, that we want to differentiate.
double func(const double& x, const double& y)
{
double z=sqrt(x);
return y*z+sin(z);
}
Do a search and replace on "double" with "F<double>". Our code now
looks like:
F<double> func(const F<double>& x, const F<double>& y)
{
F<double> z=sqrt(x);
return y*z+sin(z);
}
We have to include the file "fadiff.h" to define the F<> template.
Our function is now prepared for computing derivatives. Before we call
the function we have to specify the variables we want to differentiate with
respect to. After the call we obtain the function value and the derivatives.
This can be done with the following code:
F<double> x,y,f; // Declare variables x,y,f
x=1; // Initialize variable x
x.diff(0,2); // Differentiate with respect to x (index 0 of 2)
y=2; // Initialize variable y
y.diff(1,2); // Differentiate with respect to y (index 1 of 2)
f=func(x,y); // Evaluate function and derivatives
double fval=f.x(); // Value of function
double dfdx=f.d(0); // Value of df/dx (index 0 of 2)
double dfdy=f.d(1); // Value of df/dy (index 1 of 2)
cout << "f(x,y)=" << fval << endl;
cout << "df/dx(x,y)=" << dfdx << endl;
cout << "df/dy(x,y)=" << dfdy << endl;
Notice that the method diff(i,n) is
called on the independent variables before the function call.
Where n is the total number of independent variables
that we want to differentiate with respect to and i denotes the index
of the variable i=0...n-1. After
the function call the actual function value is obtained from the dependent
variable(s) with the x() method and the derivatives by the d(i) method,
where i=0...n-1 is the index of the independent variable.
Backward automatic differentiation
Now, lets wee what this looks like when using backward automatic differentiation.
Given the function func evaluated in doubles. We do a search and replace
on "double" with "B<double>". Now the code looks like:
B<double> func(const B<double>& x, const B<double>& y)
{
B<double> z=sqrt(x);
return y*z+sin(z);
}
The file "badiff.h" have to be included for defining the B<> template.
The code is now prepared for computing derivatives. Actually the function
diff will now "record" a directed acyclic graph (DAG) representing the function,
while computing the function value. After the function call we specify what
variable we want to differentiate. When this has been done on all dependent
variables we can obtain the derivatives. This is done with the following
code:
B<double> x,y,f; // Declare variables x,y,f
x=1; // Initialize variable x
y=2; // Initialize variable y
f=func(x,y); // Evaluate function and record DAG
f.diff(0,1); // Differentiate f (index 0 of 1)
double fval=f.x(); // Value of function
double dfdx=x.d(0); // Value of df/dx (index 0 of 1)
double dfdy=y.d(0); // Value of df/dy (index 0 of 1)
cout << "f(x,y)=" << fval << endl;
cout << "df/dx(x,y)=" << dfdx << endl;
cout << "df/dy(x,y)=" << dfdy << endl;
Notice that this time the method diff(i,m) is
called on the dependent variable(s) after the function-call.
Where m is the total number of dependent variables we want to differentiate
and i denotes the index of the variable i=0..m-1. The function value is obtained
by the x() method on the dependent variables, while the derivaties are obtained
by calling the d(i) on the independent variables, where i=0...m-1
is the index of the dependent variable.
The DAG is deallocated automatically when the derivatives are being computed
backwards through the DAG.
It is very important that
diff(i,m)
is called on all dependent
variables, i.e. variables that are dependent on the variables we want to differentiate
with respect to. If this is not done then the derivatives are not propagated correctly
into the independent variables, causing the derivatives to be wrong when obtaining them
by calling d(i).
A way to make a variable independent is to assign a constant to it.
E.g. f=0; in the above example. Another way to avoid temporary dependent variables
is to let them go out of scope by using {...} or by defining a functions that
encapsulates the evaluation and only returns the dependent variables of
interest.
Simple Taylor expansion
Lets asume we want to Taylor expand the function func with respect to the
variable x. We now do a search and replace on "double" with "T<double>"
and get:
T<double> func(const T<double>& x, const T<double>& y)
{
T<double> z=sqrt(x);
return y*z+sin(z);
}
Where the template T<> is defined in the file "tadiff.h". The function
will now "record" a directed acyclic graph (DAG) while computing the function
value (which is the 0'th order Taylor-coefficient). This DAG can then be
used to find the Taylor coefficients. This is done in the following code:
T<double> x,y,f; // Declare variables x,y,f
x=1; // Initialize variable x
y=2; // Initialize variable y
x[1]=1; // Taylor-expand wrt. x (dx/dx=1)
f=func(x,y); // Evaluate function and record DAG
double fval=f[0]; // Value of function
f.eval(10); // Taylor-expand f to degree 10
// f[0]...f[10] now contains the Taylor-coefficients.
cout << "f(x,y)=" << fval << endl;
for(int i=0;i<=10;i++)
{
double c=f[i];// The i'th taylor coefficient
cout << "(1/k!)*(d^"<<i<<"f/dx^" << i << ")=" << c << endl;
}
We manually initialise the 1'st order Taylor coefficient of x to the value
1 to denote the independent variable in which we want to expand the
function. We can now expand the function by calling the eval(k) method on
the dependent variable(s), where k is the order in which we want
to expand the function.
The DAG is not de-allocated automatically (like in the backward method)
when the Taylor coefficients have been computed by the eval(k) method. The
DAG will instead hold all Taylor coefficients of degree 0...k of all intermediate
variables for calculating the function. If we call eval(l), where l>k,
the Taylor coefficients that have already been calculated will be reused
and only the coefficients (k+1)..l will be calculated. This is useful when
Taylor expanding the solution to an ordinary differential equation, which
we will see later. The coefficients in the DAG can be deleted by calling
the method reset() on the dependent variable(s). This is done in the following
code:
f.reset(); // Reset the values in the DAG
x[0]=3; // New value for x
y[0]=4; // New value for y
y[1]=1; // Taylor-expand wrt. y (dy/dy=1)
f.eval(10); // Taylor-expand f to degree 10
// f[0]...f[10] now contains the Taylor-coefficients.
for(int i=0;i<=10;i++)
{
double c=f[i];// The i'th taylor coefficient
cout << "(1/k!)*(d^"<<i<<"f/dy^" << i << ")=" << c << endl;
}
After the coefficients in the DAG has been deleted we can re-initialise
the values of x and y (0'th order coefficients) and now specify that we want
to expand with respect to the variable y.
Finally the DAG will be de-allocated when the independent variables is
de-allocated.
Taylor expanding the solution of an ODE
The feature that the coefficients in the DAG is not de-allocated can be
used for Taylor expanding the solution of an ODE: x'=f(x). Assume that we
want to expand the solution of x'=cos(x). We can encapsulate the DAG of the
function in an object by defining a class:
class TODE
{
public:
T<double> x; // Independent variables
T<double> xp; // Dependent variables
TODE(){xp=cos(x);} // record DAG at construction
};
Objects of this class will have a DAG representing the ODE. The Taylor
coefficients can then be computed one order at a time with the following
code:
TODE ode; // Construct ODE:
ode.x[0]=1; // Set point of expansion:
for(int i=0;i<10;i++)
{
ode.xp.eval(i); // Evaluate i'th Taylor coefficient
ode.x[i+1]=ode.xp[i]/double(i+1);// Use dx/dt=ode(x).
}
// ode.x[0]...ode.x[10] now contains the Taylor-coefficients
// of the solution of the ODE.
// Print out the Taylor coefficients for the solution
// of the ODE:
for(int i=0;i<=10;i++)
{
cout << "x[" << i << "]=" << ode.x[i] << endl;
}
We see that the ODE links the i'th coefficient of the dependent variable
to the (i+1)'th coefficient of the independent variable. This dependency
is specified explicitly in the loop.
Combinations of automatic differentiation
One of the very unique things of FADBAD++ is the ability to compute high
order derivatives in a very flexible way by combining the methods of automatic
differentiation. These combinations are produced by applying the templates
on themselves. For example the type B< F< double > > can be used
in optimisation for computing first order derivatives by using the backward
method and second order derivatives by using a backward-forward method. The
combination T< F<double> > can be used to Taylor expand the solution
of an ODE, while computing derivatives of the coefficients with respect to
the point of expansion. The derivatives are then by definition the Taylor
coefficients to the solution of the variational equation. All these possibilities
are described in more detail in the documents which can be downloaded from
this site and in the examples that comes with the source code of FADBAD++.
Avoiding search/replace by using functors
One way to avoid searching and replacing of types when applying automatic
differentiation is to prepare the code for automatic differentiation. This
preparation can be done by using functors:
class Func
{
public:
template <class T>
T operator()(const T& x, const T& y)
{
T z=sqrt(x);
return y*z+sin(z);
}
};
Objects of class Func can now be used to evaluate the function with different
arithmetic types, which are determined at compile-time. We can even create
forward- and backward differentiating functors that takes a functor-class
as template arguments and differentiates it:
template <class C>
class FDiff
{
public:
template <class T>
T operator()(
T& o_dfdx, T& o_dfdy,
const T& i_x, const T& i_y)
{
F<T> x(i_x),y(i_y); // Initialize arguments
x.diff(0,2); // Differentiate wrt. x
y.diff(1,2); // Differentiate wrt. y
C func; // Instantiate functor
F<T> f(func(x,y)); // Evaluate function and record DAG
o_dfdx=f.d(0); // Value of df/dx
o_dfdy=f.d(1); // Value of df/dy
return f.x(); // Return function value
}
};
template <class C>
class BDiff
{
public:
template <class T>
T operator()(
T& o_dfdx, T& o_dfdy,
const T& i_x, const T& i_y)
{
B<T> x(i_x),y(i_y); // Initialize arguments
C func; // Instantiate functor
B<T> f(func(x,y)); // Evaluate function and record DAG
f.diff(0,1); // Differentiate
o_dfdx=x.d(0); // Value of df/dx
o_dfdy=y.d(0); // Value of df/dy
return f.x(); // Return function value
}
};
The differentiating functors are used in the following code:
double x,y,f,dfdx,dfdy; // Declare variables
x=1; // Initialize variable x
y=2; // Initialize variable y
FDiff<Func> FFunc; // Functor for function and derivatives
f=FFunc(dfdx,dfdy,x,y); // Evaluate function and derivatives
cout << "f(x,y)=" << f << endl;
cout << "df/dx(x,y)=" << dfdx << endl;
cout << "df/dy(x,y)=" << dfdy << endl;
BDiff<Func> BFunc; // Functor for function and derivatives
f=BFunc(dfdx,dfdy,x,y); // Evaluate function and derivatives
cout << "f(x,y)=" << f << endl;
cout << "df/dx(x,y)=" << dfdx << endl;
cout << "df/dy(x,y)=" << dfdy << endl;
It is possible to make completely generic differentiating functors that
differentiates any function f:R^{n}->R^{m}, by using STL
vectors as input and output variables. See the test-code in FADBAD++.
Using FADBAD++ with other types than double
It is possible to change the underlying arithmetic type for all function
and derivative calculations - also called the BaseType. Normally the underlying
arithmetic type is doubles, but automatic differentiation is also very useful
in the field of interval arithmetic. E.g. to obtain narrow enclosures by
using the mean value form or in global optimization.
The templates that are defined in FADBAD++ requre that some standard operations
are defined for the underlying type: comparison operators, simple operations
"+", "-" "*", "/" and simple functions "sin", "cos", "exp", etc.
One of the main ideas of FADBAD++ is the ability of changing the underlying
arithmetic type from double to e.g. intervals. This is done by defining BaseType
to the name of the underlying type, before including the FADBAD++ headerfiles.
If the functions for the BaseType is named in a non-standard way, e.g. "cos"
are named "Cos", then these names has to be redefined, before including the
FADBAD++ headerfiles. For example to use FADBAD++ with the interval type
that are provided by Profil/BIAS we have to include the FADBAD++ headerfiles
with the following code:
#define pow Power
#define sqr Sqr
#define exp Exp
#define log Log
#define sqrt Sqrt
#define sin Sin
#define cos Cos
#define tan Tan
#define asin ArcSin
#define acos ArcCos
#define atan ArcTan
#define BaseType INTERVAL
#include "tadiff.h"
#include "fadiff.h"
#undef BaseType
#undef pow
#undef exp
#undef log
#undef sqrt
#undef sin
#undef cos
#undef tan
#undef asin
#undef acos
#undef atan
It is possible to configure FADBAD++ even more. See the headerfile "fadbad.h"
for details.
Download source code and documentation for FADBAD++
( and FADBAD-TADIFF ):
Notice that the packages FADBAD-TADIFF are old packages, while FADBAD++
is the new package. The old packages require you to run a configure script
in a unix shell. FADBAD++ can be used right away - without any configuring.
FADBAD++ has been tested with GCC 3.2 on Linux, Visual C++ 6.0 and .NET
on Windows.
- Get the sourcecode for FADBAD++
(as zip).
- Get the sourcecode for FADBAD (as
zip) and
the technical
report (as pdf), "FADBAD,
a Flexible C++ Package for Automatic Differentiation, using the forward and
backward methods".
- Get the sourcecode for TADIFF (as
zip) and
the technical
report (as pdf), "TADIFF,
a Flexible C++ Package for Automatic Differentiation, using Taylor series
expansion".
Work related to FADBAD++ (FADBAD-TADIFF):
- Ole Stauning. "Flexible Automatic Differentiation using Templates
and Operator Overloading in C++". Talk presented at the Automatic Differentiation
Workshop at Shrivenham Campus, Cranfield University, June 6, 2003. Download
PDF.
- N. S. Nedialkov and K. R. Jackson. "The Design
and Implementation of an Object-Oriented Validated ODE Solver".
- N. S. Nedialkov. VNODE,
A Validated Solver for Initial Value Problems for Ordinary Differential Equations.
- Ole Stauning, "ADIODES: Automatic Differentiation Interval Ordinary
Differential Equation Solver". The C++ program package for enclosing solutions
of ordinary differential equations which is descibed in the above Ph.D.-thesis.
Download
sourcecode (as zip).
- Ole Stauning, "ADIODES, a Self-Validating ODE Solver". Slides
for a talk presented at the SIAM Annual Meeting, University of Toronto,
July 15, 1998. Download PS, PDF.
- Ole
Stauning, "Automatic Validation of Numerical Solutions". Ph.D.-thesis,
September 1997. Contains some applications of FADBAD-TADIFF. Download PS,
PDF.
- Ole Stauning. "Introduction to FADBAD, a C++ Program package for
Automatic Differentiation". Slides for a talk presented in the course "C0202
-- Topics in Numerical Analysis", March 11, 1997, at the Technical University
of Denmark. Download PS, PDF.
- Ole Stauning. "Automatic Differentiation in Theory and Practice".
Slides for a talk presented Oct 24, 1996, at the Technical University of
Denmark. Download PS, PDF.
- Ole Stauning. "Obtaining 2.nd Order Derivatives Using a Mixed Forward-
and Backward Automatic Differentiation Strategy for use in Interval Optimization".
Poster presented at INTERVAL'96 Sep 30 - Oct 2, 1996 Würzburg University,
Germany. Download PS, PDF.
Other Automatic differentiation Sites:
You were visitor number
since '97. This page was last modified by Ole Stauning, Feb. 3, 2005. The counter is sponsored by
www.digits.com.