Automatic Differentiation (AD) is a computational technique that efficiently calculates exact derivatives of functions implemented as computer programs. Unlike numerical differentiation, which approximates derivatives using finite differences, or symbolic differentiation, which manipulates mathematical expressions, AD breaks down complex functions into elementary operations with known derivatives and applies the chain rule systematically. This enables accurate gradient computation for machine learning, optimization problems, and scientific computing while avoiding the numerical errors of finite differences and the expression explosion of symbolic methods.
The following four-step process is performed to calculate the derivative
- Apply recursive descent parsing to construct an abstract syntax tree (AST) from the input string.
- Convert abstract syntax tree to computation graph.
- Perform forward pass through the computation graph with the given inputs. Deposit intermediate values in their respective nodes.
- Perform backward pass through the computation graph. Accumulating local gradients using the chain rule.
The following figure illustrates the computation graph, forward, and backward passes of the function:
Image found here.
// Include dependencies
#include "ComputationGraph.h"
int main() {
// Expression to analyze
ComputationGraph cg("sigmoid(w1 * x1 + w2 * x2 + b)");
// Initialize parameters
cg.setValue("w1", 0.3);
cg.setValue("w2", -0.12);
cg.setValue("b", 0.2);
// Initialize inputs
cg.setValue("x1", 0.5);
cg.setValue("x2", 0.8);
// Forward pass; compute value of expression
double output = cg.forward();
std::cout << "Output: " << output << '\n';
double grad_w1 = cg.backward("w1");
double grad_w2 = cg.backward("w2");
double grad_b = cg.backward("b");
std::cout << "Grad_w1: " << grad_w1 << '\n';
std::cout << "Grad_w2: " << grad_w2 << '\n';
std::cout << "Grad_b: " << grad_b << '\n';
return 0;
}
To compile and run the examples:
g++ Examples.cpp ComputationGraph.cpp Parser.cpp -o examples
./examples
Example 1: Machine learning flavored. Simple binary classifier of a single feature.
Example 2: Physics flavored. Kinetic energy of a point mass experiencing 2D motion.
Example 3: Statistics flavored. Probability density function of a Normal random variable with mean and variance of 1.
I plan on implementing support for multi-dimensional array/tensor operations, additional built-in composite function nodes, and optimizers. These changes will make the project a full-fledged deep learning framework.
This work was inspired by the course material discussed in CS8803: Differential & Probabilistic Programming at Georgia Institute of Technology ❤️.