Automatic (Stochastic) Gradient Method (SGM) is a framework for stochastic gradient learning that unifies (Polyak's Heavy Ball (PHB), Nesterov's Accelerated Gradient (NAG), Adaptive Moment Estimation (Adam)) used in deep learning.
Learning is seen as an interconnection between a gradient-generating system like an artificial neural network (a well-defined differentiable function) with the SGM learning system or control function.
This suggests that there is only one (stochastic) gradient method (SGM), with different approaches or metrics to both setting-up the learning rate
This repo. contains implementation(s) of AutoSGM:
Expected input
output
- a time-integration
$\mathbb{I}_{t, \alpha_t}$ component, controlled by a proportional learning rate parameter$\alpha_t$ . - a lowpass smoothing component
$\mathbb{E}_{t, \beta}$ regularizing the gradient generating system, with a lowpass parameter$\beta$ , at the input where$\beta:= \beta_i$ and the output where$\beta := \beta_o$ .
It explains observed acceleration in the SGM as the consequence of lowpass smoothing. This framework leads to many implementations, as seen in the deep learning literature. It makes sense of the many variants in use today.
It also allows to derive an optimal choice of learning rate. Adam can be seen as one approximation of this optimal choice (normalized gradients).
Code is entirely in Python, using PyTorch.
Download or clone locally with git
.
>> git clone https://github.com/somefunagba/autosgm.git
Assume this repository was directly git cloned to the root path of your project.
from opts.autosgml import AutoSGM
This loads an AutoSGM implementation.
Some examples from the PyTorch Examples Repo. have been added as demo. See the cases folder.
Possible options are documented in opts/autosgml. Some of the defaults, might likely need not be changed.
Given a neural network model called mdl
has been constructed with PyTorch.
The following examples illustrate how parameters of the model mdl.parameters()
may be optimized or learnt with this AutoSGM implementation.
By default, this implementation, auto-tunes an initial learning iteratively, which in the code snippet below has been set as lr_init=1e-4
.
optimizer = AutoSGM(mdl.parameters(), lr_init=1e-4)
To use only moment estimation, in tuning the learning rate, for all iteration, the code snippet below uses a single constant value of lr_init=3e-4
with a normalized gradient.
optimizer = AutoSGM(mdl.parameters(), autolr=False, lr_init=3e-4)
The code snippet below disables any optimal learning-rate estimation and uses a single learning rate constant lr_init=1e-3
.
optimizer = AutoSGM(mdl.parameters(), lr_init=5e-4, autolr=None)
Also, important parameters to configure apart from the initial learning rate are 3 main lowpass
(often called momentum) parameters in beta_cfg
. The first two are respectively for iteratively smoothing the gradient input, smoothing the weight output. The third is for estimating the gradient's variance/moment, which also adapts the learning rate.
By smoothing
, we mean the lowpass
filter is used to carefully filter high frequency noise components from its input signal. By averaging
, we mean the lowpass
filter is used to estimate a statistical expectation function. Note that when using the first-order lowpass
filter: For smoothing
, the lowpass parameter is often less or equal to 0.9
but for averaging, the lowpass parameter is often greater than 0.9
.
By default, the values in beta_cfg
are sensible theoretical values, which should be changed depending on what works and a feel for the linear nature of the learning system (neural network). Annotations on available options are documented in opts/autosgml.
optimizer = AutoSGM(mdl.parameters(), lr_init=1e-4, beta_cfg=(0.9,0.1,0.999,0))
The code
and style
in this repository is still undergoing active
development as part of my PhD
work. Feel free to raise an issue
, if you detect any bug
or you have any questions.