Skip to content

Commit e8f0cd1

Browse files
committed
Introduce PIC approximation
This commit implements the Partially Independent Conditional sparse GP approximation introduced in https://proceedings.mlr.press/v2/snelson07a/snelson07a.pdf Although I'm not aware of any outstanding bugs, this approximation remains somewhat experimental at the time of writing.
1 parent f6b8ab1 commit e8f0cd1

File tree

12 files changed

+1916
-71
lines changed

12 files changed

+1916
-71
lines changed

BUILD.bazel

+4
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ cc_library(
4848
"include/albatross/serialize/Ransac",
4949
"include/albatross/src/cereal/ThreadPool.hpp",
5050
"include/albatross/src/cereal/block_utils.hpp",
51+
"include/albatross/src/cereal/block_diagonal.hpp",
5152
"include/albatross/src/cereal/dataset.hpp",
5253
"include/albatross/src/cereal/distribution.hpp",
5354
"include/albatross/src/cereal/eigen.hpp",
@@ -127,8 +128,10 @@ cc_library(
127128
"include/albatross/src/models/gp.hpp",
128129
"include/albatross/src/models/least_squares.hpp",
129130
"include/albatross/src/models/null_model.hpp",
131+
"include/albatross/src/models/pic_gp.hpp",
130132
"include/albatross/src/models/ransac.hpp",
131133
"include/albatross/src/models/ransac_gp.hpp",
134+
"include/albatross/src/models/sparse_common.hpp",
132135
"include/albatross/src/models/sparse_gp.hpp",
133136
"include/albatross/src/samplers/callbacks.hpp",
134137
"include/albatross/src/samplers/ensemble.hpp",
@@ -228,6 +231,7 @@ swift_cc_test(
228231
"tests/test_models.cc",
229232
"tests/test_models.h",
230233
"tests/test_parameter_handling_mixin.cc",
234+
"tests/test_pic_gp.cc",
231235
"tests/test_prediction.cc",
232236
"tests/test_radial.cc",
233237
"tests/test_random_utils.cc",

doc/src/crappy-pic.rst

+164
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
#################################################
2+
PIC math scratchpad
3+
#################################################
4+
5+
.. _crappy-pic:
6+
7+
-------------------------------------------------------------------
8+
Relationship between :math:`\sigma_\cancel{B}` and :math:`\sigma_f`
9+
-------------------------------------------------------------------
10+
11+
It seems like what we need first of all is some way to get the PITC posterior variance based only on :math:`\cancel{B}` using only the full PITC posterior and calculations that scale only with :math:`|B|`. I'm not totally finished working out how :math:`(\sigma^2_{*})^{\cancel{B} \cancel{B}}` fits into the whole posterior :math:`(\sigma^2_*)^{PIC}`
12+
13+
Observe that since :math:`\mathbf{\Lambda}` is a block-diagonal matrix, its inverse is too, so when it's in the middle of a quadratic form, the cross terms disappear. As far as I can tell, this is the only place we can additively separate :math:`\mathbf{Q}_{\cancel{B} \cancel{B}}` and :math:`\mathbf{K}_{B B}` without explicit cross terms. Start with the usual application of Woodbury's lemma to the PITC posterior covariance:
14+
15+
.. math::
16+
17+
\newcommand{Lam}{\mathbf{\Lambda}}
18+
\newcommand{Kuu}{\mathbf{K}_{uu}}
19+
\newcommand{Kuf}{\mathbf{K}_{uf}}
20+
\newcommand{Kfu}{\mathbf{K}_{fu}}
21+
(\sigma_*^2) &= K_* - \mathbf{Q}_{* f} \left(\mathbf{K}_{fu} \mathbf{K}_{uu}^{-1} \mathbf{K}_{uf} + \mathbf{\Lambda} \right)^{-1} \mathbf{Q}_{f *} \\
22+
&= K_* - \mathbf{K}_{*u} \Kuu^{-1} \Kuf \left(\Kfu \Kuu^{-1} \Kuf + \Lam\right)^{-1} \Kfu \Kuu^{-1} \mathbf{K}_{u*} \\
23+
&= K_* - \mathbf{K}_{*u} \Kuu^{-1} \mathbf{K}_{u*} + \mathbf{K}_{*u} \left( \Kuu + \Kuf \Lam^{-1} \Kfu \right)^{-1} \mathbf{K}_{u*}
24+
25+
Now break up :math:`\Kuf` and :math:`\Lam`:
26+
27+
.. math::
28+
29+
\Kuf &= \begin{bmatrix} \mathbf{K}_{u \cancel{B}} & \mathbf{K}_{u B} \end{bmatrix} \\
30+
\Lam &= \begin{bmatrix} \Lam_{\cancel{B}} & \mathbf{0} \\ \mathbf{0} & \Lam_{B} \end{bmatrix} \\
31+
32+
so that
33+
34+
.. math::
35+
36+
\Kuf \Lam^{-1} \Kfu &= \begin{bmatrix} \mathbf{K}_{u \cancel{B}} & \mathbf{K}_{u B} \end{bmatrix}
37+
\begin{bmatrix} \Lam_{\cancel{B}}^{-1} & \mathbf{0} \\ \mathbf{0} & \Lam_{B}^{-1} \end{bmatrix}
38+
\begin{bmatrix} \mathbf{K}_{\cancel{B} u} \\ \mathbf{K}_{B u} \end{bmatrix} \\
39+
&= \mathbf{K}_{u \cancel{B}} \Lam_{\cancel{B}}^{-1} \mathbf{K}_{\cancel{B} u} +
40+
\mathbf{K}_{u B} \Lam_{B}^{-1} \mathbf{K}_{B u} \\
41+
\mathbf{K}_{u \cancel{B}} \Lam_{\cancel{B}}^{-1} \mathbf{K}_{\cancel{B} u} &= \Kuf \Lam^{-1} \Kfu - \mathbf{K}_{u B} \Lam_{B}^{-1} \mathbf{K}_{B u}
42+
43+
This last one is the inverse term in the training-data dependent part of the PITC posterior covariance using only the training points in :math:`\cancel{B}`, which I think is what we want to combine with the dense posterior from the training points in :math:`B`:
44+
45+
.. math::
46+
(\sigma_*^2)^{\cancel{B} \cancel{B}} = K_* - \mathbf{K}_{*u} \Kuu^{-1} \mathbf{K}_{u*} + \mathbf{K}_{*u} \left( \Kuu + \mathbf{K}_{u \cancel{B}} \Lam_{\cancel{B}}^{-1} \mathbf{K}_{\cancel{B} u} \right)^{-1} \mathbf{K}_{u*}
47+
48+
which we can write in terms of only the full PITC prior and the prior for group :math:`B`:
49+
50+
.. math::
51+
(\sigma_*^2)^{\cancel{B} \cancel{B}} = K_* - \mathbf{K}_{*u} \Kuu^{-1} \mathbf{K}_{u*} + \mathbf{K}_{*u} \left( \Kuu + \Kuf \Lam^{-1} \Kfu - \mathbf{K}_{u B} \Lam_{B}^{-1} \mathbf{K}_{B u} \right)^{-1} \mathbf{K}_{u*}
52+
53+
If you set :math:`B = \emptyset`, removing the correction for the training points in block :math:`B` entirely, you have the original PITC posterior covariance based on the full set of training points :math:`f`, as expected. Now we only have to compute :math:`\Kuf \Lam^{-1} \Kfu` (the term that scales with the full training set :math:`f`) once, and nothing that scales with :math:`|\cancel{B}|`. This still involves an :math:`O(M^3)` decomposition for each group :math:`B`. I have been assuming our group size :math:`|B|` is significantly smaller than the number of inducing points :math:`M`, so it's not great, but an interesting thing happens if we use the "downdate" form of Woodbury's lemma in the opposite direction to the usual way:
54+
55+
.. math::
56+
\left(A - B C^{-1} B^T \right)^{-1} &= A^{-1} + A^{-1} B \left(C - B^T A^{-1} B\right)^{-1} B^T A^{-1} \\
57+
58+
Letting :math:`\mathbf{A} = \Kuu + \Kuf \Lam^{-1} \Kfu` (playing the part of :math:`A` in Woodbury's lemma):
59+
60+
.. math::
61+
62+
(\sigma_*^2)^{\cancel{B} \cancel{B}} &= K_* - \mathbf{K}_{*u} \Kuu^{-1} \mathbf{K}_{u*} + \mathbf{K}_{*u} \left( \mathbf{A}^{-1} + \mathbf{A}^{-1} \mathbf{K}_{u B} \left(\Lam_{B} - \mathbf{K}_{B u} \mathbf{A}^{-1} \mathbf{K}_{u B}\right)^{-1} \mathbf{K}_{B u} \mathbf{A}^{-1} \right) \mathbf{K}_{u*} \\
63+
&= K_* - \mathbf{K}_{*u} \Kuu^{-1} \mathbf{K}_{u*} + \mathbf{K}_{*u} \mathbf{A}^{-1} \mathbf{K}_{u*} + \mathbf{K}_{*u} \mathbf{A}^{-1} \mathbf{K}_{u B} \left(\Lam_{B} - \mathbf{K}_{B u} \mathbf{A}^{-1} \mathbf{K}_{u B}\right)^{-1} \mathbf{K}_{B u} \mathbf{A}^{-1} \mathbf{K}_{u*} \\
64+
65+
I find the use of Woodbury's lemma to downdate a bit uncomfortable -- does this work with real symmetric matrices? But we would only need to decompose :math:`\mathbf{A}` (and also :math:`\Kuu`) at a cost of :math:`O(M^3)` once; the inversion to be calculated per group costs :math:`O(|B|^3)`. I also don't immediately spot a slick QR decomposition way to do this per-group term; it's pretty symmetrical. The only thing that jumps out is precomputing :math:`\mathbf{K}_{B u} \mathbf{A}^{-\frac{1}{2}}`.
66+
67+
----------------------------------------------
68+
Derivation of Woodbury's Lemma for Differences
69+
----------------------------------------------
70+
71+
This is specifically for symmetric matrices, to convince myself the downdate formula is legit. We would like to compute a rank-:math:`k` downdate to the inverse of a matrix :math:`A`.
72+
73+
.. math::
74+
\left(A - B C^{-1} B^T \right)^{-1} &= A^{-1} + A^{-1} B \left(C - B^T A^{-1} B\right)^{-1} B^T A^{-1} \\
75+
76+
To be extra clear, let's say :math:`A \in \mathbb{R}^{n \times n}`, :math:`B \in \mathbb{R}^{n \times k}`, :math:`C \in \mathbb{R}^{k \times k}`. We put the relevant matrices into a block matrix :math:`M \in \mathbb{R}^{m \times m}` for :math:`m = n + k`, and label the conforming blocks of its inverse:
77+
78+
.. math::
79+
M &= \begin{pmatrix} A & B \\ B^T & C\end{pmatrix} \\
80+
M^{-1} &= \begin{pmatrix} W & X \\ X^T & Y\end{pmatrix} \\
81+
M M^{-1} &= \begin{pmatrix} \mathbf{I} & \mathbf{0} \\ \mathbf{0} & \mathbf{I} \end{pmatrix} \\
82+
M^{-1} M &= \begin{pmatrix} \mathbf{I} & \mathbf{0} \\ \mathbf{0} & \mathbf{I} \end{pmatrix}
83+
84+
The blocks of :math:`M M^{-1}` yield the following equations:
85+
86+
.. math::
87+
AW + BX^T &= \mathbf{I} \\
88+
AX + BY &= \mathbf{0} \\
89+
B^T W + CX^T &= \mathbf{0} \\
90+
B^T X + CY &= \mathbf{I} \\
91+
92+
Rearrange the middle two:
93+
94+
.. math::
95+
X &= -A^{-1} B Y \\
96+
X^T &= -C^{-1} B^T W
97+
98+
If we do the same for :math:`M^{-1} M`:
99+
100+
.. math::
101+
X &= -W B C^{-1} \\
102+
X^T &= -Y B^T A^{-1}
103+
104+
These blocks are equal:
105+
106+
.. math::
107+
C^{-1} B^T W &= Y B^T A^{-1} \\
108+
A^{-1} B Y &= W B C^{-1} \\
109+
110+
Now use the middle two equations from :math:`M M^{-1}` and plug them into the first and last equations irrespectively:
111+
112+
.. math::
113+
W - A^{-1} B C^{-1} B^T W &= A^{1} \\
114+
\left(I - A^{-1} B C^{-1} B^T\right) W &= A^{-1} \\
115+
-C^{-1} B^T A B Y + Y &= C^{-1} \\
116+
\left(I - C^{-1} B^T A^{-1} B\right)Y &= C^{-1} \\
117+
118+
Assuming :math:`\left(I - A^{-1} B C^{-1} B^T\right)` and :math:`\left(I - C^{-1} B^T A^{-1} B\right)` are invertible, rearrange:
119+
120+
.. math::
121+
W &= \left(I - A^{-1} B C^{-1} B^T\right)^{-1} A^{-1} \\
122+
&= \left(A - B C^{-1} B^T\right)^{-1} \\
123+
Y &= \left(I - C^{-1} B^T A^{-1} B\right)^{-1} C^{-1} \\
124+
&= \left(C - B^T A^{-1} B\right)^{-1}
125+
126+
Now use equality of the off-diagonal blocks from the two ways to multiply :math:`M` and :math:`M^{-1}` above (don't you wish Sphinx would number equations?) and substitute:
127+
128+
.. math::
129+
C^{-1} B^T \left(A - B C^{-1} B^T\right)^{-1} &= \left(C - B^T A^{-1} B\right)^{-1} B^T A^{-1} \\
130+
\left(A - B C^{-1} B^T\right)^{-1} B C^{-1} &= A^{-1} B \left(C - B^T A^{-1} B\right)^{-1} \\
131+
132+
Let's look just at the term :math:`\left(A - B C^{-1} B^T\right)` and right-multiply by :math:`-A^{-1}`:
133+
134+
.. math::
135+
\left(A - B C^{-1} B^T\right) \left(-A^{-1}\right) &= -\mathbf{I} + B C^{-1} B^T A^{-1} \\
136+
\mathbf{I} + \left(A - B C^{-1} B^T\right) \left(-A^{-1}\right) &= B C^{-1} B^T A^{-1}
137+
138+
Now let's return to the previous equation and right-multiply by :math:`B^T A^{-1}`:
139+
140+
.. math::
141+
\left(A - B C^{-1} B^T\right)^{-1} B C^{-1} B^T A^{-1} &= A^{-1} B \left(C - B^T A^{-1} B\right)^{-1} B^T A^{-1} \\
142+
143+
Substitute the previous result for :math:`B C^{-1} B^T A^{-1}`:
144+
145+
.. math::
146+
\left(A - B C^{-1} B^T\right)^{-1} \left( \mathbf{I} + \left(A - B C^{-1} B^T\right) \left(-A^{-1}\right) \right) &= A^{-1} B \left(C - B^T A^{-1} B\right)^{-1} B^T A^{-1} \\
147+
\left(A - B C^{-1} B^T\right)^{-1} - A^{-1} &= A^{-1} B \left(C - B^T A^{-1} B\right)^{-1} B^T A^{-1} \\
148+
\left(A - B C^{-1} B^T \right)^{-1} &= A^{-1} + A^{-1} B \left(C - B^T A^{-1} B\right)^{-1} B^T A^{-1} \blacksquare
149+
150+
which is the difference form of Woodbury's lemma, or a rank-:math:`k` downdate of :math:`A^{-1}`. The main assumption seems to be that :math:`\left(I - A^{-1} B C^{-1} B^T\right)` and :math:`\left(I - C^{-1} B^T A^{-1} B\right)` are invertible.
151+
152+
------------------------
153+
Blockwise inversion of S
154+
------------------------
155+
156+
I tried to invert :math:`\mathbf{S}` blockwise; it didn't work out but I'm leaving it in here just to look back at:
157+
158+
.. math::
159+
\newcommand{VV}{\mathbf{V}}
160+
\mathbf{U} &= \mathbf{Q}_{* \cancel{B}} \mathbf{S}^{-1}_{\cancel{B} B} \VV_{B *} \\
161+
&= \mathbf{Q}_{* \cancel{B}} \left( \mathbf{Q}_{\cancel{B} \cancel{B}}^{-1} \mathbf{Q}_{\cancel{B} B} \left(\mathbf{Q}_{B B} - \mathbf{Q}_{B \cancel{B}} \mathbf{Q}_{\cancel{B} \cancel{B}}^{-1} \mathbf{Q}_{\cancel{B} B}\right)^{-1} \right) \VV_{B *}
162+
163+
164+
How are we going to avoid the :math:`O(|\cancel{B}|^3)` inversion for :math:`\mathbf{Q}_{\cancel{B} \cancel{B}}`? Also we are going to get cross-terms with both :math:`\mathbf{Q}_{* \cancel{B}}` and :math:`\mathbf{Q}_{B B}` involved, but maybe they are OK because they are only :math:`O(|\cancel{B}| |B|)`?

doc/src/index.rst

+2
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,8 @@ and plot the results (though this'll require a numerical python environment),
9797
custom-models
9898
references
9999

100+
crappy-pic
101+
100102

101103
#########
102104
Credit

0 commit comments

Comments
 (0)