Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix the memory bloat of a saved emulator #25

Open
abigailsnyder opened this issue Mar 22, 2019 · 6 comments
Open

fix the memory bloat of a saved emulator #25

abigailsnyder opened this issue Mar 22, 2019 · 6 comments
Assignees
Milestone

Comments

@abigailsnyder
Copy link
Contributor

Option 1:
Use something other than approx_fun to characterize the empirical CDF for each grid cell with fewer variables in R/normalizeresiduals.R #14

Option 2:
If Option 1 doesn't bring down the size of a saved emulator by enough, cry and think of something else

@rplzzz
Copy link
Contributor

rplzzz commented Nov 9, 2019

I've been thinking about this today, and I'm adding an option to downsample the points in the ECDF. For the drought experiment runs we have something like 500 samples in each grid cell, and that's a lot more than we really need. I think we could easily reduce that by a factor of 10 and still be getting good fidelity. This can be pretty easily implemented by adding a decimation function right before we construct the interpolator.

Option 1A (before we resort to option 2), would be to stop storing the closures produced by approxfun, and instead just store the table of x and y values for the interpolation function. This is a win for us in two ways. First, there is bound to be some memory overhead associated with storing the closures, and it's unnecessary because all of those closures are just the same function with different parameters. Second, we actually end up storing the entire table twice, once for the CDF and once for the quantile function. We could save a factor of 2 on the memory usage by storing them just once and passing them in the opposite order to a single linear interpolation function.

Now, the closures returned by approxfun are actually implemented with a call to stats:::.approxfun, which does exactly what we want, but I'm a little hesitant to use an unexported function from another package. Still, it wouldn't be hard to whip up our own implementation. .approxfun is implemented in C, which should make it pretty fast, but we can go with a simpler implementation and then rewrite in Rcpp if it looks like the lower performance is costing us too much.

@rplzzz
Copy link
Contributor

rplzzz commented Nov 9, 2019

So, as a quick experiment, I took a full grid of residuals and compared the decimated (by a factor of 19 -- leaving about 50 samples per grid cell) empirical CDF and quantile functions to the fully-sampled version. I computed the mean and maximum differences in both output probability and quantile values for each grid cell. (Quantile differences were normalized by the mean absolute value of the input values; CDF probabilities are already on a scale from 0-1.) Then I took the mean and the max of all four indicators over the entire grid. Here's what I came up with:

> apply(materr, 1, mean)
  pmeandiff   qmeandiff    pmaxdiff    qmaxdiff 
0.002130431 0.002671451 0.010091029 0.134635401 
> apply(materr, 1, max)
  pmeandiff   qmeandiff    pmaxdiff    qmaxdiff 
0.002758295 0.018301984 0.017009315 0.663785502 

The maximum differences for the quantiles are a little larger than I'm happy with, so I've got a test running now to see how it looks when I downsample the CDF to 100 samples per grid cell. That's still a factor of 10 savings, which should help a lot.

@rplzzz
Copy link
Contributor

rplzzz commented Nov 9, 2019

Here's what the results of the 100-sample (-ish) test look like:

> apply(materr10, 1, mean)
  pmeandiff   qmeandiff    pmaxdiff    qmaxdiff 
0.001307984 0.001537848 0.006743266 0.087171459 
> apply(materr10, 1, max)
  pmeandiff   qmeandiff    pmaxdiff    qmaxdiff 
0.001620835 0.008162966 0.010638218 0.663785502 

All of the stats are improved, except for the grid max of qmaxdiff, which is identical (and, unsurprisingly, occurs in the same grid cell). I'm still not sure whether this is worth being concerned about. I'm going to implement this version and check how much difference it makes in the output residual fields.

@rplzzz
Copy link
Contributor

rplzzz commented Nov 10, 2019

Ok, I've spent way more time on this than it's probably worth, but I
believe this represents the final word on memory usage in fldgen.

tl;dr: When applying fldgen to these inputs (ISIMIP half-degree
land-only), we have about 2.5 GB of unused data. We can save
somewhere between 500--1000 MB by sampling the CDF and quantile
functions at 100 samples per grid cell (vs. roughly 950 in the
unmodified version). Both of these changes are easy to make. In the
long run, we can realize significant memory savings by representing
CDF and quantile functions as vector data, instead of as lists of
closures, as they are now. However, this will be a bit more work and
will require more testing.

The tables below give a detailed comparison of memory usage in two
versions of the emulator, one with 450 CDF samples per grid cell, and
the other with 100 (I couldn't run the full-resolution version on my
workstation; I ran out of memory and crashed)

Total emulator size (calculated with object_size):

emu450 emu100
7.39 GB 6.4 GB

Here's the breakdown by component:

component size (emu450) size(emu100)
griddataT 518 MB 518 MB
griddataP 2,500 MB 2,500 MB
tgav 0.00786 MB 0.00786 MB
meanfldT 517 MB 517 MB
meanfldP 517 MB 517 MB
tfuns 1,380 MB 890 MB
pfuns 1,380 MB 890 MB
reof 569 MB 568 MB
fx 5.06 MB 5.05 MB
infiles 0.00152 MB 0.00152 MB

The griddataP element is surprisingly large. Here is its breakdown.

component size (emu450) size(emu100)
vardata 516 MB 516 MB
globalop 0.539 MB 0.539 MB
lat 0.0031 MB 0.0031 MB
lon 0.00598 MB 0.00598 MB
time 0.0077 MB 0.0077 MB
tags 0.000976 MB 0.000976 MB
vardata_raw 1,980 MB 1,980 MB
pvarconvert_fcn 0.0634 MB 0.0641 MB
ncol_full 0.000056 MB 0.000056 MB
gridid_full 0.27 MB 0.27 MB
coord 1.08 MB 1.08 MB

The griddataT element lacks the vardata_raw component, but is
otherwise the same. It turns out the vardata_raw component isn't
actually used anywhere except in a test:

WE27755% grep vardata_raw **/*.R
R/trainTP.R:        griddataT$vardata_raw <- griddataT$vardata
R/trainTP.R:        griddataT$vardata_raw <- NULL
R/trainTP.R:        griddataP$vardata_raw <- griddataP$vardata
R/trainTP.R:        griddataP$vardata_raw <- NULL
tests/testthat/test_varfield.R:              diff <- prscl - griddataP$vardata_raw
tests/testthat/test_varfield.R:              diff2 <- pscl2 - griddataP$vardata_raw

So, we could get a quick win out of eliminating this component,
provided that we're either willing to forgo that particular test, or
find another way to do it. That's all assuming that we don't have any
future plans for this component, of course.

The other big memory users are the tfuns and pfuns, which from a
memory point of view are the same.

component size (emu450) size(emu100)
cdf 826.6 MB 445 MB
quant 826.6 MB 445 MB

Each one of these is a list of 67420 closures, and each one of
those looks like this:

component size (emu450) size(emu100)
formals 168 B 168 B
body 896 B 896 B
env 6.54 kB 2.9 kB
x 2.74 kB 912 B
y 2.74 kB 912 B
method 56 B 56 B
yleft 56 B 56 B
yright 56 B 56 B
f 56 B 56 B
------------- ---------- --------
total 10.9 kB 7.23 kB

It's not clear to me whether the closure's environment and the
contents of the environment should be counted separately, but you
don't get the full object size reported by object_size unless you
count both. Also, even if the data in the environment is being double
counted, the environment itself carries a bit of overhead beyond the
storage associated with the contents.

Note also that the function formals and body are taking up a little
over 1 kB (these are 1000-based kB, not 1024-based) each. Multiplied
by 2 lists and 67420 elements per list (massively more, if we weren't
using the land-only data from ISIMIP), this is a pretty substantial
chunk of memory that isn't doing anything for us (these formals and
body are all the same for every function in the list). Also, the x
and y values stored in the cdf closures are the same as the y
and x values stored in the quant closures, so we could save a big
chunk of memory from storing those only once.

So, what have we learned here? Reducing the resolution of the CDF and
quantile functions only saved us about a GB (actually more, since the
450-variant is downsampled by a factor of 2 from the full-resolution
version) We could get a quick 2.5 GB just by eliminating the
unused vardata_raw component. If we still want to keep it for
purposes of running the test, maybe we can add a flag that allows us
to save it when we're running tests, but not in production use.

Finally, storing the CDF and quantile functions as lists of closures
(instead of as arrays of data that are fed to a single function stored
in the package namespace) has really been costing us. It should be
easy enough to implement an R function that takes as input the x and
y arrays, along with the v array (the only thing that is input right
now). If it looks like this is an important contributor to run time,
we can reimplement in Rcpp as a function that works directly on
matrices of coefficients and values.

I still need to do some more testing on the equivalence between the
residual fields generated when we downsample the CDF and quantile
values. As it stands, they look a lot more different than I would
have expected, given the relatively small differences I saw when I
tested the CDF and quantile functions directly. This leads me to
suspect that I've messed up something else in the process of generating
the test batches of residual fields for comparison. I should have a branch
to push up sometime later this week.

@bpbond
Copy link
Member

bpbond commented Nov 11, 2019

Really interesting--thanks for the detail.

@abigailsnyder
Copy link
Contributor Author

Intermediate fix provided by the introduction of the emulator_reducer function in PR #49. If resulting bare bones trained emulators are still too big in pipeline, addressing the size of the empirical CDFs becomes more pressing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants