Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regenerate geojson for RES=6 #19

Closed
MathewBiddle opened this issue Aug 12, 2024 · 14 comments · Fixed by #20
Closed

regenerate geojson for RES=6 #19

MathewBiddle opened this issue Aug 12, 2024 · 14 comments · Fixed by #20

Comments

@MathewBiddle
Copy link
Owner

https://h3geo.org/docs/core-library/restable/#average-area-in-km2

Record how much time it takes.

@MathewBiddle
Copy link
Owner Author

MathewBiddle commented Aug 12, 2024

Couldn't generate at H3 resolution 6 (see resolutions) for the entire snapshot. Memory usage at 24.23GB.

image

Error: memory exhausted (limit reached?)

Code:

> ptm <- proc.time()
> # grid resolution
> # 3 = 15 mins
> # 6 = 
> RES <- 6
> 
> # map defaults
> column <- "es"
> label <- "ES(50)"
> trans <- "identity"
> crs <- "+proj=eqc +lat_ts=0 +lat_0=0 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +type=crs"
> limits <- c(0,50)
> 
> 
> # make grid
> grid_dec <- res_changes(occ, RES)
Error: memory exhausted (limit reached?)

So, now we have to figure out what the capacity is for running this locally, or explore other more performant solutions. Like h3o cited at marinebon/obisindicators#31 (comment).

@MathewBiddle
Copy link
Owner Author

My system info:

Processor	Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz   2.11 GHz
Installed RAM	32.0 GB (31.8 GB usable)
System type	64-bit operating system, x64-based processor

@MathewBiddle
Copy link
Owner Author

probably subsetting to us marine waters.

@mimidiorio do you have a polygon for the region of interest? That will help with compute resources.

@mimidiorio
Copy link

@MathewBiddle
H3 grids (H2-H6) clipped to US waters (and coasts) are here

USwaters shapefile is here

@MathewBiddle
Copy link
Owner Author

Tried to run the H3 grid at resolution 6 for 1 year (1970-1971) between 0 < Lat < 74.7, 160 < Lon < -40 (an extremely rough approximation of US waters bounding box).

dec_beg <- 1970

occ_dec <- 
  occ %>%
  filter(between(decimalLatitude,0.00,74.7)) %>%
  filter(between(decimalLongitude,-180.0,-40.0) | between(decimalLongitude,160.0,180.0)) %>%
  filter(
      date_year >= dec_beg,
      date_year <= dec_beg+1)

Even with this significant reduction in data, I ran into Error: memory exhausted (limit reached?).

It used used 24.37GB memory before it bailed.

What do we do now?

At H3 resolution 6 we are creating hexagons with the average area of 36 km^2 (according to the spec). I think resolution 4 (1,770 km^2) or 5 (252 km^2) would be more feasible and could significantly reduce the processing needed to generate the data.

I'll try to run the grid for resolution 5 and see if that succeeds with this massive reduction in data.

@mimidiorio
Copy link

Phooey. If you can get H4 to run, let's check that out. Have asked for NODD space/access. Not really sure what exactly we need, but seeing what might be available to us. Any specs or requirements I can relay to the NODD team would help.

@MathewBiddle
Copy link
Owner Author

I was able to run a one-year US coastal waters obisindicators for resolutions 3, 4, and 5. Here's what that looks like:
image

Details on runtime and memory:

resolution appprox area (km^2) time to run (sec) Output size (MB) Memory used (GB)
3 12393 82.65 1.0 3.51
4 1770 561.95 2.7 2.25
5 252 6351.06 5.8 12.94
6 36 Crashes with "Error: memory exhausted (limit reached?)" 24.37GB memory used before crash

@MathewBiddle
Copy link
Owner Author

@mimidiorio
Copy link

Looks like the res5 file covers a lot more than coastal waters. Can you clip to US waters before you run? Looking into the NODD as an option too.
h5_19701971

@MathewBiddle
Copy link
Owner Author

I have been testing with subsetting the OBIS snapshot down to the polygon @mimidiorio provided (.shp added to the PR above, let me know if this is not okay). Even with that significant reduction in data (35million records down to 39k) it still takes a long time to run the H3 gridding process at resolution 6.

One positive is that I am no longer running into memory limitations. But, I am still only testing with one year of data (1970-1971).

The next step is to dig into the grid generating function to see why that is taking so long. Is it generating a global grid and then trying to compute for all grids? We only need it to generate for the regions of interest. We might be able to optimize obisindicators.

@MathewBiddle
Copy link
Owner Author

I think I'm getting a clearer picture. obisindicators first builds a global H3 grid at the resolution you request.

https://github.com/marinebon/obisindicators/blob/9b1ef2fed175f47b21f65280cfa0f903666502ed/R/h3.R#L35-L38

Building a grid at resolution 6 takes a long time! So, how can we optimize that process to build a grid only for data we have?

@MathewBiddle
Copy link
Owner Author

xref: marinebon/obisindicators#45

@MathewBiddle
Copy link
Owner Author

MathewBiddle commented Aug 29, 2024

@mimidiorio I was able to make res=6 indicator from 1970-1971.

See the geosjon at:
https://github.com/MathewBiddle/globe/blob/main/data/indicators_1970_1971_res6.geojson

@MathewBiddle
Copy link
Owner Author

here is my memory usage for res6 US Waters for one year:
image

> gc()
            used   (Mb) gc trigger   (Mb)  max used    (Mb)
Ncells  24865732 1328.0  177079315 9457.1 345858035 18470.9
Vcells 290764946 2218.4  814872988 6217.0 766190872  5845.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants