Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Anna Virkkala's Carbon Flux Dataset onto the PDG #99

Open
justinkadi opened this issue Oct 4, 2024 · 12 comments
Open

Add Anna Virkkala's Carbon Flux Dataset onto the PDG #99

justinkadi opened this issue Oct 4, 2024 · 12 comments
Assignees
Labels
layer Displaying a specific data product in the PDG portal viz-workflow ready The data is available to go through the viz-workflow processing steps

Comments

@justinkadi
Copy link

Anna Virkkala requested we tile her files and prepare them to be added as layers to the PDG. Her dataset is being hosted at ORNL DAAC, but they have a long queue and will likely not be able to create WMTS endpoint for us in time for her publication. As such, they have requested the ADC host the files necessary to get the layer onto the PDG.

@justinkadi justinkadi added the layer Displaying a specific data product in the PDG portal label Oct 4, 2024
@justinkadi justinkadi self-assigned this Oct 4, 2024
@justinkadi
Copy link
Author

Email from Anna on September 2nd

Our manuscript about Arctic-boreal CO2 fluxes is undergoing a second (and hopefully last) round of revisions at Nature Climate Change. I am expecting to hear back from the editor in the coming few weeks, and if all goes well and smoothly we might have the paper published in a month or so. But of course anything can still happen.

Debjani has now received all the data layers and metadata related to those that we were planning to publish at ORNL DAAC as part of the manuscript. One part of those files are aggregated annual NEE layers showing the distribution of net CO2 sinks and CO2 sources; these are the same files that we would also like to visualize at the Permafrost Discovery Gateway. These final aggregated files can be found here. Juliet - I sent you a very similar set of files some time ago but please use the ones on this new folder.

So I guess right now we would need to come up with a path forward for how to publish and store the three files so that they could be used in Permafrost Discovery Gateway as well. Let me know how you would like to proceed with this.

Thanks a lot with your help making the dataset available for broad audiences!

Best,

Anna

@justinkadi
Copy link
Author

ORNL DAAC's pre-release version of the dataset https://doi.org/10.3334/ORNLDAAC/2377

@justinkadi justinkadi added the viz-workflow ready The data is available to go through the viz-workflow processing steps label Oct 7, 2024
@justinkadi justinkadi moved this to In Progress in Data Layers Oct 7, 2024
@justinkadi
Copy link
Author

Finalized files have been published https://doi.org/10.3334/ORNLDAAC/2377

@justinkadi
Copy link
Author

From Rushiraj:

I set up both the packages (viz-staging and viz-raster ) on the datateam server, and passed a config for the set up to the workflow, and looks like the workflow is running into issues with reading the .tif file.
error:

File "fiona/ogrext.pyx", line 588, in fiona.ogrext.Session.start
  File "fiona/ogrext.pyx", line 143, in fiona.ogrext.gdal_open_vector
fiona.errors.DriverError: './input/CO2Fluxes_Arctic_Boreal_NEE_2002_2020_annual_trend_senslope.tif' not recognized as a supported file format.

So, a couple of questions:

  • it looks like the staging only expects vector file for tiling, do you know if we have used the workflow to tile raster files? staging uses fiona to read files, I'm not sure if it has drivers to read raster, but looks like we also have dependency on rasterio for viz_raster and I'm wondering if I should directly use viz_raster - thoughts?
  • any other approaches or thoughts to make progress? (I was thinking of having a custom script to convert the .tif to vector gdf, and then proceed with staging, but not entirely sure if it would result into issues with data).

@justinkadi
Copy link
Author

From Robyn:

  • Yes, the workflow was created to process vector data only, we never got to extending it to take geotiff as input. We processed the raster tile sets manually. I will share some examples with you!
  • I think the right approach would be to create a new "arm" of the workflow that takes raster directly. I think it would be inefficient to convert input rasters to gdf, unless there's a plan to make 3d tiles. The process might be, raster input -> tiling + deduplicating for raster specifically -> the web tiling part that exists already in the workflow

There are lots of tools that exist already for deduplicating and tiling rasters, so it actually isn't too complicated to do that manually

I found that there was a lot of debugging to do each time I tried to manually tile rasters though, because they seem to be so variable in their structure. I ran into problems with interpreting the "mask" layer especially. That might be the challenge with generalizing the process. But I still have a lot to learn about geotiffs so 🤷

Okay, on datateam, all the examples are in /home/thiessenbock/PDG-test/manual_processing. The following dirs all contain examples of manually converting a large geotiff to tile geotiffs + web tiles:

  • bartsch_infrastructure
  • bergstedt-2021-lake-basin
  • circumpolar_arctic_vegetation
  • mishra-soil-carbon
  • webb-2022

let me know if you can't access those dirs

@justinkadi
Copy link
Author

From Rushiraj:

I think it would be inefficient to convert input rasters to gdf, unless there's a plan to make 3d tiles.

Yes, this makes sense, we intend to have the layer up as WMTS geotiffs.

@justinkadi
Copy link
Author

From Rushiraj:

I think we can potentially utilize gdal to do the conversion. Here is what I found:
https://gdal.org/en/latest/programs/gdal2tiles.html

Unfortunately we do not have this library installed in datateam, so I'm testing on my machine. (May be in the future, we can talk with Matt and Nick, and get this install on datateam to handle similar cases).

@justinkadi
Copy link
Author

From Rushriaj:

gdal_translate -of VRT -ot Byte -scale CO2Fluxes_Arctic_Boreal_NEE_2002_2020_annual_trend_senslope.tif temp.vrt
gdal2tiles --zoom=7 temp.vrt temp_7

If you have gdal on your machine, give this a try - though this gives the output in .png instead of .tif, so currently looking into this more

@justinkadi
Copy link
Author

From Rushiraj:

update:

  • Saw some success with tiling using gdal - all three layers are now placed under my home directory on datateam. gdal_retile.py got us initial structure and organize.py python module organized the files in WMTS required hierarchical format: z/x/y . I've added this command in a READ_ME.md and the helper script in the approach folder under Virrkala
  • I noticed that both our accounts was under the group datateam so I've made the Virrkala directory owned by that group, so you should be able to access and examine those directories if you like. (for future reference)
  • There are two different scripts under for this workflow, one is a simple runner.py and the other one is parsl_runner.py (this uses the python parsl framework to parallelize the processing - rasterization can be a very slow process). So, I'm currently working on setting up the next part of the process - rasterizing.

@justinkadi
Copy link
Author

We moved the workflow folder into var/data/curation/Virkkala/workflow so that we can both access the files.

@rushirajnenuji
Copy link
Member

Final version: https://test.arcticdata.io/portals/permafrost-ORNLDAAC1934-16

Latest feedback from Anna:

Could you change one thing:

In the temporal trend description, change This layer represents the temporal trend (change) in annual terrestrial NEE over 2002-2020.

Negative values indicate increasing net ecosystem CO2 uptake by land (i.e. plant photosynthesis is higher than respiratory losses).
Positive values indicate increasing net ecosystem CO2 emissions to the atmosphere (i.e. plant photosynthesis is lower than respiratory losses).

to This layer represents the temporal trend (change) in annual terrestrial NEE over 2002-2020.

Negative values indicate increasing net ecosystem CO2 uptake by land (i.e. more plant photosynthesis over time than respiratory losses).
Positive values indicate increasing net ecosystem CO2 emissions to the atmosphere (i.e. higher respiratory losses over time than plant photosynthesis).

Then also when looking at the viewfinder, would it be possible to change the description that you see there first the average NEE one instead of the temporal trend one? Our product's strength is more on the average maps rather than the temporal change maps, so it would be good that the temporal trend layer is less highlighted. This is not so critical so no worries if the change cannot be made.

Status: ready for production, waiting to hear from PI on when to publish this to prod PDG portal.

@rushirajnenuji
Copy link
Member

  • Carbon layer deployed to production (completed on 01/20/2025)
  • Disable Carbon layer as default on PDG

From Anna L. on Slack:

I notice that Anna V's carbon paper is the default layer when you get to the PDG. That made sense for the time of the press release, which was two weeks ago now. I'd like to move back to our one big IWP dataset as the default now.

Sorry, something went wrong.

@rushirajnenuji rushirajnenuji moved this from In Progress to In Review in Data Layers Feb 3, 2025
@rushirajnenuji rushirajnenuji moved this from In Review to Done in Data Layers Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
layer Displaying a specific data product in the PDG portal viz-workflow ready The data is available to go through the viz-workflow processing steps
Projects
Status: Done
Development

No branches or pull requests

2 participants