Skip to content

Commit

Permalink
rename file to fix broken link
Browse files Browse the repository at this point in the history
  • Loading branch information
jules32 committed Jun 4, 2024
1 parent 596d6ae commit acdbe5f
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions _freeze/how-tos/find-data/find-r/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"hash": "21ed51cc781131eaca6eb6db34ed5496",
"result": {
"markdown": "---\ntitle: \"How do I find data using R?\"\nexecute:\n eval: false\n---\n\n\n*This is a work in progress, and may be split up into different modules/tutorials\nas we continue to work on it.*\n\nHere are our recommended approaches for finding data with R, from the command line or a notebook.\n\n### Using the web interface\n\nSee [**How do I find data using Earthdata Search?**](earthdata_search.md) - in that tutorial, we found the dataset *ECCO Sea Surface Height - Monthly Mean 0.5 Degree (Version 4 Release 4)*, with the shortname `ECCO_L4_SSH_05DEG_MONTHLY_V4R4`.\n\n### Searching programmatically\n\nThe NASA cloud data is searchable programmatically via two methods - NASA's own\nsearch service, and the NASA Spatio-Temporal Asset Catalog (STAC) API. To find\ndata in R, we'll mainly rely on the \n[rstac](https://brazil-data-cube.github.io/rstac/) package. This enables us\ninteract with the NASA [STAC](https://stacspec.org/en) API to search for our\ndata, and at the same time learn about STAC more generally. This will be useful\nas it is a common standard for distributing spatial data.\n\nWe will also search for data using the \n[NASA Earthdata search API](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html), \nwhich is the service that powers the \n[Earthdata web search portal](https://search.earthdata.nasa.gov/search).\n\nFor both of these services, the\n[earthdatalogin](https://boettiger-lab.github.io/earthdatalogin/) package will\nhelp us get set up and provide a few handy functions to smooth some rough edges.\n\n### Authentication for NASA Earthdata \n\nAn Earthdata Login account is required to access data from the NASA Earthdata\nsystem. Please visit <https://urs.earthdata.nasa.gov> to register and manage\nyour Earthdata Login account. This account is free to create and only takes a\nmoment to set up.\n\nOnce you have created your Earthdata Login account, you can use the \n[earthdatalogin](https://github.com/boettiger-lab/earthdatalogin/) R package to help you manage your authentication within R.\n\nThere is some functionality currently only available in the development version\nof earthdatalogin, so we will install it from GitHub using the [pak](https://pak.r-lib.org/) package:\n\n``` r\ninstall.packages(\"pak\")\npak::pak(\"boettiger-lab/earthdatalogin\")\n```\n\nThe easiest and most portable method of access is using the `netrc` basic\nauthentication protocol for HTTP. Call `edl_netrc()` to set this up given your\nusername and password. The easiest way to store your credentials for use in\n`edl_netrc()` is to set them as environment variables in your `~/.Renviron` file.\nThe usethis package has a handy function to find and open this file for editing:\n\n```r\n# install.packages(\"usethis\")\nusethis::edit_r_environ()\n```\n\nThen add the following lines to the file, save the file, and restart R.\n\n```\nEARTHDATA_USER=\"your_user_name\"\nEARTHDATA_PASSWORD=\"your_password\"\n```\n\nNow, when you call `edl_netrc()`, it will consult these environment variables to\ncreate a `.netrc` file that will be used to authenticate with the NASA Earthdata\nservices. If you don't have your credentials saved as environment variables, \n`earthdatalogin` will provide its own credentials, but you may experience rate \nlimits more readily. You can also manually type in your credentials to the `username`\nand `password` arguments to `edl_netrc()` but this is not recommended as it \nis too easy to accidentally share these in your code.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(earthdatalogin)\n\nedl_netrc()\n```\n:::\n\n\nOnce `edl_netrc()` has been called in your R session, then most spatial \npackages in R can seamlessly access NASA Earthdata over HTTP links.\n\n### Finding data in NASA STACs with rstac\n\nAll of the NASA STAC catalogues can be viewed here: \n<https://radiantearth.github.io/stac-browser/#/external/cmr.earthdata.nasa.gov/stac/>.\n\nWe will use the rstac package to first browse the collections in the PO DAAC \ncatalogue (POCLOUD):\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## In R\n## load R libraries\n# install.packages(\"pak\") \n# pak::pak(c(\"tidyverse\", \"rstac\", \"boettiger-lab/earthdatalogin\"))\n\nlibrary(rstac)\nlibrary(earthdatalogin)\n\npo_collections <- stac(\"https://cmr.earthdata.nasa.gov/stac/POCLOUD/\") |>\n collections() |>\n get_request()\n\npo_collections\n```\n:::\n\n\nThis only gives us the first 10 collections in the catalogue; to get the rest\nwe can use the `collections_fetch()` function:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nall_po_collections <- collections_fetch(po_collections)\nall_po_collections\n\nlength(all_po_collections$collections)\n\nhead(all_po_collections$collections)\n\n# Just look at the titles:\nsapply(all_po_collections$collections, `[[`, \"title\")\n\n# Get shortnames from the 'id' field and search for a match:\ngrep(\"ECCO_L4_SSH\", sapply(all_po_collections$collections, `[[`, \"id\"), value = TRUE)\n```\n:::\n\n\nOnce we have searched through the collections, we can choose the one we are \ninterested in and query it to get the items (granules) we are interested in:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstart <- \"2015-01-01\"\nend <- \"2015-12-31\" \n\nitems <- stac(\"https://cmr.earthdata.nasa.gov/stac/POCLOUD\") |> \n stac_search(collections = \"ECCO_L4_SSH_05DEG_MONTHLY_V4R4\",\n datetime = paste(start,end, sep = \"/\")) |>\n post_request() |>\n items_fetch()\n\nitems\n```\n:::\n\n\nThere are 13 'items' representing the same 13 granules [we found when we searched using EarthData Search](https://search.earthdata.nasa.gov/search/granules?p=C1990404799-POCLOUD&pg%5B0%5D%5Bv%5D=f&pg%5B0%5D%5Bgsk%5D=-start_date&q=ECCO%20monthly%20SSH&qt=2015-01-01T00:00:00.000Z,2015-12-31T23:59:59.999Z&ff=Available%20in%20Earthdata%20Cloud&tl=1713291266.503!3!!&long=0.0703125).\n\nWe can see more details of the items (granules) by printing the `features` list\nitem:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nitems$features\n\n# And the urls:\nedl_stac_urls(items)\n```\n:::\n\n\n### Finding data in NASA EarthData Search using earthdatalogin\n\nOnce we know the shortname of a collection (usually by looking in the EarthData Search portal), we can supply it to `edl_search()` to get the metadata and file urls of the individual granules:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngranules <- edl_search(\n short_name = \"ECCO_L4_SSH_05DEG_MONTHLY_V4R4\",\n temporal = c(start, end), \n parse_results = FALSE\n)\n\ngranules\n\ngranules[[1]]\n\n# See the granule titles\nsapply(granules, `[`, \"title\")\n\n# Note these are the same urls obtained via the rstac workflow demonstrated above\ngranule_urls <- edl_extract_urls(granules)\ngranule_urls\n```\n:::\n\n\n#### Accessing data using the `{terra}` package\n\nWe can read any of these urls using `terra::rast()`. We supply the `vsi = TRUE` \nargument, which prepends `\"/vsicurl/\"` to the url, indicating that the file\nshould be opened as a \"virtual\" remote dataset. This allows random partial \nreading of files without prior download of the entire file. This will vastly speed up\nmost operations as only the subset of the data required is ever actually downloaded.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(terra)\n\nrast <- terra::rast(granule_urls[1], vsi = TRUE)\n\n# This does not come with an extent and CRS embedded so we can supply it manually\ngranules[[1]]$boxes\next(rast) <- c(-180, 180, -90, 90)\ncrs(rast) <- \"EPSG:4326\"\n\nplot(rast[[\"SSH\"]])\n\n# We notice that this plot is upside-down - it is likely that the NetCDF file\n# does not conform properly to the conventions. But we can flip it:\nrast <- flip(rast, direction = \"vertical\")\nplot(rast)\n```\n:::\n\n\nIf we want to crop the raster, we can define an extent and crop it to that area.\nBecause we previously called `edl_netrc()`, we are not only authenticated with \nthe server so we can access the data, but we have also set up `terra` and `stars`\nto us the underlying `GDAL` library to access the data in such a way that only \\the subset of the data we have requested is actually downloaded.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncrop_box <- rast(extent = c(-150, -120, 35, 60), crs = \"EPSG:4326\")\n\nrast_cropped <- crop(rast, crop_box)\n\nplot(rast_cropped[[\"SSH\"]])\n```\n:::\n\n\n#### Accessing data using the `{stars}` package\n\nThe `read_*` functions in stars do not have the `vsi` argument, but we can do\nthe same thing simply by prepending `\"/vsicurl/\"` to the url ourselves. Here \nwe will use the `read_mdim()` function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(stars)\nssh_stars <- read_mdim(paste0(\"/vsicurl/\", granule_urls[1]))\n\nplot(ssh_stars)\n```\n:::\n\n\nWe can again crop this using the same bounding box as we did with terra:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nst_crop(\n ssh_stars, \n st_bbox(c(xmin = -150, xmax = -120, ymin = 35, ymax = 60))\n) |> \n plot()\n```\n:::\n\n\n\n\n### Accessing data using gdalcubes\n\n*Coming soon!*\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}
File renamed without changes.

0 comments on commit acdbe5f

Please sign in to comment.