Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem downloading grids on the fly #1474

Open
snowman2 opened this issue Feb 12, 2025 · 5 comments
Open

Problem downloading grids on the fly #1474

snowman2 opened this issue Feb 12, 2025 · 5 comments
Labels
bug proj Bug or issue related to PROJ

Comments

@snowman2
Copy link
Member

Discussed in #1454

Originally posted by j-carson October 25, 2024
The following code used to work with pyproj==3.6.1

    WGS84 = "EPSG:4979"  # https://epsg.io/4979
    EGM96 = "EPSG:9707"  # https://epsg.io/9707

    tg = TransformerGroup(WGS84, EGM96)
    if tg.unavailable_operations:
        tg.download_grids(verbose=True)

    transformer = Transformer.from_crs(CRS(EGM96), CRS(WGS84), always_xy=True)

But with pyproj==3.7 I'm getting a no-op transform - the grid I tried to download isn't actually being used.

This bug only manifested itself in my CI system, which starts with a completely clean environment. If I look in the .local/share/proj directory in the build system, I see that the grid is there us_nga_egm96_15.tif, but if I look in my "messy" system where it wasn't failing, I have files.geojson in there as well.

I found the call to get_transform_grid_list() which puts the missing files.geojson there.

With that call added, the test still fails the first time I run it, but once both files are in place, the second time you run the program, the tests succeed.

How can I reliably download grids on the fly?

@snowman2 snowman2 added the bug label Feb 12, 2025
@snowman2
Copy link
Member Author

#1454 (comment)

After some debugging, it appears that it doesn't matter if the files.geojson file is there. You just need to start a new python session for it to work as expected.

The issue appears to be related to the change in #1419.

To re-produce the issue, first clear out the user proj data directory contents.

Then, run this script:

import concurrent.futures

from pyproj.transformer import Transformer, TransformerGroup

WGS84 = "EPSG:4979"  # https://epsg.io/4979
EGM96 = "EPSG:9707"  # https://epsg.io/9707

tg = TransformerGroup(WGS84, EGM96)
if tg.unavailable_operations:
    tg.download_grids(verbose=True)

transformer = Transformer.from_crs(EGM96, WGS84, always_xy=True)
print(transformer)

def transform_repr(idx):
    return str(Transformer.from_crs(EGM96, WGS84, always_xy=True))


with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
    for result in executor.map(transform_repr, range(5)):
        print(result)

The output:

proj=noop ellps=GRS80
unavailable until proj_trans is called
unavailable until proj_trans is called
unavailable until proj_trans is called
unavailable until proj_trans is called
unavailable until proj_trans is called

The main thread is the only one with the issue. The other threads get a fresh context that has something reset.
This tells me that PROJ is caching something in the context that needs to be cleared after downloading the files in the python session.

@snowman2
Copy link
Member Author

It appears that the context caches the files looked in the context here. It gets reset if you set the search paths code.

I verified that if you call set_data_dir it will reset it an the Transform is populated correctly:

from pyproj.transformer import Transformer, TransformerGroup
from pyproj.datadir import get_data_dir, set_data_dir

WGS84 = "EPSG:4979"  # https://epsg.io/4979
EGM96 = "EPSG:9707"  # https://epsg.io/9707

tg = TransformerGroup(WGS84, EGM96)
if tg.unavailable_operations:
    tg.download_grids(verbose=True)

set_data_dir(get_data_dir())
transformer = Transformer.from_crs(EGM96, WGS84, always_xy=True)
print(transformer)
unavailable until proj_trans is called

@snowman2
Copy link
Member Author

@rouault, what are your thoughts on this behavior? Do you think that it would be helpful to add a method for the user to invalidate the cache on the context here? Or, alternatively have proj_download_file update the cache for a the downloaded grid?

@rouault
Copy link

rouault commented Feb 12, 2025

Or, alternatively have proj_download_file update the cache for a the downloaded grid?

oh yes proj_download_file should definitely invalidate lookupedFiles. Please file a OSGeo/PROJ issue about that

@j-carson
Copy link

It appears that the context caches the files looked in the context here. It gets reset if you set the search paths code.

I verified that if you call set_data_dir it will reset it an the Transform is populated correctly:

Thanks for the work-around!

@snowman2 snowman2 pinned this issue Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug proj Bug or issue related to PROJ
Projects
None yet
Development

No branches or pull requests

3 participants