Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad error message when GUNW file missing in S3 bucket #648

Closed
jhkennedy opened this issue May 8, 2024 · 8 comments · Fixed by #658
Closed

Bad error message when GUNW file missing in S3 bucket #648

jhkennedy opened this issue May 8, 2024 · 8 comments · Fixed by #658
Labels
bug Something isn't working

Comments

@jhkennedy
Copy link
Collaborator

jhkennedy commented May 8, 2024

When running RAiDER in HyP3 independently against a previous INSAR_ISCE job, RAiDER looks for a GUNW product in the provided S3 bucket and job_id prefix. However, there are cases where there won't be a GUNW product in the bucket, such as:

  • the INSAR_ISCE job failed on the DockerizedTopsApp step and so no GUNW was produced
  • the ARIA_RAIDER job was provided a job_id for a different job type which does not produce a GUNW

When this happens, aws.get_s3_file returns None here:
https://github.com/jhkennedy/RAiDER/blob/dev/tools/RAiDER/cli/raider.py#L523
and then either:

  1. for the HRRR model, the None gets converted to the string 'None' which becomes the GUNW ID:
    https://github.com/jhkennedy/RAiDER/blob/dev/tools/RAiDER/cli/raider.py#L525-L527
    and this line
    https://github.com/jhkennedy/RAiDER/blob/dev/tools/RAiDER/cli/raider.py#L528
    raises an exception like:

    Traceback (most recent call last):
    File "<frozen runpy>", line 198, in _run_module_as_main
    File "<frozen runpy>", line 88, in _run_code
    File "/opt/conda/envs/RAiDER/lib/python3.12/site-packages/RAiDER/cli/__main__.py", line 44, in <module>
      main()
    File "/opt/conda/envs/RAiDER/lib/python3.12/site-packages/RAiDER/cli/__main__.py", line 40, in main
      process_entry_point.load()()
    File "/opt/conda/envs/RAiDER/lib/python3.12/site-packages/RAiDER/cli/raider.py", line 539, in calcDelaysGUNW
      if not RAiDER.aria.prepFromGUNW.check_hrrr_dataset_availablity_for_s1_azimuth_time_interpolation(gunw_id):
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/conda/envs/RAiDER/lib/python3.12/site-packages/RAiDER/aria/prepFromGUNW.py", line 61, in check_hrrr_dataset_availablity_for_s1_azimuth_time_interpolation
      ref_acq_time = _get_acq_time_from_gunw_id(gunw_id, 'reference')
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/conda/envs/RAiDER/lib/python3.12/site-packages/RAiDER/aria/prepFromGUNW.py", line 36, in _get_acq_time_from_gunw_id
      date_tokens = tokens[6].split('_')
                    ~~~~~~^^^
    IndexError: list index out of range
    
  2. for other weather models, it likely fails similarly opaquely in these lines:
    https://github.com/jhkennedy/RAiDER/blob/dev/tools/RAiDER/cli/raider.py#L533-L542
    either when checking the weather model availability or trying to load the GUNW ingest JSON file. If somehow it manages to get through all that, then it will raise this exception, which still doesn't explain the problem adequately:
    https://github.com/jhkennedy/RAiDER/blob/dev/tools/RAiDER/cli/raider.py#L544-L545

Ideally, an exception would be thrown like:

if iargs.file is None:
    raise ValueError(f'GUNW product file could not be found in  s3://{iargs.bucket}/{iargs.bucket_prefix}')

here:
https://github.com/jhkennedy/RAiDER/blob/dev/tools/RAiDER/cli/raider.py#L523

And an:

if json_file_path is None:
    raise ValueError(f'GUNW metadata file could not be found in  s3://{iargs.bucket}/{iargs.bucket_prefix}')

here:
https://github.com/jhkennedy/RAiDER/blob/dev/tools/RAiDER/cli/raider.py#L539

@jhkennedy jhkennedy added the bug Something isn't working label May 8, 2024
garlic-os added a commit to garlic-os/RAiDER that referenced this issue Jun 26, 2024
@garlic-os
Copy link
Contributor

Hi, I'm working on a PR to add your improve error messages and right now I'm writing tests to make sure they are triggered instead of the old error from now on. Can you please send steps to reproduce the error so I can put it into the tests?

@jlmaurer
Copy link
Collaborator

jlmaurer commented Jul 2, 2024

@bbuzz31 do you have an example yaml that calls a GUNW product that would work for testing this? I think the only thing we really need is a yaml that references a GUNW product directly.

@bbuzz31
Copy link
Collaborator

bbuzz31 commented Jul 2, 2024

@cmarshak can you chime in? I'm useless with hyp3

@cmarshak
Copy link
Collaborator

cmarshak commented Jul 3, 2024

This is reproduced on the cloud and can't be reproduced exactly locally - tests for this would be mocked (see the GUNW tests for examples).

I wouldn't recommend doing this right now; Joe did an excellent job documenting the roadmap to fix this.

@jhkennedy
Copy link
Collaborator Author

@garlic-os to see the happy path, you can use this ARIA GUNW job for the next 5 days (expires 2024-07-07T00:00:00+00:00)

You can list the files for this job like:

aws s3 ls --no-sign-request s3://hyp3-tibet-jpl-contentbucket-81rn23hp7ppf/53ad52c3-51fa-43a9-af9c-5cd3d1481437/

(After it expires, you can have @cmarshak show you how to run/find new ones)

Likewise, for the failure case, you can use this ARIA GUNW job since it's expired and the GUNW .nc and .json files have been deleted, but the browse image remains.

aws s3 ls --no-sign-request s3://hyp3-tibet-jpl-contentbucket-81rn23hp7ppf/668d2ef5-b2a5-4fa7-9d86-9b88a864d824/

But yes, for adding an integration test, @cmarshak is right; you'll want to mock similar to the GUNW tests or stage test cases in a testing bucket somewhere.

@garlic-os
Copy link
Contributor

Thanks for the info, folks. I'm getting a 401 Not Authorized on both those job links. Is there something I need to do to authenticate?

@jhkennedy
Copy link
Collaborator Author

Yes, you need an ASF cookie which you can get by signing into Vertex with an Earthdata Login. You can also use the job UUID (bucket prefix) and the HyP3 SDK

I think you only really need the info in the S3 path: --bucket and UUID for --input-bucket-prefix.

Note: RAiDER will bonk when trying to upload the final product to --bucket and --prefix as you won't have permission for that unless you set up an S3 bucket specific for testing.

@garlic-os
Copy link
Contributor

Thanks again for your advice everyone. I have code + tests ready in #658 now, and I went ahead and marked that PR ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants