Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTC-3084: Add new endpoints that proxy to geostore microservice (for now) #621

Merged
merged 47 commits into from
Jan 28, 2025

Conversation

dmannarino
Copy link
Member

@dmannarino dmannarino commented Jan 17, 2025

Pull request checklist

Please check if your PR fulfills the following requirements:

  • Make sure you are requesting to pull a topic/feature/bugfix branch (right side). Don't request your master!
  • Make sure you are making a pull request against the develop branch (left side). Also you should start your branch off our develop.
  • Check the commit's or even all commits' message styles matches our requested structure.
  • Check your code additions will fail neither code linting checks nor unit test.

Pull request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

What is the current behavior?

Issue Number: GTC-3084

What is the new behavior?

  • There are new routes which mimic and proxy to those of the RW geostore microservice, with Pydantic models to support them. The only mildly complicated routes are those to that conflict with previously existing GFW Data API routes, specifically creating a geostore and those getting a geostore by ID. Fortunately for the former the payload differs, and in the case of the latter the IDs are slightly different (GUIDs for RW, UUIDs for GFW). Tests have been added to ensure that in these cases the correct branch/function is chosen, though there are not comprehensive tests for the purely proxied routes.

Does this introduce a breaking change?

  • Yes
  • No

Other information

Gary made a great ticket with more context which can be found here: https://gfw.atlassian.net/browse/GTC-3084

@codecov-commenter
Copy link

codecov-commenter commented Jan 17, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 62.50000% with 48 lines in your changes missing coverage. Please review.

Project coverage is 80.67%. Comparing base (73d9f44) to head (c61a2d0).
Report is 13 commits behind head on develop.

Files with missing lines Patch % Lines
app/utils/rw_api.py 21.42% 33 Missing ⚠️
app/routes/geostore/geostore.py 62.50% 15 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #621      +/-   ##
===========================================
- Coverage    81.10%   80.67%   -0.44%     
===========================================
  Files          131      131              
  Lines         5897     6014     +117     
===========================================
+ Hits          4783     4852      +69     
- Misses        1114     1162      +48     
Flag Coverage Δ
unittests 80.67% <62.50%> (-0.44%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dmannarino dmannarino marked this pull request as ready for review January 21, 2025 17:30
Copy link
Contributor

@gtempus gtempus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on getting tests worked in here, too! 💪
I just had some basic questions/ideas for ya.

Thanks, @dmannarino!

async def create_rw_geostore(
payload: RWGeostoreIn, x_api_key: str | None = None
) -> RWGeostoreResponse:
url = f"{RW_API_URL}/v1/geostore"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be v2, right?

Suggested change
url = f"{RW_API_URL}/v1/geostore"
url = f"{RW_API_URL}/v2/geostore"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be v2, right?

Hmm, I've never tried with v2. The original ticket didn't say, and I got the v1 from https://resource-watch.github.io/doc-api/reference.html#overview-of-available-endpoints

I just tried, and they both work. Seemingly identical results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick with your current implementation, @dmannarino. We want bug-for-bug compatibility right now.

I looked at the code for the above v1 endpoints and compared with v2. They are different.
But, I looked at Flagship's usage of the create endpoint. It looks like it just calls /geostore with no version and that seems to make the microservice default to v1. I could not find evidence of calculate area being called.

We'll need to reconcile the v1 vs v2 endpoints at a later date.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for checking that out!

async def calc_area(
payload: Dict, x_api_key: str | None = None
) -> RWCalcAreaForGeostoreResponse:
url = f"{RW_API_URL}/v1/geostore/area"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be v2, right?

Suggested change
url = f"{RW_API_URL}/v1/geostore/area"
url = f"{RW_API_URL}/v2/geostore/area"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be v2, right?

Same as the create geostore endpoint (v1 and v2 both work).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in previous comment: #621 (comment)

Comment on lines 103 to 105
# Checking for hyphens to see if the provided ID is a GUID or a UUID
# is a bit iffy. Happy to hear better ideas.
if "-" in geostore_id:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. RW API MongoDB uses ObjectId (24 hex digits). Luckily, that's different from the postgres uuid (32 hex digits plus hyphens).

One example could be:

from bson.objectid import ObjectId
import uuid

def check_id_type(value):
    try:
        ObjectId(value)
        return "ObjectId"
    except (TypeError, ValueError):
        try:
            uuid.UUID(value)
            return "UUID"
        except ValueError:
            return "Neither ObjectId nor UUID"

# Example usage
value1 = "60c0a4754483e60015000000"
value2 = "c78f40f4-3516-4099-8676-85507701973f"
value3 = "not an id"

print(check_id_type(value1))  # Output: ObjectId
print(check_id_type(value2))  # Output: UUID
print(check_id_type(value3))  # Output: Neither ObjectId nor UUID

bson is in the pymongo package

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info, and code snippet!

Copy link
Member Author

@dmannarino dmannarino Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it sure looks like the get geostore endpoint is not using ObjectIds. I have tried instantiating an ObjectId with a bunch of valid identifiers and always get
ObjectId("ca38fa80a4ffa9ac6217a7e0bf71e6df") InvalidId: 'ca38fa80a4ffa9ac6217a7e0bf71e6df' is not a valid ObjectId, it must be a 12-byte input or a 24-character hex string

If I try turning it into a bytes object first with unhexlify I get this rather confusing error:
In [17]: foo = unhexlify("ca38fa80a4ffa9ac6217a7e0bf71e6df") In [18]: ObjectId(foo) TypeError: id must be an instance of (bytes, str, ObjectId), not <class 'bytes'>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad! 🤦 I was thinking about the actual document ID this whole time and not the hash. It doesn't help that the gfw-geostore-api source refers to the hash as an ID frequently.

You're totally right, both are uuids.

Comment on lines 224 to 230
async with AsyncClient() as client:
if x_api_key is not None:
response: HTTPXResponse = await client.get(
url, headers={"x-api-key": x_api_key}
)
else:
response = await client.get(url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you experiment with generalizing this at all to cut down on duplication? Just curious?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about writing a proxy_to_rw function to reduce duplication, but I wanted to see what they looked like first. And then I ran out of time. However I also suspected that a proxy function would lose value as we started modifying the routes to extend them. I tried to keep the pattern similar though, to make refactoring in the future easy.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you should probably make a common function for all the get proxies out of lines 220-231, excluding line 229 (as Gary is asking) That is, the common function issues the request with api-key or not, and throws an error if status code is not 200. The only difference between all the GET proxies is the URL and the response type.

@jterry64
Copy link
Member

@dmannarino @gtempus this PR is a little surprising to me, maybe I didn't fully understand the needs for this proxy. What are all the places this will need to be used? Will these endpoints be public? It does feel like we're suddenly adding the API structure of RW to Data API. What do we do to support endpoints like land use (which are likely very out of date) if they need to be updated? Maybe discuss in standup.

Comment on lines 224 to 230
async with AsyncClient() as client:
if x_api_key is not None:
response: HTTPXResponse = await client.get(
url, headers={"x-api-key": x_api_key}
)
else:
response = await client.get(url)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you should probably make a common function for all the get proxies out of lines 220-231, excluding line 229 (as Gary is asking) That is, the common function issues the request with api-key or not, and throws an error if status code is not 200. The only difference between all the GET proxies is the URL and the response type.

Comment on lines 9 to 21
class LandUseType(str, Enum):
fiber = "fiber"
logging = "logging"
# mining = "mining" # Present in the docs, but yields a 404
oilpalm = "oilpalm"
tiger_conservation_landscapes = "tiger_conservation_landscapes"


class LandUseTypeUseString(str, Enum):
fiber = "gfw_wood_fiber"
logging = "gfw_logging"
oilpalm = "gfw_oil_palm"
tiger_conservation_landscapes = "tcl"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having enums for land use will not work since the param is open-ended in the gfw-geostore-api microservice.

I quick look at the geostore logs for the past 10 days shows that default is frequently used with values such as:

  • river_basins
  • biodiversity_hotspots
  • endemic_bird_areas

to name a few.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for re-discovering/remembering this! It's fixed now, I'm just passing through the string for land_use as-is. Just tested with these examples in dev!

@dmannarino dmannarino merged commit 986405d into develop Jan 28, 2025
2 checks passed
@dmannarino dmannarino deleted the gtc-3084_new_geostore_endpts branch January 28, 2025 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants