Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Response Parsing Protocol Tests #3247

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from
41 changes: 33 additions & 8 deletions botocore/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@
from botocore.compat import ETree, XMLParseError
from botocore.eventstream import EventStream, NoInitialResponseError
from botocore.utils import (
ensure_boolean,
is_json_value_header,
lowercase_dict,
merge_dicts,
Expand Down Expand Up @@ -201,6 +202,10 @@ class ResponseParser:

DEFAULT_ENCODING = 'utf-8'
EVENT_STREAM_PARSER_CLS = None
# This is a list of known values for the "location" key in the
# serialization dict. The location key tells us where in the response
# to parse the value. Locations not in this list will be ignored.
KNOWN_LOCATIONS = ('statusCode', 'header', 'headers')

def __init__(self, timestamp_parser=None, blob_parser=None):
if timestamp_parser is None:
Expand Down Expand Up @@ -356,14 +361,21 @@ def _has_unknown_tagged_union_member(self, shape, value):
if shape.is_tagged_union:
cleaned_value = value.copy()
cleaned_value.pop("__type", None)
cleaned_value = {
k: v for k, v in cleaned_value.items() if v is not None
}
Comment on lines +364 to +366
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand how we're getting to a state where we have None values arriving here. Do we have an example of what that case might look like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rust SDK has experienced issues (aws-sdk-rust#1095) when parsing responses from services like quicksight who populate additional union members with null values. Although the quicksight case doesn't specifically affect boto3 due to differences in modeled shape between Smithy and C2J, it’s possible for any other service to send us this type of response for a union structure.

Related Protocol Test Id: AwsJson10DeserializeAllowNulls

if len(cleaned_value) != 1:
error_msg = (
"Invalid service response: %s must have one and only "
"one member set."
)
raise ResponseParserError(error_msg % shape.name)
tag = self._get_first_key(cleaned_value)
if tag not in shape.members:
serialized_member_names = [
shape.members[member].serialization.get('name', member)
for member in shape.members
]
if tag not in serialized_member_names:
Comment on lines +374 to +378
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When parsing union structure members, botocore will use the raw member name from the HTTP response to determine if the member is know or not. If unknown, the union is populated with the following member:

“SDK_UNKNOWN_MEMBER”: {
    “name”: “<unknown member name>”
}

Botocore fails to consider union members modeled with a locationName which is used to provide an alias for a member’s name.

Solution: When determining if a union member is know, we should consider serialized names for members modeled with the locationName trait.

Realted Protocol Test Ids: PostUnionWithJsonNameResponse1, PostUnionWithJsonNameResponse2

msg = (
"Received a tagged union response with member "
"unknown to client: %s. Please upgrade SDK for full "
Expand Down Expand Up @@ -427,11 +439,12 @@ def _handle_structure(self, shape, node):
return self._handle_unknown_tagged_union_member(tag)
for member_name in members:
member_shape = members[member_name]
location = member_shape.serialization.get('location')
if (
'location' in member_shape.serialization
location in self.KNOWN_LOCATIONS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What specific test case is this handling? I'm not sure how we'd introduce a new location in a model that's unhandled. This is mostly for my own understanding of what we defined in the protocol tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request object members can be modeled with location values that aren't applied to response parsing such as uri and querystring. In the case where the same structure is modeled for the input and output of an operation and one of location values mentioned above is modeled, we would incorrectly assume the member has already been handled and fail to parse the related member.

Solution: Add a list (KNOWN_LOCATIONS = ('statusCode', 'header', 'headers') of know locations used to determine if a member is parsed in a location that is not the body. We'll only skip the member if one of these locations is specified, preventing the case where we unintentionally skip when "location": "uri" is specified.

Related Protocol Test Id: IgnoreQueryParamsInResponse

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think I'm just reading this wrong, but what you wrote doesn't make sense to me.

Request object members can be modeled with location values that aren't applied to response parsing such as uri and querystring.

This seems to contradict what you say here:

We'll only skip the member if one of these locations is specified, preventing the case where we unintentionally skip when "location": "uri" is specified.

So we do want to parse members that are in the uri after all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is there a reason we aren't doing this here where we handle the rest of the locations? Is this somehow specific to XML parsing?

Copy link
Contributor Author

@jonathan343 jonathan343 Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uri and querystring locations are specific to input serialization (handled in serialize.py) and shouldn't be handled by our parsers. We shouldn't care about these locations when parsing responses (in parsers.py), but we do today. This changes is defining a list of the location values we do care about: KNOWN_LOCATIONS = ('statusCode', 'header', 'headers').

Lmk if you need further clarification!

Copy link
Contributor

@SamRemis SamRemis Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this makes more sense now. To give an example for anyone else confused by this, here's my current understanding:

  1. These traits are specific to XML parsing, which is why they were put here. This still needs to be confirmed.
  2. This solves an extremely specific case. Best way to describe it is an example:

Let's say a service team models a FooShape and uses that shape as a top level input and a top level output on an operation. If FooShape has a member called BarString that targets the querystring, we want to serialize that on the input by adding ?barstring=<input value> to the query string; this all makes sense and is in line with how many of our APIs behave.

Since the service has modeled FooShape as both the input and output, we need to have some way to parse the result for BarString that the service returns. But since the query string is controlled entirely by the requester, the service can't reply with a value there, so instead they put it in the body. This code ensures that BarString won't get skipped during parsing and instead will be returned to the customer.

Note: Jonathan is looking into this more to confirm all of the above and make sure this really makes sense to add.

or member_shape.serialization.get('eventheader')
):
# All members with locations have already been handled,
# All members with known locations have already been handled,
# so we don't need to parse these members.
continue
xml_name = self._member_key_name(member_shape, member_name)
Expand Down Expand Up @@ -707,6 +720,8 @@ def _do_error_parse(self, response, shape):
# code has a couple forms as well:
# * "com.aws.dynamodb.vAPI#ProvisionedThroughputExceededException"
# * "ResourceNotFoundException"
if ':' in code:
code = code.split(':', 1)[0]
Comment on lines +723 to +724
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are now 3 distinct if statements here that aren't using elif/else. Is the intent to continually mutate the code value as it goes through each check?

e.g.

com.test.mypackage#this:that#1 -- split(':', 1) --> com.test.mypackage#this
com.test.mypackage# -- split('#', 1) --> this

Then this is evaluated for the x-amzn-query-error code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup that's correct. Smithy provides the following examples:

All of the following error values resolve to FooError:

FooError
FooError:http://internal.amazon.com/coral/com.amazon.coral.validate/
aws.protocoltests.restjson#FooError
aws.protocoltests.restjson#FooError:http://internal.amazon.com/coral/com.amazon.coral.validate/

if '#' in code:
code = code.rsplit('#', 1)[1]
if 'x-amzn-query-error' in headers:
Expand Down Expand Up @@ -1020,19 +1035,29 @@ def _inject_error_code(self, error, response):
# The "Code" value can come from either a response
# header or a value in the JSON body.
body = self._initial_body_parse(response['body'])
code = None
if 'x-amzn-errortype' in response['headers']:
code = response['headers']['x-amzn-errortype']
# Could be:
# x-amzn-errortype: ValidationException:
code = code.split(':')[0]
error['Error']['Code'] = code
elif 'code' in body or 'Code' in body:
error['Error']['Code'] = body.get('code', body.get('Code', ''))
code = body.get('code', body.get('Code', ''))
if code is not None:
if ':' in code:
code = code.split(':', 1)[0]
if '#' in code:
code = code.rsplit('#', 1)[1]
error['Error']['Code'] = code

def _handle_boolean(self, shape, value):
return ensure_boolean(value)

def _handle_integer(self, shape, value):
return int(value)

def _handle_float(self, shape, value):
return float(value)

_handle_long = _handle_integer
_handle_double = _handle_float


class RestXMLParser(BaseRestParser, BaseXMLResponseParser):
Expand Down
Loading
Loading