From 3042c980fe76685cb2d98934ccf3ae45e7e7da9a Mon Sep 17 00:00:00 2001 From: Thomas Cameron Date: Wed, 21 Dec 2022 04:10:03 +0000 Subject: [PATCH 01/15] commit RFC --- ...c0023_serialization_and_deserialization.md | 349 ++++++++++++++++++ 1 file changed, 349 insertions(+) create mode 100644 design/src/rfcs/rfc0023_serialization_and_deserialization.md diff --git a/design/src/rfcs/rfc0023_serialization_and_deserialization.md b/design/src/rfcs/rfc0023_serialization_and_deserialization.md new file mode 100644 index 0000000000..a1445d6140 --- /dev/null +++ b/design/src/rfcs/rfc0023_serialization_and_deserialization.md @@ -0,0 +1,349 @@ +RFC: Serialization and Deserialization +============= + +> Status: RFC +> +> Applies to: Output, Input and Builder types as well as `DateTime`, `Document`, `Blob` and `Number` implemented in `aws_smithy_types` crate. + +# Terminology +- Builder + Refers to data types prefixed with `Builder`, which converts itself into a corresponding data type upon being built. e.g. `aws_sdk_dynamodb::input::PutItemInput`. +- serde + Refers to `serde` crate. +- `Serialize` + Refers to `Serialize` trait avaialble on `serde` crate. +- `Deserialize` + Refers to `Deserialize` trait available on `serde` crate. + + +# Overview +We are going to implement Serialize and Deserialize traits from `serde` crate to some data types. +Data types that are going to be affected are; +- builder data types +- operation `Input` types +- operation `Output` types +- data types that builder types may have on its field(s) +- `aws_smithy_types::DateTime` +- `aws_smithy_types::Document` +- `aws_smithy_types::Blob` +- `aws_smithy_types::Number` + +`DateTime` and `Blob` implements different serialization/deserialization format for human-readable and non-human readable format; We must emphasize that these 2 formats are not compatible to each other. Reason for this is explained at the [Blob](#blob) section and [Date Time](#datetime). + +Additionally, we add `fn set_fields` to fluent builders to allow users to set the data they deserialized to fluent builders. + +Lastly, we emphasize that this RFC does NOT aim to serialize the entire response or request or implement `serde` traits on data types for server-side code. + +# Use Case +Users have requested `serde` traits to be implemented on data types implemented in rust SDK. +We have created this RFC with the following use cases in mind. +1. [[request]: Serialize/Deserialize of models for Lambda events #269](https://github.com/awslabs/aws-sdk-rust/issues/269) +2. [Tests](https://awslabs.github.io/smithy-rs/design/faq.html#why-dont-the-sdk-service-crates-implement-serdeserialize-or-serdedeserialize-for-any-types) as suggested in the design FAQ. +3. Building tools + +# Implementation +## Smithy Types +`aws_smithy_types` is a crate that implements smithy's data types. +These data types must implement serde traits as well since SDK uses the data types. + +### Blob +`Serialize` and `Deserialize` is not implemented with derive macro. + +In human-readable format, `Blob` is serialized as a base64 encoded string and any data to be deserialized as this data type must be encoded in base 64. +Encoding must be carried out by `base64::encode` function available from `aws_smithy_types` crate. +Non-human readable format serializes `Blob` with `fn serialize_bytes`. + +- Reason behind the implementation of human-readable format + +`aws_smithy_types` crate comes with functions for encoding/decoding base 64, which makes the implementation simpler. +Additionally, AWS CLI and AWS SDK for other languages require data to be encoded in base 64 when it requires `Blob` type as input. + +We also considered serializing them with `serialize_bytes`, without encoding them with `serialize_bytes`. +In this case, the implementation will depend on the implementation of the library author. + +There are many different crates, so we decided to survey how some of the most popular crates implement this feature. + +| library | version | implementation | all time downloads on crate.io as of writing (Dec, 2022) | +| ---------- | ------- | --------------- | -------------------------------------------------------- | +| serde_json | 1.0 | Array of number | 109,491,713 | +| toml | 0.5.9 | Array of number | 63,601,994 | +| serde_yaml | 0.9.14 | Unsupported | 23,767,300 | + +First of all, bytes could have hundreds of elements; reading an array of hundred of numbers will never be a pleasing experience, and it is especially troubling when you are writing data for test cases. +Additionally, it has come to our attention that some crates just doesn't support it, which would hinder users ability to be productive and tie users' hand. + +For the reasons described above, we believe that it is crucial to encode them to string and base64 is favourable over other encoding schemes such as base 16, 32, or Ascii85. + +- Reason behind the implementation of non-human readable format +We considered using the same logic for non-human readable format as well. +However, readable-ness is not necessary for non-human readable format. +Additionally, non-human readable format tends to emphasize resource efficiency over human-readable format; Base64 encoded string would take up more space, which is not what the users would want. + +Thus, we believe that implementing a tailored serialization logic would be beneficial to the users. + + +### DateTime +`Serialize` and `Deserialize` is not implemented with derive macro. +For human-readable format, `DateTime` is serialized in RFC-3339 format; +It expects the value to be in RFC-3339 format when it is Deserialized. + +Non-human readable implements `DateTime` as a tuple of `u32` and `i64`; the latter corresponds to `seconds` field and the first is the `seubsecond_nanos`. + +- Reason behind the implementation of a human-readable format + +For serialization, `DateTime` format already implements a function to encode itself into RFC-3339 format. +For deserialization, it is possible to accept other formats, we can add this later if we find it reasonable. + +- Reason behind the implementation of a non-human readable format + +Serializing them as tuples of two integers results in a smaller data size and requires less computing power than any string-based format. +Tuple will be smaller in size as it does not require tagging like in maps. + +### Document +`Serialize` and `Deserialize` is implemented with derive macro. +Additionally, it implements container attribute `#[serde(untagged)]`. +Serde can distinguish each variant without tagging thanks to the difference in each variant's datatypes. + +### Number +`Serialize` and `Deserialize` is implemented with derive macro. +Additionally, it implements container attribute `#[serde(untagged)]`. + +Serde can distinguish each variant without a tag as each variant's content is different. + +## Builder Types and Non-Builder Types +Builder types and non Builder types implement `Serialize` and `Deserialize` with derive macro. +Derive macro will be implemented behind a feature-gate; Users must enable `serialize` to use `serde::Serialize`, and `deserialize` to use `serde::Deserialize` respectively. Additionally, user must enable `unstable` until the stablization of this RFC. + +```rust +#[allow(missing_docs)] // documentation missing in model +#[cfg_attr( + all(feature = "unstable", feature = "serialize"), + derive(serde::Serialize) +)] +#[cfg_attr( + all(feature = "unstable", feature = "deserialize"), + derive(serde::Deserialize) +)] +#[non_exhaustive] +#[derive(std::clone::Clone, std::cmp::PartialEq)] +pub struct UploadPartCopyOutput { + ... +} +``` + +## Enum Representation +`serde` allows programmers to use one of four different tagging ([internal, external, adjacent and untagged](https://serde.rs/enum-representations.html)) when serializing an enum. +### untagged + You cannot deserialize serialized data in some cases. + For example, [aws_sdk_dynamodb::model::AttributeValue](https://docs.rs/aws-sdk-dynamodb/latest/aws_sdk_dynamodb/model/enum.AttributeValue.html) has `Null(bool)` and `Bool(bool)`, which you cannot distinguish serialized values without a tag. + + +### internal + This results in compile time error. + [*Using a #[serde(tag = "...")] attribute on an enum containing a tuple variant is an error at compile time*](https://serde.rs/enum-representations.html). + +### external and adjacent +We are left with `external` and `adjacent` tagging. +External tagging is the default way. +This RFC can be achieved either way. + +The resulting size of the serialized data is smaller when tagged externally, as adjacent tagging will require a tag even when a variant has no content. + +For the reasons mentioned above, we implement an enum that is externally tagged. + +## Data Types to Skip +We are going to skip serialization and deserialization of fields that have the following datatypes. + +- `aws_smithy_http::byte_stream::ByteStream` +- `aws_smithy_http::event_stream::Receiver` +- `aws_smithy_http::event_stream::EventStreamSender` + +Any fields with these data types are tagged with `#[serde(skip, default = "..name of the function for tailored serilaization/de-serialization")]`. + +Here are some examples of data types affected by this decision: +- `aws_sdk_transcribestreaming::client::fluent_builders::StartMedicalStreamTranscription` +- `aws_sdk_s3::input::put_object_input::PutObjectInput` + +We considered serializing them as bytes, however, it could take some time for a stream to reach the end, and the resulting serialized data may be too big for itself to fit into the ram. +Additionally, those data types are sometimes used to represent bi-directional data transfer, which is not serializable. + +Here is an example of struct with a field which comes with custom serialization/de-serialization logic. +```rust +#[allow(missing_docs)] +#[cfg_attr( + all(feature = "unstable", feature = "serialize"), + derive(serde::Serialize) +)] +#[cfg_attr( + all(feature = "unstable", feature = "deserialize"), + derive(serde::Deserialize) +)] +#[non_exhaustive] +#[derive(std::fmt::Debug)] +pub struct ExampleStreamTranscriptionOutput { + #[cfg_attr( + any( + all(feature = "unstable", feature = "deserialize"), + all(feature = "unstable", feature = "serialize") + ), + serde( + skip, + default = "aws_smithy_http::event_stream::Receiver::deserialized_receiver" + ) + )] + pub transcript_result_stream: aws_smithy_http::event_stream::Receiver< + crate::model::ExampleTranscriptResultStream, + crate::error::ExampleTranscriptResultStreamError, + > +} +``` + +## `Serde` traits implemented on Builder of Output Types +Output data, such as `aws_sdk_dynamodb::output::UpdateTableOutput` has builder types. +These builder types are available to users, however, no API requires users to build data types by themselves. + +We considered removing traits from these data types, however, the code-gen framework does not carry the necessary metadata to determine whether the data is the builder type of an output type or not. +We conclude that we must avoid such a technical challenge to bring this RFC to life. + +## `fn set_fields` to allow users supply externally created `Input`s + +SDK does not have a method that allows users supply deserialized inputs. +Thus, we add a new method `fn set_fields` to `Client` types. +This method accepts inputs and replace all parameters that `Client` has with the new one. + +```rust +pub fn set_fields(mut self, builder: path::to::builder_type) -> path::to::builder_type { + self.inner = new_parameter; + self +} +``` + +# Other Concerns +## Model evolution +SDK will introduce new fields and it is possible that we may see new data types in future. + +We believe that this will not be a problem. + +### Introduction of New Fields +Most fields are `Option` type. +When user de-serializes data written for a format before the new fields were introduced, new field will be assigned with `None` type. + +If a field isn't `Option`, `serde` uses `Default` trait unless a custom de-serialization/serialization is specified to generate data to fill the field. +If the new field is not an `Option` type and has no `Default` implementation, we must implement a custom de-serialization logic. + +In case of serilization, introduction of new fields will not be an issue unless the data format requires a schema. (e.g. parquet, avro) However, this is outside the scope of this RFC. + +## Introduction of New Data Type +If a new field introduces a new data type, it will not require any additional work if the data type can derive `serde` traits. + +If the data cannot derive `serde` traits on it's own, then we have two options. +To clarify, this is the same approach we took on `Data Type to skip` section. +1. skip + We will simply skip serializing/de-serializing. However, we may need to implement custom serialization/de-serialization logic if a value is not wrapped with `Option`. +2. custom serialization/de-serialization logic + We can implement tailored serialization/de-serialization logic. + +Either way, we will mention this on the generated docs to avoid suprising users. + +e.g. +```rust +#[derive(serde::Serialize, serde::Deserialize)] +struct OutputV1 { + string_field: Option +} + +#[derive(serde::Serialize, serde::Deserialize)] +struct OutputV2 { + string_field: Option, + // this will always be treated as None value by serde + #[serde(skip)] + skip_not_serializable: Option, + // We can implement a custom serialization logic + #[serde(serialize_with = "custom_serilization_logic", deserialize_with = "custom_deserilization_logic")] + not_derive_able: SomeComplexDataType, + // Serialization will be skipped, and de-serialization will be handled with the function provided on default tag + #[serde(skip, default = "default_value")] + skip_with_custom: DataTypeWithoutDefaultTrait, +} +``` +# Discussions + +## Serialization and de-serialization support for an entire response/request +The problem with serialization/de-serialization of an entire response/request the lack of data type that can be mapped to `serde`'s data model field by field. + +Currently, SDK has no data type that represents an entire response or request that can be mapped to `serde`'s data model; Thus, you must introduce a schema and implement logics that allows users to serialize/de-serialize their data. + +Although this RFC does not solve this issue, we believe that this RFC will help future contirbutor who wishes to implement serialization and de-serialization support for an entire response/request. + + +## Sensitive Information +If serialized data contains sensitive information, it will not be masked. +We mention that fields can compromised such information on every struct field to ensure that users know this. + +## Compile Time +We ran the following benchmark on C6a.2xlarge instance with 50gb of GP2 SSD. +The commit hash of the code is a8e2e19129aead4fbc8cf0e3d34df0188a62de9f. + +It clearly shows an increase in compile time. +Users are advised to consider the use of software such as [sccache](https://github.com/mozilla/sccache) or [mold](https://github.com/rui314/mold) to reduce the compile time. + +- `aws-sdk-dynamodb` + +| command | real time | user time | sys time | +| ----------------------------------------------------------- | --------- | --------- | --------- | +| cargo build | 0m35.728s | 2m24.243s | 0m11.868s | +| cargo build --release | 0m52.040s | 5m0.841s | 0m11.313s | +| cargo build --features unstable-serde-serialize | 0m38.079s | 2m26.082s | 0m11.631s | +| cargo build --release --features unstable-serde-serialize | 0m53.153s | 5m4.069s | 0m11.577s | +| cargo build --features unstable-serde-deserialize | 0m45.689s | 2m34.000s | 0m11.978s | +| cargo build --release --features unstable-serde-deserialize | 1m0.107s | 5m10.231s | 0m11.699s | +| cargo build --all-features | 0m48.959s | 2m45.688s | 0m13.359s | +| cargo build --release --all-features | 1m3.198s | 5m26.076s | 0m12.311s | + +- `aws-sdk-ec2` + +| command | real time | user time | sys time | +| ----------------------------------------------------------- | --------- | ---------- | --------- | +| cargo build | 1m20.041s | 2m14.592s | 0m6.611s | +| cargo build --release | 2m29.480s | 9m19.530s | 0m15.957s | +| cargo build --features unstable-serde-serialize | 2m0.555s | 4m24.881s | 0m16.131s | +| cargo build --release --features unstable-serde-serialize | 2m45.002s | 9m43.098s | 0m16.886s | +| cargo build --features unstable-serde-deserialize | 3m10.857s | 5m34.246s | 0m18.844s | +| cargo build --release --features unstable-serde-deserialize | 3m47.531s | 10m52.017s | 0m18.404s | +| cargo build --all-features | 3m31.473s | 6m1.052s | 0m19.681s | +| cargo build --release --all-features | 3m45.208s | 8m46.168s | 0m10.211s | + + +## Misleading Results +SDK team previously expressed concern that serialized data may be misleading. +We believe that features implemented as part of this RFC does not produce misleading result as we focus on builder types and it's corresponding data types which are mapped to serde's data type model with the derive macro. + +# Feature Gate +`Serde` traits are implemented behind feature gates. +`Serialize` is implemented behind `serialize`, while `Deserialize` is implemented behind `deserialize`. +Users must enable the `unstable` feature to expose those features. + +We considered giving each feature a dedicated feature gate such as `unstable-serde-serialize`. +In this case, we will need to change the name of feature gates entirely once it leaves the unstable status which will cause users to make changes to their code base. +We conclude that this brings no benefit to the users. + +## Keeping both features behind the same feature gate +We considered keeping both features behind the same feature gate. +There is no significant difference in the complexity of implementation. +We do not see any benefit in keeping them behind a same feature-gate as this will only result in increase of compile time when users are not in need of one of the feature. + +## Different feature gates for different data types +We considered implementing different feature gates for output, input and their corresponding data types. +For example, output and input types can have `output-serde-*` and `input-serde-*`. +We are unable to do this as relevant meta data is not available during the code-gen. + + +Changes checklist +----------------- +- [ ] Implement human-redable serialization for `DateTime` and `Blob` in `aws_smithy_types` +- [ ] Implement non-human-redable serialization for `DateTime` and `Blob` in `aws_smithy_types` +- [ ] Implement `Serialize` and `Deserialize` for relevant data types in `aws_smithy_types` +- [ ] Modify Kotlin's codegen so that generated Builder and non-Builder types implement `Serialize` and `Deserialize` +- [ ] Add feature gate for `Serialize` and `Deserialize` +- [ ] Prepare examples +- [ ] Prepare reproducible compile time benchmark From 2010fbca1e286aae83ffc7d48ec84b560b87d522 Mon Sep 17 00:00:00 2001 From: Thomas Cameron Date: Wed, 21 Dec 2022 16:13:23 +0000 Subject: [PATCH 02/15] commit --- design/src/SUMMARY.md | 11 +- ...c0028_serialization_and_deserialization.md | 349 ++++++++++++++++++ 2 files changed, 358 insertions(+), 2 deletions(-) create mode 100644 design/src/rfcs/rfc0028_serialization_and_deserialization.md diff --git a/design/src/SUMMARY.md b/design/src/SUMMARY.md index 8067cc5a9c..29676ae455 100644 --- a/design/src/SUMMARY.md +++ b/design/src/SUMMARY.md @@ -15,10 +15,12 @@ - [Backwards Compatibility](smithy/backwards-compat.md) - [Server](./server/overview.md) + - [Middleware](./server/middleware.md) + - [Instrumentation](./server/instrumentation.md) + - [Accessing Un-modelled Data](./server/from_parts.md) + - [The Anatomy of a Service](./server/anatomy.md) - [Generating Common Service Code](./server/code_generation.md) - [Generating the Pokémon Service](./server/pokemon_service.md) - - [Instrumentation](./server/instrumentation.md) - - [RFCs](./rfcs/overview.md) - [RFC-0001: Sharing configuration between multiple clients](./rfcs/rfc0001_shared_config.md) @@ -43,6 +45,11 @@ - [RFC-0020: Service Builder Improvements](./rfcs/rfc0020_service_builder.md) - [RFC-0021: Dependency Versions](./rfcs/rfc0021_dependency_versions.md) - [RFC-0022: Error Context and Compatibility](./rfcs/rfc0022_error_context_and_compatibility.md) + - [RFC-0023: Evolving the new service builder API](./rfcs/rfc0023_refine_builder.md) + - [RFC-0024: RequestID](./rfcs/rfc0024_request_id.md) + - [RFC-0025: Constraint traits](./rfcs/rfc0025_constraint_traits.md) + - [RFC-0026: Client Crate Organization](./rfcs/rfc0026_client_crate_organization.md) + - [RFC-0028: Serialization And Deserialization](./rfcs/rfc0028_serialization_and_deserialization.md) - [Contributing](./contributing/overview.md) - [Writing and debugging a low-level feature that relies on HTTP](./contributing/writing_and_debugging_a_low-level_feature_that_relies_on_HTTP.md) diff --git a/design/src/rfcs/rfc0028_serialization_and_deserialization.md b/design/src/rfcs/rfc0028_serialization_and_deserialization.md new file mode 100644 index 0000000000..a1445d6140 --- /dev/null +++ b/design/src/rfcs/rfc0028_serialization_and_deserialization.md @@ -0,0 +1,349 @@ +RFC: Serialization and Deserialization +============= + +> Status: RFC +> +> Applies to: Output, Input and Builder types as well as `DateTime`, `Document`, `Blob` and `Number` implemented in `aws_smithy_types` crate. + +# Terminology +- Builder + Refers to data types prefixed with `Builder`, which converts itself into a corresponding data type upon being built. e.g. `aws_sdk_dynamodb::input::PutItemInput`. +- serde + Refers to `serde` crate. +- `Serialize` + Refers to `Serialize` trait avaialble on `serde` crate. +- `Deserialize` + Refers to `Deserialize` trait available on `serde` crate. + + +# Overview +We are going to implement Serialize and Deserialize traits from `serde` crate to some data types. +Data types that are going to be affected are; +- builder data types +- operation `Input` types +- operation `Output` types +- data types that builder types may have on its field(s) +- `aws_smithy_types::DateTime` +- `aws_smithy_types::Document` +- `aws_smithy_types::Blob` +- `aws_smithy_types::Number` + +`DateTime` and `Blob` implements different serialization/deserialization format for human-readable and non-human readable format; We must emphasize that these 2 formats are not compatible to each other. Reason for this is explained at the [Blob](#blob) section and [Date Time](#datetime). + +Additionally, we add `fn set_fields` to fluent builders to allow users to set the data they deserialized to fluent builders. + +Lastly, we emphasize that this RFC does NOT aim to serialize the entire response or request or implement `serde` traits on data types for server-side code. + +# Use Case +Users have requested `serde` traits to be implemented on data types implemented in rust SDK. +We have created this RFC with the following use cases in mind. +1. [[request]: Serialize/Deserialize of models for Lambda events #269](https://github.com/awslabs/aws-sdk-rust/issues/269) +2. [Tests](https://awslabs.github.io/smithy-rs/design/faq.html#why-dont-the-sdk-service-crates-implement-serdeserialize-or-serdedeserialize-for-any-types) as suggested in the design FAQ. +3. Building tools + +# Implementation +## Smithy Types +`aws_smithy_types` is a crate that implements smithy's data types. +These data types must implement serde traits as well since SDK uses the data types. + +### Blob +`Serialize` and `Deserialize` is not implemented with derive macro. + +In human-readable format, `Blob` is serialized as a base64 encoded string and any data to be deserialized as this data type must be encoded in base 64. +Encoding must be carried out by `base64::encode` function available from `aws_smithy_types` crate. +Non-human readable format serializes `Blob` with `fn serialize_bytes`. + +- Reason behind the implementation of human-readable format + +`aws_smithy_types` crate comes with functions for encoding/decoding base 64, which makes the implementation simpler. +Additionally, AWS CLI and AWS SDK for other languages require data to be encoded in base 64 when it requires `Blob` type as input. + +We also considered serializing them with `serialize_bytes`, without encoding them with `serialize_bytes`. +In this case, the implementation will depend on the implementation of the library author. + +There are many different crates, so we decided to survey how some of the most popular crates implement this feature. + +| library | version | implementation | all time downloads on crate.io as of writing (Dec, 2022) | +| ---------- | ------- | --------------- | -------------------------------------------------------- | +| serde_json | 1.0 | Array of number | 109,491,713 | +| toml | 0.5.9 | Array of number | 63,601,994 | +| serde_yaml | 0.9.14 | Unsupported | 23,767,300 | + +First of all, bytes could have hundreds of elements; reading an array of hundred of numbers will never be a pleasing experience, and it is especially troubling when you are writing data for test cases. +Additionally, it has come to our attention that some crates just doesn't support it, which would hinder users ability to be productive and tie users' hand. + +For the reasons described above, we believe that it is crucial to encode them to string and base64 is favourable over other encoding schemes such as base 16, 32, or Ascii85. + +- Reason behind the implementation of non-human readable format +We considered using the same logic for non-human readable format as well. +However, readable-ness is not necessary for non-human readable format. +Additionally, non-human readable format tends to emphasize resource efficiency over human-readable format; Base64 encoded string would take up more space, which is not what the users would want. + +Thus, we believe that implementing a tailored serialization logic would be beneficial to the users. + + +### DateTime +`Serialize` and `Deserialize` is not implemented with derive macro. +For human-readable format, `DateTime` is serialized in RFC-3339 format; +It expects the value to be in RFC-3339 format when it is Deserialized. + +Non-human readable implements `DateTime` as a tuple of `u32` and `i64`; the latter corresponds to `seconds` field and the first is the `seubsecond_nanos`. + +- Reason behind the implementation of a human-readable format + +For serialization, `DateTime` format already implements a function to encode itself into RFC-3339 format. +For deserialization, it is possible to accept other formats, we can add this later if we find it reasonable. + +- Reason behind the implementation of a non-human readable format + +Serializing them as tuples of two integers results in a smaller data size and requires less computing power than any string-based format. +Tuple will be smaller in size as it does not require tagging like in maps. + +### Document +`Serialize` and `Deserialize` is implemented with derive macro. +Additionally, it implements container attribute `#[serde(untagged)]`. +Serde can distinguish each variant without tagging thanks to the difference in each variant's datatypes. + +### Number +`Serialize` and `Deserialize` is implemented with derive macro. +Additionally, it implements container attribute `#[serde(untagged)]`. + +Serde can distinguish each variant without a tag as each variant's content is different. + +## Builder Types and Non-Builder Types +Builder types and non Builder types implement `Serialize` and `Deserialize` with derive macro. +Derive macro will be implemented behind a feature-gate; Users must enable `serialize` to use `serde::Serialize`, and `deserialize` to use `serde::Deserialize` respectively. Additionally, user must enable `unstable` until the stablization of this RFC. + +```rust +#[allow(missing_docs)] // documentation missing in model +#[cfg_attr( + all(feature = "unstable", feature = "serialize"), + derive(serde::Serialize) +)] +#[cfg_attr( + all(feature = "unstable", feature = "deserialize"), + derive(serde::Deserialize) +)] +#[non_exhaustive] +#[derive(std::clone::Clone, std::cmp::PartialEq)] +pub struct UploadPartCopyOutput { + ... +} +``` + +## Enum Representation +`serde` allows programmers to use one of four different tagging ([internal, external, adjacent and untagged](https://serde.rs/enum-representations.html)) when serializing an enum. +### untagged + You cannot deserialize serialized data in some cases. + For example, [aws_sdk_dynamodb::model::AttributeValue](https://docs.rs/aws-sdk-dynamodb/latest/aws_sdk_dynamodb/model/enum.AttributeValue.html) has `Null(bool)` and `Bool(bool)`, which you cannot distinguish serialized values without a tag. + + +### internal + This results in compile time error. + [*Using a #[serde(tag = "...")] attribute on an enum containing a tuple variant is an error at compile time*](https://serde.rs/enum-representations.html). + +### external and adjacent +We are left with `external` and `adjacent` tagging. +External tagging is the default way. +This RFC can be achieved either way. + +The resulting size of the serialized data is smaller when tagged externally, as adjacent tagging will require a tag even when a variant has no content. + +For the reasons mentioned above, we implement an enum that is externally tagged. + +## Data Types to Skip +We are going to skip serialization and deserialization of fields that have the following datatypes. + +- `aws_smithy_http::byte_stream::ByteStream` +- `aws_smithy_http::event_stream::Receiver` +- `aws_smithy_http::event_stream::EventStreamSender` + +Any fields with these data types are tagged with `#[serde(skip, default = "..name of the function for tailored serilaization/de-serialization")]`. + +Here are some examples of data types affected by this decision: +- `aws_sdk_transcribestreaming::client::fluent_builders::StartMedicalStreamTranscription` +- `aws_sdk_s3::input::put_object_input::PutObjectInput` + +We considered serializing them as bytes, however, it could take some time for a stream to reach the end, and the resulting serialized data may be too big for itself to fit into the ram. +Additionally, those data types are sometimes used to represent bi-directional data transfer, which is not serializable. + +Here is an example of struct with a field which comes with custom serialization/de-serialization logic. +```rust +#[allow(missing_docs)] +#[cfg_attr( + all(feature = "unstable", feature = "serialize"), + derive(serde::Serialize) +)] +#[cfg_attr( + all(feature = "unstable", feature = "deserialize"), + derive(serde::Deserialize) +)] +#[non_exhaustive] +#[derive(std::fmt::Debug)] +pub struct ExampleStreamTranscriptionOutput { + #[cfg_attr( + any( + all(feature = "unstable", feature = "deserialize"), + all(feature = "unstable", feature = "serialize") + ), + serde( + skip, + default = "aws_smithy_http::event_stream::Receiver::deserialized_receiver" + ) + )] + pub transcript_result_stream: aws_smithy_http::event_stream::Receiver< + crate::model::ExampleTranscriptResultStream, + crate::error::ExampleTranscriptResultStreamError, + > +} +``` + +## `Serde` traits implemented on Builder of Output Types +Output data, such as `aws_sdk_dynamodb::output::UpdateTableOutput` has builder types. +These builder types are available to users, however, no API requires users to build data types by themselves. + +We considered removing traits from these data types, however, the code-gen framework does not carry the necessary metadata to determine whether the data is the builder type of an output type or not. +We conclude that we must avoid such a technical challenge to bring this RFC to life. + +## `fn set_fields` to allow users supply externally created `Input`s + +SDK does not have a method that allows users supply deserialized inputs. +Thus, we add a new method `fn set_fields` to `Client` types. +This method accepts inputs and replace all parameters that `Client` has with the new one. + +```rust +pub fn set_fields(mut self, builder: path::to::builder_type) -> path::to::builder_type { + self.inner = new_parameter; + self +} +``` + +# Other Concerns +## Model evolution +SDK will introduce new fields and it is possible that we may see new data types in future. + +We believe that this will not be a problem. + +### Introduction of New Fields +Most fields are `Option` type. +When user de-serializes data written for a format before the new fields were introduced, new field will be assigned with `None` type. + +If a field isn't `Option`, `serde` uses `Default` trait unless a custom de-serialization/serialization is specified to generate data to fill the field. +If the new field is not an `Option` type and has no `Default` implementation, we must implement a custom de-serialization logic. + +In case of serilization, introduction of new fields will not be an issue unless the data format requires a schema. (e.g. parquet, avro) However, this is outside the scope of this RFC. + +## Introduction of New Data Type +If a new field introduces a new data type, it will not require any additional work if the data type can derive `serde` traits. + +If the data cannot derive `serde` traits on it's own, then we have two options. +To clarify, this is the same approach we took on `Data Type to skip` section. +1. skip + We will simply skip serializing/de-serializing. However, we may need to implement custom serialization/de-serialization logic if a value is not wrapped with `Option`. +2. custom serialization/de-serialization logic + We can implement tailored serialization/de-serialization logic. + +Either way, we will mention this on the generated docs to avoid suprising users. + +e.g. +```rust +#[derive(serde::Serialize, serde::Deserialize)] +struct OutputV1 { + string_field: Option +} + +#[derive(serde::Serialize, serde::Deserialize)] +struct OutputV2 { + string_field: Option, + // this will always be treated as None value by serde + #[serde(skip)] + skip_not_serializable: Option, + // We can implement a custom serialization logic + #[serde(serialize_with = "custom_serilization_logic", deserialize_with = "custom_deserilization_logic")] + not_derive_able: SomeComplexDataType, + // Serialization will be skipped, and de-serialization will be handled with the function provided on default tag + #[serde(skip, default = "default_value")] + skip_with_custom: DataTypeWithoutDefaultTrait, +} +``` +# Discussions + +## Serialization and de-serialization support for an entire response/request +The problem with serialization/de-serialization of an entire response/request the lack of data type that can be mapped to `serde`'s data model field by field. + +Currently, SDK has no data type that represents an entire response or request that can be mapped to `serde`'s data model; Thus, you must introduce a schema and implement logics that allows users to serialize/de-serialize their data. + +Although this RFC does not solve this issue, we believe that this RFC will help future contirbutor who wishes to implement serialization and de-serialization support for an entire response/request. + + +## Sensitive Information +If serialized data contains sensitive information, it will not be masked. +We mention that fields can compromised such information on every struct field to ensure that users know this. + +## Compile Time +We ran the following benchmark on C6a.2xlarge instance with 50gb of GP2 SSD. +The commit hash of the code is a8e2e19129aead4fbc8cf0e3d34df0188a62de9f. + +It clearly shows an increase in compile time. +Users are advised to consider the use of software such as [sccache](https://github.com/mozilla/sccache) or [mold](https://github.com/rui314/mold) to reduce the compile time. + +- `aws-sdk-dynamodb` + +| command | real time | user time | sys time | +| ----------------------------------------------------------- | --------- | --------- | --------- | +| cargo build | 0m35.728s | 2m24.243s | 0m11.868s | +| cargo build --release | 0m52.040s | 5m0.841s | 0m11.313s | +| cargo build --features unstable-serde-serialize | 0m38.079s | 2m26.082s | 0m11.631s | +| cargo build --release --features unstable-serde-serialize | 0m53.153s | 5m4.069s | 0m11.577s | +| cargo build --features unstable-serde-deserialize | 0m45.689s | 2m34.000s | 0m11.978s | +| cargo build --release --features unstable-serde-deserialize | 1m0.107s | 5m10.231s | 0m11.699s | +| cargo build --all-features | 0m48.959s | 2m45.688s | 0m13.359s | +| cargo build --release --all-features | 1m3.198s | 5m26.076s | 0m12.311s | + +- `aws-sdk-ec2` + +| command | real time | user time | sys time | +| ----------------------------------------------------------- | --------- | ---------- | --------- | +| cargo build | 1m20.041s | 2m14.592s | 0m6.611s | +| cargo build --release | 2m29.480s | 9m19.530s | 0m15.957s | +| cargo build --features unstable-serde-serialize | 2m0.555s | 4m24.881s | 0m16.131s | +| cargo build --release --features unstable-serde-serialize | 2m45.002s | 9m43.098s | 0m16.886s | +| cargo build --features unstable-serde-deserialize | 3m10.857s | 5m34.246s | 0m18.844s | +| cargo build --release --features unstable-serde-deserialize | 3m47.531s | 10m52.017s | 0m18.404s | +| cargo build --all-features | 3m31.473s | 6m1.052s | 0m19.681s | +| cargo build --release --all-features | 3m45.208s | 8m46.168s | 0m10.211s | + + +## Misleading Results +SDK team previously expressed concern that serialized data may be misleading. +We believe that features implemented as part of this RFC does not produce misleading result as we focus on builder types and it's corresponding data types which are mapped to serde's data type model with the derive macro. + +# Feature Gate +`Serde` traits are implemented behind feature gates. +`Serialize` is implemented behind `serialize`, while `Deserialize` is implemented behind `deserialize`. +Users must enable the `unstable` feature to expose those features. + +We considered giving each feature a dedicated feature gate such as `unstable-serde-serialize`. +In this case, we will need to change the name of feature gates entirely once it leaves the unstable status which will cause users to make changes to their code base. +We conclude that this brings no benefit to the users. + +## Keeping both features behind the same feature gate +We considered keeping both features behind the same feature gate. +There is no significant difference in the complexity of implementation. +We do not see any benefit in keeping them behind a same feature-gate as this will only result in increase of compile time when users are not in need of one of the feature. + +## Different feature gates for different data types +We considered implementing different feature gates for output, input and their corresponding data types. +For example, output and input types can have `output-serde-*` and `input-serde-*`. +We are unable to do this as relevant meta data is not available during the code-gen. + + +Changes checklist +----------------- +- [ ] Implement human-redable serialization for `DateTime` and `Blob` in `aws_smithy_types` +- [ ] Implement non-human-redable serialization for `DateTime` and `Blob` in `aws_smithy_types` +- [ ] Implement `Serialize` and `Deserialize` for relevant data types in `aws_smithy_types` +- [ ] Modify Kotlin's codegen so that generated Builder and non-Builder types implement `Serialize` and `Deserialize` +- [ ] Add feature gate for `Serialize` and `Deserialize` +- [ ] Prepare examples +- [ ] Prepare reproducible compile time benchmark From 75b78684f9d176b78eb8923147cd61a9cada018d Mon Sep 17 00:00:00 2001 From: Thomas Cameron <68596478+thomas-k-cameron@users.noreply.github.com> Date: Thu, 22 Dec 2022 01:14:59 +0900 Subject: [PATCH 03/15] Delete rfc0023_serialization_and_deserialization.md --- ...c0023_serialization_and_deserialization.md | 349 ------------------ 1 file changed, 349 deletions(-) delete mode 100644 design/src/rfcs/rfc0023_serialization_and_deserialization.md diff --git a/design/src/rfcs/rfc0023_serialization_and_deserialization.md b/design/src/rfcs/rfc0023_serialization_and_deserialization.md deleted file mode 100644 index a1445d6140..0000000000 --- a/design/src/rfcs/rfc0023_serialization_and_deserialization.md +++ /dev/null @@ -1,349 +0,0 @@ -RFC: Serialization and Deserialization -============= - -> Status: RFC -> -> Applies to: Output, Input and Builder types as well as `DateTime`, `Document`, `Blob` and `Number` implemented in `aws_smithy_types` crate. - -# Terminology -- Builder - Refers to data types prefixed with `Builder`, which converts itself into a corresponding data type upon being built. e.g. `aws_sdk_dynamodb::input::PutItemInput`. -- serde - Refers to `serde` crate. -- `Serialize` - Refers to `Serialize` trait avaialble on `serde` crate. -- `Deserialize` - Refers to `Deserialize` trait available on `serde` crate. - - -# Overview -We are going to implement Serialize and Deserialize traits from `serde` crate to some data types. -Data types that are going to be affected are; -- builder data types -- operation `Input` types -- operation `Output` types -- data types that builder types may have on its field(s) -- `aws_smithy_types::DateTime` -- `aws_smithy_types::Document` -- `aws_smithy_types::Blob` -- `aws_smithy_types::Number` - -`DateTime` and `Blob` implements different serialization/deserialization format for human-readable and non-human readable format; We must emphasize that these 2 formats are not compatible to each other. Reason for this is explained at the [Blob](#blob) section and [Date Time](#datetime). - -Additionally, we add `fn set_fields` to fluent builders to allow users to set the data they deserialized to fluent builders. - -Lastly, we emphasize that this RFC does NOT aim to serialize the entire response or request or implement `serde` traits on data types for server-side code. - -# Use Case -Users have requested `serde` traits to be implemented on data types implemented in rust SDK. -We have created this RFC with the following use cases in mind. -1. [[request]: Serialize/Deserialize of models for Lambda events #269](https://github.com/awslabs/aws-sdk-rust/issues/269) -2. [Tests](https://awslabs.github.io/smithy-rs/design/faq.html#why-dont-the-sdk-service-crates-implement-serdeserialize-or-serdedeserialize-for-any-types) as suggested in the design FAQ. -3. Building tools - -# Implementation -## Smithy Types -`aws_smithy_types` is a crate that implements smithy's data types. -These data types must implement serde traits as well since SDK uses the data types. - -### Blob -`Serialize` and `Deserialize` is not implemented with derive macro. - -In human-readable format, `Blob` is serialized as a base64 encoded string and any data to be deserialized as this data type must be encoded in base 64. -Encoding must be carried out by `base64::encode` function available from `aws_smithy_types` crate. -Non-human readable format serializes `Blob` with `fn serialize_bytes`. - -- Reason behind the implementation of human-readable format - -`aws_smithy_types` crate comes with functions for encoding/decoding base 64, which makes the implementation simpler. -Additionally, AWS CLI and AWS SDK for other languages require data to be encoded in base 64 when it requires `Blob` type as input. - -We also considered serializing them with `serialize_bytes`, without encoding them with `serialize_bytes`. -In this case, the implementation will depend on the implementation of the library author. - -There are many different crates, so we decided to survey how some of the most popular crates implement this feature. - -| library | version | implementation | all time downloads on crate.io as of writing (Dec, 2022) | -| ---------- | ------- | --------------- | -------------------------------------------------------- | -| serde_json | 1.0 | Array of number | 109,491,713 | -| toml | 0.5.9 | Array of number | 63,601,994 | -| serde_yaml | 0.9.14 | Unsupported | 23,767,300 | - -First of all, bytes could have hundreds of elements; reading an array of hundred of numbers will never be a pleasing experience, and it is especially troubling when you are writing data for test cases. -Additionally, it has come to our attention that some crates just doesn't support it, which would hinder users ability to be productive and tie users' hand. - -For the reasons described above, we believe that it is crucial to encode them to string and base64 is favourable over other encoding schemes such as base 16, 32, or Ascii85. - -- Reason behind the implementation of non-human readable format -We considered using the same logic for non-human readable format as well. -However, readable-ness is not necessary for non-human readable format. -Additionally, non-human readable format tends to emphasize resource efficiency over human-readable format; Base64 encoded string would take up more space, which is not what the users would want. - -Thus, we believe that implementing a tailored serialization logic would be beneficial to the users. - - -### DateTime -`Serialize` and `Deserialize` is not implemented with derive macro. -For human-readable format, `DateTime` is serialized in RFC-3339 format; -It expects the value to be in RFC-3339 format when it is Deserialized. - -Non-human readable implements `DateTime` as a tuple of `u32` and `i64`; the latter corresponds to `seconds` field and the first is the `seubsecond_nanos`. - -- Reason behind the implementation of a human-readable format - -For serialization, `DateTime` format already implements a function to encode itself into RFC-3339 format. -For deserialization, it is possible to accept other formats, we can add this later if we find it reasonable. - -- Reason behind the implementation of a non-human readable format - -Serializing them as tuples of two integers results in a smaller data size and requires less computing power than any string-based format. -Tuple will be smaller in size as it does not require tagging like in maps. - -### Document -`Serialize` and `Deserialize` is implemented with derive macro. -Additionally, it implements container attribute `#[serde(untagged)]`. -Serde can distinguish each variant without tagging thanks to the difference in each variant's datatypes. - -### Number -`Serialize` and `Deserialize` is implemented with derive macro. -Additionally, it implements container attribute `#[serde(untagged)]`. - -Serde can distinguish each variant without a tag as each variant's content is different. - -## Builder Types and Non-Builder Types -Builder types and non Builder types implement `Serialize` and `Deserialize` with derive macro. -Derive macro will be implemented behind a feature-gate; Users must enable `serialize` to use `serde::Serialize`, and `deserialize` to use `serde::Deserialize` respectively. Additionally, user must enable `unstable` until the stablization of this RFC. - -```rust -#[allow(missing_docs)] // documentation missing in model -#[cfg_attr( - all(feature = "unstable", feature = "serialize"), - derive(serde::Serialize) -)] -#[cfg_attr( - all(feature = "unstable", feature = "deserialize"), - derive(serde::Deserialize) -)] -#[non_exhaustive] -#[derive(std::clone::Clone, std::cmp::PartialEq)] -pub struct UploadPartCopyOutput { - ... -} -``` - -## Enum Representation -`serde` allows programmers to use one of four different tagging ([internal, external, adjacent and untagged](https://serde.rs/enum-representations.html)) when serializing an enum. -### untagged - You cannot deserialize serialized data in some cases. - For example, [aws_sdk_dynamodb::model::AttributeValue](https://docs.rs/aws-sdk-dynamodb/latest/aws_sdk_dynamodb/model/enum.AttributeValue.html) has `Null(bool)` and `Bool(bool)`, which you cannot distinguish serialized values without a tag. - - -### internal - This results in compile time error. - [*Using a #[serde(tag = "...")] attribute on an enum containing a tuple variant is an error at compile time*](https://serde.rs/enum-representations.html). - -### external and adjacent -We are left with `external` and `adjacent` tagging. -External tagging is the default way. -This RFC can be achieved either way. - -The resulting size of the serialized data is smaller when tagged externally, as adjacent tagging will require a tag even when a variant has no content. - -For the reasons mentioned above, we implement an enum that is externally tagged. - -## Data Types to Skip -We are going to skip serialization and deserialization of fields that have the following datatypes. - -- `aws_smithy_http::byte_stream::ByteStream` -- `aws_smithy_http::event_stream::Receiver` -- `aws_smithy_http::event_stream::EventStreamSender` - -Any fields with these data types are tagged with `#[serde(skip, default = "..name of the function for tailored serilaization/de-serialization")]`. - -Here are some examples of data types affected by this decision: -- `aws_sdk_transcribestreaming::client::fluent_builders::StartMedicalStreamTranscription` -- `aws_sdk_s3::input::put_object_input::PutObjectInput` - -We considered serializing them as bytes, however, it could take some time for a stream to reach the end, and the resulting serialized data may be too big for itself to fit into the ram. -Additionally, those data types are sometimes used to represent bi-directional data transfer, which is not serializable. - -Here is an example of struct with a field which comes with custom serialization/de-serialization logic. -```rust -#[allow(missing_docs)] -#[cfg_attr( - all(feature = "unstable", feature = "serialize"), - derive(serde::Serialize) -)] -#[cfg_attr( - all(feature = "unstable", feature = "deserialize"), - derive(serde::Deserialize) -)] -#[non_exhaustive] -#[derive(std::fmt::Debug)] -pub struct ExampleStreamTranscriptionOutput { - #[cfg_attr( - any( - all(feature = "unstable", feature = "deserialize"), - all(feature = "unstable", feature = "serialize") - ), - serde( - skip, - default = "aws_smithy_http::event_stream::Receiver::deserialized_receiver" - ) - )] - pub transcript_result_stream: aws_smithy_http::event_stream::Receiver< - crate::model::ExampleTranscriptResultStream, - crate::error::ExampleTranscriptResultStreamError, - > -} -``` - -## `Serde` traits implemented on Builder of Output Types -Output data, such as `aws_sdk_dynamodb::output::UpdateTableOutput` has builder types. -These builder types are available to users, however, no API requires users to build data types by themselves. - -We considered removing traits from these data types, however, the code-gen framework does not carry the necessary metadata to determine whether the data is the builder type of an output type or not. -We conclude that we must avoid such a technical challenge to bring this RFC to life. - -## `fn set_fields` to allow users supply externally created `Input`s - -SDK does not have a method that allows users supply deserialized inputs. -Thus, we add a new method `fn set_fields` to `Client` types. -This method accepts inputs and replace all parameters that `Client` has with the new one. - -```rust -pub fn set_fields(mut self, builder: path::to::builder_type) -> path::to::builder_type { - self.inner = new_parameter; - self -} -``` - -# Other Concerns -## Model evolution -SDK will introduce new fields and it is possible that we may see new data types in future. - -We believe that this will not be a problem. - -### Introduction of New Fields -Most fields are `Option` type. -When user de-serializes data written for a format before the new fields were introduced, new field will be assigned with `None` type. - -If a field isn't `Option`, `serde` uses `Default` trait unless a custom de-serialization/serialization is specified to generate data to fill the field. -If the new field is not an `Option` type and has no `Default` implementation, we must implement a custom de-serialization logic. - -In case of serilization, introduction of new fields will not be an issue unless the data format requires a schema. (e.g. parquet, avro) However, this is outside the scope of this RFC. - -## Introduction of New Data Type -If a new field introduces a new data type, it will not require any additional work if the data type can derive `serde` traits. - -If the data cannot derive `serde` traits on it's own, then we have two options. -To clarify, this is the same approach we took on `Data Type to skip` section. -1. skip - We will simply skip serializing/de-serializing. However, we may need to implement custom serialization/de-serialization logic if a value is not wrapped with `Option`. -2. custom serialization/de-serialization logic - We can implement tailored serialization/de-serialization logic. - -Either way, we will mention this on the generated docs to avoid suprising users. - -e.g. -```rust -#[derive(serde::Serialize, serde::Deserialize)] -struct OutputV1 { - string_field: Option -} - -#[derive(serde::Serialize, serde::Deserialize)] -struct OutputV2 { - string_field: Option, - // this will always be treated as None value by serde - #[serde(skip)] - skip_not_serializable: Option, - // We can implement a custom serialization logic - #[serde(serialize_with = "custom_serilization_logic", deserialize_with = "custom_deserilization_logic")] - not_derive_able: SomeComplexDataType, - // Serialization will be skipped, and de-serialization will be handled with the function provided on default tag - #[serde(skip, default = "default_value")] - skip_with_custom: DataTypeWithoutDefaultTrait, -} -``` -# Discussions - -## Serialization and de-serialization support for an entire response/request -The problem with serialization/de-serialization of an entire response/request the lack of data type that can be mapped to `serde`'s data model field by field. - -Currently, SDK has no data type that represents an entire response or request that can be mapped to `serde`'s data model; Thus, you must introduce a schema and implement logics that allows users to serialize/de-serialize their data. - -Although this RFC does not solve this issue, we believe that this RFC will help future contirbutor who wishes to implement serialization and de-serialization support for an entire response/request. - - -## Sensitive Information -If serialized data contains sensitive information, it will not be masked. -We mention that fields can compromised such information on every struct field to ensure that users know this. - -## Compile Time -We ran the following benchmark on C6a.2xlarge instance with 50gb of GP2 SSD. -The commit hash of the code is a8e2e19129aead4fbc8cf0e3d34df0188a62de9f. - -It clearly shows an increase in compile time. -Users are advised to consider the use of software such as [sccache](https://github.com/mozilla/sccache) or [mold](https://github.com/rui314/mold) to reduce the compile time. - -- `aws-sdk-dynamodb` - -| command | real time | user time | sys time | -| ----------------------------------------------------------- | --------- | --------- | --------- | -| cargo build | 0m35.728s | 2m24.243s | 0m11.868s | -| cargo build --release | 0m52.040s | 5m0.841s | 0m11.313s | -| cargo build --features unstable-serde-serialize | 0m38.079s | 2m26.082s | 0m11.631s | -| cargo build --release --features unstable-serde-serialize | 0m53.153s | 5m4.069s | 0m11.577s | -| cargo build --features unstable-serde-deserialize | 0m45.689s | 2m34.000s | 0m11.978s | -| cargo build --release --features unstable-serde-deserialize | 1m0.107s | 5m10.231s | 0m11.699s | -| cargo build --all-features | 0m48.959s | 2m45.688s | 0m13.359s | -| cargo build --release --all-features | 1m3.198s | 5m26.076s | 0m12.311s | - -- `aws-sdk-ec2` - -| command | real time | user time | sys time | -| ----------------------------------------------------------- | --------- | ---------- | --------- | -| cargo build | 1m20.041s | 2m14.592s | 0m6.611s | -| cargo build --release | 2m29.480s | 9m19.530s | 0m15.957s | -| cargo build --features unstable-serde-serialize | 2m0.555s | 4m24.881s | 0m16.131s | -| cargo build --release --features unstable-serde-serialize | 2m45.002s | 9m43.098s | 0m16.886s | -| cargo build --features unstable-serde-deserialize | 3m10.857s | 5m34.246s | 0m18.844s | -| cargo build --release --features unstable-serde-deserialize | 3m47.531s | 10m52.017s | 0m18.404s | -| cargo build --all-features | 3m31.473s | 6m1.052s | 0m19.681s | -| cargo build --release --all-features | 3m45.208s | 8m46.168s | 0m10.211s | - - -## Misleading Results -SDK team previously expressed concern that serialized data may be misleading. -We believe that features implemented as part of this RFC does not produce misleading result as we focus on builder types and it's corresponding data types which are mapped to serde's data type model with the derive macro. - -# Feature Gate -`Serde` traits are implemented behind feature gates. -`Serialize` is implemented behind `serialize`, while `Deserialize` is implemented behind `deserialize`. -Users must enable the `unstable` feature to expose those features. - -We considered giving each feature a dedicated feature gate such as `unstable-serde-serialize`. -In this case, we will need to change the name of feature gates entirely once it leaves the unstable status which will cause users to make changes to their code base. -We conclude that this brings no benefit to the users. - -## Keeping both features behind the same feature gate -We considered keeping both features behind the same feature gate. -There is no significant difference in the complexity of implementation. -We do not see any benefit in keeping them behind a same feature-gate as this will only result in increase of compile time when users are not in need of one of the feature. - -## Different feature gates for different data types -We considered implementing different feature gates for output, input and their corresponding data types. -For example, output and input types can have `output-serde-*` and `input-serde-*`. -We are unable to do this as relevant meta data is not available during the code-gen. - - -Changes checklist ------------------ -- [ ] Implement human-redable serialization for `DateTime` and `Blob` in `aws_smithy_types` -- [ ] Implement non-human-redable serialization for `DateTime` and `Blob` in `aws_smithy_types` -- [ ] Implement `Serialize` and `Deserialize` for relevant data types in `aws_smithy_types` -- [ ] Modify Kotlin's codegen so that generated Builder and non-Builder types implement `Serialize` and `Deserialize` -- [ ] Add feature gate for `Serialize` and `Deserialize` -- [ ] Prepare examples -- [ ] Prepare reproducible compile time benchmark From ec4a9e7209fcfbbf5492aa4080f61c1e40a30127 Mon Sep 17 00:00:00 2001 From: Thomas Cameron Date: Thu, 22 Dec 2022 08:25:02 +0000 Subject: [PATCH 04/15] fix formatting --- design/src/rfcs/rfc0028_serialization_and_deserialization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/src/rfcs/rfc0028_serialization_and_deserialization.md b/design/src/rfcs/rfc0028_serialization_and_deserialization.md index a1445d6140..0c7ec859cc 100644 --- a/design/src/rfcs/rfc0028_serialization_and_deserialization.md +++ b/design/src/rfcs/rfc0028_serialization_and_deserialization.md @@ -66,7 +66,7 @@ There are many different crates, so we decided to survey how some of the most po | library | version | implementation | all time downloads on crate.io as of writing (Dec, 2022) | | ---------- | ------- | --------------- | -------------------------------------------------------- | | serde_json | 1.0 | Array of number | 109,491,713 | -| toml | 0.5.9 | Array of number | 63,601,994 | +| toml | 0.5.9 | Array of number | 63,601,994 | | serde_yaml | 0.9.14 | Unsupported | 23,767,300 | First of all, bytes could have hundreds of elements; reading an array of hundred of numbers will never be a pleasing experience, and it is especially troubling when you are writing data for test cases. From 1c2f1f03d203344156b763ec636309ef7124796e Mon Sep 17 00:00:00 2001 From: Thomas Cameron Date: Thu, 22 Dec 2022 08:32:26 +0000 Subject: [PATCH 05/15] https://github.com/awslabs/smithy-rs/pull/1944#discussion_r1054637168 --- ...c0028_serialization_and_deserialization.md | 55 ++++++++++++------- 1 file changed, 34 insertions(+), 21 deletions(-) diff --git a/design/src/rfcs/rfc0028_serialization_and_deserialization.md b/design/src/rfcs/rfc0028_serialization_and_deserialization.md index 0c7ec859cc..a6ac438b40 100644 --- a/design/src/rfcs/rfc0028_serialization_and_deserialization.md +++ b/design/src/rfcs/rfc0028_serialization_and_deserialization.md @@ -289,29 +289,42 @@ Users are advised to consider the use of software such as [sccache](https://gith - `aws-sdk-dynamodb` -| command | real time | user time | sys time | -| ----------------------------------------------------------- | --------- | --------- | --------- | -| cargo build | 0m35.728s | 2m24.243s | 0m11.868s | -| cargo build --release | 0m52.040s | 5m0.841s | 0m11.313s | -| cargo build --features unstable-serde-serialize | 0m38.079s | 2m26.082s | 0m11.631s | -| cargo build --release --features unstable-serde-serialize | 0m53.153s | 5m4.069s | 0m11.577s | -| cargo build --features unstable-serde-deserialize | 0m45.689s | 2m34.000s | 0m11.978s | -| cargo build --release --features unstable-serde-deserialize | 1m0.107s | 5m10.231s | 0m11.699s | -| cargo build --all-features | 0m48.959s | 2m45.688s | 0m13.359s | -| cargo build --release --all-features | 1m3.198s | 5m26.076s | 0m12.311s | + - when compiled with debug profile + + | command | real time | user time | sys time | + | ------------------------------------------------- | --------- | --------- | --------- | + | cargo build | 0m35.728s | 2m24.243s | 0m11.868s | + | cargo build --features unstable-serde-serialize | 0m38.079s | 2m26.082s | 0m11.631s | + | cargo build --features unstable-serde-deserialize | 0m45.689s | 2m34.000s | 0m11.978s | + | cargo build --all-features | 0m48.959s | 2m45.688s | 0m13.359s | + + - when compiled with release profile + + | command | real time | user time | sys time | + | ----------------------------------------------------------- | --------- | --------- | --------- | + | cargo build --release | 0m52.040s | 5m0.841s | 0m11.313s | + | cargo build --release --features unstable-serde-serialize | 0m53.153s | 5m4.069s | 0m11.577s | + | cargo build --release --features unstable-serde-deserialize | 1m0.107s | 5m10.231s | 0m11.699s | + | cargo build --release --all-features | 1m3.198s | 5m26.076s | 0m12.311s | - `aws-sdk-ec2` - -| command | real time | user time | sys time | -| ----------------------------------------------------------- | --------- | ---------- | --------- | -| cargo build | 1m20.041s | 2m14.592s | 0m6.611s | -| cargo build --release | 2m29.480s | 9m19.530s | 0m15.957s | -| cargo build --features unstable-serde-serialize | 2m0.555s | 4m24.881s | 0m16.131s | -| cargo build --release --features unstable-serde-serialize | 2m45.002s | 9m43.098s | 0m16.886s | -| cargo build --features unstable-serde-deserialize | 3m10.857s | 5m34.246s | 0m18.844s | -| cargo build --release --features unstable-serde-deserialize | 3m47.531s | 10m52.017s | 0m18.404s | -| cargo build --all-features | 3m31.473s | 6m1.052s | 0m19.681s | -| cargo build --release --all-features | 3m45.208s | 8m46.168s | 0m10.211s | + - when compiled with debug profile + + | command | real time | user time | sys time | + | ------------------------------------------------- | --------- | --------- | --------- | + | cargo build | 1m20.041s | 2m14.592s | 0m6.611s | + | cargo build --features unstable-serde-serialize | 2m0.555s | 4m24.881s | 0m16.131s | + | cargo build --features unstable-serde-deserialize | 3m10.857s | 5m34.246s | 0m18.844s | + | cargo build --all-features | 3m31.473s | 6m1.052s | 0m19.681s | + + - when compiled with release profile + + | command | real time | user time | sys time | + | ----------------------------------------------------------- | --------- | ---------- | --------- | + | cargo build --release | 2m29.480s | 9m19.530s | 0m15.957s | + | cargo build --release --features unstable-serde-serialize | 2m45.002s | 9m43.098s | 0m16.886s | + | cargo build --release --features unstable-serde-deserialize | 3m47.531s | 10m52.017s | 0m18.404s | + | cargo build --release --all-features | 3m45.208s | 8m46.168s | 0m10.211s | ## Misleading Results From d9d4e80276d14e138ae4863e9cbc5ae391dca3e9 Mon Sep 17 00:00:00 2001 From: Thomas Cameron Date: Thu, 22 Dec 2022 16:28:56 +0000 Subject: [PATCH 06/15] https://github.com/awslabs/smithy-rs/pull/1944#discussion_r1054632202 --- ...c0028_serialization_and_deserialization.md | 36 ++++++++++++++++--- 1 file changed, 32 insertions(+), 4 deletions(-) diff --git a/design/src/rfcs/rfc0028_serialization_and_deserialization.md b/design/src/rfcs/rfc0028_serialization_and_deserialization.md index a6ac438b40..8e465eff78 100644 --- a/design/src/rfcs/rfc0028_serialization_and_deserialization.md +++ b/design/src/rfcs/rfc0028_serialization_and_deserialization.md @@ -205,19 +205,47 @@ These builder types are available to users, however, no API requires users to bu We considered removing traits from these data types, however, the code-gen framework does not carry the necessary metadata to determine whether the data is the builder type of an output type or not. We conclude that we must avoid such a technical challenge to bring this RFC to life. -## `fn set_fields` to allow users supply externally created `Input`s +## `fn set_fields` to allow users to use externally created `Input`s -SDK does not have a method that allows users supply deserialized inputs. +Currently, to set value to fluent builders, users must call setter methods for each field. +SDK does not have a method that allows users use deserialized `Input` . Thus, we add a new method `fn set_fields` to `Client` types. This method accepts inputs and replace all parameters that `Client` has with the new one. ```rust -pub fn set_fields(mut self, builder: path::to::builder_type) -> path::to::builder_type { - self.inner = new_parameter; +pub fn set_fields(mut self, input_type: path::to::input_type) -> path::to::input_type { + self.inner = input_type; self } ``` +Users can use `fn set_fields` to replace the parameters in fluent builders. + +Example: +```rust +use aws_sdk_dynamodb::{Client, Error}; + +#[tokio::main] +async fn main() -> Result<(), Error> { + let shared_config = aws_config::load_from_env().await; + let client = Client::new(&shared_config); + let input = if cfg!(builder) { + let mut parameter: aws_sdk_dynamodb::input::list_tables_input::Builder = serde_json::from_str(include_str!("./builder.json")) + parameter.set_exclusive_start_table_name("some_name").build() + } else { + let input: aws_sdk_dynamodb::input::ListTablesInput = serde_json::from_str(include_str!("./input.json")) + input + }; + + let res = client.list_tables().set_fields(input).send().await?; + println!("Current DynamoDB tables: {:?}", res.table_names); + + let json_output = serde_json::to_string(res).unwrap(); + save_serialized_output(json_output); + Ok(()) +} +``` + # Other Concerns ## Model evolution SDK will introduce new fields and it is possible that we may see new data types in future. From bb196cc77e24d310190db33e658b7f2a8aba4759 Mon Sep 17 00:00:00 2001 From: Thomas Cameron Date: Thu, 22 Dec 2022 16:49:51 +0000 Subject: [PATCH 07/15] applied grammarly pre-commit --- ...c0028_serialization_and_deserialization.md | 62 +++++++++---------- 1 file changed, 31 insertions(+), 31 deletions(-) diff --git a/design/src/rfcs/rfc0028_serialization_and_deserialization.md b/design/src/rfcs/rfc0028_serialization_and_deserialization.md index 8e465eff78..a4a6ed9608 100644 --- a/design/src/rfcs/rfc0028_serialization_and_deserialization.md +++ b/design/src/rfcs/rfc0028_serialization_and_deserialization.md @@ -3,7 +3,7 @@ RFC: Serialization and Deserialization > Status: RFC > -> Applies to: Output, Input and Builder types as well as `DateTime`, `Document`, `Blob` and `Number` implemented in `aws_smithy_types` crate. +> Applies to: Output, Input, and Builder types as well as `DateTime`, `Document`, `Blob`, and `Number` implemented in `aws_smithy_types` crate. # Terminology - Builder @@ -22,13 +22,13 @@ Data types that are going to be affected are; - builder data types - operation `Input` types - operation `Output` types -- data types that builder types may have on its field(s) +- data types that builder types may have on their field(s) - `aws_smithy_types::DateTime` - `aws_smithy_types::Document` - `aws_smithy_types::Blob` - `aws_smithy_types::Number` -`DateTime` and `Blob` implements different serialization/deserialization format for human-readable and non-human readable format; We must emphasize that these 2 formats are not compatible to each other. Reason for this is explained at the [Blob](#blob) section and [Date Time](#datetime). +`DateTime` and `Blob` implements different serialization/deserialization format for human-readable and non-human readable format; We must emphasize that these 2 formats are not compatible with each other. The reason for this is explained in the [Blob](#blob) section and [Date Time](#datetime). Additionally, we add `fn set_fields` to fluent builders to allow users to set the data they deserialized to fluent builders. @@ -63,18 +63,18 @@ In this case, the implementation will depend on the implementation of the librar There are many different crates, so we decided to survey how some of the most popular crates implement this feature. -| library | version | implementation | all time downloads on crate.io as of writing (Dec, 2022) | +| library | version | implementation | all-time downloads on crate.io as of writing (Dec 2022) | | ---------- | ------- | --------------- | -------------------------------------------------------- | | serde_json | 1.0 | Array of number | 109,491,713 | | toml | 0.5.9 | Array of number | 63,601,994 | | serde_yaml | 0.9.14 | Unsupported | 23,767,300 | -First of all, bytes could have hundreds of elements; reading an array of hundred of numbers will never be a pleasing experience, and it is especially troubling when you are writing data for test cases. -Additionally, it has come to our attention that some crates just doesn't support it, which would hinder users ability to be productive and tie users' hand. +First of all, bytes could have hundreds of elements; reading an array of hundreds of numbers will never be a pleasing experience, and it is especially troubling when you are writing data for test cases. +Additionally, it has come to our attention that some crates just doesn't support it, which would hinder users' ability to be productive and tie users' hand. For the reasons described above, we believe that it is crucial to encode them to string and base64 is favourable over other encoding schemes such as base 16, 32, or Ascii85. -- Reason behind the implementation of non-human readable format +- Reason behind the implementation of a non-human readable format We considered using the same logic for non-human readable format as well. However, readable-ness is not necessary for non-human readable format. Additionally, non-human readable format tends to emphasize resource efficiency over human-readable format; Base64 encoded string would take up more space, which is not what the users would want. @@ -112,7 +112,7 @@ Serde can distinguish each variant without a tag as each variant's content is di ## Builder Types and Non-Builder Types Builder types and non Builder types implement `Serialize` and `Deserialize` with derive macro. -Derive macro will be implemented behind a feature-gate; Users must enable `serialize` to use `serde::Serialize`, and `deserialize` to use `serde::Deserialize` respectively. Additionally, user must enable `unstable` until the stablization of this RFC. +Derive macro will be implemented behind a feature-gate; Users must enable `serialize` to use `serde::Serialize`, and `deserialize` to use `serde::Deserialize` respectively. Additionally, users must enable `unstable` until the stabilization of this RFC. ```rust #[allow(missing_docs)] // documentation missing in model @@ -132,7 +132,7 @@ pub struct UploadPartCopyOutput { ``` ## Enum Representation -`serde` allows programmers to use one of four different tagging ([internal, external, adjacent and untagged](https://serde.rs/enum-representations.html)) when serializing an enum. +`serde` allows programmers to use one of four different tagging ([internal, external, adjacent, and untagged](https://serde.rs/enum-representations.html)) when serializing an enum. ### untagged You cannot deserialize serialized data in some cases. For example, [aws_sdk_dynamodb::model::AttributeValue](https://docs.rs/aws-sdk-dynamodb/latest/aws_sdk_dynamodb/model/enum.AttributeValue.html) has `Null(bool)` and `Bool(bool)`, which you cannot distinguish serialized values without a tag. @@ -158,7 +158,7 @@ We are going to skip serialization and deserialization of fields that have the f - `aws_smithy_http::event_stream::Receiver` - `aws_smithy_http::event_stream::EventStreamSender` -Any fields with these data types are tagged with `#[serde(skip, default = "..name of the function for tailored serilaization/de-serialization")]`. +Any fields with these data types are tagged with `#[serde(skip, default = "..name of the function for tailored serialization/de-serialization")]`. Here are some examples of data types affected by this decision: - `aws_sdk_transcribestreaming::client::fluent_builders::StartMedicalStreamTranscription` @@ -167,7 +167,7 @@ Here are some examples of data types affected by this decision: We considered serializing them as bytes, however, it could take some time for a stream to reach the end, and the resulting serialized data may be too big for itself to fit into the ram. Additionally, those data types are sometimes used to represent bi-directional data transfer, which is not serializable. -Here is an example of struct with a field which comes with custom serialization/de-serialization logic. +Here is an example of struct with a field that comes with custom serialization/de-serialization logic. ```rust #[allow(missing_docs)] #[cfg_attr( @@ -205,12 +205,12 @@ These builder types are available to users, however, no API requires users to bu We considered removing traits from these data types, however, the code-gen framework does not carry the necessary metadata to determine whether the data is the builder type of an output type or not. We conclude that we must avoid such a technical challenge to bring this RFC to life. -## `fn set_fields` to allow users to use externally created `Input`s +## `fn set_fields` to allow users to use externally created `Input` -Currently, to set value to fluent builders, users must call setter methods for each field. -SDK does not have a method that allows users use deserialized `Input` . +Currently, to set the value to fluent builders, users must call setter methods for each field. +SDK does not have a method that allows users to use deserialized `Input`. Thus, we add a new method `fn set_fields` to `Client` types. -This method accepts inputs and replace all parameters that `Client` has with the new one. +This method accepts inputs and replaces all parameters that `Client` has with the new one. ```rust pub fn set_fields(mut self, input_type: path::to::input_type) -> path::to::input_type { @@ -236,7 +236,7 @@ async fn main() -> Result<(), Error> { let input: aws_sdk_dynamodb::input::ListTablesInput = serde_json::from_str(include_str!("./input.json")) input }; - + let res = client.list_tables().set_fields(input).send().await?; println!("Current DynamoDB tables: {:?}", res.table_names); @@ -248,30 +248,30 @@ async fn main() -> Result<(), Error> { # Other Concerns ## Model evolution -SDK will introduce new fields and it is possible that we may see new data types in future. +SDK will introduce new fields and we may see new data types in the future. We believe that this will not be a problem. ### Introduction of New Fields Most fields are `Option` type. -When user de-serializes data written for a format before the new fields were introduced, new field will be assigned with `None` type. +When the user de-serializes data written for a format before the new fields were introduced, new fields will be assigned with `None` type. If a field isn't `Option`, `serde` uses `Default` trait unless a custom de-serialization/serialization is specified to generate data to fill the field. If the new field is not an `Option` type and has no `Default` implementation, we must implement a custom de-serialization logic. -In case of serilization, introduction of new fields will not be an issue unless the data format requires a schema. (e.g. parquet, avro) However, this is outside the scope of this RFC. +In the case of serialization, the introduction of new fields will not be an issue unless the data format requires a schema. (e.g. parquet, avro) However, this is outside the scope of this RFC. ## Introduction of New Data Type If a new field introduces a new data type, it will not require any additional work if the data type can derive `serde` traits. -If the data cannot derive `serde` traits on it's own, then we have two options. +If the data cannot derive `serde` traits on its own, then we have two options. To clarify, this is the same approach we took on `Data Type to skip` section. 1. skip We will simply skip serializing/de-serializing. However, we may need to implement custom serialization/de-serialization logic if a value is not wrapped with `Option`. 2. custom serialization/de-serialization logic We can implement tailored serialization/de-serialization logic. -Either way, we will mention this on the generated docs to avoid suprising users. +Either way, we will mention this on the generated docs to avoid surprising users. e.g. ```rust @@ -297,16 +297,16 @@ struct OutputV2 { # Discussions ## Serialization and de-serialization support for an entire response/request -The problem with serialization/de-serialization of an entire response/request the lack of data type that can be mapped to `serde`'s data model field by field. +The problem with serialization/de-serialization of an entire response/request is the lack of data type that can be mapped to `serde`'s data model field by field. Currently, SDK has no data type that represents an entire response or request that can be mapped to `serde`'s data model; Thus, you must introduce a schema and implement logics that allows users to serialize/de-serialize their data. -Although this RFC does not solve this issue, we believe that this RFC will help future contirbutor who wishes to implement serialization and de-serialization support for an entire response/request. +Although this RFC does not solve this issue, we believe that this RFC will help future contributors who wish to implement serialization and de-serialization support for an entire response/request. ## Sensitive Information If serialized data contains sensitive information, it will not be masked. -We mention that fields can compromised such information on every struct field to ensure that users know this. +We mention that fields can compromise such information on every struct field to ensure that users know this. ## Compile Time We ran the following benchmark on C6a.2xlarge instance with 50gb of GP2 SSD. @@ -318,16 +318,16 @@ Users are advised to consider the use of software such as [sccache](https://gith - `aws-sdk-dynamodb` - when compiled with debug profile - + | command | real time | user time | sys time | | ------------------------------------------------- | --------- | --------- | --------- | | cargo build | 0m35.728s | 2m24.243s | 0m11.868s | | cargo build --features unstable-serde-serialize | 0m38.079s | 2m26.082s | 0m11.631s | | cargo build --features unstable-serde-deserialize | 0m45.689s | 2m34.000s | 0m11.978s | | cargo build --all-features | 0m48.959s | 2m45.688s | 0m13.359s | - + - when compiled with release profile - + | command | real time | user time | sys time | | ----------------------------------------------------------- | --------- | --------- | --------- | | cargo build --release | 0m52.040s | 5m0.841s | 0m11.313s | @@ -357,7 +357,7 @@ Users are advised to consider the use of software such as [sccache](https://gith ## Misleading Results SDK team previously expressed concern that serialized data may be misleading. -We believe that features implemented as part of this RFC does not produce misleading result as we focus on builder types and it's corresponding data types which are mapped to serde's data type model with the derive macro. +We believe that features implemented as part of this RFC do not produce a misleading result as we focus on builder types and it's corresponding data types which are mapped to serde's data type model with the derive macro. # Feature Gate `Serde` traits are implemented behind feature gates. @@ -371,12 +371,12 @@ We conclude that this brings no benefit to the users. ## Keeping both features behind the same feature gate We considered keeping both features behind the same feature gate. There is no significant difference in the complexity of implementation. -We do not see any benefit in keeping them behind a same feature-gate as this will only result in increase of compile time when users are not in need of one of the feature. +We do not see any benefit in keeping them behind the same feature gate as this will only increase compile time when users do not need one of the features. ## Different feature gates for different data types -We considered implementing different feature gates for output, input and their corresponding data types. +We considered implementing different feature gates for output, input, and their corresponding data types. For example, output and input types can have `output-serde-*` and `input-serde-*`. -We are unable to do this as relevant meta data is not available during the code-gen. +We are unable to do this as relevant metadata is not available during the code-gen. Changes checklist From 856e370cb0efb8512dd155ec588e5c8976e0fa47 Mon Sep 17 00:00:00 2001 From: Thomas Cameron Date: Thu, 22 Dec 2022 17:37:29 +0000 Subject: [PATCH 08/15] better example wip --- ...c0028_serialization_and_deserialization.md | 64 +++++++++++-------- 1 file changed, 38 insertions(+), 26 deletions(-) diff --git a/design/src/rfcs/rfc0028_serialization_and_deserialization.md b/design/src/rfcs/rfc0028_serialization_and_deserialization.md index a4a6ed9608..8a562b22ab 100644 --- a/design/src/rfcs/rfc0028_serialization_and_deserialization.md +++ b/design/src/rfcs/rfc0028_serialization_and_deserialization.md @@ -202,8 +202,9 @@ pub struct ExampleStreamTranscriptionOutput { Output data, such as `aws_sdk_dynamodb::output::UpdateTableOutput` has builder types. These builder types are available to users, however, no API requires users to build data types by themselves. -We considered removing traits from these data types, however, the code-gen framework does not carry the necessary metadata to determine whether the data is the builder type of an output type or not. -We conclude that we must avoid such a technical challenge to bring this RFC to life. +We considered removing traits from these data types. + +Removing serde traits on these types will help reduce compile time, however, this will not be benefitial to users who wishes to ## `fn set_fields` to allow users to use externally created `Input` @@ -220,31 +221,8 @@ pub fn set_fields(mut self, input_type: path::to::input_type) -> path::to::input ``` Users can use `fn set_fields` to replace the parameters in fluent builders. +You can find an example at [use case examples](#UseCaseExamples). -Example: -```rust -use aws_sdk_dynamodb::{Client, Error}; - -#[tokio::main] -async fn main() -> Result<(), Error> { - let shared_config = aws_config::load_from_env().await; - let client = Client::new(&shared_config); - let input = if cfg!(builder) { - let mut parameter: aws_sdk_dynamodb::input::list_tables_input::Builder = serde_json::from_str(include_str!("./builder.json")) - parameter.set_exclusive_start_table_name("some_name").build() - } else { - let input: aws_sdk_dynamodb::input::ListTablesInput = serde_json::from_str(include_str!("./input.json")) - input - }; - - let res = client.list_tables().set_fields(input).send().await?; - println!("Current DynamoDB tables: {:?}", res.table_names); - - let json_output = serde_json::to_string(res).unwrap(); - save_serialized_output(json_output); - Ok(()) -} -``` # Other Concerns ## Model evolution @@ -378,6 +356,40 @@ We considered implementing different feature gates for output, input, and their For example, output and input types can have `output-serde-*` and `input-serde-*`. We are unable to do this as relevant metadata is not available during the code-gen. +# Appendix +## [Use Case Examples](UseCaseExamples) +```rust +use aws_sdk_dynamodb::{Client, Error}; + +async fn example(read_builder: bool) -> Result<(), Error> { + // getting the client + let shared_config = aws_config::load_from_env().await; + let client = Client::new(&shared_config); + + // de-serializing input's builder types and input types from json + let deserialized_input = if read_builder { + let mut parameter: aws_sdk_dynamodb::input::list_tables_input::Builder = serde_json::from_str(include_str!("./builder.json")); + parameter.set_exclusive_start_table_name("some_name").build() + } else { + let input: aws_sdk_dynamodb::input::ListTablesInput = serde_json::from_str(include_str!("./input.json")); + input + }; + + // sending request using the deserialized input + let res = client.list_tables().set_fields(deserialized_input).send().await?; + println!("DynamoDB tables: {:?}", res.table_names); + + // serializing json output + let json_output = serde_json::to_string(res).unwrap(); + // you can save the serialized input + save_serialized_output(json_output); + Ok(()) +} +``` + +## [Use Case Examples](UseCaseExamples) + + Changes checklist ----------------- From 0e25b5ce71a89222d681760b7b780633d8a9de2e Mon Sep 17 00:00:00 2001 From: Thomas Cameron Date: Thu, 22 Dec 2022 17:39:06 +0000 Subject: [PATCH 09/15] FIX --- design/src/rfcs/rfc0028_serialization_and_deserialization.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/design/src/rfcs/rfc0028_serialization_and_deserialization.md b/design/src/rfcs/rfc0028_serialization_and_deserialization.md index 8a562b22ab..0377638776 100644 --- a/design/src/rfcs/rfc0028_serialization_and_deserialization.md +++ b/design/src/rfcs/rfc0028_serialization_and_deserialization.md @@ -387,9 +387,6 @@ async fn example(read_builder: bool) -> Result<(), Error> { } ``` -## [Use Case Examples](UseCaseExamples) - - Changes checklist ----------------- From f734f51d3ca82274792f65a2ed676cb0a8090673 Mon Sep 17 00:00:00 2001 From: Thomas Cameron Date: Thu, 22 Dec 2022 20:29:40 +0000 Subject: [PATCH 10/15] https://github.com/awslabs/smithy-rs/pull/1944#discussion_r1054632468 https://github.com/awslabs/smithy-rs/pull/1944#discussion_r1054818444 https://github.com/awslabs/smithy-rs/pull/1944#discussion_r1054820192 --- ...c0028_serialization_and_deserialization.md | 77 +++++++++++++------ 1 file changed, 55 insertions(+), 22 deletions(-) diff --git a/design/src/rfcs/rfc0028_serialization_and_deserialization.md b/design/src/rfcs/rfc0028_serialization_and_deserialization.md index 0377638776..bbe1a8f29a 100644 --- a/design/src/rfcs/rfc0028_serialization_and_deserialization.md +++ b/design/src/rfcs/rfc0028_serialization_and_deserialization.md @@ -41,6 +41,58 @@ We have created this RFC with the following use cases in mind. 2. [Tests](https://awslabs.github.io/smithy-rs/design/faq.html#why-dont-the-sdk-service-crates-implement-serdeserialize-or-serdedeserialize-for-any-types) as suggested in the design FAQ. 3. Building tools +# Feature Gate + +## Enabling Feature +To enable any of the features from this RFC, user must pass `--cfg aws-sdk-unstable` to rustc. + +You can do this by specifying it on env-variable or by config.toml. + +- specifying it on .cargo/config.toml + +```toml +[build] +rustflags = ["--cfg", "aws-sdk-unstable"] +``` + +- As an environment variable + +```bash +export RUSTFLAGS="--cfg aws-sdk-unstable" +cargo build +``` + +We considered allowing users to enable this feature on a crate-level. + +e.g. +```toml +[dependencies] +aws_sdk_dynamodb = { version = "0.22.0", features = ["unstable", "serialize"] } +``` + +Compared to the cfg approach, it is lot easier for the users to enable this feature. +However, we believe that cfg approach ensure users won't enable this feature by surprise, and communicate to users that features behind this feature gate can be taken-away or exprience breaking changes any time in future. + +## Feature Gate for Serialization and De-serialization +`Serde` traits are implemented behind feature gates. +`Serialize` is implemented behind `serialize`, while `Deserialize` is implemented behind `deserialize`. +Users must enable the `unstable` feature to expose those features. + +We considered giving each feature a dedicated feature gate such as `unstable-serde-serialize`. +In this case, we will need to change the name of feature gates entirely once it leaves the unstable status which will cause users to make changes to their code base. +We conclude that this brings no benefit to the users. + +## Keeping both features behind the same feature gate +We considered keeping both features behind the same feature gate. +There is no significant difference in the complexity of implementation. +We do not see any benefit in keeping them behind the same feature gate as this will only increase compile time when users do not need one of the features. + +## Different feature gates for different data types +We considered implementing different feature gates for output, input, and their corresponding data types. +For example, output and input types can have `output-serde-*` and `input-serde-*`. +We are unable to do this as relevant metadata is not available during the code-gen. + + # Implementation ## Smithy Types `aws_smithy_types` is a crate that implements smithy's data types. @@ -152,14 +204,14 @@ The resulting size of the serialized data is smaller when tagged externally, as For the reasons mentioned above, we implement an enum that is externally tagged. ## Data Types to Skip -We are going to skip serialization and deserialization of fields that have the following datatypes. +We are going to skip serialization and deserialization of fields that have the datatype that corresponds to `@streaming blob` and `@streaming union` from smithy. +Any fields with these data types are tagged with `#[serde(skip, default = "..name of the function for tailored serialization/de-serialization")]`. +As of writing, this decision affects following Rust data types. - `aws_smithy_http::byte_stream::ByteStream` - `aws_smithy_http::event_stream::Receiver` - `aws_smithy_http::event_stream::EventStreamSender` -Any fields with these data types are tagged with `#[serde(skip, default = "..name of the function for tailored serialization/de-serialization")]`. - Here are some examples of data types affected by this decision: - `aws_sdk_transcribestreaming::client::fluent_builders::StartMedicalStreamTranscription` - `aws_sdk_s3::input::put_object_input::PutObjectInput` @@ -337,25 +389,6 @@ Users are advised to consider the use of software such as [sccache](https://gith SDK team previously expressed concern that serialized data may be misleading. We believe that features implemented as part of this RFC do not produce a misleading result as we focus on builder types and it's corresponding data types which are mapped to serde's data type model with the derive macro. -# Feature Gate -`Serde` traits are implemented behind feature gates. -`Serialize` is implemented behind `serialize`, while `Deserialize` is implemented behind `deserialize`. -Users must enable the `unstable` feature to expose those features. - -We considered giving each feature a dedicated feature gate such as `unstable-serde-serialize`. -In this case, we will need to change the name of feature gates entirely once it leaves the unstable status which will cause users to make changes to their code base. -We conclude that this brings no benefit to the users. - -## Keeping both features behind the same feature gate -We considered keeping both features behind the same feature gate. -There is no significant difference in the complexity of implementation. -We do not see any benefit in keeping them behind the same feature gate as this will only increase compile time when users do not need one of the features. - -## Different feature gates for different data types -We considered implementing different feature gates for output, input, and their corresponding data types. -For example, output and input types can have `output-serde-*` and `input-serde-*`. -We are unable to do this as relevant metadata is not available during the code-gen. - # Appendix ## [Use Case Examples](UseCaseExamples) ```rust From bc7f5227bdfd3f1d2a8a534afd49480937a0b742 Mon Sep 17 00:00:00 2001 From: Thomas Cameron Date: Thu, 22 Dec 2022 21:03:44 +0000 Subject: [PATCH 11/15] add examples for output's builer --- ...rfc0028_serialization_and_deserialization.md | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/design/src/rfcs/rfc0028_serialization_and_deserialization.md b/design/src/rfcs/rfc0028_serialization_and_deserialization.md index bbe1a8f29a..a9355a7791 100644 --- a/design/src/rfcs/rfc0028_serialization_and_deserialization.md +++ b/design/src/rfcs/rfc0028_serialization_and_deserialization.md @@ -256,7 +256,8 @@ These builder types are available to users, however, no API requires users to bu We considered removing traits from these data types. -Removing serde traits on these types will help reduce compile time, however, this will not be benefitial to users who wishes to +Removing serde traits on these types will help reduce compile time, however, builder type can be useful, for example, for testing. +We have prepared an example here [Use Case Examples](UseCaseExamples). ## `fn set_fields` to allow users to use externally created `Input` @@ -412,10 +413,22 @@ async fn example(read_builder: bool) -> Result<(), Error> { let res = client.list_tables().set_fields(deserialized_input).send().await?; println!("DynamoDB tables: {:?}", res.table_names); + let out: aws_sdk_dynamodb::output::ListTablesOutput = { + // say you want some of the field to have certain values + let mut out_builder: aws_sdk_dynamodb::output::list_tables_output::Builder = serde_json::from_str(r#" + { + table_names: [ "table1", "table2" ] + } + "#); + // but you don't really care about some other values + out_builder.set_last_evaluated_table_name(res.last_evaluated_table_name()).build() + }; + assert_eq!(res, out); + // serializing json output let json_output = serde_json::to_string(res).unwrap(); // you can save the serialized input - save_serialized_output(json_output); + println!(json_output); Ok(()) } ``` From eea1ab831765a9b4b25dd132b2b0b8823fb8c9a1 Mon Sep 17 00:00:00 2001 From: Thomas Cameron Date: Thu, 22 Dec 2022 21:55:17 +0000 Subject: [PATCH 12/15] fixes --- design/src/rfcs/rfc0028_serialization_and_deserialization.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/design/src/rfcs/rfc0028_serialization_and_deserialization.md b/design/src/rfcs/rfc0028_serialization_and_deserialization.md index a9355a7791..a5b042d57e 100644 --- a/design/src/rfcs/rfc0028_serialization_and_deserialization.md +++ b/design/src/rfcs/rfc0028_serialization_and_deserialization.md @@ -257,7 +257,7 @@ These builder types are available to users, however, no API requires users to bu We considered removing traits from these data types. Removing serde traits on these types will help reduce compile time, however, builder type can be useful, for example, for testing. -We have prepared an example here [Use Case Examples](UseCaseExamples). +We have prepared examples [here](UseCaseExamples). ## `fn set_fields` to allow users to use externally created `Input` @@ -274,8 +274,7 @@ pub fn set_fields(mut self, input_type: path::to::input_type) -> path::to::input ``` Users can use `fn set_fields` to replace the parameters in fluent builders. -You can find an example at [use case examples](#UseCaseExamples). - +You can find examples [here](#UseCaseExamples). # Other Concerns ## Model evolution From 8c2a5b7eee5180ad1a1cd8deebee328af46e6e12 Mon Sep 17 00:00:00 2001 From: Thomas Cameron Date: Fri, 23 Dec 2022 11:55:23 +0000 Subject: [PATCH 13/15] fixing unstable feature gate snippet --- .../rfc0028_serialization_and_deserialization.md | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/design/src/rfcs/rfc0028_serialization_and_deserialization.md b/design/src/rfcs/rfc0028_serialization_and_deserialization.md index a5b042d57e..0d7d294a58 100644 --- a/design/src/rfcs/rfc0028_serialization_and_deserialization.md +++ b/design/src/rfcs/rfc0028_serialization_and_deserialization.md @@ -164,16 +164,15 @@ Serde can distinguish each variant without a tag as each variant's content is di ## Builder Types and Non-Builder Types Builder types and non Builder types implement `Serialize` and `Deserialize` with derive macro. -Derive macro will be implemented behind a feature-gate; Users must enable `serialize` to use `serde::Serialize`, and `deserialize` to use `serde::Deserialize` respectively. Additionally, users must enable `unstable` until the stabilization of this RFC. +Example: ```rust -#[allow(missing_docs)] // documentation missing in model #[cfg_attr( - all(feature = "unstable", feature = "serialize"), + all(aws-sdk-unstable, feature = "serialize"), derive(serde::Serialize) )] #[cfg_attr( - all(feature = "unstable", feature = "deserialize"), + all(aws-sdk-unstable, feature = "deserialize"), derive(serde::Deserialize) )] #[non_exhaustive] @@ -223,11 +222,11 @@ Here is an example of struct with a field that comes with custom serialization/d ```rust #[allow(missing_docs)] #[cfg_attr( - all(feature = "unstable", feature = "serialize"), + all(aws-sdk-unstable, feature = "serialize"), derive(serde::Serialize) )] #[cfg_attr( - all(feature = "unstable", feature = "deserialize"), + all(aws-sdk-unstable, feature = "deserialize"), derive(serde::Deserialize) )] #[non_exhaustive] @@ -235,8 +234,8 @@ Here is an example of struct with a field that comes with custom serialization/d pub struct ExampleStreamTranscriptionOutput { #[cfg_attr( any( - all(feature = "unstable", feature = "deserialize"), - all(feature = "unstable", feature = "serialize") + all(aws-sdk-unstable, feature = "deserialize"), + all(aws-sdk-unstable, feature = "serialize") ), serde( skip, From 770edc1372e96b9c978a91951ba70dbd868cb340 Mon Sep 17 00:00:00 2001 From: Thomas Cameron <68596478+thomas-k-cameron@users.noreply.github.com> Date: Thu, 5 Jan 2023 00:33:26 +0900 Subject: [PATCH 14/15] Update rfc0028_serialization_and_deserialization.md Section is deleted in response to the discussion https://github.com/awslabs/smithy-rs/pull/1944#discussion_r1061577656 --- .../src/rfcs/rfc0028_serialization_and_deserialization.md | 8 -------- 1 file changed, 8 deletions(-) diff --git a/design/src/rfcs/rfc0028_serialization_and_deserialization.md b/design/src/rfcs/rfc0028_serialization_and_deserialization.md index 0d7d294a58..023ca4cb58 100644 --- a/design/src/rfcs/rfc0028_serialization_and_deserialization.md +++ b/design/src/rfcs/rfc0028_serialization_and_deserialization.md @@ -325,14 +325,6 @@ struct OutputV2 { ``` # Discussions -## Serialization and de-serialization support for an entire response/request -The problem with serialization/de-serialization of an entire response/request is the lack of data type that can be mapped to `serde`'s data model field by field. - -Currently, SDK has no data type that represents an entire response or request that can be mapped to `serde`'s data model; Thus, you must introduce a schema and implement logics that allows users to serialize/de-serialize their data. - -Although this RFC does not solve this issue, we believe that this RFC will help future contributors who wish to implement serialization and de-serialization support for an entire response/request. - - ## Sensitive Information If serialized data contains sensitive information, it will not be masked. We mention that fields can compromise such information on every struct field to ensure that users know this. From e8f3a1c30a95f4bd9cc0ee6590828a9120b8f5d7 Mon Sep 17 00:00:00 2001 From: Thomas Cameron Date: Wed, 4 Jan 2023 15:57:55 +0000 Subject: [PATCH 15/15] file name fix --- design/src/SUMMARY.md | 2 +- ...lization.md => rfc0030_serialization_and_deserialization.md} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename design/src/rfcs/{rfc0028_serialization_and_deserialization.md => rfc0030_serialization_and_deserialization.md} (100%) diff --git a/design/src/SUMMARY.md b/design/src/SUMMARY.md index 0b0c9a727d..2b79d89d61 100644 --- a/design/src/SUMMARY.md +++ b/design/src/SUMMARY.md @@ -52,7 +52,7 @@ - [RFC-0027: Endpoints 2.0](./rfcs/rfc0027_endpoints_20.md) - [RFC-0028: SDK Credential Cache Type Safety](./rfcs/rfc0028_sdk_credential_cache_type_safety.md) - [RFC-0029: Finding New Home for Credential Types](./rfcs/rfc0029_new_home_for_cred_types.md) - - [RFC-0030: Serialization And Deserialization](./rfcs/rfc0028_serialization_and_deserialization.md) + - [RFC-0030: Serialization And Deserialization](./rfcs/rfc0030_serialization_and_deserialization.md) - [Contributing](./contributing/overview.md) - [Writing and debugging a low-level feature that relies on HTTP](./contributing/writing_and_debugging_a_low-level_feature_that_relies_on_HTTP.md) diff --git a/design/src/rfcs/rfc0028_serialization_and_deserialization.md b/design/src/rfcs/rfc0030_serialization_and_deserialization.md similarity index 100% rename from design/src/rfcs/rfc0028_serialization_and_deserialization.md rename to design/src/rfcs/rfc0030_serialization_and_deserialization.md