Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Hadoop 2/3 compatible with Java SDK for IAM #7634

Merged
merged 9 commits into from
Apr 10, 2024

Conversation

Isan-Rivkin
Copy link
Contributor

@Isan-Rivkin Isan-Rivkin commented Apr 7, 2024

Our hadoop-lakeFS supports hadoop 2 and 3 contracts and introduces the following dependencies:

  1. Hadoop-AWS 2 - AWS SDK 1.7.7
  2. Hadoop-AWS 3 - AWS SDK 1.11+

Those AWS SDK versions has breaking changes between them.
When working with AWS SDK that means the code will not compile.
To solve this issue I copied the minimal as possible code required to make it work on both versions.

@Isan-Rivkin Isan-Rivkin self-assigned this Apr 7, 2024
Copy link

github-actions bot commented Apr 7, 2024

No linked issues found. Please add the corresponding issues in the pull request description.
Use GitHub automation to close the issue when a PR is merged

Copy link

github-actions bot commented Apr 7, 2024

E2E Test Results - DynamoDB Local - Local Block Adapter

10 passed

@Isan-Rivkin Isan-Rivkin requested a review from arielshaqed April 8, 2024 07:12
@Isan-Rivkin Isan-Rivkin changed the base branch from master to feature/m1-lakefsfs-iam April 8, 2024 07:13
@Isan-Rivkin Isan-Rivkin added exclude-changelog PR description should not be included in next release changelog minor-change Used for PRs that don't require issue attached labels Apr 8, 2024
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

This one is hard to test; I suppose the real test will be that a system test works with lakeFS.

Comment on lines 24 to 27
* GetCallerIdentityV4Presigner is that knows how to generate a presigned URL for the GetCallerIdentity API. The presigned URL is signed using SigV4.
* This class is extending AWS4Signer of AWS SDK version 1.7.4 and copies some functions from https://github.com/aws/aws-sdk-java/blob/1.7.4/src/main/java/com/amazonaws/auth/AWS4Signer.java
* The reason we copy some functions is that we need to support aws-hadoop-2 which depends on aws sdk 1.7.4 while aws-hadoop-3 depends on aws sdk 1.11.375.
* Everything that is copied starts with "overridden" prefix, a reasonable alternative would be to use @Override but, AWS made those functions final.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
* GetCallerIdentityV4Presigner is that knows how to generate a presigned URL for the GetCallerIdentity API. The presigned URL is signed using SigV4.
* This class is extending AWS4Signer of AWS SDK version 1.7.4 and copies some functions from https://github.com/aws/aws-sdk-java/blob/1.7.4/src/main/java/com/amazonaws/auth/AWS4Signer.java
* The reason we copy some functions is that we need to support aws-hadoop-2 which depends on aws sdk 1.7.4 while aws-hadoop-3 depends on aws sdk 1.11.375.
* Everything that is copied starts with "overridden" prefix, a reasonable alternative would be to use @Override but, AWS made those functions final.
* GetCallerIdentityV4Presigner is generates a presigned URL for the GetCallerIdentity API, signed using SigV4.
* This class extends AWS4Signer of AWS SDK version 1.7.4 and copies some functions from https://github.com/aws/aws-sdk-java/blob/1.7.4/src/main/java/com/amazonaws/auth/AWS4Signer.java
* The copied functions exist in AWS SDK 1.7.4 but not AWS SDK 1.11.375, so they are
not available on Hadoop AWS 3.
* Copied code has an "overridden" prefix: cannot use @Override as those functions are final.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in the new implementation

@@ -108,15 +107,39 @@ public Request<GeneratePresignGetCallerIdentityRequest> newPresignedRequest() th

public String newPresignedGetCallerIdentityToken() throws Exception {
Request<GeneratePresignGetCallerIdentityRequest> signedRequest = this.newPresignedRequest();
Map<String, ?> rawQueryParams = signedRequest.getParameters();
Map<String, String> params = new HashMap<>();
// check if the value is an array and join it with commas depends on the AWS SDK version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what "depends on the AWS SDK version" means here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

Comment on lines 43 to 52
StringBuilder pattern = new StringBuilder();

pattern
.append(Pattern.quote("+"))
.append("|")
.append(Pattern.quote("*"))
.append("|")
.append(Pattern.quote("%7E"))
.append("|")
.append(Pattern.quote("%2F"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider:

Suggested change
StringBuilder pattern = new StringBuilder();
pattern
.append(Pattern.quote("+"))
.append("|")
.append(Pattern.quote("*"))
.append("|")
.append(Pattern.quote("%7E"))
.append("|")
.append(Pattern.quote("%2F"));
StringBuilder pattern = new StringBuilder()
.append(Pattern.quote("+"))
.append("|")
.append(Pattern.quote("*"))
.append("|")
.append(Pattern.quote("%7E"))
.append("|")
.append(Pattern.quote("%2F"));

@@ -28,7 +28,7 @@ public TemporaryAWSCredentialsLakeFSTokenProvider(String scheme, Configuration c
}
AWSCredentialsProvider awsProvider = new AWSCredentialsProvider() {
@Override
public AWSCredentials getCredentials() {
public AWSCredentials getCredentials() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
public AWSCredentials getCredentials() {
public AWSCredentials getCredentials() {

Comment on lines 11 to 14
@Test
public void name() {
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

@Isan-Rivkin
Copy link
Contributor Author

Hey @arielshaqed, thanks for the review so far, sorry in advance for this but after you mentioned the CI failed I noticed a big flaw in what I did, so then I had to rethink the issue and do something else.
Please take a look - I completely re-wrote clients/hadoopfs/src/main/java/io/lakefs/auth/GetCallerIdentityV4Presigner.java to a minimal common version between the AWS SDK's.

Specifically regarding GetCallerIdentityV4Presigner - there's no need in comparing changes, it's all new.
🙏

@Isan-Rivkin Isan-Rivkin requested a review from arielshaqed April 8, 2024 22:14
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, this is good stuff!

Please avoid "copy from v2", and prefer instead to specify the specific version of the AWS SDK from which code was copied, or the specific version of Hadoop if that is relevant.

* GetCallerIdentityV4Presigner is that knows how to generate a presigned URL for the GetCallerIdentity API.
* The presigned URL is signed using SigV4.
* TODO: when we move to AWS SDK v2, we can use the AWS SDK's implementation of this (depends on hadoop-aws upgrading their own AWS SDK dependency).
* * GetCallerIdentityV4Presigner is generates a presigned URL for the GetCallerIdentity API, signed using SigV4.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* * GetCallerIdentityV4Presigner is generates a presigned URL for the GetCallerIdentity API, signed using SigV4.
* * GetCallerIdentityV4Presigner generates a presigned URL for the GetCallerIdentity API, signed using SigV4.

@@ -17,14 +18,22 @@ public void testProviderIdentityTokenSerde() throws Exception {
conf.set("fs.lakefs." + Constants.TOKEN_AWS_CREDENTIALS_PROVIDER_SESSION_TOKEN_KEY_SUFFIX, "sessionToken");
conf.set("fs.lakefs." + Constants.TOKEN_AWS_STS_ENDPOINT, "https://sts.amazonaws.com");

AWSLakeFSTokenProvider provider = (AWSLakeFSTokenProvider)LakeFSTokenProviderFactory.newLakeFSTokenProvider(Constants.DEFAULT_SCHEME, conf);
AWSLakeFSTokenProvider provider = (AWSLakeFSTokenProvider) LakeFSTokenProviderFactory.newLakeFSTokenProvider(Constants.DEFAULT_SCHEME, conf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need to cast here, it seems like an upcast that is usually implicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i need it here because the newLakeFSTokenProvider() returns LakeFSTokenProvider
while im testing specific code in AWSLakeFSTokenProvider (AWS) and i need the method newPresignedGetCallerIdentityToken

@Isan-Rivkin Isan-Rivkin merged commit 280599a into feature/m1-lakefsfs-iam Apr 10, 2024
10 of 11 checks passed
@Isan-Rivkin Isan-Rivkin deleted the temp-test-migrate-sdkl branch April 10, 2024 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude-changelog PR description should not be included in next release changelog minor-change Used for PRs that don't require issue attached
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants