-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(PPS-411): Add bucket selection support for multi part upload #1112
Conversation
fence/blueprints/data/blueprint.py
Outdated
@@ -176,17 +177,9 @@ def upload_data_file(): | |||
) | |||
|
|||
protocol = params["protocol"] if "protocol" in params else None | |||
bucket = params.get("bucket") | |||
bucket = params.get("bucket", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If should already default to None, you don't need this
fence/blueprints/data/blueprint.py
Outdated
logger.debug(f"Bucket '{bucket}' not in ALLOWED_DATA_UPLOAD_BUCKETS config") | ||
raise Forbidden(f"Uploading to bucket '{bucket}' is not allowed") | ||
|
||
_is_allowed_data_upload_bucket_configured(bucket) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this function already exists, but the name is a bit confusing. The prefix "is" usually implies a boolean return. Since this is a check I would change this to something like "verify_data_upload_bucket_configuration"
fence/blueprints/data/blueprint.py
Outdated
@@ -257,12 +256,18 @@ def generate_multipart_upload_presigned_url(): | |||
default=default_expires_in, | |||
) | |||
|
|||
bucket = params.get("bucket", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.get() already returns None if it's not found, so the , None
is redundant. You could just .get("bucket")
fence/blueprints/data/blueprint.py
Outdated
if bucket: | ||
_is_allowed_data_upload_bucket_configured(bucket) | ||
|
||
# TODO add bucket param here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a real TODO? like we need to do something in the future, can we do it now?
If this is just a note for the future in case we need to add things, don't use TODO
, just say note:
or something
fence/blueprints/data/blueprint.py
Outdated
@@ -285,6 +290,7 @@ def complete_multipart_upload(): | |||
raise UserError("missing required arguments: {}".format(list(missing))) | |||
|
|||
default_expires_in = flask.current_app.config.get("MAX_PRESIGNED_URL_TTL", 3600) | |||
bucket = params.get("bucket", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as prev comments, can just use .get(
openapis/swagger.yaml
Outdated
bucket: | ||
type: string | ||
required: false | ||
description: bucket to upload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description: bucket to upload | |
description: bucket to upload to |
openapis/swagger.yaml
Outdated
bucket: | ||
type: string | ||
required: false | ||
description: bucket to upload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description: bucket to upload | |
description: bucket to upload to |
fence/blueprints/data/indexd.py
Outdated
raise InternalError( | ||
"fence not configured with data upload bucket; can't create signed URL" | ||
if bucket: | ||
s3_buckets = get_value( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this logic exactly what _is_allowed_data_upload_bucket_configured(bucket)
is doing? Can you reuse that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you still left a chunk of code you don't need. If you're calling verify you don't need to get the s3_buckets here
fence/blueprints/data/blueprint.py
Outdated
@@ -9,6 +9,7 @@ | |||
BlankIndex, | |||
IndexedFile, | |||
get_signed_url_for_file, | |||
_is_allowed_data_upload_bucket_configured, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't/shouldn't import private functions or variables. The prefixed _
implies private in Python naming conventions so if you need the function to be used here, you need to rename it to remove the _
prefix and add a docstring
from fence.errors import InternalError | ||
|
||
MAX_TRIES = 5 | ||
|
||
logger = get_logger(__name__) | ||
|
||
|
||
def initilize_multipart_upload(bucket, key, credentials): | ||
def initilize_multipart_upload(bucket_name, key, credentials): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor name for typo: initialize
, not initilize
fence/blueprints/data/indexd.py
Outdated
if not bucket: | ||
try: | ||
bucket = flask.current_app.config["DATA_UPLOAD_BUCKET"] | ||
except KeyError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this error will never happen b/c DATA_UPLOAD_BUCKET
is part of the default config, so it will always be set, it might just be ''. And this isn't catching the case where the bucket is actually empty ''.
So you can simplify/fix this whole block with:
bucket = bucket or flask.current_app.config["DATA_UPLOAD_BUCKET"]
if not bucket:
raise InternalError("fence not configured with data upload bucket; can't create signed URL")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a test that deletes that configuration so this will fail. https://github.com/uc-cdis/fence/blob/master/tests/data/test_blank_index.py#L351
I don't think it gets another default value.
Should I just set the test to empty the field rather than delete the config since that would be more realistic?
fence/blueprints/data/indexd.py
Outdated
|
||
def verify_data_upload_bucket_configuration(bucket): | ||
s3_buckets = get_value( | ||
flask.current_app.config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to use get_value
b/c ALLOWED_DATA_UPLOAD_BUCKETS
is part of the default configuration, which guarantees its existence in the config dictionary. If you're trying to check if it's an empty list, then this isn't catching that either.
s3_buckets = flask.current_app.config["ALLOWED_DATA_UPLOAD_BUCKETS"]
if not s3_buckets:
raise InternalError("ALLOWED_DATA_UPLOAD_BUCKETS not configured")
fence/blueprints/data/indexd.py
Outdated
raise InternalError( | ||
"fence not configured with data upload bucket; can't create signed URL" | ||
if bucket: | ||
s3_buckets = get_value( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you still left a chunk of code you don't need. If you're calling verify you don't need to get the s3_buckets here
fence/blueprints/data/indexd.py
Outdated
verify_data_upload_bucket_configuration(bucket) | ||
else: | ||
try: | ||
bucket = flask.current_app.config["DATA_UPLOAD_BUCKET"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see previous comment about default configs. this will never KeyError
fence/blueprints/data/indexd.py
Outdated
else: | ||
try: | ||
bucket = flask.current_app.config["DATA_UPLOAD_BUCKET"] | ||
except KeyError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see previous comment
@@ -95,7 +125,7 @@ def complete_multipart_upload(bucket, key, credentials, uploadId, parts): | |||
|
|||
|
|||
def generate_presigned_url_for_uploading_part( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are all positive tests. I know they were here before, but can you add some for the error cases like when DATA_UPLOAD and/or ALLOWED_DATA_UPLOAD_BUCKETS buckets aren't specified / are empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like my previous comment might resolve this test case? 🤔
689d11e
to
5f710c8
Compare
bbaa9bb
to
517566d
Compare
Jira Ticket: PXP-xxxx
cdis-data-client PR (uc-cdis/cdis-data-client#123)
New Features
Breaking Changes
Bug Fixes
Improvements
Dependency updates
Deployment changes