-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(PPS-411): Add bucket selection support for multi part upload #123
Conversation
Pull Request Test Coverage Report for Build 7589691499
💛 - Coveralls |
@@ -105,6 +105,7 @@ type RetryObject struct { | |||
GUID string | |||
RetryCount int | |||
Multipart bool | |||
Bucket *string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go conventions prefer the type to be all lined up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why not just use a string instead of a string pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using just string was giving me some weird errors when I tried to build it. Im not too familiar with it but I can change it to a string and give it another Go. 🥁
gen3-client/g3cmd/retry-upload.go
Outdated
@@ -79,6 +81,10 @@ func retryUpload(failedLogMap map[string]commonUtils.RetryObject) { | |||
log.Printf("Sleep for %.0f seconds\n", GetWaitTime(ro.RetryCount).Seconds()) | |||
time.Sleep(GetWaitTime(ro.RetryCount)) // exponential wait for retry | |||
|
|||
if ro.Bucket !=nil || *ro.Bucket != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be a space != nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this is a little confusing. I think if Bucket was just a string like GUID this would be a little easier.
gen3-client/g3cmd/utils.go
Outdated
@@ -38,6 +38,7 @@ type ManifestObject struct { | |||
// InitRequestObject represents the payload that sends to FENCE for getting a singlepart upload presignedURL or init a multipart upload for new object file | |||
type InitRequestObject struct { | |||
Filename string `json:"file_name"` | |||
Bucket *string `json:"bucket,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a string pointer and not just a string?
gen3-client/g3cmd/retry-upload.go
Outdated
@@ -53,6 +53,8 @@ func retryUpload(failedLogMap map[string]commonUtils.RetryObject) { | |||
var guid string | |||
var presignedURL string | |||
var err error | |||
var bucketPtr string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is suffixed with Ptr but it doesn't look like it's a pointer? It's just a string
gen3-client/g3cmd/utils.go
Outdated
@@ -57,13 +58,15 @@ type MultipartUploadRequestObject struct { | |||
Key string `json:"key"` | |||
UploadID string `json:"uploadId"` | |||
PartNumber int `json:"partNumber"` | |||
Bucket *string `json:"bucket,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here. The pattern for other variables here seem to be to use the raw type (string, int) rather than a pointer
gen3-client/g3cmd/utils.go
Outdated
} | ||
|
||
// MultipartCompleteRequestObject represents the payload that sends to FENCE for completeing a multipart upload | ||
type MultipartCompleteRequestObject struct { | ||
Key string `json:"key"` | ||
UploadID string `json:"uploadId"` | ||
Parts []MultipartPartObject `json:"parts"` | ||
Bucket *string `json:"bucket,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment re: pointers
gen3-client/g3cmd/utils.go
Outdated
request := new(jwt.Request) | ||
configure := new(jwt.Configure) | ||
function := new(jwt.Functions) | ||
|
||
function.Config = configure | ||
function.Request = request | ||
|
||
multipartCompleteObject := MultipartCompleteRequestObject{Key: key, UploadID: uploadID, Parts: parts} | ||
var bucketPtr *string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you just passed through the bucketName as a string you wouldn't have to do all this every time
gen3-client/g3cmd/upload-multiple.go
Outdated
if batch { | ||
if forceMultipart { | ||
uploadPath := []string{furObjects[i].FilePath} | ||
_, multipartFilePaths := validateFilePath(uploadPath, forceMultipart) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validateFilePath returns two arrays (singlepartFilePaths, multipartFilePaths). It looks as if the singlepartFilePaths
array is ignored. @BinamB
if len(batchFURObjects) < workers { | ||
batchFURObjects = append(batchFURObjects, furObject) | ||
} else { | ||
batchUpload(gen3Interface, batchFURObjects, workers, respCh, errCh) | ||
batchUpload(gen3Interface, batchFURObjects, workers, respCh, errCh, bucketName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't believe multipartUpload
gets called at all if part of a batch? @BinamB
Hey @BinamB thank you for your work adding support for small files with the multipart upload! The files are being uploaded successfully to the bucket which is fantastic, unfortunately I'm encountering a couple of potential issues during this process (that might be due to my testing setup) that could interfere with potential end users: My Testing SetupTesting directory and files: .
├── DATA
│ ├── file-1MB.1.txt
│ ├── file-1MB.2.txt
│ ├── file-1MB.3.txt
│ ├── ...
│ └── file-1MB.99.txt
└── file-manifest.json
2 directories, 101 files Command: gen3-client upload-multiple --profile=development \
--manifest=./file-manifest.json \
--bucket aced-ohsu-development \
--upload-path . \
--force-multipart \
--batch \
--numparallel=10 File manifest sample (for one file): [
{
"object_id": "bda45761-f6d5-543c-82ab-e459b94b96c1",
"file_name": "DATA/file-1MB.1.txt",
"size": 1048576,
"modified": "2023-11-02T11:11:10.707559",
"md5": "b6d81b360a5672d80c27430f39153e2c",
}
] Issue: Slow file validationThe upload process tries to validate files that aren't in the file manifest which can result in a somewhat long time (a few minutes) to start the actual upload. Could this be due to the following logic in upload-multiple.go?
for i, furObject := range furObjects {
if forceMultipart {
uploadPath, _ := commonUtils.GetAbsolutePath(uploadPath)
filePaths, err := commonUtils.ParseFilePaths(uploadPath, false) <-- Includes non-manifest files?
if err != nil {
log.Fatalf("Error when parsing file paths: " + err.Error())
}
singlepartFilePaths, multipartFilePaths := validateFilePath(filePaths, forceMultipart) Issue: Extra files uploadedThe upload process started much quicker after removing all files in the testing directory that are not included in the file manifest. However it seems like the file manifest itself was uploaded judging by the resulting output?
Confirmed by adding a "Secret" file to the upload path (but not to the manifest): echo "Secret password" > DATA/secret.txt Rerunning the ~/.gen3/logs/development_succeeded_log.json: ...
"DATA/secret.txt": "PREFIX/bf46fc01-0d27-43ca-b014-0af89334d94d"
... Issue: Erroneous "Skipped" log after successful uploadThe test files themselves were uploaded successfully, but the resulting output included the following error for each file (despite removing all upload logs prior to the test:
Potential Tester Error?Let me know if I'm missing something big or small in my testing setup! I've been known to miss the forest for the trees so it could be something obvious, can also provide any additional info on the testing process. Thank you! |
gen3-client/g3cmd/upload-multiple.go
Outdated
} | ||
if forceMultipart { | ||
uploadPath, _ := commonUtils.GetAbsolutePath(uploadPath) | ||
// filePaths, err := commonUtils.ParseFilePaths(uploadPath, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not leave dead code, if we don't need this, remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i forgot to post comments from when i started reviewing this
…o feat/bucket_param_multipart
* Use file_name provided in manifest * Use file_name provided in manifest * Ensure file_name used in url * Ensure bucket passed to GeneratePresignedURL
…o feat/bucket_param_multipart
endPointPostfix := commonUtils.FenceDataUploadEndpoint + "/" + furObject.GUID | ||
msg, err := g3.DoRequestWithSignedHeader(&profileConfig, endPointPostfix, "", nil) | ||
if furObject.PresignedURL == "" { | ||
endPointPostfix := commonUtils.FenceDataUploadEndpoint + "/" + furObject.GUID + "?file_name=" + url.QueryEscape(furObject.Filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how did this ever work before? the file_name
param is required in the fence data upload endpoint 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not sure. I followed the code and there are cases for a nil
body but it ends here and still sends this request somehow
https://github.com/uc-cdis/cdis-data-client/blob/master/gen3-client/jwt/functions.go#L66
already reviewed and approved by someone else
Jira Ticket: PXP-xxxx
Fence PR (uc-cdis/fence#1112)
New Features
upload-multiple
upload-multiple
Breaking Changes
Bug Fixes
Improvements
Dependency updates
Deployment changes