-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add StorageID to ObjectPointer #8522
Conversation
🎊 PR Preview b361b34 has been successfully built and deployed to https://treeverse-lakeFS-preview-pr-8522.surge.sh 🕐 Build time: 0.01s 🤖 By surge-preview |
modules/block/factory/go.work.sum
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No clue why it wasn't added before, but I think it should be in master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not in it's own PR then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -2091,6 +2096,7 @@ func (c *Controller) ensureStorageNamespace(ctx context.Context, storageNamespac | |||
// this serves two purposes, first, we maintain safety check for older lakeFS version. | |||
// second, in scenarios where lakeFS shouldn't have access to the root namespace (i.e pre-sign URL only). | |||
if c.Config.Graveler.EnsureReadableRootNamespace { | |||
// TODO (gilo): ObjectPointer init - add StorageID here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change delayed? It's going in a different PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce the scope of this one, and making it more reviewable (yet complete enough to be able to merge it in the end).
var blob *upload.Blob | ||
if mediaType != "multipart/form-data" { | ||
// handle non-multipart, direct content upload | ||
address := c.PathProvider.NewPath() | ||
blob, err = upload.WriteBlob(ctx, c.BlockAdapter, repo.StorageNamespace, address, r.Body, r.ContentLength, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this code style change indicate a change in our best practices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. I don't think we have a code style guide for go,
And the formatter didn't seem to care.
These lines became too long.
pkg/api/controller_test.go
Outdated
@@ -2550,7 +2550,7 @@ func TestController_ObjectsHeadObjectHandler(t *testing.T) { | |||
buf := new(bytes.Buffer) | |||
buf.WriteString("this is file content made up of bytes") | |||
address := upload.DefaultPathProvider.NewPath() | |||
blob, err := upload.WriteBlob(context.Background(), deps.blocks, onBlock(deps, "ns1"), address, buf, 37, block.PutOpts{StorageClass: &expensiveString}) | |||
blob, err := upload.WriteBlob(context.Background(), deps.blocks, "", onBlock(deps, "ns1"), address, buf, 37, block.PutOpts{StorageClass: &expensiveString}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it valid to provide an empty string for the storageid? Shouldn't this trigger some sort of "invalid parameter" exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, empty StorageID is the only valid option.
@@ -63,6 +63,7 @@ const DefaultPreSignExpiryDuration = 15 * time.Minute | |||
// ObjectPointer is a unique identifier of an object in the object | |||
// store: the store is a 1:1 mapping between pointers and objects. | |||
type ObjectPointer struct { | |||
StorageID string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation: This is such an important core class class, but there is no explanation what sort of values these fields are ment to take. Is "S3" a valid storage ID? Or maybe a URL is expected like "https://bucketname.s3.amazonaws.com"? I have no idea when reading this code?
Same goes for the other two fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is out of scope for this PR - we should however do this for the config part and document that well.
@itaigilo This is another thing we need to bring up with product
bucketName = "test" | ||
ObjectBlockSize = 1024 * 3 | ||
storageNamespace = "test" | ||
ObjectBlockSize = 1024 * 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this size measured in bits? bytes? gigabytes? Would be nice if I didn't have to guess(const name or docs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No clue -
only changed whitespaces here...
@@ -110,6 +110,7 @@ func (controller *GetObject) Handle(w http.ResponseWriter, req *http.Request, o | |||
contentLength := entry.Size | |||
contentRange := "" | |||
objectPointer := block.ObjectPointer{ | |||
StorageID: o.Repository.StorageID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume these will be "" fore existing repos? Is that valid? Don't we need a migrator to initialize the values for existing repos? Is that in this PR? Another issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, "" is valid, for existing repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - most of it is pretty straightforward.
I am however asking for changes on some of the open issues + some cleanups.
pkg/upload/write_blob.go
Outdated
@@ -17,11 +17,12 @@ type Blob struct { | |||
CreationDate time.Time | |||
} | |||
|
|||
func WriteBlob(ctx context.Context, adapter block.Adapter, bucketName, address string, body io.Reader, contentLength int64, opts block.PutOpts) (*Blob, error) { | |||
func WriteBlob(ctx context.Context, adapter block.Adapter, storageID, storageNamespace, address string, body io.Reader, contentLength int64, opts block.PutOpts) (*Blob, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point I think we should just consider passing the ObjectPointer instead of all these params
@@ -419,6 +419,7 @@ func (tfs *TierFS) newLocalFileRef(namespace, nsPath, filename string) localFile | |||
} | |||
|
|||
func (tfs *TierFS) objPointer(namespace, filename string) block.ObjectPointer { | |||
// TODO (gilo): ObjectPointer init - add StorageID here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we planning on doing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically we need to propagate this in the hierarchy -
I'll soon do this in a separate PR, to not overload this PR.
pkg/catalog/catalog_test.go
Outdated
objReader, err := c.BlockAdapter.Get(ctx, block.ObjectPointer{ | ||
Identifier: obj, | ||
IdentifierType: block.IdentifierTypeFull, | ||
StorageID: "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary - can be removed
pkg/api/controller.go
Outdated
storageNamespace := repo.StorageNamespace | ||
storageID := repo.StorageID | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant - please remove
@@ -63,6 +63,7 @@ const DefaultPreSignExpiryDuration = 15 * time.Minute | |||
// ObjectPointer is a unique identifier of an object in the object | |||
// store: the store is a 1:1 mapping between pointers and objects. | |||
type ObjectPointer struct { | |||
StorageID string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is out of scope for this PR - we should however do this for the config part and document that well.
@itaigilo This is another thing we need to bring up with product
pkg/block/azure/adapter.go
Outdated
@@ -139,6 +139,7 @@ func ResolveBlobURLInfoFromURL(pathURL *url.URL) (BlobURLInfo, error) { | |||
|
|||
func resolveBlobURLInfo(obj block.ObjectPointer) (BlobURLInfo, error) { | |||
key := obj.Identifier | |||
// TODO (gilo): verify there's no need for StorageID here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point you are already in the context of a specific adapter instance so storage ID is not needed here
pkg/block/s3/adapter.go
Outdated
@@ -411,6 +411,7 @@ func (a *Adapter) GetPreSignedURL(ctx context.Context, obj block.ObjectPointer, | |||
|
|||
log := a.log(ctx).WithFields(logging.Fields{ | |||
"operation": "GetPreSignedURL", | |||
"sid": obj.StorageID, // TODO (gilo): is "sid" a good enough name? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not good enough :)
Let's be explicit (we don't have to be stingy in our logs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason I haven't done this is that StorageNamespace is logged as namespace
and not as storage_namespace
😖
pkg/block/s3/adapter.go
Outdated
@@ -466,6 +467,7 @@ func (a *Adapter) GetPresignUploadPartURL(ctx context.Context, obj block.ObjectP | |||
|
|||
log := a.log(ctx).WithFields(logging.Fields{ | |||
"operation": "GetPresignUploadPartURL", | |||
"sid": obj.StorageID, // TODO (gilo): is "sid" a good enough name? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
pkg/catalog/catalog.go
Outdated
@@ -2630,7 +2630,7 @@ type UncommittedParquetObject struct { | |||
CreationDate int64 `parquet:"name=creation_date, type=INT64, convertedtype=INT_64"` | |||
} | |||
|
|||
func (c *Catalog) uploadFile(ctx context.Context, ns graveler.StorageNamespace, location string, fd *os.File, size int64) (string, error) { | |||
func (c *Catalog) uploadFile(ctx context.Context, sid graveler.StorageID, ns graveler.StorageNamespace, location string, fd *os.File, size int64) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use sid, it's overloaded and confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the ns
"convention".
pkg/catalog/catalog.go
Outdated
@@ -2630,7 +2630,7 @@ type UncommittedParquetObject struct { | |||
CreationDate int64 `parquet:"name=creation_date, type=INT64, convertedtype=INT_64"` | |||
} | |||
|
|||
func (c *Catalog) uploadFile(ctx context.Context, ns graveler.StorageNamespace, location string, fd *os.File, size int64) (string, error) { | |||
func (c *Catalog) uploadFile(ctx context.Context, sid graveler.StorageID, ns graveler.StorageNamespace, location string, fd *os.File, size int64) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point consider just passing the repository object, will be cleaner than just adding more params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@N-o-Z fixed, PTAL again.
pkg/block/s3/adapter.go
Outdated
@@ -411,6 +411,7 @@ func (a *Adapter) GetPreSignedURL(ctx context.Context, obj block.ObjectPointer, | |||
|
|||
log := a.log(ctx).WithFields(logging.Fields{ | |||
"operation": "GetPreSignedURL", | |||
"sid": obj.StorageID, // TODO (gilo): is "sid" a good enough name? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason I haven't done this is that StorageNamespace is logged as namespace
and not as storage_namespace
😖
pkg/catalog/catalog.go
Outdated
@@ -2630,7 +2630,7 @@ type UncommittedParquetObject struct { | |||
CreationDate int64 `parquet:"name=creation_date, type=INT64, convertedtype=INT_64"` | |||
} | |||
|
|||
func (c *Catalog) uploadFile(ctx context.Context, ns graveler.StorageNamespace, location string, fd *os.File, size int64) (string, error) { | |||
func (c *Catalog) uploadFile(ctx context.Context, sid graveler.StorageID, ns graveler.StorageNamespace, location string, fd *os.File, size int64) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the ns
"convention".
@@ -419,6 +419,7 @@ func (tfs *TierFS) newLocalFileRef(namespace, nsPath, filename string) localFile | |||
} | |||
|
|||
func (tfs *TierFS) objPointer(namespace, filename string) block.ObjectPointer { | |||
// TODO (gilo): ObjectPointer init - add StorageID here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically we need to propagate this in the hierarchy -
I'll soon do this in a separate PR, to not overload this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful, thanks!!
-1, | ||
block.PutOpts{StorageClass: params.StorageClass}, | ||
) | ||
objectPointer := block.ObjectPointer{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
@@ -2630,7 +2630,7 @@ type UncommittedParquetObject struct { | |||
CreationDate int64 `parquet:"name=creation_date, type=INT64, convertedtype=INT_64"` | |||
} | |||
|
|||
func (c *Catalog) uploadFile(ctx context.Context, sid graveler.StorageID, ns graveler.StorageNamespace, location string, fd *os.File, size int64) (string, error) { | |||
func (c *Catalog) uploadFile(ctx context.Context, repo *graveler.RepositoryRecord, location string, fd *os.File, size int64) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
@@ -411,7 +411,7 @@ func (a *Adapter) GetPreSignedURL(ctx context.Context, obj block.ObjectPointer, | |||
|
|||
log := a.log(ctx).WithFields(logging.Fields{ | |||
"operation": "GetPreSignedURL", | |||
"sid": obj.StorageID, // TODO (gilo): is "sid" a good enough name? | |||
"storage_id": obj.StorageID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
# Conflicts: # pkg/api/controller.go
Closes #8496.
Change Description
Adding
StorageID
toObjectPointer
.Adding
StorageID
to the init ofObjectPointer
, that cover the basic flows of the server.My approach in this PR
Search for all initializations of ObjectPointer, and:
TODO
if repo isn't present.What's Next
I'll open new Issues for the following, to be handled later on:
StorageID
toobjPointer
of TierFS.StorageID
to GarbageCollectionManager.StorageID
toResolveNamespace
:DefaultResolveNamespace
relevant toStorageID
?ObjectID
references.ensureStorageNamespace
(it probably need to be moved to the Block interface, and addstorageID
to its signature).OutputWrite
should haveStorageID
, and so doesHookOutputWriter
.