You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have Transaction information, but it's not well surfaced. Along with each version, it would be nice to list the operations performed to produce each version.
Cleanup
When we cleanup versions, it removes them from the list.
Performance
The logic for listing version is:
Use CommitHandler to list the manifests
For each manifest:
Open it
Read off the timestamp and metadata from the versions.
If there are a lot of versions, this can be expensive.
API
We would create a new history() command to replace the versions() API that would provide more information about the commits:
TBD: how to add a created_by field? Should we just create metadata? Then we could add arbitrary things like: lance_version, lancedb_version, author, etc.
The only_tagged parameter let's users filter just for the versions that have tags currently.
Improving performance and handling cleanup
We can store a new file to speed up history listing: _history/{version}.lance. This would be a Lance file containing a cache of history up to version.
Tags are mutable, but we can still include them in the cache as (name, etag) pairs. We can list the _refs/tags directory. If the tag is missing, we ignore it. If there is an etag we don't recognize, we can add that.
When we run history(), we can cache this file if it doesn't exist. When we run cleanup_old_versions(), we can also create this file before deleting old versions. By caching this data, we keep around the full history. If we want, we can also add a parameter to cleanup_old_version() to prune this history so we don't retain longer than some time period.
The text was updated successfully, but these errors were encountered:
Adding operation information
We have Transaction information, but it's not well surfaced. Along with each version, it would be nice to list the operations performed to produce each version.
Cleanup
When we cleanup versions, it removes them from the list.
Performance
The logic for listing version is:
CommitHandler
to list the manifeststimestamp
andmetadata
from the versions.If there are a lot of versions, this can be expensive.
API
We would create a new
history()
command to replace theversions()
API that would provide more information about the commits:TBD: how to add a
created_by
field? Should we just create metadata? Then we could add arbitrary things like:lance_version
,lancedb_version
,author
, etc.The
only_tagged
parameter let's users filter just for the versions that have tags currently.Improving performance and handling cleanup
We can store a new file to speed up history listing:
_history/{version}.lance
. This would be a Lance file containing a cache of history up toversion
.Tags are mutable, but we can still include them in the cache as
(name, etag)
pairs. We can list the_refs/tags
directory. If the tag is missing, we ignore it. If there is an etag we don't recognize, we can add that.When we run
history()
, we can cache this file if it doesn't exist. When we runcleanup_old_versions()
, we can also create this file before deleting old versions. By caching this data, we keep around the full history. If we want, we can also add a parameter tocleanup_old_version()
to prune this history so we don't retain longer than some time period.The text was updated successfully, but these errors were encountered: