Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinguish between the nature of images #1532

Merged
merged 1 commit into from
Mar 12, 2024

Conversation

lubosmj
Copy link
Member

@lubosmj lubosmj commented Mar 3, 2024

closes #1437

@lubosmj lubosmj force-pushed the distinguish-image-nature-1437 branch 2 times, most recently from 8fbcc78 to 37b5879 Compare March 3, 2024 20:04
@lubosmj
Copy link
Member Author

lubosmj commented Mar 3, 2024

This is the output after syncing data https://github.com/pulp/pulp_container/pkgs/container/bootc-labeled.

http :5001/pulp/api/v3/content/container/manifests/?is_bootable=True
{
    "count": 2,
    "next": null,
    "previous": null,
    "results": [
        {
            "artifact": "/pulp/api/v3/artifacts/018e05e1-d54a-743b-87c3-e641a2305f64/",
            "blobs": [],
            "config_blob": null,
            "digest": "sha256:6a4a6aded92e720d91ec9bfdb0bf80a37846ac837be652f4d87140a46dc4ba71",
            "is_bootable": true,
            "is_flatpak": false,
            "listed_manifests": [
                "/pulp/api/v3/content/container/manifests/018e05e1-d6e9-7212-b323-24c60c40579e/"
            ],
            "media_type": "application/vnd.oci.image.index.v1+json",
            "pulp_created": "2024-03-03T19:54:39.966583Z",
            "pulp_href": "/pulp/api/v3/content/container/manifests/018e05e1-d563-77dc-af34-65cedfd92aa5/",
            "pulp_last_updated": "2024-03-03T19:54:39.966598Z",
            "schema_version": 2
        },
        {
            "artifact": "/pulp/api/v3/artifacts/018e05e1-d6d0-7260-9ff2-a483de045536/",
            "blobs": [
                "/pulp/api/v3/content/container/blobs/018e05e1-d6eb-72ed-b420-a71f6c0fa12c/"
            ],
            "config_blob": "/pulp/api/v3/content/container/blobs/018e05e1-d6ed-7f53-86c8-6fd9b87173a1/",
            "digest": "sha256:23973d32066ae1cb15ce3cef7fe66a8aa87038936b6c828cc5c129d46a3a9233",
            "is_bootable": true,
            "is_flatpak": false,
            "listed_manifests": [],
            "media_type": "application/vnd.oci.image.manifest.v1+json",
            "pulp_created": "2024-03-03T19:54:39.948038Z",
            "pulp_href": "/pulp/api/v3/content/container/manifests/018e05e1-d6e9-7212-b323-24c60c40579e/",
            "pulp_last_updated": "2024-03-03T19:54:39.948052Z",
            "schema_version": 2
        }
    ]
}
http :5001/pulp/api/v3/content/container/manifests/018e05e1-d6e9-7212-b323-24c60c40579e/
{
    "annotations": {
        "org.opencontainers.image.base.digest": "sha256:695ae78b4957fef4e53adc51febd07f5401eb36fcd80fff3e5107a2b4aa42ace",
        "org.opencontainers.image.base.name": "docker.io/library/alpine:3.18"
    },
    "artifact": "/pulp/api/v3/artifacts/018e05e1-d6d0-7260-9ff2-a483de045536/",
    "blobs": [
        "/pulp/api/v3/content/container/blobs/018e05e1-d6eb-72ed-b420-a71f6c0fa12c/"
    ],
    "config_blob": "/pulp/api/v3/content/container/blobs/018e05e1-d6ed-7f53-86c8-6fd9b87173a1/",
    "digest": "sha256:23973d32066ae1cb15ce3cef7fe66a8aa87038936b6c828cc5c129d46a3a9233",
    "is_bootable": true,
    "is_flatpak": false,
    "labels": {
        "containers.bootc": "1",
        "io.buildah.version": "1.33.5"
    },
    "listed_manifests": [],
    "media_type": "application/vnd.oci.image.manifest.v1+json",
    "pulp_created": "2024-03-03T19:54:39.948038Z",
    "pulp_href": "/pulp/api/v3/content/container/manifests/018e05e1-d6e9-7212-b323-24c60c40579e/",
    "pulp_last_updated": "2024-03-03T19:54:39.948052Z",
    "schema_version": 2
}

Detail exposes additional fields for annotations and labels. Manifest lists inherit the nature of listed containers. However, annotations and labels are not a subject of the inheritance.

@lubosmj lubosmj force-pushed the distinguish-image-nature-1437 branch 9 times, most recently from 0a1d1ee to cad750c Compare March 4, 2024 12:25
@lubosmj lubosmj marked this pull request as ready for review March 4, 2024 15:45
Copy link
Member

@ipanova ipanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR it does exactly what it meant to, however, I would definitely rework how you are initializing the metadata because you are reading from file multiple times data you have access to and can just pass it. This is a concern especially with objects storage because you pay for every get operation on the file.
Also this PR code path is tightly related to the artifactless manifest, so Michal will need to rework after you, how init_metadata is done. If you don't want to address my previous point I would suggest then wait until artifactless manifests/config is done.

@@ -460,6 +460,9 @@ async def init_pending_content(self, digest, manifest_data, media_type, artifact
media_type=media_type,
config_blob=config_blob,
)

await sync_to_async(manifest.init_metadata)(artifact)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would you be passing artifact here, that you will need to open and parse again if you already have parsed manifest data on line 447

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to have init_metadata(manifest_data, config_data=None) where you would be passing json loadd data, this way you will not need to open and parse files multiple times

you will need to adjust save_config_blob to return both config_blob, config_blob_data

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I like the idea.

@@ -1181,6 +1181,11 @@ def put(self, request, path, pk=None):
)
manifest = manifest_list

# once relations for listed manifests are established,
# it is possible to initialize the metadata, like labels
manifest.init_metadata(artifact)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here and further down

@@ -35,6 +35,9 @@
# a repository containing 4 manifest lists and 5 manifests
PULP_FIXTURE_1 = "pulp/test-fixture-1"

# a dummy repository containing two manifests with an arbitrary bootc label
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two manifests(index and image)

class ManifestViewSet(ContainerContentQuerySetMixin, ReadOnlyContentViewSet):
class ManifestViewSetDetailMixin:
"""
A mixin that allows differentiating between listing manifests and retrieving a single manifest.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the nutshell this is nice idea, in practice i think users will rely to see same fields for manifests whether on list of get. I.e. they will need to write 2 different parsers(a bit of inconvenience) that will be able to cope with no labels/annotations fields during listing and take them into account when getting a specific instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do recommend returning annotations and labels for the list view too, right? Or, have you considered not exposing such fields at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no no, i mean exposing them for list and get.
@mdellweg any opinion on the approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hard question that should ideally be answered on the application level. Do we promise our users in general that the objects from list and view calls are equivalent.

I fear that ship has sailed already. We do have similar things for tasks and rpm packages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to use one and only ManifestSerializer to not do any other gimmicks. Thanks for your inputs!


def update_manifest(manifest):
with suppress(ObjectDoesNotExist):
artifact = manifest._artifacts.get()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know if we should keep this as a migration or rather a management command. this is expensive

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Let me write a new populate-annotations-labels command.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see 0033_raise_warning_for_repair.py

@lubosmj
Copy link
Member Author

lubosmj commented Mar 5, 2024

We reached a consensus on adopting the requested changes and merging this before #1530.

Thanks for pointing out the possible cost of each artifact.file.read operation! I will fix that!

@lubosmj lubosmj marked this pull request as draft March 6, 2024 17:17
@lubosmj lubosmj force-pushed the distinguish-image-nature-1437 branch 14 times, most recently from 3cc2588 to 7fe828c Compare March 7, 2024 15:02
@lubosmj lubosmj marked this pull request as ready for review March 7, 2024 15:04
Comment on lines 9 to 11
"Manifests containing additional metadata, like image manifest annotations and "
"image configuration labels, need to be updated. Please, run 'pulpcore-manager "
"init-image-nature' to extract and initiliaze the metadata from related artifacts."
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this to a more friendlier manner: If you are interested in initializing annotations and labels, run this django command...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without interest :D , just 'run django command to initialize...'

@@ -107,6 +118,73 @@ class Manifest(Content):
through_fields=("image_manifest", "manifest_list"),
)

def init_metadata(self, manifest_data=None, manifest_artifact=None, config_artifact=None):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def init_metadata(self, manifest_data=None, manifest_artifact=None, config_artifact=None):
def init_metadata(self, manifest_data=None):

These arguments are not used anywhere right now.

Copy link
Member

@ipanova ipanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

@@ -0,0 +1,2 @@
Incorporated a notion of container images' characteristics. Users can now filter manifests by their
nature using the ``is_flatpak`` or ``is_bootable`` field on the corresponding Manifest endpoint.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please mention that you've exposed labels and annotations

self.update_manifests(manifest_lists)

def update_manifests(self, manifests_qs):
paginator = Paginator(manifests_qs, PAGE_CHUNK_SIZE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can safely go to 1000

Comment on lines 9 to 11
"Manifests containing additional metadata, like image manifest annotations and "
"image configuration labels, need to be updated. Please, run 'pulpcore-manager "
"init-image-nature' to extract and initiliaze the metadata from related artifacts."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without interest :D , just 'run django command to initialize...'

@@ -674,3 +675,8 @@ def _post_save(self, batch):
BlobManifest.objects.bulk_create(blob_manifests, ignore_conflicts=True)
if manifest_list_manifests:
ManifestListManifest.objects.bulk_create(manifest_list_manifests, ignore_conflicts=True)

# the nature of manifest lists can be determined after creating the listed manifest relation
for ml in manifest_lists:
Copy link
Member

@ipanova ipanova Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this operations is not really related to relation creation, can you rename the variable from manifest_lits to something more explicit, like manifests_lists_nature_to_init

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to say in the comment that we can run this initialization only after creating the relations between listed manifests and manifest lists. It appears that the combination of the comment and variable name causes confusion. 😞 Let me remove the comment and rename the variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the comment is helpful. you can leave it + rename the var ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I decided to rewrite the comment instead:

        # after creating the relation between listed manifests and manifest lists,
        # it is possible to initialize the nature of the corresponding manifest lists

How do you like it now? The suggested explicit name is hard to read and understand. We iterate over manifest lists and thus it does make sense to call the list "manifest_lists".

@lubosmj lubosmj force-pushed the distinguish-image-nature-1437 branch from 7fe828c to 5b3c4f2 Compare March 8, 2024 20:16
@lubosmj lubosmj requested a review from ipanova March 8, 2024 20:19
Copy link
Member

@ipanova ipanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some non-blocking comments.

def print_warning_for_initializing_image_nature(apps, schema_editor):
warnings.warn(
"Run 'pulpcore-manager init-image-nature' to initialize metadata (i.e., annotations "
"and labels) for all manifests. The metadata are stored in associated artifacts and "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata is singular, hence is stored

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove last sentence, what kind of important info it brings? "their contents should be accessible through the Manifest model now" what contents? all contents? or just selected ones.
Users are not interested in tech.details of how a manifests is composed, imo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a shorter concise version: "Run 'pulpcore-manager init-image-nature' to initialize and expose metadata (i.e., annotations and labels) for the manifests"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to be explicit when it comes to reading data from the storage back-end. If administrators run this migration, they might be a bit surprised by the the rise in expenses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the migration itself does not read from disk anything, it is the django management command that it does

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the django-management command migration. We are getting lost in this terminology because we always talk about migrating data, xixi.

@lubosmj lubosmj force-pushed the distinguish-image-nature-1437 branch from 5b3c4f2 to fa1c4c4 Compare March 12, 2024 10:26
@lubosmj
Copy link
Member Author

lubosmj commented Mar 12, 2024

Merging because the failures are unrelated to the PR.

@lubosmj lubosmj merged commit f27f3af into pulp:main Mar 12, 2024
11 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

As a user I want to differentiate between different nature of images
3 participants