Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lzdownload service - download all packages of a given channel #9679

Open
wants to merge 34 commits into
base: master
Choose a base branch
from

Conversation

waterflow80
Copy link
Collaborator

What does this PR change?

The lzdownalod service will be in charge of using the cached packages' metadata to download the actual binaries (or source rpms).

In this PR, we implemented a minimal version of the download all strategy, by downloading all the packages of given channel using their cached metadata.

Usage

pip install -e .
lzdownload --channel channel_label

# you should see something like
00:55:40 Downloading total 423 files from 1 queues.
00:55:40     1/423 : gstreamer-plugins-bad-1.22.0-lp155.3.10.1.x86_64.rpm
00:55:40     2/423 : gstreamer-plugins-bad-1.22.0-lp155.3.14.1.x86_64.rpm
...

And the packages will be downloaded to the specified location in the filesystem:

├── 5f41124267d1c6329289e83f06f8e33bced9862222b5a85f1e0dea92d763463a
│   └── gstreamer-plugins-bad-1.22.0-lp155.3.14.1.src.rpm
├── c7a4d7475745bb2c9b747a5d821e147031aca786e43c19948a5fff51a50df177
│   └── gstreamer-plugins-bad-1.22.0-lp155.3.4.1.src.rpm
├── f3e52634346fb67c6c92a74cd308124693edac848f0521fecdaf671e3ec9c209
│   └── gstreamer-plugins-bad-1.22.0-lp155.3.10.1.src.rpm
├── f690d6767df5e1ae68bb1442739b502d8f5baade733121e8ac98ceb34cfbbb0b

Independence

We have separated the lzdownload from the lzreposync so that each service can run independently from each other. This will help in scaling and in the separation of tasks.
We may consider putting some functions used by both services in a common location.

GUI diff

No difference.

  • DONE

Documentation

  • No documentation needed: only internal and user invisible changes

  • DONE

Test coverage

ℹ️ If a major new functionality is added, it is strongly recommended that tests for the new functionality are added to the Cucumber test suite

  • No tests: Unit tests will be added on the fly.

  • DONE

Links

Issue(s): #
Port(s): # add downstream PR(s), if any

  • DONE

Changelogs

Make sure the changelogs entries you are adding are compliant with https://github.com/uyuni-project/uyuni/wiki/Contributing#changelogs and https://github.com/uyuni-project/uyuni/wiki/Contributing#uyuni-projectuyuni-repository

If you don't need a changelog check, please mark this checkbox:

  • No changelog needed

If you uncheck the checkbox after the PR is created, you will need to re-run changelog_test (see below)

Re-run a test

If you need to re-run a test, please mark the related checkbox, it will be unchecked automatically once it has re-run:

  • Re-run test "changelog_test"
  • Re-run test "backend_unittests_pgsql"
  • Re-run test "java_pgsql_tests"
  • Re-run test "schema_migration_test_pgsql"
  • Re-run test "susemanager_unittests"
  • Re-run test "javascript_lint"
  • Re-run test "spacecmd_unittests"

Before you merge

Check How to branch and merge properly!

agraul and others added 30 commits January 2, 2025 13:39
lzreposync will be a spacewalk-repo-sync replacement written in Python.
It uses a src layout and a pyproject.toml. The target Python version is
3.11, compatibility with older Python versions is explicitly not a goal.
Added the remote_path column that will hold the remote path/
url of a given package.

This information will help locate the package later-on on the
remote repository and download it.
A boolean argument that checks whether we should call the
header.hdr.fullFilelist()

We added this argument to disable the header.hdr.fullFilelist()
function only for the lzreposync service.
The inspect.getargspec() method is deprecated in Python 3
It can be replaced by inspect.getfullargspec()
The import_signatures is a boolean argument that specifies
whether we should execute the _import_signatures() method.

We added this parameter to disable the _import_signatures()
method for the lzreposync service.
Parsing the rpm's Primary.xml packages metadata file using
pulldom xml parser as a memory efficient parsing library.

Note that some attributes in the returned parsed object are
faked, and maybe filled in elsewhere.

The faking of some of the data is done because some
attributes are required by the importer service.
Parsing the rpm's filelists.xml metadata file using
pulldom xml parser as a memory efficient parsing library.

The parser parses the given filelists.xml file (normally in gz
format), and cache the filelist information of each package
in a separate file in the cache directory, using the package's
hash as the filename, with no file extension.
Using both primary_parser and filelists_parser, return the full
packages' metadata, pacakge by package, using lazing parsing.

Note that there some attributes that are faked, because we
can't fetch them now, and they're required by the package
importer later-on.
However, we can fake them more efficiently, using less memory.
Parsed the update-info.xml file and imported the parsed
patches/updates to the database.

We used pretty much the same code from the old Reposync class.
Import the parsed rpm and debian packages to the database in
batche, and associate each pacakge with the corresponding
channel
Parsed the debian Packages metadata file in a lazy way and
yield the metadata of each package separately.
Parsed the debian's Translation file that contains the full
description of packages, grouped by description-md5, and
cache the parsed descriptions in a cache directory.
Using both packages_parser and translation_parser, return the
full packages' metadata, pacakge by package, using lazing
parsing

Also set the debian repository's information in a DebRepo
class
Given the channel label, fetch important repository's
information form the database, and store it in a temporary
object RepoDTO
Added the necessary command line arguments.

Identify the target repositories, prepare the datastructures,
and execute the lazy synchronization of repositories/packages.
Added a new dependency python-gnupg used to verify repo
signature.
Ignored two linting complains about rasing exceptions floowing the
approach in the old reposync.

We can enhance the code instead of doing this though.
This commit completes almost all the logic and use cases
of the new lazy reposync.

**Note** that this commit will be restructured and possibly
divided into smaller and more convenient commits.
This commit is for review purposes.
Seemingly this error happened because we reached the maximum number
of unclosed db connections. And thought that this might be due to
the fact that the close() method in the Database class was not
implemented, and the rhnSQL.closeDB() was not closing any connection.

However, we're still hesitating about whether this is the root cause
of the problem, because the old(current) reposync is was using it
without any error.
This is the latest and almost the final version of the
lzreposync service. (gpg sig check not complete)

It contains pretty much all the necessary tests,
including the ones for updates/patches import.

Some of the remaining 'todos' are either for code
enhancements or some unclear concepts that
will be discussed with the team.

Of course, this commit will be split into smaller
ones later after rebase.
- Removed some todos.
- Changed some sql queries with equivalent ones using
JOIN...ON.
- Some other minor cleanup
Optimized some code by changing classes and methods
in some logics with free functions.

Consolidated the debian repo parsing.
Completed the gpg signature check for rpm repositories,
mainly for the repomd.xml file.

This is done by downloading the signature file from the
remote rpm repo, and executing 'gpg verify' to verify the
repomd.xml file against its signature using the already
added gpg keys on the filesystem.

So, if you haven't already added the required gpg keyring
on your system, you'll not be able to verify the repo.

You should ideally run this version directly on the uyuni-
server, because the gpg keyring will probably be present
there.
makedirs() in uyuni.common.fileutils now accepts relative paths that
consist of only a directory name or paths with trailing slashes.
Completed the gpg signature check for debian repositories.

If you haven't already added the required gpg keyring
on your system, you'll not be able to verify the repo,
and you'll normally get a GeneralRepoException.

You should ideally run this version directly on the uyuni-
server, because the gpg keyring will probably be present
there.
Mocked the SPACEWALK_GPG_HOMEDIR value to `~/.gnupg/`, which is the
default directory for gpg, in order to execute the gpg tests outside
the uyuni-server
Made the lzreposync service continuously loop over the existing
channels and synchronize the corresponding repositories.

Added a status column in the rhnchannel table to indicate the
sync status of a given channel.

Also added some helper arguments to the service that allows us to
perform test operations, like creating a test channel and
associating repositories to it, etc
Implemented a first, minimal, working version of the download
service, using the download all strategy, meaning that for a
given channel, we download all the packages that are linked to
that channel.

The download directory is hard coded, but it should be further
discussed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants