Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lzdownload service - download all packages of a given channel #9679

Open
wants to merge 34 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
9f01bc6
Create lzreposync subdirectory
agraul May 23, 2024
bb6f7c8
Correct error message
waterflow80 Aug 4, 2024
98d796b
Add remote_path column
waterflow80 Aug 4, 2024
cc52fc1
Add expand_full_filelist parameter
waterflow80 Aug 4, 2024
7c6b531
Update deprecated method
waterflow80 Aug 4, 2024
48f8180
Add import_signatures parameter
waterflow80 Aug 4, 2024
be993a4
Implement Primary.xml file parser
waterflow80 Aug 4, 2024
89c6669
Implement filelists.xml file parser
waterflow80 Aug 4, 2024
500fd54
Implement full rpm metadata parsing
waterflow80 Aug 4, 2024
d9a3d16
Parse and import rpm patches/updates
waterflow80 Aug 4, 2024
9a1b04d
Import parsed rpm & deb packages to db
waterflow80 Aug 4, 2024
b46ccfe
Implement the deb Packages md file
waterflow80 Aug 4, 2024
4d49b37
Implement the Translation file parser
waterflow80 Aug 4, 2024
267f13a
Implement full deb metadata parsing
waterflow80 Aug 4, 2024
e51aada
Fetch repository information from the db
waterflow80 Aug 4, 2024
ccb67f9
Complete lzreposync service entry point
waterflow80 Aug 4, 2024
62c7130
Add new dependency
waterflow80 Aug 4, 2024
b9d0f24
Add unit tests for rpm metadata parsers
waterflow80 Aug 4, 2024
326a4d2
Delete no longer used files
waterflow80 Aug 4, 2024
659ca77
Remove already defined function
waterflow80 Aug 4, 2024
0d967cf
Fix linting complain
waterflow80 Aug 4, 2024
3892029
Complete code for lzreposync version 0.1
waterflow80 Aug 15, 2024
a233acb
Complete tests for lzreposync service
waterflow80 Aug 15, 2024
89b21c5
Fix error: too many clients already
waterflow80 Aug 15, 2024
4506bed
Complete latest version
waterflow80 Aug 17, 2024
ec0e318
Optimize code and do some cleanup
waterflow80 Aug 26, 2024
8b2d525
Optimize and consolidate code
waterflow80 Aug 29, 2024
f172342
Fix cachedir path formatting issue
waterflow80 Aug 29, 2024
4db663d
Complete gpg signature check for rpm
waterflow80 Sep 9, 2024
4dedf5f
Refactor: Allow more input variants in makedirs()
agraul Sep 9, 2024
487727d
Complete gpg signature check for debian
waterflow80 Sep 11, 2024
183638f
Mock spacewak gpg home directory
waterflow80 Oct 24, 2024
cbb73da
Add loop functionality to service
waterflow80 Dec 7, 2024
f504fca
Implement the download all strategy
waterflow80 Jan 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,6 @@ yarn-error.log
# This should never be used since we use Yarn, but avoid anyone accidentally committing it
package-lock.json

rel-eng/custom/__pycache__

# Intellij IDEA
.idea/
*.iml
Expand All @@ -89,6 +87,12 @@ python/.vscode
# Python
venv/
.venv/
*.egg-info/
*.egg
wheels/
__pycache__/
build/
.pytest_cache/

# Schema

Expand Down
14 changes: 14 additions & 0 deletions python/lzdownload/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
[build-system]
requires = ["setuptools", "setuptools-scm"]
build-backend = "setuptools.build_meta"

[project]
name = "lzdownload"
version = "0.1"
dependencies = [
"psycopg2-binary==2.9.10",
"urlgrabber"
]

[project.scripts]
lzdownload = "lzdownload:main"
29 changes: 29 additions & 0 deletions python/lzdownload/src/lzdownload/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
"""
Entry point for the download service of the lazy reposync
"""

import argparse
import logging

from lzdownload import lzdownloader


def main():
parser = argparse.ArgumentParser(
description="Lazy reposync download service",
conflict_handler="resolve",
formatter_class=argparse.RawDescriptionHelpFormatter,
)
parser.add_argument(
"-c",
"--channel",
help="The channel label of which you want to download packages",
dest="channel",
type=str,
required=True,
)
args = parser.parse_args()

# pylint: disable-next=logging-fstring-interpolation
logging.info(f"Downloading packages for channel {args.channel}")
lzdownloader.download_all(args.channel)
75 changes: 75 additions & 0 deletions python/lzdownload/src/lzdownload/db_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
"""
Database helper functions
"""

from lzdownload.repo_dto import RepoDTO
from spacewalk.server import rhnSQL


class NoSourceFoundForChannel(Exception):
"""Raised when no source(repository) was found"""

def __init__(self, channel_label):
self.msg = f"No resource found for channel {channel_label}"
super().__init__(self.msg)


def get_repositories_by_channel_label(channel_label):
"""
Fetch repositories information of a given channel form the database, and return a list of
RepoDTO objects
"""
rhnSQL.initDB()
h = rhnSQL.prepare(
"""
SELECT c.label as channel_label, c_ark.name as channel_arch, s.id, s.source_url, s.metadata_signed, s.label as repo_label, cst.label as repo_type_label
FROM rhnChannel c
INNER JOIN rhnChannelArch c_ark ON c.channel_arch_id = c_ark.id
INNER JOIN rhnChannelContentSource cs ON c.id = cs.channel_id
INNER JOIN rhnContentSource s ON cs.source_id = s.id
INNER JOIN rhnContentSourceType cst ON s.type_id = cst.id
WHERE c.label = :channel_label
"""
)
h.execute(channel_label=channel_label)
sources = h.fetchall_dict()
if not sources:
raise NoSourceFoundForChannel(channel_label)
repositories = map(
lambda source: RepoDTO(
channel_label=source["channel_label"],
repo_id=source["id"],
channel_arch=source["channel_arch"],
repo_label=source["repo_label"],
repo_type=source["repo_type_label"],
source_url=source["source_url"],
metadata_signed=source["metadata_signed"],
),
sources,
)
rhnSQL.closeDB()

return list(repositories)


def get_all_packages_metadata_from_channel(channel) -> list[dict]:
"""
return all the packages' metadata that are linked to the given channel
:channel: channel label
"""
rhnSQL.initDB()
h = rhnSQL.prepare(
"""
SELECT p.remote_path, cks.checksum, ckstype.label as checksum_type, rpm.name source_rpm from rhnpackage p
INNER JOIN rhnChannelPackage chpkg on chpkg.package_id = p.id
INNER JOIN rhnChannel ch on ch.id = chpkg.channel_id
INNER JOIN rhnChecksum cks on cks.id = p.checksum_id
INNER JOIN rhnChecksumType ckstype on ckstype.id = cks.checksum_type_id
INNER JOIN rhnSourceRpm rpm ON rpm.id = p.source_rpm_id
WHERE ch.label = :channel_label
"""
)
h.execute(channel_label=channel)
packages_metadata = h.fetchall_dict()
rhnSQL.closeDB()
return packages_metadata
53 changes: 53 additions & 0 deletions python/lzdownload/src/lzdownload/lzdownloader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
"""
Download functions for the lazy reposync
"""

import os.path

from lzdownload import db_utils
from spacewalk.satellite_tools.download import ThreadedDownloader, TextLogger

DOWNLOAD_DIR = "/tmp/spacewalk/packages/1/stage" # TODO: to be reviewed


def prepare_download_params_from_package(package: dict):
params = {}
url = "/".join(package["remote_path"].split("/")[:-2]) + "/"
relative_path = "/".join(package["remote_path"].split("/")[-2:])
params["urls"] = [url]
params["relative_path"] = relative_path
params["authtoken"] = None
params["target_file"] = os.path.join(
DOWNLOAD_DIR, package["checksum"], package["source_rpm"]
)
params["ssl_ca_cert"] = None
params["ssl_client_cert"] = None
params["ssl_client_key"] = None
params["checksum_type"] = package["checksum_type"]
params["checksum"] = package["checksum"]
params["bytes_range"] = None
params["http_headers"] = tuple()
params["timeout"] = 300
params["minrate"] = 1000
params["proxies"] = {}
params["urlgrabber_logspec"] = None

return params


def download_all(channel):
"""
Downloading all packages of the given channel
:channel: channel label
"""
packages = db_utils.get_all_packages_metadata_from_channel(channel)
downloader = ThreadedDownloader()
to_download_count = 0
for package in packages:
params = prepare_download_params_from_package(package)
downloader.add(params)
to_download_count += 1

logger = TextLogger(None, to_download_count)
downloader.set_log_obj(logger)
downloader.run()
35 changes: 35 additions & 0 deletions python/lzdownload/src/lzdownload/repo_dto.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# pylint: disable=missing-module-docstring


# pylint: disable-next=missing-class-docstring
class RepoDTO:
def __init__(
self,
channel_label,
repo_id,
channel_arch,
repo_label,
repo_type,
source_url,
metadata_signed="N",
):
self.repo_id = repo_id
self.channel_arch = channel_arch
self.repo_label = repo_label
self.repo_type = repo_type
self.source_url = source_url
self.metadata_singed = metadata_signed
self.channel_label = channel_label

def __str__(self):
return (
"Repo("
+ str(self.repo_id)
+ ", "
+ self.repo_label
+ ", "
+ self.repo_type
+ ", "
+ self.source_url
+ ")"
)
1 change: 1 addition & 0 deletions python/lzreposync/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.cache/
73 changes: 73 additions & 0 deletions python/lzreposync/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# lzreposync

TODO: project description

## How to work in this project

1. Create a new virtual environment
```sh
$ python3.11 -m venv .venv
$ . .venv/bin/activate
```
2. Install `lzreposync` in *editable* mode
``` sh
$ pip install -e .
```
3. Install other required dependencies (required by spacewalk and other modules)
```sh
pip install rpm
pip install salt
```
4. Add a path configuration file (**Important!**)
```
echo "absolute/path/to/uyuni/python/" > .venv/lib64/python3.11/site-packages/uyuni_python_paths.pth
# This is a temporary solution that will allow the lzreposync service to recognize/locate other modules like spacewalk, etc...
```
5. Add configuration environment variables
```sh
vim /etc/rhn/rhn.conf: # create directory/file if not exists

DB_BACKEND=postgresql
DB_USER=spacewalk
DB_PASSWORD=spacewalk
DB_NAME=susemanager
DB_HOST=127.0.0.1 # might not work with 'localhost'
DB_PORT=5432
PRODUCT_NAME=any
TRACEBACK_MAIL=any
DB_SSL_ENABLED=
DB_SSLROOTCERT=any
DEBUG=1
ENABLE_NVREA=1
MOUNT_POINT=/tmp
SYNC_SOURCE_PACKAGES=0

# Some values might not be the right ones
```
6. Try `lzreposync`
``` sh
$ lzreposync -u https://download.opensuse.org/update/leap/15.5/oss/ --type yum [--no-errata]
$ lzreposync --type deb --url 'https://ppa.launchpadcontent.net/longsleep/golang-backports/ubuntu?uyuni_suite=jammy&uyuni_component=main&uyuni_arch=amd64'
```

### How do I ...?

- add new a dependency? Add the *pypi* name to the `dependencies` list in the `[project]` section in `pyproject.toml`.

## Tests
We're using a special postgres db docker container that contains all the `susemanager` database schema built and ready.

To pull and start the database, you should:
```sh
cd /uyuni/java
sudo make -f Makefile.docker EXECUTOR=podman dockerrun_pg
# Wait a few seconds until the db is fully initialized
```

After installing with `pip install .` (or `pip install -e .`), `python3.11 -m pytest pytest tests/` runs all tests. Sometimes a `rehash` is required to ensure `.venv/bin/pytest` is used by your shell.

You can connect to the test database by:
```sh
psql -h localhost -d susemanager -U spacewalk # password: spacewalk
```

22 changes: 22 additions & 0 deletions python/lzreposync/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
[build-system]
requires = ["setuptools", "setuptools-scm"]
build-backend = "setuptools.build_meta"

[project]
name = "lzreposync"
version = "0.1"
dependencies = [
"memory_profiler",
"pytest",
"requests",
"python-gnupg",
"pycurl",
"pyopenssl",
"psycopg2-binary",
"urlgrabber",
"python-debian",
"python-dateutil"
]

[project.scripts]
lzreposync = "lzreposync:main"
Loading
Loading