Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install python apps in venv #1207

Closed
jooola opened this issue May 27, 2021 · 17 comments
Closed

Install python apps in venv #1207

jooola opened this issue May 27, 2021 · 17 comments

Comments

@jooola
Copy link
Contributor

jooola commented May 27, 2021

I see that most of the apps if not all install there requirements system wide.

In order to decouple and isolate more each component of the project maybe the apps should be installed in a venv.

This would be the first step to a full isolation of some app such as libretime analyzer. Full isolation and testing.

@paddatrapper
Copy link
Contributor

Is there a way of using a venv while providing system-wide scripts in $PATH? For a production install I see more use in breaking out the Python apps into separate containers and using some container orchestration instead. Though decoupling things is tricky, thanks to the legacy code we're working with.

@jooola
Copy link
Contributor Author

jooola commented May 31, 2021

I've been thinking about this, and it overlaps with some other topics/tickets I opened, such as how to install libretime (ansible/deb package), how to build docker image for each component, how to manage dependencies and supported systems for libretime.

EDIT: I forgot to answer you question...
Yes, you can add system wide python package to the virtual environment (I don't know if it is a good practice though), but if you are talking about other binaries, you they should be targeted in PATH. If I understand it correctly, the virtual environment is only change python stuff.

Here is what I would see as big picture:

Split libretime in multiple component

This is mostly the case, but rarely when it comes to packaging, as the libretime package is a huge bundle with everything.

So far I've been working with the analyzer and it is designed to work as standalone, and the other python apps seem to do the same (or we can change in that direction). I would leverage this and manage each component individually, for packaging, testing, docker images.

Packaging

With the above idea, libretime could be split in multiple packages :

  • libretime-analyzer is a standalone package with the minimal dependencies for this component to work.
  • libretime-playout is a standalone package with the minimal dependencies for this component to work.
  • libretime-core is the php application with all main dependencies (rabbitmq, postgresql)
  • libretime-api is the future api that will replace core at some point
  • libretime is a virtual package that depends on all of the above, and some specific tweaks.

And this schema would also apply to docker images, so we could deploy libretime using a docker-compose file and this would work really smoothly (Wouw this would be awesome).

Testing

Since every component are isolated, it is way easier to test each running parts. For instance one can build the deb, install it in CI, run some e2e test on it, if everything work, ship it, thank contributors, and move on the next improvements / bugfix.

Currently it looks like the CI install everything and test everything as a whole, and the testing scripts are hard to maintain.

Python dependencies

With all of the above, we can push it on step further for isolation and install each component in its own virtual environment.
I am still not convinced by @paddatrapper 's preference for system wide packages over pip packages as they add such a burden to handle version across distributions.
We don't get the latest fixes or features for some libraries, prevent use for using the tons of libraries already on pip.
In addition we already install some package from pip, so we are mixing things.

I think using virtual venv for dependencies will ease a lot of process in this project. By installing each component dependencies from pip inside a venv, we work like containers, isolated from the rest, with up to date libraries.

I understand the concern of requiring a stable, if not ultra stable setup for a radio broadcast service. And shipping upgrades so frequently might scare a lot, but I would argue that we can also provide some beta/testing channel for packages, and recommend having a staging environment for radio station to test things out before deploying to production.

Wrapping everything together

In order to wrap everything together, they are many tools, but here is what I would recommend:
Handling the virtual venv for package installation is provided by this awesome project https://github.com/spotify/dh-virtualenv.
We can build each packages using fpm, or sbuild.

So this would be my idea, on where libretime should head to, I wonder what your thoughts are, if such plans could be marked as a future project / milestone. I also don't know if such enhancement could target libretime 3, or if we should already talk about libretime 4 ?

@paddatrapper
Copy link
Contributor

Yes, you can add system wide python package to the virtual environment (I don't know if it is a good practice though), but if you are talking about other binaries, you they should be targeted in PATH. If I understand it correctly, the virtual environment is only change python stuff.

You misunderstand what I meant - We provide binaries in the python packages. These need to be accessible in the system $PATH. I'm not talking about accessing site-packages within the virtualenv.

@jooola
Copy link
Contributor Author

jooola commented May 31, 2021

Yes, you can add system wide python package to the virtual environment (I don't know if it is a good practice though), but if you are talking about other binaries, you they should be targeted in PATH. If I understand it correctly, the virtual environment is only change python stuff.

You misunderstand what I meant - We provide binaries in the python packages. These need to be accessible in the system $PATH. I'm not talking about accessing site-packages within the virtualenv.

Ok, my bad, then I am not sure I fully understand your question, do you have an example ?

@paddatrapper
Copy link
Contributor

For example, the import functionality @Robbt is building in https://github.com/LibreTime/libretime/pull/514/files#diff-cea22264cd86756dbd865ac7d3405bdbc942c7885e44cf7d5d02a401be5b4613R16 is a script that a user can execute from the command line. This should be in the user's $PATH without them needing to activate a virtualenv before using it. This is because many of our users are not very familiar with the command line and Linux in general, so the less required of them the better

@jooola
Copy link
Contributor Author

jooola commented May 31, 2021

Aah ok, gotcha, yes this is doable, but it require some extra tweaks.

You can activate a virtualvenv from within a python script, so I would load the virtualvenv at the beginning for the main file. And simply add a symlink from /usr/local/bin/<my-script> to the main file.

@paddatrapper
Copy link
Contributor

The other issue with virtualenvs is that it would make it impossible to ever get into Debian and Ubuntu archives. They are not the way distro packages happen and to make the packages policy compliant, I would need to rip all of it out anyway

@paddatrapper
Copy link
Contributor

You can activate a virtualvenv from within a python script, so I would load the virtualvenv at the beginning for the main file. And simply add a symlink from /usr/local/bin/ to the main file.

This sounds very hacky...

@jooola
Copy link
Contributor Author

jooola commented May 31, 2021

This activation script is shipped with virtualenv, so it doesn't seem so hacky to me, I'd suggest you try it out, or I could provide poc if I can change your mind:

jo@jofix: setup-pre-commit ⚑1 ~/git/github.com/jooola/libretime/venv $ ll bin 
total 56
-rw-r--r-- 1 jo jo 2391 May 31 15:16 activate
-rw-r--r-- 1 jo jo 1453 May 31 15:16 activate.csh
-rw-r--r-- 1 jo jo 3084 May 31 15:16 activate.fish
-rw-r--r-- 1 jo jo 1751 May 31 15:16 activate.ps1
-rw-r--r-- 1 jo jo 1199 May 31 15:16 activate_this.py
-rw-r--r-- 1 jo jo 1175 May 31 15:16 activate.xsh
-rwxr-xr-x 1 jo jo  261 May 31 15:16 pip
-rwxr-xr-x 1 jo jo  261 May 31 15:16 pip3
-rwxr-xr-x 1 jo jo  261 May 31 15:16 pip-3.7
-rwxr-xr-x 1 jo jo  261 May 31 15:16 pip3.7
lrwxrwxrwx 1 jo jo   16 May 31 15:16 python -> /usr/bin/python3
lrwxrwxrwx 1 jo jo    6 May 31 15:16 python3 -> python
lrwxrwxrwx 1 jo jo    6 May 31 15:16 python3.7 -> python
-rwxr-xr-x 1 jo jo  248 May 31 15:16 wheel
-rwxr-xr-x 1 jo jo  248 May 31 15:16 wheel3
-rwxr-xr-x 1 jo jo  248 May 31 15:16 wheel-3.7
-rwxr-xr-x 1 jo jo  248 May 31 15:16 wheel3.7
jo@jofix: setup-pre-commit ⚑1 ~/git/github.com/jooola/libretime/venv $ cat bin/activate_this.py
# -*- coding: utf-8 -*-
"""Activate virtualenv for current interpreter:

Use exec(open(this_file).read(), {'__file__': this_file}).

This can be used when you must use an existing Python interpreter, not the virtualenv bin/python.
"""
import os
import site
import sys

try:
    abs_file = os.path.abspath(__file__)
except NameError:
    raise AssertionError("You must use exec(open(this_file).read(), {'__file__': this_file}))")

bin_dir = os.path.dirname(abs_file)
base = bin_dir[: -len("bin") - 1]  # strip away the bin part from the __file__, plus the path separator

# prepend bin to PATH (this file is inside the bin directory)
os.environ["PATH"] = os.pathsep.join([bin_dir] + os.environ.get("PATH", "").split(os.pathsep))
os.environ["VIRTUAL_ENV"] = base  # virtual env is right above bin directory

# add the virtual environments libraries to the host python import mechanism
prev_length = len(sys.path)
for lib in "../lib/python3.7/site-packages".split(os.pathsep):
    path = os.path.realpath(os.path.join(bin_dir, lib))
    site.addsitedir(path.decode("utf-8") if "" else path)
sys.path[:] = sys.path[prev_length:] + sys.path[0:prev_length]

sys.real_prefix = sys.prefix
sys.prefix = base
jo@jofix: setup-pre-commit ⚑1 ~/git/github.com/jooola/libretime/venv $ 

The other issue with virtualenvs is that it would make it impossible to ever get into Debian and Ubuntu archives. They are not the way distro packages happen and to make the packages policy compliant, I would need to rip all of it out anyway

With all the constraints this implies, I don't see the benefits of distributing libretime in the debian main package repository. I think having access to stable software such as libreoffice, ffmpeg, apache2, or anything similar is precious in the debian ecosystem. But shipping exotic software's to the main repo seem not worth it, since a self hosted/shared repository would be as efficient to use.

I mean many large software manage there own repository, and it works well, users only need to add an extra entry to the source.list !

That said, if everything is available in the main repo, and we can push this software to the main repo, let's do it. But I would only do this if it doesn't restrict us too much.
Additionally using pip packages allow use to package for other distribution (centos, etc.) without too much headache.

Or one could use some bundler to ship a single file with it's venv, but I really don't know how this works : https://pex.readthedocs.io/en/latest/

@jooola
Copy link
Contributor Author

jooola commented May 31, 2021

It integrate well with setup.py https://python-packaging.readthedocs.io/en/latest/command-line-scripts.html#command-line-scripts, and maybe there is no need for a symlink at all.

@paddatrapper
Copy link
Contributor

Disclaimer: I am a Debian Developer, so I strongly prefer distribution package repositories to language-specific ones

I don't have an issue with virtualenv as a concept, I think that it is a great way to install application specific dependencies for development. It ensures that packages do not conflict with other things on the system and that specific version requirements can be used without causing issues on the system. For development, this is great because you can test specific versions and customize your development environment as you need to replicate bugs, etc. These advantages become defunct on a dedicated server running in production.

On a production server, you achieve separation using containers. There is no need to separate the dependencies of the service and the applications on the server. There are plenty of containerization platforms out there (VMs, docker, podman, LXC, LXD, etc) all of which provide separation between instances, thus no conflict between the system and the application dependencies.

Specific dependency versioning is supported by either targeting versions of dependencies in the distro you are running in the container or through language specific repositories. Both methods allow you to target a fixed version of a dependency, however, the distribution dependencies also allow you to get security updates without needing to bump the version you are supporting. This is good in a broadcast/production environment where downtime is expensive. Further, the distro dependencies change on a regular schedule. They do not require tracking all the upstream projects that the project depend on. Using language-specific repositories either require us updating day-0 every time a dependency has a new release and our users doing the same to ensure that known security issues are patched or leaving systems open with publicly known security vulnerabilities without giving users an option to protect themselves. Updating the same day every time one of our many dependencies release a new version is not feasible. We do not have the work-power to support it. It also is extremely unstable, potentially causing issues for our users.

Distribution packages give us security support that is done by the distribution maintainers, guaranteed support periods and known update times. This allows us to focus our limit work-power on implementing new features and fixing bugs instead of constantly updating dependencies to new versions.

The argument that you can leave the language-specific repositories to install the latest version of dependencies also does not stand up to scrutiny. Even just looking at the pypi dependencies we currently use, you see issues where working installations silently break because dependencies released new versions that are incompatible with LibreTime. The solution most often is to pin the dependency until we can fix the incompatibility. If there are security fixes included in the update, we miss out on them. Dependencies also require manual updating when their versions are not pinned - pypi for example will not update a dependency if it is already installed, you have to manually tell it to --upgrade. This doesn't work when you are handling many different virtualenvs, making a production virtualenv-based solution even more infeasible.

@jooola
Copy link
Contributor Author

jooola commented May 31, 2021

Haha I knew before I wrote all my "don't ship huge project in debian main repo" that you are a Debian developer. Somehow I have a deja vu with our discussion.

I am a bit confused, maybe I don't understand it correctly, is it allowed for a Debian package (in the main repo) to install dependencies from pip ?

@jooola
Copy link
Contributor Author

jooola commented May 31, 2021

I found this https://askubuntu.com/questions/327543/how-can-a-debian-package-install-python-modules-from-pypi and I am not sure if this is still accurate.

@paddatrapper
Copy link
Contributor

I am a bit confused, maybe I don't understand it correctly, is it allowed for a Debian package (in the main repo) to install dependencies from pip ?

No, the package must only depend on thing in the Debian repos. So if a dependency isn't in Debian, we would need to package it for Debian before we could get LT in. Fortunately most of the Python dependencies are in the repos. Our current PPA only installs rgain3 from pip. We bundle the Javascript dependencies, which isn't great, but not as bad as downloading them in the maintainer scripts

@jooola
Copy link
Contributor Author

jooola commented Jun 1, 2021

Regarding the initial issue, I do agree with you. The virtualenv is not necessarily the best idea, and we should rely on distribution package if available. And with a long release cycle using packages from pip (any language specific repositories) is a bad idea.

From what I understand, the discussion is now about "should Libretime be shipped to Debian main repository".

I don't know if this has been discussed already, if a choice has been made ?
WHEN do you plan to ship libretime in debian main ?
Should we open a new ticket for this discussion ?

Because this raises some concerns:

The debian release cycle is quiet long, and this will constrain the project to follow it. Nothing wrong with this, but libretime should not constrain it self, while migrating from the legacy app to django. I could see this happen in the future when libretime replaced the legacy app, and the whole project has been modernized.

It's a matter of when, for example, we want to ship a django 3 api, but django hasn't been accepted in debian testing yet, so until all our dependencies aren't shipped in debian main, it will take years to reach stable and oldstable. Until then we have time to package libretime ourself with our own infrastructure ?

For me it is incoherent, to refuse packages from pip, and want to ship a bundle of javascript or composer depenencies. These dependencies needs to be upgraded as well.

To reach stability and quality, we should be testing and automate processes, not freezing every dependency we have. This ease development, as one can add features, upgrade, anything with some confidence about not breaking something else.

What is the real benefit of being in the main repository ? Say our package is complaint with the policy, we try hard to only use distribution dependencies, our release pipeline are as strict as debian's one, but we keep our freedom to manage this repository ourselfs and rolling updates.

Ideally I don't think we disagree. But I don't want to hit the debian main repo wall, for every new contributions and possibility to move libretime to a next stage.

For now, I 'll try to use tools already shipped in Debian main, but I will not restrict myself from using other tools, and I hope when this packaging thing happen we will find a way to make it work.

@jooola
Copy link
Contributor Author

jooola commented Jun 1, 2021

I am closing this issue as I have my answer. But I would be happy to keep talking about this Debian packaging and what it involves.
Here, in some other issue or on the community channels.

@jooola jooola closed this as completed Jun 1, 2021
@paddatrapper
Copy link
Contributor

Debian archive discussion - libretime/libretime-debian-packaging#3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants