Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Environment YAML for Compute doesn't work out of the box #84

Closed
B0r1sD opened this issue Jan 16, 2025 · 3 comments
Closed

Environment YAML for Compute doesn't work out of the box #84

B0r1sD opened this issue Jan 16, 2025 · 3 comments

Comments

@B0r1sD
Copy link

B0r1sD commented Jan 16, 2025

Hi Harpy maintainers

Data Core checks every 3 months if the usage example to use Harpy on the Compute Cluster is working as intended. As you have excellent documentation on this, the strategy we discussed together was that we basically referred to your docs (this part) so they have the latest changes.

After rerunning the example with release 0.0.2, it didn't work out of the box. The following was seen and done to make it work:

  1. The environment_vib_compute.yml was pruned a bit too much I think, as I had to load the jax/0.3.25-foss-2022a-CUDA-11.7.0 module in the Jupyter session + execute python -m pip install textalloc spatialdata-plot in the Harpy conda env to be able to load the Python libraries successfully.

    The error after running cell one was the following (for traceback):
    image

  2. The versions of the dependencies for the Conda env are not locked. This failed for a repo clone of the older version,so I can share the versions that were used previously that can help with this (which I tested):

name: harpy
channels:
 - conda-forge
dependencies:
 - python=3.10.8
 - opencv=4.10.0
 - pip=24.3.1
 - rasterio=1.4.1
 - pip:
     - basicpy==1.0.0
     - torch==2.4.1
     - cellpose==3.0.11
     - jax==0.4.6
     - jaxlib==0.4.6
     - ipython==8.28.0
     - ipykernel==6.29.5
  1. The plots on Compute somehow never get rendered if the magic command %matplotlib inline isn't ran before any plots are executed. This might be Compute specific, but as it's not clear why this is necessary, I'll open a pull request to add it to the notebook that is being referred in the docs (Harpy_feature_calculation.ipynb).

We have a PR ready to merge to make the usage example even more future proof, relying entirely on your documentation. We can keep you posted on the usability of the part were users could run Harpy on Compute, the same way I did here.

Let me know your thoughts!
Boris D

@berombau
Copy link
Member

berombau commented Jan 17, 2025

Great, thanks for checking up on it!

  1. I think it's solved if you install harpy with the optional [testing,clustering] dependencies. I would eitherway add jax as a Conda dependency if it's missing, not a module as that is cluster specific sometimes. It's important to keep the default package installation free from accelerators like jax or CUDA packages to make it easily installable on all platforms. The conda environment used for the Spatial Catalyst training is also interesting https://github.com/vibspatial/targeted_transcriptomics_training/blob/main/environment_vib_compute.yml. Since there is a PyPi package now called harpy-analysis, the conda file can look like this:
channels:
  - conda-forge
dependencies:
  - python=3.10.8
  - opencv=4.10.0
  - rasterio=1.4.1
  - pip=24.3.1   
  - pip:
      - harpy-analysis[testing,clustering]=0.0.2
  1. I don't like maintaining also conda locked files, as the accelerator packages in there make them very platform-specific and we're constantly updating or switching dependencies still. But for the fixed platform in the VIB Compute case, that's interesting, so please add a PR with an updated environment_vib_compute.yml that works with well with your docs and the VIB Compute.

  2. I think that's maybe specific for your documentation site and how render it. That magic command is not needed anymore in recent versions of Jupyter Lab. It's more important to only show the code that is needed for a user when running the code themselves, so be sure to check if that's the case. If just rendering is the issue, maybe there is an extra parameter there you need to set or use an environment variable e.g. MPLBACKEND=inline.

@berombau
Copy link
Member

I added support for frozen dependencies in #87, but it requires using uv or pixi instead of pip. So something like uv sync --frozen --extra clustering (Python only) would be able to install an exact version. You can manage the Python installation with uv venv, so you don't need conda. The conda file above is still valid though and work for more users. But if you really want to pin everything exactly, I would work with uv and it's --frozen parameter.

@ArneDefauw
Copy link
Collaborator

I've updated the documentation at https://github.com/saeyslab/harpy/blob/main/docs/tutorials/hpc/vib_compute.md, only change necessary was changing

pip install git+ssh://git@github.com/saeyslab/harpy.git

to

pip install 'harpy-analysis[clustering] @ git+https://github.com/saeyslab/harpy.git'

and then I was able to run the notebook on compute.

Jax is an optional dependency, and not necessary for the example notebook.

I would prefer not to include %matplotlib inline in the notebook. Maybe this can temporary be added to data core documentation, because it is probably related to jupyter lab version, as Benjamin mentioned.

I will close this issue, feel free to open it if there are still issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants