Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cactus Fails with “Not Enough Memory” on 12-species Alignment (SLURM, Toil) #1611

Open
cs1890 opened this issue Feb 12, 2025 · 2 comments

Comments

@cs1890
Copy link

cs1890 commented Feb 12, 2025

I successfully ran Progressive Cactus v.2.9.3 on a SLURM-based HPC cluster for an alignment of 5 species using the following command:

-created a tmux session

module load python/3.8.2
source scratch/cs1890/cactus-bin-v2.9.3/venv-cactus-v2.9.3/bin/activate
cd /scratch/cs1890/cactus_data/output

TOIL_SLURM_ARGS="partition=mem --time=3-00:00:00"
cactus jobstore /scratch/cs1890/cactus_data/input/sequenceFile.txt
/scratch/cs1890/cactus_data/output/output.hal
--batchSystem slurm
--batchLogsDir /scratch/cs1890/cactus_logs
--coordinationDir /scratch/cs1890/tmp
--workDir /scratch/cs1890/tmp
--consCores 64
--maxMemory 1.4Ti
--doubleMem true
--maxJobs 100

However, after increasing to 12 species, the job consistently fails due to memory issues.

Not enough memory! User limited to 153931627886 bytes but we only have 135285168640 bytes.

This has been posted before, but I tried everything that was recommended. And all my genomes have been repeat masked.

Other error messages I received:

slurmstepd: error: Detected 1 oom_kill event in StepId=41103254.batch. Some of the step tasks have been OOM Killed. Exit reason: MEM LIMIT

Due to failure we are reducing the remaining try count of job 'CutHeadersJob' kind-CutHeadersJob/instance-bnhuhvlgv1 with ID kind-CutHeadersJob/instance-bnhuhvlg to 1

(I attached my log so you can see the exact messages)

I tried playing around with the flags and adjusting the parameters, but nothing has worked. I tried breaking it into different steps which also didn't work. I also contacted the cluster support staff, and they were unable to come up with a solution.

My scratch spaces where my working directory and coordination directory are located is 2 Terabytes

cactus_run.log

@glennhickey
Copy link
Collaborator

Try a lowering --maxMemory 1.4Ti

@cs1890
Copy link
Author

cs1890 commented Feb 12, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants