-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: "An output file is marked as pipe, but consuming jobs are part of conflicting groups." #38
Comments
I created a new conda environment with the recommended snakemake version, and now the error is no longer there. It seems that Snakemake 6.6.0 is not backwards compatible with Snakemake 5.18.0? Thanks! |
Hi @Rubbert, Thanks for reporting this issue and for all of the helpful information. Yes, I think there might be a few problems with the pipeline that prevent it from being used with more recent versions of Snakemake, but support for v6.0 is definitely on the list! I'd also like to improve the test dataset; I want it to be smaller and faster to test VarCA with. And yet another issue that I've been having is trying to figure out the best way to distribute the test dataset. Currently, I distribute it as an asset as part of every release. But this requires that I reupload the file after every release of VarCA, and I can sometimes forget to upload it, as you've noticed with release v0.3.1 (which I have since fixed! - thanks for letting me know). The solution I'm currently leaning towards is hosting with Google Drive here, since our institution has storage there for free, but if I do that, I don't think users will be able to download it using Anyway, here is where all of those things fall on my current list of priorities, in case you're interested. I'm tracking support for Snakemake v6.0 in #19. |
Hi @aryarm , I was able to run part of the pipeline on our cluster. However, it looks like Snakemake 5.18.0 does not play well with the slurm cluster profiles (the --profile option) for all jobs. It basically will not run "normalize_vcf", and throws different errors for two different profile definitions/yaml's that worked with Snakemake 6.x.x. Do you happen to know what causes the error I mentioned above about the "conflicting groups", and is there an easy hack I can implement to see if the it will run with Snakemake 6? Update: I tested the pipeline with different Snakemake versions. I still can't get it to run on our cluster, but Snakemake 5.27.4 shows the "An output file is marked as pipe, but consuming jobs are part of conflicting groups." error, and Version 5.26.1 is still OK. So it does not appear to be a Snakemake 6.x.x problem, but already a change between 5.27.4 and 5.26.1. Thanks! Cheers, Rob |
Hi @Rubbert , Apologies for all of the trouble that this has been causing you! I'm not entirely sure why Snakemake has different behavior with You might want try that, as well. I think it substantially simplifies the DAG resolution because it allows Snakemake to group fewer jobs together. The downside is that it will probably make execution of the pipeline a bit slower because it increases file IO. I'm hoping to get the upcoming release out soon. It's just proving to be a bit large - there will be a lot of updates! |
Hi @Rubbert, I haven't forgotten about this! I'm just making a note here for myself later: snakemake/snakemake#975 seems to indicate that piped output no longer works for some versions of 6.x 😞 If that's the case, then I'll probably just convert all the |
The original problem I posted is still present in version 6.8. When you say convert pipe to temp do you mean create concrete files instead of streaming between steps? That won't work for us since using pipe is an important optimization due to the size of our data files. |
Oh, boy. I'm sorry to hear that :(
Yes, that was what I was proposing. If it doesn't work for you, then I'm fresh out of ideas. What sort of issues did you have when you tried converting the |
@Rubbert, once you have a chance to try the solution I recommended, can you let me know here? It will save me some time when I go to update VarCA to the newest version of Snakemake. |
I'm only working in development mode now so the data files are small. I haven't actually tried it with our actual data files which can be hundreds of GB in size. I did some performance testing last year and found that there was a 3x performance improvement when using piped output instead of writing intermediate files. It's not really a question of disk space. |
hmm... yeah, I can imagine that we'll see similar performance differences if we convert the It would be really nice if someone could resolve the Snakemake issue.
You might want to briefly explore zipping, anyway? Even small amounts of compression can significantly reduce file IO and my understanding is that gzipping can be relatively fast. |
Based on experience I doubt that would help. There are many steps in the pipeline and hundreds of GB of data would be redundantly read, written (and now compressed) to and from disk. Compression is not going to solve this problem. |
Hi @aryarm ,
I am running your pipeline starting with called peaks and a .BAM file. However, when doing a dry-run, I get the following error:
My sample.tsv file looks like:
In my config.yaml I define:
I cloned your repository on Jul 26 13:11.
I am running Snakemake 6.6.0
I run the following command to do a dry-run:
snakemake --config out="$out_path" -p -n
I might have misunderstood something in the way different folders need to be defined, so apologies in advance if that is the case.
Thanks!
PS. Could it be that the example data is no longer available?
The text was updated successfully, but these errors were encountered: