-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the correct workflow for creating tracks DBs? #35
Comments
No, this file does not need to be saved. It is only temporarily needed in memory. Writing this file takes quite a bit of time (if you have a lot of regions). The second step would only be needed if iyou created the the first step in multiple steps (when using |
@eboileau What was the full command that you ran? I suspect that your input feather file you used was not in the proper format.
|
Thanks for your reply. I ran the first step
Using the output of step 1, I then ran step 2
which resulted in the above error. The files are here: https://data.dieterichlab.org/s/8Npf6r5bGdfaASM |
Thanks for providing the test files. In case you would need to run 1 |
Hi @ghuls thanks for taking time to fix this. I'd appreciate your comments on some more general questions:
Ok, so now I have my *regions_vs_tracks.rankings.feather to run pySCENIC. What about the difference between "regions" or "genes", i.e. more in terms of overall performance, do you have any experience?
Do I have to generate this from scratch? Are you aware of any resource for mouse?
? |
pySCENIC will only work with gene based rankings databases as it will also use the expression data. Region based databases are used in SCENIC+ as you would have access to ATAC data. But in general region databases are "better" than gene databases as they have a higher resolution (way more number of regions than number of genes). The disadvantage of region based databases is that you would need to associate a region with a downstream affected gene. In the gene based databases we assume that a e.g 10kb area around the TSS is correlated with the expression of the gene, but this dismisses distal enhancers.
https://resources.aertslab.org/cistarget/track2tf/ You will need a file with the following header: The format was originally only meant for motifs and not tracks, so the header is a bit weird:
BTW, our SCENIC+ public motif collection is now available: https://resources.aertslab.org/cistarget/motif_collections/ moitf2TF snapshots for that collection can be found at: https://resources.aertslab.org/cistarget/motif_collections/v10nr_clust_public/snapshots/ To create annotation for other species (if you would make one for non-human,mouse,fly) try to find orthologs for your species for the gene name mentioned in the gene_name (column 6) column and replace it with the gene name of your species. If you don't have an ortholog, for that gene, delete the line. |
Thanks @ghuls this is really useful. Let me take a few days to digest all this and go back to my project, then I'll mark this issue as resolved. |
First off, thanks for putting this together for the community.
I am trying to create a cisTarget track DB for mouse to use with pySCENIC, using a number of ChIP-seq (bigWig) files.
I am aware of the multiple (still open) issues on related topics, and a tutorial e.g. PBMC10k_SCENIC-protocol-CLI-tracks, etc., but I must say, despite all that and the instructions given here, I am still a bit unsure about what is a correct workflow.
First step, score-all-tracks-at-once-and-create-rankings. Here, I am using
--tracks
to pass a list of bigWig files, and for--bed
I am using the files that you provide under regions, e.g. mm10-limited-upstream500-tss-downstream100-full-transcript.bed. This steps outputs the files described in the instructions, except *.tracks_vs_regions.rankings.feather. I have 1 question for this step in particular:create_cisTarget_databases/create_cistarget_track_databases.py
Line 414 in fef07ae
Second step, I think from here instructions are a bit unclear for track-based annotations... According to my understanding, I need to run convert_motifs_or_tracks_vs_regions_or_genes_scores_to_rankings_cistarget_dbs.py using the output *.tracks_vs_regions.scores.feather from step 1. But it looks like it's not even reading in the file (same whether we use regions or genes with
--genes
):If Step 1 (and Step 2) above complete successfully, this would give me a "ranking database" to run pySCENIC. Here I have a few quetions:
For the "ranking database", actually which feather file should I use? (regions_vs_tracks.rankings.feather`? tracks_vs_tracks.rankings.feather? I don't know what would be the output of Step 2?). And actually do you recommend using "regions" or "genes" (e.g regions_vs_tracks.rankings vs. genes_vs_tracks.rankings) ?
Most importantly, how do I generate a matching "motif database" to use with
--annotations_fname
( files like those under track2tf )?Finally, a more general questions: How does this compare with region-based databases that you provide? Are these only usable with pycisTarget/SCENIC+?
Thanks for taking time to clarify, I'm sure this will help others as well.
I am running under:
SMP Debian 5.10.158-2 (2022-12-13) x86_64 GNU/Linux
and that's my environment:
The text was updated successfully, but these errors were encountered: