You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a GBZ file from Minimap-cactus and I want to extract a single chromosome or a region of the chromosome as a way to subset the GBZ graph. Is it possible to do this directly using GBZ as the input graph? I looked into vg chunk and vg find but I didn't see any flags that explicitly supported extracting a chromosome or a region of the chromosome. The best I could do was run the following command: vg chunk -x <input>.gbz -S <input>.snarl -p GRCh38#0#chr20:2000000-3000000 -O gfa > subgraph.gfa. I assume this provides the snarl in the chr20 in the given base pair range.
Any pointers to the proper documentation or tutorials would be greatly appreciated. I looked through the wiki but it didn't necessarily help so I apologize if I missed something obvious.
Thank you.
The text was updated successfully, but these errors were encountered:
Your command looks right to me. Is it giving you an unexpected output?
If you want the path and all the nested variants too, then I think your command is right. Using the snarls can be a bit slow though.
If you're only interested in the path and the stuff close to it, you can use --context-steps or --context-length to walk out from the nodes along the path.
If you want the whole chromosome, you can use the --components flag that will give you the whole connected component.
It was giving the expected output however I just wanted to get a region without the snarl. When I tried to run the command with only -p flag, it requires me to use the flag with either -S or -c flags.
Thank you for your clarification. I think that kind of answers my question. Just to confirm, to extract a single chromosome's subgraph from the graph, I can run something like vg chunk -x <input>.gbz --components -p GRCh38#0#chr20 -O gfa > subgraph.gfa ?
Now, is there a way to filter the reads that are mapped within a certain region? For instance, if i have alig.gam file that was obtained from giraffe, and I want to only extract the reads that are mapped between node A and node B of the graph?
And is there a way to chunk the graph so that the haplotype paths are retained in the subgraph?
Hi,
I have a GBZ file from Minimap-cactus and I want to extract a single chromosome or a region of the chromosome as a way to subset the GBZ graph. Is it possible to do this directly using GBZ as the input graph? I looked into
vg chunk
andvg find
but I didn't see any flags that explicitly supported extracting a chromosome or a region of the chromosome. The best I could do was run the following command:vg chunk -x <input>.gbz -S <input>.snarl -p GRCh38#0#chr20:2000000-3000000 -O gfa > subgraph.gfa
. I assume this provides the snarl in the chr20 in the given base pair range.Any pointers to the proper documentation or tutorials would be greatly appreciated. I looked through the wiki but it didn't necessarily help so I apologize if I missed something obvious.
Thank you.
The text was updated successfully, but these errors were encountered: