-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract the alignement to a file ? #11
Comments
Hi Sam, The amount of identity can be calculated by summing the length of all aligned sequence segments (multiplied by the number of genomes for which that segment was observed, look at the ORI tag for each segment) and divide that by the total amount of sequence that was input into the alignment. Note that the parameters -m, -e, -n, -c etc. have a large impact on the number of segments that are aligned, and thus the identity calculation. Good luck, |
Hi Jasper, Thank you for your fast answer and your advices. I will let you know about what I can get. Regards, |
Hi Jasper, It was indeed pretty trivial to convert the gfa file into xmfa. On another note, I ran several alignment on the same dataset of 3 collinear genomes to test the different options of the tool and I noticed that the standard output remains the same although counting the MEMs (depending on the option choosen and the default minimum length) would indicate that the final alignment are different, as expected. Note that the standard output does change depending on the dataset. Regards, |
Hi Sam, Cheers, |
Hi @samaln, could you share the approach you used to generate a XMFA from a reveal GFA? Thanks, |
Hi @jasperlinthorst and @fbemm, I think it shouldn't be too difficult to add other constrains such as -n, -g ... since you can play with the values given by ORI:Z: . But I have no idea how u can manage to apply constrains from options as -c or -e that @jasperlinthorst mentioned above, which would be undoubtedly very interesting. Please let me know if you both have any remark or idea about that approch. @jasperlinthorst, 1/ -s option : 2/ -x option 3/ -g option 4/ -c option The standard output I get each time : As I understand the way reveal works, there should be at least slight differences on the number of bases aligned. Sorry for the bothering but I would like to use those figures since I cannot properly convert the file. Best Regards, |
Hi Sam, -c: This parameter determines whether a match in an alignment is incorporated into the graph. For instance, if you input two fasta files and there's a match of 1000bp between the two genomes and the indel penalty (given some penalty scheme) for this match adds up to 200. Then I calculate the score for this match as (10003)-(2001)=2800, where the 3 and the 1 come from the --ws and --wp parameters. Now if you set -c to a value higher than 2800 this match will not be incorporated into the graph. By default, this value is not considered, but setting it to a high number should change the output. -g, -x and -s: These parameters should apply when you align a sequence against a graph, where in the graph, only nodes that satisfy these parameters are considered for alignment (meaning they can be mutated by the alignment). So, if in you case, you are aligning only sequences these parameters are not considered and therefore won't change the output, so that should explain your findings. I recently added the 'stats' subcommand to calculate some statistics on the alignment graph, so maybe that could be a starting point for your identity calculation. Also, I'd suggest making some visualisations with 'reveal plot' or 'reveal gplot' to get an idea of how much of your genomes are actually aligned in the graphs that you generate and if there are any major structural events going on. In case, these can be addressed with the 'finish --order=chains' subcommand, but this is still quite preliminary for now. Let me know, |
Hi Jasper,
I'm interested in aligning bacterial genome (~ 5Mbp so ~5.2Mo each ) but in order to compare different tools, I wonder if there is a way to actually extract the alignment in a text/fasta... format from the graph ?
Otherwise, how did you manage to count the number of inversion or the percentage of identity (as you did for your pre-print) ?
I look forward for your answer.
Regards,
Sam
The text was updated successfully, but these errors were encountered: