Skip to content
This repository has been archived by the owner on Jul 17, 2023. It is now read-only.

Commit

Permalink
Merge branch 'develop' into feature-MANTA-1398
Browse files Browse the repository at this point in the history
  • Loading branch information
x-chen committed Nov 7, 2018
2 parents 296ad36 + 8df4f99 commit 261e6a6
Show file tree
Hide file tree
Showing 57 changed files with 991 additions and 556 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

#
# Using sudo-false/container-based tests for greater (linux) test responsiveness. This doesn't seem
# to effect the queing time for OSX tests.
# to effect the queueing time for OSX tests.
#

dist: trusty
Expand Down
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,27 @@
- Add a configurable option to allow overlapping pairs to be used as evidence (MANTA-1398)
- The option is available in the configure file configureManta.py.ini

### Changed
- Change SV candidate contig aligners to improve precision (MANTA-1396)
- Change contig aligners such that variant occurrences are more heavily penalized.
- Fix multi-junction nomination (MANTA-1430)
- Complex events with more than two junctions are no longer nominated as a group
- Fix the problem of duplicate detection of the same SV candidate
- Add index to ensure uniqueness of evidence bam filenames (MANTA-1431)
- It solves the potential problem of name conflicts for evidence bams if the input bam files have the same name while located in different directories.
- Change filters for easy interpretation of multi-sample germline variant vcf (MANTA-1343)
- Add record-level filter 'SampleFT' when no sample passes all sample level filters
- Add sample-level filter 'HomRef' for homogyzous reference calls
- No more sample-level filter will be applied at the record level even if it applies to all samples
- Change representation of inversions in the VCF output (MANTA-1385)
- Intrachromosomal translocations with inverted breakpoints are now reported as two breakend (BND) records.
- Previously they were reported in the VCF using the inversion (INV) allele type.

### Fixed
- Fix the bug of stats generation with short reference sequences (MANTA-1459/[#143])
- Fix the evidence significance test in the multi-sample calling mode (MANTA-1294)
- This issue previously caused spurious false negatives during the multi-sample calling mode. The incidence rate of the problem tended to increase with sample count.

## v1.4.0 - 2018-04-25

This is a major bugfix update from v1.3.2, featuring improved precision and vcf representation, in addition to minor user friendly improvements.
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ indels for germline and cancer sequencing applications. *Bioinformatics*,

...and the corresponding [open-access pre-print][preprint].

[bpaper]:https://dx.doi.org/10.1093/bioinformatics/btv710
[preprint]:http://dx.doi.org/10.1101/024232
[bpaper]:https://doi.org/10.1093/bioinformatics/btv710
[preprint]:https://doi.org/10.1101/024232


License
Expand Down
4 changes: 2 additions & 2 deletions docs/developerGuide/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Manta Developer Guide
* [Commit messages](#commit-messages)
* [Commit consolidation](#commit-consolidation)
* [Changelog conventions](#changelog-conventions)
* [Branching and release tagging guidelines](#branching-and-release-tagging-guidelines)
* [Branching and release tagging guidelines](#branching-and-release-tagging-guidelines)
* [Error handling](#error-handling)
* [General Policies](#general-policies)
* [Exception Details](#exception-details)
Expand Down Expand Up @@ -240,7 +240,7 @@ prior to merging the branch.
longer, for instance by starting all major bullet points with an imperitive verb.


## Branching and release tagging guidelines
### Branching and release tagging guidelines

All features and bugfixes are developed on separate branches. Branch names should contain the corresponding JIRA ticket
id or contain the key "github${issueNumber}' to refer to the corresponding issue on github.com. After code
Expand Down
10 changes: 5 additions & 5 deletions docs/methods/primary/methods.tex
Original file line number Diff line number Diff line change
Expand Up @@ -350,7 +350,7 @@ \subsubsection{Contig assembly}

Finally, a greedy procedure is applied to select the constructed contigs in the order of the number of effective supporting reads and contig length. An effective supporting read cannot be a psuedo read, nor support any contigs that have been selected previously. The selection process is repeated until there is no more contig available with the minimum number of effective supporting reads (defaults to 2), or the maximum number of assembled contigs (defaults to 10) is met.

\subsubsection{Contig alignment for large SVs} For large SV candidates spanning two distinct regions of the genome, the reference sequences are extracted from the two expected breakend regions, and the order and/or orientation of the references is adjusted such that if the candidate SV exists, the left-most segment of the SV contig should align to the first transformed reference region and the right-most contig segment should align to the second reference region. The contig is aligned across the two reference regions using a variant of Smith-Waterman-Gotoh alignment (\cite{smith1981,gotoh1982}) where a `jump' state is included which can only be entered from the match state for the first reference segment and only exits to the match or insert states of the second reference segment. The state transitions of this alignment scheme are shown in Figure \ref{fig:jumpstate}
\subsubsection{Contig alignment for large SVs} For large SV candidates spanning two distinct regions of the genome, the reference sequences are extracted from the two expected breakend regions, and the order and/or orientation of the references is adjusted such that if the candidate SV exists, the left-most segment of the SV contig should align to the first transformed reference region and the right-most contig segment should align to the second reference region. The contig is aligned across the two reference regions using a variant of Smith-Waterman-Gotoh alignment (\cite{smith1981,gotoh1982}) where a `jump' state is included which can only be entered from the match state for the first reference segment and only exits to the match or insert states of the second reference segment. The state transitions of this alignment scheme are shown in Figure \ref{fig:jumpstate}.

\begin{figure}[!tpb]
\centerline{
Expand All @@ -362,15 +362,15 @@ \subsubsection{Contig alignment for large SVs} For large SV candidates spanning
\label{fig:jumpstate}
\end{figure}

The alignment scores used for each reference segment are (2,-8,-12,-1) for match, mismatch, gap open and gap extend. Switching between insertion and deletion states is allowed at no cost. Scores to transition into and extend the 'jump' state are -24 and 0, respectively. The jump state is entered from any point in reference segment 1 and exits to any point in reference segment 2. The alignments resulting from this method are only used when a transition through the jump state occurs. In addition, each of the two alignment segments flanking the jump state are required to extend at least 30 bases with an alignment score no less than 75\% of the perfect match score for the flanking alignment segment. If more than one contig meets all quality criteria the contig with the highest alignment score is selected. When a contig and alignment meet all quality criteria, the reference orientation and ordering transformations applied before alignment are reversed to express the refined basepair-resolution structural variant candidate in standard reference genome coordinates.
The alignment scores used for each reference segment are (2,-8,-12,-1) for match, mismatch, gap open and gap extend. Switching between insertion and deletion states is allowed at no cost. Scores to transition into and extend the 'jump' state are -100 and 0, respectively. The jump state is entered from any point in reference segment 1 and exits to any point in reference segment 2. The alignments resulting from this method are only used when a transition through the jump state occurs. In addition, each of the two alignment segments flanking the jump state are required to extend at least 30 bases with an alignment score no less than 75\% of the perfect match score for the flanking alignment segment. If more than one contig meets all quality criteria, the contig with the highest alignment score is selected. When a contig and alignment meet all quality criteria, the reference orientation and ordering transformations applied before alignment are reversed to express the refined basepair-resolution structural variant candidate in standard reference genome coordinates.


\subsubsection{Contig alignment for complex region candidates}
Complex regions are segments of the genome targeted for assembly without a specific variant hypothesis. For this reason the problem of aligning contigs for these regions is somewhat more difficult than for specific large SV candidates, because a wide range of variant sizes are possible. This is reflected in the alignment procedure for complex region contigs, which are checked against two aligners optimized for large and small indels respectively.
Complex regions are segments of the genome targeted for assembly without a specific variant hypothesis. For this reason the problem of aligning contigs for these regions is somewhat more difficult than for specific large SV candidates, because a wide range of variant sizes are possible. This is reflected in the indel aligner that handles both small and large indels.

A contig is first aligned with the large indel aligner and only checked for small indels if no large indels are found. The structure of the large indel aligner is a variant on a standard affine-gap scheme, in which a second pair of delete and insert states are added for large indels. Alignment scores for standard alignment states are (2, -8, -18, -1) for match, mismatch, gap open, and gap extend. Open and extend scores for 'large' gaps are -24 and 0. Transitions are allowed between standard insertions and deletions but disallowed between the large indel states. Variants are only reported from the large indel aligner if an insertion of at least 80 bases or a deletion of at least 200 bases is found. The flanking alignment quality criteria described above for large SVs is also applied to filter out noisy alignments. To reduce false positive calls in repetitive regions an additional filter is applied to complex region candidates: the left and right segments of the contig flanking a candidate indel are checked for uniqueness in the local reference context. Contig alignments are filtered out if either of the two flanking contig segments can be aligned equally well to multiple locations within 500bp of the target reference region.
The indel aligner is a variant on a standard affine-gap scheme, in which a second pair of delete and insert states are added for large indels. Alignment scores for standard alignment states are (2, -8, -24, -1) for match, mismatch, gap open, and gap extend. Open and extend scores for 'large' gaps are -100 and 0. Transitions are allowed between standard insertions and deletions but disallowed between the large indel states.

If the large indel aligner fails to identify a candidate meeting the size and quality criteria above, the contig is used to search for smaller indels, this time using a conventional affine gap aligner with parameters: (2,-8,-12,0) for match, mismatch, gap open, gap extend. All indels larger than the minimum indel size are identified. For each indel, the flanking contig alignment quality and uniqueness checks described above are applied to filter likely false positives, and any remaining cases become small indel candidates.
All indels larger than the minimum indel size are identified by the indel aligner. For each indel, the flanking alignment quality criteria described above for large SVs is also applied to filter out noise alignments. To further reduce false positive calls in repetitive regions, an additional filter is applied to complex region candidates: the left and right segments of the contig flanking a candidate indel are checked for uniqueness in the local reference context. Contig alignments are filtered out if either of the two flanking contig segments can be aligned equally well to multiple locations within 500bp of the target reference region. Among contigs meeting all quality criteria, the ones with 'large' gaps are prioritized during contig selection. If there are more than one contig with 'large' gaps, or if all contigs have no 'large' gap, the contig with the highest alignment score is selected.

\subsubsection{Large Insertions}

Expand Down
Loading

0 comments on commit 261e6a6

Please sign in to comment.