@@ -77,47 +77,68 @@ as a developer.
77
77
78
78
### Python bindings
79
79
80
- Experimental and incomplete Python bindings can be installed with
80
+ Experimental Python bindings can be installed with
81
81
` pip install . ` . The only documentation for the moment are the tests in
82
82
` tests/*.py ` .
83
83
84
84
## Usage
85
85
86
+ To align paired-reads against a reference FASTA and produce a sorted BAM file,
87
+ using eight threads:
86
88
```
87
- strobealign ref.fa reads.fq > output.sam # Single-end reads
88
- strobealign ref.fa reads1.fq reads2.fq > output.sam # Paired-end reads
89
-
90
- strobealign -x ref.fa reads.fq > output.paf # Single-end reads mapping only (PAF)
91
- strobealign -x ref.fa reads1.fq reads2.fq > output.paf # Paired-end reads mapping only (PAF)
89
+ strobealign -t 8 ref.fa reads.1.fastq.gz reads.2.fastq.gz | samtools sort -o sorted.bam
92
90
```
93
91
94
- To use interleaved files, use the ` --interleaved ` flag:
95
-
92
+ For single-end reads:
96
93
```
97
- strobealign ref.fa reads.fq --interleaved > output.sam # Single and/or paired-end reads
94
+ strobealign -t 8 ref.fa reads.fastq.gz | samtools sort -o sorted.bam
98
95
```
99
96
100
- To report secondary alignments, set parameter ` -N [INT] ` for a maximum of ` [INT] ` secondary alignments.
101
-
102
- The above commands are suitable for interactive use and test runs.
103
- For normal use, avoid creating SAM files on disk as they get very large compared
104
- to their compressed BAM counterparts. Instead, either pipe strobealign’s output
105
- into ` samtools view ` to create unsorted BAM files:
97
+ For mixed reads (the input file can contain both single and paired-end reads):
106
98
```
107
- strobealign ref.fa reads.1.fastq.gz reads.2. fastq.gz | samtools view -o mapped .bam
99
+ strobealign -t 8 ref.fa --interleaved reads.fastq.gz | samtools sort -o sorted .bam
108
100
```
109
- Or use ` samtools sort ` to create a sorted BAM file:
101
+
102
+ In alignment mode, strobealign produces SAM output. By piping the output
103
+ directly into ` samtools ` , the above commands avoid creating potentially large
104
+ intermediate SAM files and also reduce disk I/O.
105
+
106
+ To produce unsorted BAM, use ` samtools view ` instead of ` samtools sort ` .
107
+
108
+
109
+ ### Mapping-only mode
110
+
111
+ The command-line option ` -x ` switches strobealign into mapping-only mode,
112
+ in which it will output [ PAF] ( https://github.com/lh3/miniasm/blob/master/PAF.md )
113
+ files instead of SAM. For example:
110
114
```
111
- strobealign ref.fa reads.1.fastq.gz reads.2.fastq.gz | samtools sort -o sorted.bam
115
+ strobealign -x -t 8 ref.fa reads.1.fastq.gz reads.2.fastq.gz | igzip > output.paf.gz
112
116
```
113
- This is usually faster than doing the two steps separately because fewer
114
- intermediate files are created.
117
+ ` igzip ` is a faster version of gzip that is part of
118
+ [ ISA-L] ( https://github.com/intel/isa-l/ ) .
119
+ If it is not available, replace it with ` pigz ` or regular ` gzip ` in the
120
+ command.
121
+
122
+
123
+ ## Abundance estimation mode (for metagenomic binning)
124
+
125
+ The command-line option ` --aemb ` switches strobealign into abundance estimation
126
+ mode, intended for metagenomic binning.
127
+ In this mode, strobealign outputs a single table with abundance values in
128
+ tab-separated value format instead of SAM or PAF.
115
129
116
- To output the estimated abundance of every contig, the format of output file is: contig_id \t abundance_value :
130
+ Paired-end example :
117
131
```
118
- strobealign ref.fa reads.fq --aemb > abundance.txt # Single-end reads
119
- strobealign ref.fa reads1.fq reads2.fq --aemb > abundance.txt # Paired-end reads
132
+ strobealign -t 8 --aemb ref.fa reads.1.fastq.gz reads.2.fastq.gz > abundances.tsv
120
133
```
134
+ The output table contains one row for each contig of the reference.
135
+ The first column is the reference/contig id and the second its abundance.
136
+
137
+ The abundance is the number of bases mapped to a contig, divided by the length
138
+ of the contig. Reads mapping to * n* different locations are weighted 1/* n* .
139
+
140
+ Further columns may be added to this table in future versions of strobealign.
141
+
121
142
122
143
## Command-line options
123
144
@@ -127,12 +148,11 @@ options. Some important ones are:
127
148
* ` -r ` : Mean read length. If given, this overrides the read length estimated
128
149
from the input file(s). This is usually only required in combination with
129
150
` --create-index ` , see [ index files] ( #index-files ) .
130
- * ` -t N ` , ` --threads=N ` : Use N threads. This mainly applies to the mapping step
131
- as the indexing step is only partially parallelized.
151
+ * ` -t N ` , ` --threads=N ` : Use N threads (both for mapping and indexing).
132
152
* ` --eqx ` : Emit ` = ` and ` X ` CIGAR operations instead of ` M ` .
133
153
* ` -x ` : Only map reads, do not do no base-level alignment. This switches the
134
154
output format from SAM to [ PAF] ( https://github.com/lh3/miniasm/blob/master/PAF.md ) .
135
- * ` --aemb ` : Output the estimated abundance value of every contig, the format of output file is: contig_id \t abundance_value .
155
+ * ` --aemb ` : Output estimated abundance value of each contig, see section above .
136
156
* ` --rg-id=ID ` : Add RG tag to each SAM record.
137
157
* ` --rg=TAG:VALUE ` : Add read group metadata to the SAM header. This can be
138
158
specified multiple times. Example: ` --rg-id=1 --rg=SM:mysamle --rg=LB:mylibrary ` .
@@ -172,19 +192,19 @@ To create an index, use the `--create-index` option.
172
192
Since strobealign needs to know the read length, either provide it with
173
193
read file(s) as if you wanted to map them:
174
194
175
- strobealign --create-index ref.fa reads.1.fastq.gz reads.2.fastq.gz
195
+ strobealign --create-index -t 8 ref.fa reads.1.fastq.gz reads.2.fastq.gz
176
196
177
197
Or set the read length explicitly with ` -r ` :
178
198
179
- strobealign --create-index ref.fa -r 150
199
+ strobealign --create-index -t 8 ref.fa -r 150
180
200
181
201
This creates a file named ` ref.fa.rX.sti ` containing the strobemer index,
182
202
where ` X ` is the canonical read length that the index is optimized for (see
183
203
above).
184
204
To use the index when mapping, provide option ` --use-index ` when doing the
185
205
actual mapping:
186
206
187
- strobealign --use-index ref.fa reads.1.fastq.gz reads.2.fastq.gz | samtools ...
207
+ strobealign --use-index -t 8 ref.fa reads.1.fastq.gz reads.2.fastq.gz | samtools ...
188
208
189
209
- Note that the ` .sti ` files are usually tied to a specific strobealign version.
190
210
That is, when upgrading strobealign, the ` .sti ` files need to be regenerated.
0 commit comments