Skip to content

Commit 11aaa5c

Browse files
authored
Merge pull request #402 from ksahlin/release
Version 0.13.0
2 parents a333a29 + f956c98 commit 11aaa5c

File tree

5 files changed

+57
-33
lines changed

5 files changed

+57
-33
lines changed

CHANGES.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,11 @@
11
# Strobealign Changelog
22

3-
## development version
3+
## v0.13.0 (2024-03-04)
44

5+
* #394: Added option `--aemb` (abundance estimation for metagenomic binning),
6+
which makes strobealign output a table with estimated abundance values for
7+
each contig (instead of SAM or PAF). This was contributed by Shaojun Pan
8+
(@psj1997).
59
* #386: Parallelize indexing even more by using @alugowski’s
610
[poolSTL](https://github.com/alugowski/) `pluggable_sort`.
711
Indexing a human reference (measured on CHM13) now takes only ~45 s on a

CMakeLists.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
cmake_minimum_required(VERSION 3.16)
22

3-
project(strobealign VERSION 0.12.0)
3+
project(strobealign VERSION 0.13.0)
44
include(FetchContent)
55

66
option(ENABLE_AVX "Enable AVX2 support" OFF)

CONTRIBUTING.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ If needed, run `make` with `VERBOSE=1` to get more logging output.
2424
After CMake has been run, you can use this one-liner to compile strobealign and
2525
run the tests:
2626
```
27-
make -j -C build && tests/run.sh
27+
make -s -j -C build && tests/run.sh
2828
```
2929

3030
Whenever you make changes that could potentially affect mapping results, you can

README.md

+49-29
Original file line numberDiff line numberDiff line change
@@ -77,47 +77,68 @@ as a developer.
7777

7878
### Python bindings
7979

80-
Experimental and incomplete Python bindings can be installed with
80+
Experimental Python bindings can be installed with
8181
`pip install .`. The only documentation for the moment are the tests in
8282
`tests/*.py`.
8383

8484
## Usage
8585

86+
To align paired-reads against a reference FASTA and produce a sorted BAM file,
87+
using eight threads:
8688
```
87-
strobealign ref.fa reads.fq > output.sam # Single-end reads
88-
strobealign ref.fa reads1.fq reads2.fq > output.sam # Paired-end reads
89-
90-
strobealign -x ref.fa reads.fq > output.paf # Single-end reads mapping only (PAF)
91-
strobealign -x ref.fa reads1.fq reads2.fq > output.paf # Paired-end reads mapping only (PAF)
89+
strobealign -t 8 ref.fa reads.1.fastq.gz reads.2.fastq.gz | samtools sort -o sorted.bam
9290
```
9391

94-
To use interleaved files, use the `--interleaved` flag:
95-
92+
For single-end reads:
9693
```
97-
strobealign ref.fa reads.fq --interleaved > output.sam # Single and/or paired-end reads
94+
strobealign -t 8 ref.fa reads.fastq.gz | samtools sort -o sorted.bam
9895
```
9996

100-
To report secondary alignments, set parameter `-N [INT]` for a maximum of `[INT]` secondary alignments.
101-
102-
The above commands are suitable for interactive use and test runs.
103-
For normal use, avoid creating SAM files on disk as they get very large compared
104-
to their compressed BAM counterparts. Instead, either pipe strobealign’s output
105-
into `samtools view` to create unsorted BAM files:
97+
For mixed reads (the input file can contain both single and paired-end reads):
10698
```
107-
strobealign ref.fa reads.1.fastq.gz reads.2.fastq.gz | samtools view -o mapped.bam
99+
strobealign -t 8 ref.fa --interleaved reads.fastq.gz | samtools sort -o sorted.bam
108100
```
109-
Or use `samtools sort` to create a sorted BAM file:
101+
102+
In alignment mode, strobealign produces SAM output. By piping the output
103+
directly into `samtools`, the above commands avoid creating potentially large
104+
intermediate SAM files and also reduce disk I/O.
105+
106+
To produce unsorted BAM, use `samtools view` instead of `samtools sort`.
107+
108+
109+
### Mapping-only mode
110+
111+
The command-line option `-x` switches strobealign into mapping-only mode,
112+
in which it will output [PAF](https://github.com/lh3/miniasm/blob/master/PAF.md)
113+
files instead of SAM. For example:
110114
```
111-
strobealign ref.fa reads.1.fastq.gz reads.2.fastq.gz | samtools sort -o sorted.bam
115+
strobealign -x -t 8 ref.fa reads.1.fastq.gz reads.2.fastq.gz | igzip > output.paf.gz
112116
```
113-
This is usually faster than doing the two steps separately because fewer
114-
intermediate files are created.
117+
`igzip` is a faster version of gzip that is part of
118+
[ISA-L](https://github.com/intel/isa-l/).
119+
If it is not available, replace it with `pigz` or regular `gzip` in the
120+
command.
121+
122+
123+
## Abundance estimation mode (for metagenomic binning)
124+
125+
The command-line option `--aemb` switches strobealign into abundance estimation
126+
mode, intended for metagenomic binning.
127+
In this mode, strobealign outputs a single table with abundance values in
128+
tab-separated value format instead of SAM or PAF.
115129

116-
To output the estimated abundance of every contig, the format of output file is: contig_id \t abundance_value:
130+
Paired-end example:
117131
```
118-
strobealign ref.fa reads.fq --aemb > abundance.txt # Single-end reads
119-
strobealign ref.fa reads1.fq reads2.fq --aemb > abundance.txt # Paired-end reads
132+
strobealign -t 8 --aemb ref.fa reads.1.fastq.gz reads.2.fastq.gz > abundances.tsv
120133
```
134+
The output table contains one row for each contig of the reference.
135+
The first column is the reference/contig id and the second its abundance.
136+
137+
The abundance is the number of bases mapped to a contig, divided by the length
138+
of the contig. Reads mapping to *n* different locations are weighted 1/*n*.
139+
140+
Further columns may be added to this table in future versions of strobealign.
141+
121142

122143
## Command-line options
123144

@@ -127,12 +148,11 @@ options. Some important ones are:
127148
* `-r`: Mean read length. If given, this overrides the read length estimated
128149
from the input file(s). This is usually only required in combination with
129150
`--create-index`, see [index files](#index-files).
130-
* `-t N`, `--threads=N`: Use N threads. This mainly applies to the mapping step
131-
as the indexing step is only partially parallelized.
151+
* `-t N`, `--threads=N`: Use N threads (both for mapping and indexing).
132152
* `--eqx`: Emit `=` and `X` CIGAR operations instead of `M`.
133153
* `-x`: Only map reads, do not do no base-level alignment. This switches the
134154
output format from SAM to [PAF](https://github.com/lh3/miniasm/blob/master/PAF.md).
135-
* `--aemb`: Output the estimated abundance value of every contig, the format of output file is: contig_id \t abundance_value.
155+
* `--aemb`: Output estimated abundance value of each contig, see section above.
136156
* `--rg-id=ID`: Add RG tag to each SAM record.
137157
* `--rg=TAG:VALUE`: Add read group metadata to the SAM header. This can be
138158
specified multiple times. Example: `--rg-id=1 --rg=SM:mysamle --rg=LB:mylibrary`.
@@ -172,19 +192,19 @@ To create an index, use the `--create-index` option.
172192
Since strobealign needs to know the read length, either provide it with
173193
read file(s) as if you wanted to map them:
174194

175-
strobealign --create-index ref.fa reads.1.fastq.gz reads.2.fastq.gz
195+
strobealign --create-index -t 8 ref.fa reads.1.fastq.gz reads.2.fastq.gz
176196

177197
Or set the read length explicitly with `-r`:
178198

179-
strobealign --create-index ref.fa -r 150
199+
strobealign --create-index -t 8 ref.fa -r 150
180200

181201
This creates a file named `ref.fa.rX.sti` containing the strobemer index,
182202
where `X` is the canonical read length that the index is optimized for (see
183203
above).
184204
To use the index when mapping, provide option `--use-index` when doing the
185205
actual mapping:
186206

187-
strobealign --use-index ref.fa reads.1.fastq.gz reads.2.fastq.gz | samtools ...
207+
strobealign --use-index -t 8 ref.fa reads.1.fastq.gz reads.2.fastq.gz | samtools ...
188208

189209
- Note that the `.sti` files are usually tied to a specific strobealign version.
190210
That is, when upgrading strobealign, the `.sti` files need to be regenerated.

setup.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
name="strobealign",
66
description="Python bindings for strobealign",
77
license="MIT",
8-
version="0.12.0",
8+
version="0.13.0",
99
packages=["strobealign"],
1010
package_dir={"": "src/python"},
1111
cmake_install_dir="src/python",

0 commit comments

Comments
 (0)