Skip to content

Commit

Permalink
bump to 1.0.0
Browse files Browse the repository at this point in the history
  • Loading branch information
lmdu committed Jun 5, 2023
1 parent 37d903d commit 3af7137
Show file tree
Hide file tree
Showing 20 changed files with 836 additions and 1,422 deletions.
17 changes: 0 additions & 17 deletions benchmark.py

This file was deleted.

96 changes: 44 additions & 52 deletions docs/source/api_reference.rst
Original file line number Diff line number Diff line change
@@ -1,31 +1,31 @@
API Reference
=============

stria.version
pytrf.version
-------------

.. py:function:: stria.version()
.. py:function:: pytrf.version()
Get current version of stria
Get current version of pytrf

:return: version

:rtype: str

stria.SSRMiner
--------------
pytrf.STRFinder
---------------

.. py:class:: stria.SSRMiner(name, seq, min_repeats=[12,7,5,4,4,4])
.. py:class:: pytrf.STRFinder(chrom, seq, min_repeats=[12,7,5,4,4,4])
Find all microsatellites or SSRs that meet the minimum repeats on the input sequence
Find all exact or perfect short tandem repeats (STRs), simple sequence repeats (SSRs) or microsatellites that meet the minimum repeats on the input sequence

:param str name: the sequence name
:param str chrom: the sequence name

:param str seq: the input sequence
:param str seq: the input DNA sequence

:param list min_repeats: minimum number of repeats for mono, di, tri, tetra, penta, hexa, default (12,7,5,4,4,4), corresponding to 12 for mono, 7 for di, 5 for tri and 4 for tetra, penta, hexa
:param list min_repeats: minimum number of repeats for mono, di, tri, tetra, penta, hexa, default (12,7,5,4,4,4), corresponding to 12 for mono, 7 for di, 5 for tri and 4 for tetra, penta and hexa

:return: SSRMiner object
:return: STRFinder object

.. py:method:: as_list()
Expand All @@ -35,76 +35,68 @@ stria.SSRMiner

:rtype: list

stria.VNTRMiner
pytrf.GTRFinder
---------------

.. py:class:: stria.VNTRMiner(name, seq, min_motif_size=7, max_motif_size=30, min_repeat=2)
.. py:class:: pytrf.GTRFinder(chrom, seq, max_motif=30, min_repeat=3, min_length=10)
Find all minisatellites or VNTRs that meet the minimum repeat on the input sequence
Find all exact or perfect generic tandem repeats (GTRs) that meet the minimum repeat and minimum length on the input sequence

:param str name: the sequence name
:param str chrom: the sequence name

:param str seq: the input sequence
:param str seq: the input DNA sequence

:param int min_motif_size: minimum length of motif
:param int max_motif: maximum length of motif sequence

:param int max_motif_size: maximum length of motif
:param int min_repeat: minimum number of tandem repeats

:param int min_repeat: minimum number of repeats
:param int min_length: minimum length of tandem repeats

:return: VNTRMiner object
:return: GTRFinder object

.. py:method:: as_list()
Put all VNTRs in a list and return, each VNTR in list has 7 columns including [sequence name, start position, end position, motif sequence, motif length, repeats, VNTR length]
Put all GTRs in a list and return, each GTR in list has 7 columns including [sequence name, start position, end position, motif sequence, motif length, repeats, GTR length]

:return: all VNTRs found
:return: all GTRs found

:rtype: list

stria.ITRMiner
--------------

.. py:class:: stria.ITRMiner(name, seq, min_motif_size=1, max_motif_size=6, seed_min_repeat=3, seed_min_length=10, max_continuous_errors=2, substitution_penalty=0.5, insertion_penalty=1.0, deletion_penalty=1.0, min_match_ratio=0.7, max_extend_length=2000)
pytrf.ATRFinder
---------------

Find all imperfect tandem repeats from the input sequence
.. py:class:: pytrf.ATRFinder(chrom, seq, max_motif_size=6, seed_min_repeat=3, seed_min_length=10, max_continuous_error=3, min_identity=70, max_extend_length=2000)
:param str name: the sequence name
Find all approximate or imperfect tandem repeats (ATRs) from the input sequence

:param str seq: the input sequence
:param str chrom: the sequence name

:param int min_motif_size: minimum length of motif
:param str seq: the input DNA sequence

:param int max_motif_size: maximum length of motif

:param int seed_min_repeat: minimum number of repeat for seed

:param int seed_min_length: minimum length of seed

:param int max_continuous_errors: maximum number of continuous aligned errors allowed

:param float substitution_penalty: penaly for substitution

:param float insertion_penalty: penaly for insertion

:param float deletion_penalty: penalty for deletion
:param int max_continuous_error: maximum number of allowed continuous aligned errors

:param float min_match_ratio: minimum match ratio for extending alignment
:param float min_identity: minimum identity between ATR with its perfect counterpart (0~100)

:param int max_extend_length: maximum length allowed to extend

:return: ITRMiner object
:return: ATRFinder object

.. py:method:: as_list()
Put all ITRs in a list and return, each ITR in list has 11 columns including [sequence name, start position, end position, motif sequence, motif length, ITR length, matches, substitutions, insertions, deletions, identity]
Put all ATRs in a list and return, each ATR in list has 11 columns including [sequence name, start position, end position, motif sequence, motif length, ATR length, matches, substitutions, insertions, deletions, identity]

stria.ETR
pytrf.ETR
---------

.. py:class:: stria.ETR
.. py:class:: pytrf.ETR
Readonly exact tandem repeat (ETR) object generated by iterating over SSRMiner or VNTRMiner object
Readonly exact tandem repeat (ETR) object generated by iterating over STRFinder or GTRFinder object

.. py:attribute:: chrom
Expand Down Expand Up @@ -162,16 +154,16 @@ stria.ETR

:rtype: str

stria.ITR
pytrf.ATR
---------

.. py:class:: stria.ITR
.. py:class:: pytrf.ATR
Readonly imperfect tandem repeat (ITR) object generated by iterating over ITRMiner object
Readonly imperfect or approximate tandem repeat (ATR) object generated by iterating over ATRFinder object

.. py:attribute:: chrom
chromosome or sequence name where ITR located on
chromosome or sequence name where ATR located on

.. py:attribute:: start
Expand Down Expand Up @@ -215,23 +207,23 @@ stria.ITR

.. py:attribute:: seq
get the sequence of ITR
get the sequence of ATR

.. py:method:: as_list()
convert ITR object to a list
convert ATR object to a list

.. py:method:: as_dict()
convert ITR object to a dict
convert ATR object to a dict

.. py:method:: as_gff()
convert ITR object to a gff formatted string
convert ATR object to a gff formatted string

.. py:method:: as_string(separator='\t', terminator='')
convert ITR object to a TSV or CSV string by using separator and terminator
convert ATR object to a TSV or CSV string by using separator and terminator

:param str separator: a separator between columns

Expand Down
9 changes: 9 additions & 0 deletions docs/source/changelog.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,20 @@
Changelog
=========

Version 1.0.0 (2023-06-05)
--------------------------

- Changed the name from stria to pytrf
- Optimized the command line interface
- Used wraparound dynamic programming to identify approximate repeats

Version 0.1.5 (2023-05-05)
--------------------------

- Fixed ci wheel build

Version 0.1.4 (2023-05-04)
--------------------------

- Added support for Python 3.9-3.11
- Updated the structure of objects
Expand Down
29 changes: 15 additions & 14 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
.. stripy documentation master file, created by
sphinx-quickstart on Wed Apr 21 21:18:10 2021.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to stria's documentation!
Welcome to pytrf's documentation!
==================================

A Tandem repeat (TR) in genomic sequence is a set of adjacent short DNA sequence repeated consecutively.
The core sequence or repeat unit is generally called motif. According to the motif length, tandem repeats
can be classified as microsatellites and minisatellites. Microsatellites are also known as simple sequence
repeats (SSRs) or short tandem repeats (STRs) with motif length of 1-6 bp. Minisatellites are also sometimes
referred to as variable number of tandem repeats (VNTRs) has longer motif length than micorsatellites.

The ``stria`` is a lightweight Python C extension for identification and analysis of short tandem repeats.
The stria enables to fastly identify both exact and imperfect SSRs and VNTRs from large numbers of DNA sequences.
A Tandem repeat (TR) in genomic sequence is a set of adjacent short DNA
sequence repeated consecutively. The core sequence or repeat unit is generally
called motif. According to the motif length, tandem repeats can be classified
as microsatellites and minisatellites. Microsatellites are also known as simple
sequence repeats (SSRs) or short tandem repeats (STRs) with motif length of 1-6 bp.
Minisatellites are also sometimes referred to as variable number of tandem repeats
(VNTRs) has longer motif length than micorsatellites.

The pytrf is a lightweight Python C extension for identification of tandem repeats.
The pytrf enables to fastly identify both exact or perfect SSRs. It also can find generic
tandem repeats with any size of motif, such as with maximum motif length of 100 bp.
Additionally, it has capability of finding approximate or imperfect tandem repeats.
Furthermore, the pytrf not only can be used as Python package but also provides command
line interface for users to facilitate the identification of tandem repeats.

.. toctree::
:maxdepth: 2
Expand Down
19 changes: 10 additions & 9 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
@@ -1,35 +1,36 @@
Installation
============

You can install stria via the Python Package Index (PyPI) (recommended) or from source.
Make sure you have installed both pip and Python before starting.
Currently, pyfastx supports Python 3.5, 3.6, 3.7, 3.8, 3.9 and can work on Windows, Linux, MacOS.
You can install pytrf via the Python Package Index (PyPI) (recommended)
or from source. Make sure you have installed both pip and Python before starting.
Currently, pytrf supports Python 3.6, 3.7, 3.8, 3.9, 3.10, 3.11 and can work on
Windows, Linux, MacOS. The command line tool depends on `pyfastx <https://github.com/lmdu/pyfastx>`_.

Install from PyPI
-----------------

::

pip install stria
pip install pytrf

Update to or install the latest version

::

pip install -U stria
pip install -U pytrf

Install from source
-------------------

Clone stria using ``git`` or download latest `release <https://github.com/lmdu/stria/releases>`_.
Clone pytrf using ``git`` or download latest `release <https://github.com/lmdu/pytrf/releases>`_.

::

git clone https://github.com/lmdu/stria.git
git clone https://github.com/lmdu/pytrf.git

Then ``cd`` to the stria folder and run install command:
Then ``cd`` to the pytrf folder and run install command:

::

cd stria
cd pytrf
python setup.py install
Loading

0 comments on commit 3af7137

Please sign in to comment.