Skip to content

Latest commit

 

History

History
163 lines (118 loc) · 6.78 KB

README.md

File metadata and controls

163 lines (118 loc) · 6.78 KB

pyBEAST

PyPi tests cov

PyBEAST helps with running BEAST2 with best practices. Configure a beast run in a reproducible manner can be time consuming. PyBEAST is designed to make configuring beast as simple as possible.

Install

Install pybeast with pip (requires python -V >= 3.7).

pip install pybeast

Command line interface

Basic usage

pybeast beast.xml
  1. Create output folder and run command
  2. Ensures the run is self-contained and reproducible.
pybeast --run bash beast.xml

The --run flag tells pybeast how to run the run.sh file.

SLURM example

This example using the SLURM template in the examples folder to submit the beast run as a job.

pybeast --run sbatch --template examples/slurm.template examples/beast.xml

At a minimum the template must contain {{BEAST}} key. This will be replaced with the beast2 run command.

Here we use the -v (--template-variable) option to request 4 cpus.

pybeast --run sbatch --template examples/slurm.template -v cpus-per-task=4 exmaples/beast.xml

Default template variables can be specified in the template in the format {{<key>=<value>}} e.g. {{cpus-per-task=4}}.

dynamic variables

PyBEAST uses dynamic-beast to create dynamic xml files that can be modified at runtime.

Here we use the -d (--dynamic-variable) option to set the chain length to 1000000.

pybeast -d mcmc.chainLength=1000000 examples/beast.xml

The dynamic variables are saved to a .json file in the run directory. This file can be further edited before runtime. At run time the values in the JSON file will be used in the analysis.

Example

pybeast + feast to run BETS.

This is an example of how pybeast can be used in combination with feast to easily perform a Bayesian Evaluation of Temporal Signal (BETS) analysis.

BETS constitutes a formal test of the strength of temporal signal in a data set, which is an important prerequisite for obtaining reliable inferences in phylodynamic analyses. BETS is essentially model selection between four models. In the examples/BETS-templates folders there are four reuseable feast templates for performing BETS. The templates can be used with any alignment (beast dynamic variable $(alignment)) and parse dates from the descriptors (heterochronous).

Here is the alignment tag. By parsing the fileName to beast (-d alignment=fileName) the template can be reused with any dataset.

...
<alignment id="alignment" spec='feast.fileio.AlignmentFromFasta' fileName="$(alignment)"/>
...

Here is the date trait tag. Date parsing can be configured to work with any format using dynamic variables e.g. -d Date.dateFormat=yyyy/M/d.

<trait id="Date" spec="feast.fileio.TraitSetFromTaxonSet"
    delimiter="_"
    takeGroup="0"
    everythingAfterLast="true"
    dateFormat="Y"
    traitname="date">
    <taxa id="taxonSet" spec="TaxonSet" alignment="@alignment"/>
</trait>

The script below (examples/run_BETS.sh) takes and fasta file and runs BETS on it using pyBEAST to setup the analysis. Here we analyse the ice_viruses_cleaned dataset from the TempEst tutorial.

ALIGNMENT=${1?Must provide an ALIGNMENT.fasta file}
for XML_FILE in $(ls examples/BETS-templates/*)
do  
    GROUP_NAME="examples/$(basename "${ALIGNMENT}" .fasta)-BETS/$(basename "${XML_FILE}" .xml)"
    pybeast \
        --run sbatch \
        --group $GROUP_NAME \
        --duplicates 3 \
        --template examples/slurm.template \
        -v cpus-per-task=2 \
        --ns \
        -d "alignment=$ALIGNMENT" \
        -d "mcmc.particleCount=32" \
        $XML_FILE
done

The script process_BETS.py will process and plot the BETS output (marginal likelihood mean estimates).

python examples/process_BETS.py examples/ice_viruses_cleaned

We observe strongly positive (log) Bayes factors when including the sampling dates compared to when these dates are not included. Hence, these data demonstrate clear temporal signal, formally confirming the result from the TempEst tutorial. Our (log) marginal likelihood results point to a preference for the relaxed clock model, with a mean (log) Bayes factor of 14.65 in favor over the strict clock model.

Group Marginal likelihood Standard deviation logBF
UCLN_het -4152.573951 2.246429 0.000000
UCLN_iso -4245.626864 2.717245 -93.052913
strict_het -4167.228784 2.061483 -14.654834
strict_iso -4365.063450 2.475754 -212.489499

Note: SD from NS analysis can be used to determine if individual ML estimates are precise and and different subChain lengths should be used to determine if ML estimates are accurate.

Help

❯ pybeast --help
Usage: pybeast [OPTIONS] BEAST_XML_PATH

Arguments:
  BEAST_XML_PATH  [required]

Options:
  --run TEXT                    Run the run.sh file using this command.
  --resume / --no-resume        Resume the specified run.  [default: no-resume]
  --group TEXT                  Group runs in this folder.
  --description TEXT            Text to prepend to output folder name.
  --overwrite / --no-overwrite  Overwrite run folder if exists.  [default: no-overwrite]
  --seed INTEGER                Seed to use in beast analysis.
  --duplicates INTEGER          Number for duplicate runs to create.  [default: 1]
  -d, --dynamic-variable TEXT   Dynamic variable in the format <key>=<value>.
  --template PATH               Template for run.sh. Beast command is append to end of file.
  -v, --template-variable TEXT  Template variable in the format <key>=<value>.
  --chain-length INTEGER        Number of step in MCMC chain.
  --samples INTEGER             Number of samples to collect.
  --threads INTEGER             Number of threads and beagle instances to use (one beagle per thread).  [default: 1]
  --mc3 / --no-mc3              Use dynamic-beast to set default options for running MCMCMC.  [default: no-mc3]
  --ps / --no-ps                Use dynamic-beast to set default options for running PathSampler.  [default: no-ps]
  --ns / --no-ns                Use dynamic-beast to set default options for running nested sampling. [default: no-ns]
  --install-completion          Install completion for the current shell.
  --show-completion             Show completion for the current shell, to copy it or customize the installation.
  --help                        Show this message and exit.