diff --git a/README.md b/README.md index e462951..339302d 100644 --- a/README.md +++ b/README.md @@ -2,22 +2,25 @@ [![License](https://img.shields.io/apm/l/vim-mode.svg)](LICENSE) # flower_map -A pipeline for generating maps of the species of flowers around a bee colony using drone imagery. +A pipeline for generating maps of the species of flowers around a bee colony using drone imagery. The pipeline uses Agisoft Metashape to stitch the drone images together into an orthomosaic, various computer vision algorithms to segment each plant from its background, and a pre-trained random forest classifier to label each plant by its species. # download Execute the following commands or download the [latest release](https://github.com/beelabhmc/flower_map/releases/latest) manually. ``` -wget -O- -q https://github.com/beelabhmc/flower_map/tarball/master | tar mxvzf - -mv beelabhmc-* flower_map +git clone https://github.com/beelabhmc/flower_map.git ``` # setup -The pipeline is written as a Snakefile which can be executed via [Snakemake](https://snakemake.readthedocs.io). We recommend using at least version 5.18.0: +## dependencies +The pipeline is written as a Snakefile which can be executed via [Snakemake](https://snakemake.readthedocs.io). We recommend using at least version 5.20.1: ``` -conda create -n snakemake -c bioconda -c conda-forge 'snakemake>=5.18.0' +conda create -n snakemake -c bioconda -c conda-forge --no-channel-priority 'snakemake>=5.20.1' ``` We highly recommend you install [Snakemake via conda](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html#installation-via-conda) like this so that you can use the `--use-conda` flag when calling `snakemake` to let it [automatically handle all dependencies](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#integrated-package-management) of the pipeline. Otherwise, you must manually install the dependencies listed in the [env files](envs). +## Agisoft Metashape +Create an Agisoft Metashape license file named `metashape.lic` in the same directory as the `run.bash` script. Without this file, the pipeline will attempt to run Metashape unlicensed, which usually fails on import. + # execution 1. Activate snakemake via `conda`: ``` @@ -31,14 +34,14 @@ We highly recommend you install [Snakemake via conda](https://snakemake.readthed ``` __or__ on an SGE cluster: ``` - ./run.bash --sge-cluster & + qsub run.bash ``` #### Executing the pipeline on your own data -You must modify [the config.yaml file](config.yml) to specify paths to your data. +You must modify [the config.yaml file](config.yml) to specify paths to your data. See [our wiki](wiki) for more information. ### If this is your first time using Snakemake -We recommend that you run `snakemake --help` to learn about Snakemake's options. For example, to check that the pipeline will be executed correctly before you run it, you can call Snakemake with the `-n -p -r` flags. This is also a good way to familiarize yourself with the steps of the pipeline and their inputs and outputs (the latter of which are inputs to the first rule in the pipeline -- ie the `all` rule). +We recommend that you run `snakemake --help` to read about Snakemake's options. For example, to check that the pipeline will be executed correctly before you run it, you can call Snakemake with the `-n -p -r` flags. This is also a good way to familiarize yourself with the steps of the pipeline and their inputs and outputs (the latter of which are inputs to the first rule in the pipeline -- ie the `all` rule). Note that Snakemake will not recreate output that it has already generated, unless you request it. If a job fails or is interrupted, subsequent executions of Snakemake will just pick up where it left off. This can also apply to files that *you* create and provide in place of the files it would have generated. diff --git a/Snakefile b/Snakefile index b0ce0d8..1f47022 100644 --- a/Snakefile +++ b/Snakefile @@ -14,6 +14,9 @@ def check_config(value, default=False, place=config): """ return true if config value exists and is true """ return place[value] if (value in place and place[value] is not None) else default +# set the output directory if it isn't set already +config['out'] = check_config('out', default='out') + def exp_str(): """ return the prefix str for the experimental strategy """ return "-exp" if check_config('parallel') else "" @@ -112,7 +115,7 @@ rule export_ortho: "scripts/export_ortho.py {input} {output}" rule segment: - """ segment plants from an image into high and low confidence regions""" + """ segment plants from an image into high and low confidence regions """ input: lambda wildcards: SAMP[wildcards.sample]+"/"+wildcards.image+SAMP_EXT[wildcards.sample][0] if check_config('parallel') else rules.export_ortho.output params: @@ -248,6 +251,7 @@ checkpoint create_split_truth_data: "scripts/create_truth_data.py {params.features} {input.truth} {output}" def train_input(wildcards): + """ return the input to the training step """ if check_config('truth') and check_config(wildcards.sample, place=config['truth']): if check_config('train_all', place=config['truth'][wildcards.sample]): return rules.create_truth_data.output @@ -266,6 +270,7 @@ rule train: "Rscript scripts/classify_train.R {input} {output}" def classify_input(wildcards, return_int=False): + """ return the input to the classify step """ if check_config('truth') and check_config(wildcards.sample, place=config['truth']): image_ending = "/{image}.tsv" if check_config('parallel') else '' if check_config('train_all', place=config['truth'][wildcards.sample]): @@ -363,6 +368,7 @@ rule resolve_conflicts: "scripts/resolve_conflicts.py {input.img} {input.labels} {params.predicts} {output}" def predictions(wildcards): + """ return the current predictions """ if check_config('parallel'): return expand(rules.resolve_conflicts.output[0], sample=wildcards.sample) else: diff --git a/config.yml b/config.yml index fb6c7ef..2995a10 100644 --- a/config.yml +++ b/config.yml @@ -23,7 +23,7 @@ sample_file: data/samples.tsv # which samples should we execute the pipeline on? # Comment out this line if you want to run all samples in the sample file -SAMP_NAMES: [test, test2, 62217West1, 62217West2, 62317East2, 62617East3, 62217East1, 62217West2, 62317East2] +#SAMP_NAMES: [region1, region2] # Whether to perform the default strategy or the experimental one # The default strategy performs segmentation on the stitched orthomosaic, while @@ -53,19 +53,19 @@ low_qual_ortho: null # of the data for training and half for testing. If this line is commented out # or set to a falsey value, the truth set will be split in half. Otherwise, # the testing steps will be skipped. -truth: - test: - path: out/test/truth.tsv - train_all: false +#truth: +# region1: +# path: data/region1/truth.tsv +# train_all: false # If you already have a trained model, provide it here. Otherwise, comment out # this line or set it to a falsey value. # If you already have a trained model, any truth sets you provide (see "truth" # above) will be used only for testing. # required! (unless truth sets are provided above) -# model: data/models/test2.rda +model: data/models/test-all-exp.rda # The path to the directory in which to place all of the output files # defined relative to whatever directory you execute the snakemake command in -# required! +# Defaults to 'out' if not provided out: out diff --git a/run.bash b/run.bash index 3adf975..4c45231 100755 --- a/run.bash +++ b/run.bash @@ -8,16 +8,17 @@ #$ -e /dev/null -# first, handle some weird behavior where sge passes the noclobber argument to the script -# this only applies if the script is being executed from qsub on our cluster (like: qsub run.bash) -test "$1" = "noclobber" && shift - # An example bash script demonstrating how to run the entire snakemake pipeline # on an SGE cluster -# This script creates two separate log files: +# This script creates two separate log files in the output dir: # 1) log - the basic snakemake log of completed rules # 2) qlog - a more detailed log of the progress of each rule and any errors +# Before running this snakemake pipeline, remember to complete the config file +# with the required input info. In particular, make sure that you have created +# a samples.tsv file specifying paths to your drone imagery. +# Also, make sure that this script is executed from the directory that it lives in! + # you can specify a directory for all output here: out_path="out" mkdir -p "$out_path" @@ -30,20 +31,22 @@ if [ -f "${out_path}/qlog" ]; then echo ""> "${out_path}/qlog"; fi -# make sure that this script is executed from the directory that it lives in! - -# # also, make sure this script is being executed in the correct snakemake environment! -# if [ "$CONDA_DEFAULT_ENV" != "snakemake" ] && conda info --envs | grep "$CONDA_ROOT/snakemake" &>/dev/null; then -# conda activate snakemake -# echo "Switched to snakemake environment." > "${out_path}/log" -# fi +# handle some weird behavior where sge passes the noclobber argument to the script +# this only applies if the script is being executed from qsub on our cluster (like: qsub run.bash) +test "$1" = "noclobber" && shift -# Before running this snakemake pipeline, remember to complete the config file -# with the required input info. In particular, make sure that you have created -# a samples.tsv file specifying paths to your drone imagery. +# try to find and activate the snakemake conda env if we need it +if ! command -v 'snakemake' &>/dev/null && \ + command -v 'conda' &>/dev/null && \ + [ "$CONDA_DEFAULT_ENV" != "snakemake" ] && \ + conda info --envs | grep "$CONDA_ROOT/snakemake" &>/dev/null; then + echo "Snakemake not detected. Attempting to switch to snakemake environment." >> "$out_path/log" + eval "$(conda shell.bash hook)" + conda activate snakemake +fi -# check: should we execute via qsub? -if [[ $* == *--sge-cluster* ]]; then +# check: are we being executed from within qsub? +if [ "$ENVIRONMENT" = "BATCH" ]; then snakemake \ --cluster "qsub -t 1 -V -S /bin/bash -j y -cwd -o $out_path/qlog" \ --config out="$out_path" \ @@ -51,13 +54,24 @@ if [[ $* == *--sge-cluster* ]]; then --use-conda \ -k \ -j 12 \ - ${@//--sge-cluster/} &>"$out_path/log" + "$@" &>"$out_path/log" else snakemake \ --config out="$out_path" \ --latency-wait 60 \ --use-conda \ -k \ - -j \ - "$@" 2>"$out_path/log" >"$out_path/qlog" + -j 12 \ + "$@" 2>>"$out_path/log" >>"$out_path/qlog" +fi + +# message the user on slack if possible +exit_code="$?" +if command -v 'slack' &>/dev/null; then + if [ "$exit_code" -eq 0 ]; then + slack "flower-mapping pipeline finished successfully" &>/dev/null + else + slack "flower-mapping pipeline exited with error code $exit_code" + fi fi +exit "$exit_code" diff --git a/scripts/README.md b/scripts/README.md index 0a69c63..af323a6 100644 --- a/scripts/README.md +++ b/scripts/README.md @@ -4,5 +4,72 @@ However, you can use most of these scripts on their own, too. Some may even be h All python scripts implement the `--help` argument. For R scripts, you can run `head