Skip to content

Commit

Permalink
Merge pull request #8 from beelabhmc/strat2_docs
Browse files Browse the repository at this point in the history
incorporate new documentation into the repository
  • Loading branch information
aryarm authored Sep 4, 2020
2 parents 897f545 + 044a033 commit ba98999
Show file tree
Hide file tree
Showing 10 changed files with 130 additions and 498 deletions.
19 changes: 11 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,25 @@
[![License](https://img.shields.io/apm/l/vim-mode.svg)](LICENSE)

# flower_map
A pipeline for generating maps of the species of flowers around a bee colony using drone imagery.
A pipeline for generating maps of the species of flowers around a bee colony using drone imagery. The pipeline uses Agisoft Metashape to stitch the drone images together into an orthomosaic, various computer vision algorithms to segment each plant from its background, and a pre-trained random forest classifier to label each plant by its species.

# download
Execute the following commands or download the [latest release](https://github.com/beelabhmc/flower_map/releases/latest) manually.
```
wget -O- -q https://github.com/beelabhmc/flower_map/tarball/master | tar mxvzf -
mv beelabhmc-* flower_map
git clone https://github.com/beelabhmc/flower_map.git
```

# setup
The pipeline is written as a Snakefile which can be executed via [Snakemake](https://snakemake.readthedocs.io). We recommend using at least version 5.18.0:
## dependencies
The pipeline is written as a Snakefile which can be executed via [Snakemake](https://snakemake.readthedocs.io). We recommend using at least version 5.20.1:
```
conda create -n snakemake -c bioconda -c conda-forge 'snakemake>=5.18.0'
conda create -n snakemake -c bioconda -c conda-forge --no-channel-priority 'snakemake>=5.20.1'
```
We highly recommend you install [Snakemake via conda](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html#installation-via-conda) like this so that you can use the `--use-conda` flag when calling `snakemake` to let it [automatically handle all dependencies](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#integrated-package-management) of the pipeline. Otherwise, you must manually install the dependencies listed in the [env files](envs).

## Agisoft Metashape
Create an Agisoft Metashape license file named `metashape.lic` in the same directory as the `run.bash` script. Without this file, the pipeline will attempt to run Metashape unlicensed, which usually fails on import.

# execution
1. Activate snakemake via `conda`:
```
Expand All @@ -31,14 +34,14 @@ We highly recommend you install [Snakemake via conda](https://snakemake.readthed
```
__or__ on an SGE cluster:
```
./run.bash --sge-cluster &
qsub run.bash
```
#### Executing the pipeline on your own data
You must modify [the config.yaml file](config.yml) to specify paths to your data.
You must modify [the config.yaml file](config.yml) to specify paths to your data. See [our wiki](wiki) for more information.
### If this is your first time using Snakemake
We recommend that you run `snakemake --help` to learn about Snakemake's options. For example, to check that the pipeline will be executed correctly before you run it, you can call Snakemake with the `-n -p -r` flags. This is also a good way to familiarize yourself with the steps of the pipeline and their inputs and outputs (the latter of which are inputs to the first rule in the pipeline -- ie the `all` rule).
We recommend that you run `snakemake --help` to read about Snakemake's options. For example, to check that the pipeline will be executed correctly before you run it, you can call Snakemake with the `-n -p -r` flags. This is also a good way to familiarize yourself with the steps of the pipeline and their inputs and outputs (the latter of which are inputs to the first rule in the pipeline -- ie the `all` rule).
Note that Snakemake will not recreate output that it has already generated, unless you request it. If a job fails or is interrupted, subsequent executions of Snakemake will just pick up where it left off. This can also apply to files that *you* create and provide in place of the files it would have generated.
Expand Down
8 changes: 7 additions & 1 deletion Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ def check_config(value, default=False, place=config):
""" return true if config value exists and is true """
return place[value] if (value in place and place[value] is not None) else default

# set the output directory if it isn't set already
config['out'] = check_config('out', default='out')

def exp_str():
""" return the prefix str for the experimental strategy """
return "-exp" if check_config('parallel') else ""
Expand Down Expand Up @@ -112,7 +115,7 @@ rule export_ortho:
"scripts/export_ortho.py {input} {output}"

rule segment:
""" segment plants from an image into high and low confidence regions"""
""" segment plants from an image into high and low confidence regions """
input:
lambda wildcards: SAMP[wildcards.sample]+"/"+wildcards.image+SAMP_EXT[wildcards.sample][0] if check_config('parallel') else rules.export_ortho.output
params:
Expand Down Expand Up @@ -248,6 +251,7 @@ checkpoint create_split_truth_data:
"scripts/create_truth_data.py {params.features} {input.truth} {output}"

def train_input(wildcards):
""" return the input to the training step """
if check_config('truth') and check_config(wildcards.sample, place=config['truth']):
if check_config('train_all', place=config['truth'][wildcards.sample]):
return rules.create_truth_data.output
Expand All @@ -266,6 +270,7 @@ rule train:
"Rscript scripts/classify_train.R {input} {output}"

def classify_input(wildcards, return_int=False):
""" return the input to the classify step """
if check_config('truth') and check_config(wildcards.sample, place=config['truth']):
image_ending = "/{image}.tsv" if check_config('parallel') else ''
if check_config('train_all', place=config['truth'][wildcards.sample]):
Expand Down Expand Up @@ -363,6 +368,7 @@ rule resolve_conflicts:
"scripts/resolve_conflicts.py {input.img} {input.labels} {params.predicts} {output}"

def predictions(wildcards):
""" return the current predictions """
if check_config('parallel'):
return expand(rules.resolve_conflicts.output[0], sample=wildcards.sample)
else:
Expand Down
14 changes: 7 additions & 7 deletions config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ sample_file: data/samples.tsv

# which samples should we execute the pipeline on?
# Comment out this line if you want to run all samples in the sample file
SAMP_NAMES: [test, test2, 62217West1, 62217West2, 62317East2, 62617East3, 62217East1, 62217West2, 62317East2]
#SAMP_NAMES: [region1, region2]

# Whether to perform the default strategy or the experimental one
# The default strategy performs segmentation on the stitched orthomosaic, while
Expand Down Expand Up @@ -53,19 +53,19 @@ low_qual_ortho: null
# of the data for training and half for testing. If this line is commented out
# or set to a falsey value, the truth set will be split in half. Otherwise,
# the testing steps will be skipped.
truth:
test:
path: out/test/truth.tsv
train_all: false
#truth:
# region1:
# path: data/region1/truth.tsv
# train_all: false

# If you already have a trained model, provide it here. Otherwise, comment out
# this line or set it to a falsey value.
# If you already have a trained model, any truth sets you provide (see "truth"
# above) will be used only for testing.
# required! (unless truth sets are provided above)
# model: data/models/test2.rda
model: data/models/test-all-exp.rda

# The path to the directory in which to place all of the output files
# defined relative to whatever directory you execute the snakemake command in
# required!
# Defaults to 'out' if not provided
out: out
54 changes: 34 additions & 20 deletions run.bash
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,17 @@
#$ -e /dev/null


# first, handle some weird behavior where sge passes the noclobber argument to the script
# this only applies if the script is being executed from qsub on our cluster (like: qsub run.bash)
test "$1" = "noclobber" && shift

# An example bash script demonstrating how to run the entire snakemake pipeline
# on an SGE cluster
# This script creates two separate log files:
# This script creates two separate log files in the output dir:
# 1) log - the basic snakemake log of completed rules
# 2) qlog - a more detailed log of the progress of each rule and any errors

# Before running this snakemake pipeline, remember to complete the config file
# with the required input info. In particular, make sure that you have created
# a samples.tsv file specifying paths to your drone imagery.
# Also, make sure that this script is executed from the directory that it lives in!

# you can specify a directory for all output here:
out_path="out"
mkdir -p "$out_path"
Expand All @@ -30,34 +31,47 @@ if [ -f "${out_path}/qlog" ]; then
echo ""> "${out_path}/qlog";
fi

# make sure that this script is executed from the directory that it lives in!

# # also, make sure this script is being executed in the correct snakemake environment!
# if [ "$CONDA_DEFAULT_ENV" != "snakemake" ] && conda info --envs | grep "$CONDA_ROOT/snakemake" &>/dev/null; then
# conda activate snakemake
# echo "Switched to snakemake environment." > "${out_path}/log"
# fi
# handle some weird behavior where sge passes the noclobber argument to the script
# this only applies if the script is being executed from qsub on our cluster (like: qsub run.bash)
test "$1" = "noclobber" && shift

# Before running this snakemake pipeline, remember to complete the config file
# with the required input info. In particular, make sure that you have created
# a samples.tsv file specifying paths to your drone imagery.
# try to find and activate the snakemake conda env if we need it
if ! command -v 'snakemake' &>/dev/null && \
command -v 'conda' &>/dev/null && \
[ "$CONDA_DEFAULT_ENV" != "snakemake" ] && \
conda info --envs | grep "$CONDA_ROOT/snakemake" &>/dev/null; then
echo "Snakemake not detected. Attempting to switch to snakemake environment." >> "$out_path/log"
eval "$(conda shell.bash hook)"
conda activate snakemake
fi

# check: should we execute via qsub?
if [[ $* == *--sge-cluster* ]]; then
# check: are we being executed from within qsub?
if [ "$ENVIRONMENT" = "BATCH" ]; then
snakemake \
--cluster "qsub -t 1 -V -S /bin/bash -j y -cwd -o $out_path/qlog" \
--config out="$out_path" \
--latency-wait 60 \
--use-conda \
-k \
-j 12 \
${@//--sge-cluster/} &>"$out_path/log"
"$@" &>"$out_path/log"
else
snakemake \
--config out="$out_path" \
--latency-wait 60 \
--use-conda \
-k \
-j \
"$@" 2>"$out_path/log" >"$out_path/qlog"
-j 12 \
"$@" 2>>"$out_path/log" >>"$out_path/qlog"
fi

# message the user on slack if possible
exit_code="$?"
if command -v 'slack' &>/dev/null; then
if [ "$exit_code" -eq 0 ]; then
slack "flower-mapping pipeline finished successfully" &>/dev/null
else
slack "flower-mapping pipeline exited with error code $exit_code"
fi
fi
exit "$exit_code"
67 changes: 67 additions & 0 deletions scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,72 @@ However, you can use most of these scripts on their own, too. Some may even be h

All python scripts implement the `--help` argument. For R scripts, you can run `head <script>` to read about their usage.

### [analyze_map.py](analyze_map.py)
A python script to analyze the output of the pipeline and calculate metrics that might be useful for downstream biological applications of our software. This script is __not__, in fact, part of the pipeline.

### [benchmark.py](benchmark.py)
A python script for summarizing the runtime and memory usage of the pipeline based on its benchmark files. This script is __not__, in fact, part of the pipeline.

### [classify_test.R](classify_test.R)
An R script for predicting variants using a trained classifier. It takes as input a model generated by `classify_train.R`.

### [classify_train.R](classify_train.R)
An R script for creating a trained classifier. It takes as input a set of plant segments that have already been labeled by their species.

### [create_truth_data.py](create_truth_data.py)
A python script that splits a set of pre-labeled segments into truth and training sets, for use by `classify_test.R` and `classify_train.R`.

### [export_dem.py](export_dem.py)
A python script that extracts the elevation of each point in an orthomosaic from a Metashape project file. Elevation values are calculated by Metashape's digital elevation model. This script is __not__, in fact, part of the pipeline.

### [export_ortho.py](export_ortho.py)
A python script that exports the orthomosaic from a Metashape project file to a standard image file.

### [extract_coordinates.py](extract_coordinates.py)
A python script that converts orthomosaic pixel coordinates to geographic coordinates using the Metashape project file. It can also be used for extracting the center of each segment in a json segments file (also in geographic coordinates). This script is __not__, in fact, part of the pipeline.

### [extract_features.py](extract_features.py)
A python script for extracting machine learning features for each segmented region in a json segments file.

### [features.py](features.py)
A suite of python functions for calculating features. These functions are used primarily by `extract_features.py`.

### [import_labelme.py](import_labelme.py)
A python module for importing segments from a json labelme file. The functions in this module are used by many other scripts.

### [importance_plot.py](importance_plot.py)
A python script for visualizing the random forest importance of each machine learning feature. This script uses the output of `classify_train.R`. This script is __not__, in fact, part of the pipeline.

### [map.py](map.py)
A python script for visualizing the output of the pipeline via a map.

### [metrics.py](metrics.py)
A python script to calculate scoring metrics to evaluate the performance of the classifier. This script uses the output of `classify_test.R`.

### [prc.py](prc.py)
A python script for creating a precision-recall curve for the classified segments from `classify_test.R`. It uses the output of `statistics.py`.

### [resolve_conflicts.py](resolve_conflicts.py)
A python script for resolving conflicting species labels assigned to the same segments.

### [rev_transform.py](rev_transform.py)
A python script that transforms orthomosaic pixel coordinates to their coordinates in the original drone images.

### [segment.py](segment.py)
A python script that uses computer vision algorithms to identify the location of plants in an image. The script outputs both regions that it is highly confident contain plants and regions that it is less confident about.

### [statistics.py](statistics.py)
A python script that creates the points of a precision-recall curve. This script's output is used by `prc.py`.

### [stitch.py](stitch.py)
A python script that uses Agisoft Metashape to create an orthomosaic from a collection of overlapping drone images. The output of this script is a special Metashape project file, not the orthomosaic as a standard image file.

### [test_util.py](test_util.py)
A python script that can be useful for debugging the segmentation scripts: `segment.py` and `watershed.py`. This script is __not__, in fact, part of the pipeline.

### [transform.py](transform.py)
A python script that transforms pixel coordinates in the original drone iamges to their coordinates in the orthomosaic.

### [watershed.py](watershed.py)
A python script that uses the high and low confidence regions from `segment.py` in the watershed algorithm. It outputs its best guess for the location of each plant as a segments file.

4 changes: 3 additions & 1 deletion scripts/analyze_map.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,12 @@
import argparse

parser = argparse.ArgumentParser(
description="Calculate various propoerties of a map (ie the output of the pipeline)."
description="Calculate various properties of a map (ie the output of the pipeline)."
)

parser.add_argument(
"out", help="the path to a file in which to write the metrics"
)
args = parser.parse_args()

# THIS SCRIPT IS STILL IN DEVELOPMENT
Loading

0 comments on commit ba98999

Please sign in to comment.