Skip to content

cunningham-lab/neurocaas_cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Bare bones CLI for interacting with NeuroCAAS.

Designed to be as simple as possible, this CLI does the bare minimum necessary to interact with NeuroCAAS. In particular, it has two functions:

  1. Register user info necessary to interact with NeuroCAAS via AWS.
  2. Analyze data located on a local computer with NeuroCAAS.

Note that this is a work in progress. If you run into issues, please open an issue on the issues tab of this page.

Instructions:

  1. Sign up for a NeuroCAAS account, and locate relevant credential information.

If you do not have a NeuroCAAS account, first please sign up for one at this site. Once you have been approved, log in to your account, and go to the "profile" page by clicking your name on the upper right hand side.

On this page, you will have to mark down four important pieces of identifying information. Be careful, as some of this information is sensitive. It's not a good idea to save this information in insecure locations like a remote server. Treat it like an SSH key.

  • AWS Access Key
  • AWS Secret Access Key (take special care with this parameter)
  • S3 Bucket For Datasets and config files
  1. Then, install and configure the AWS CLI and this tool.

Now that you have relevant credential information, let's install some software. First, download the relevant version of the AWS CLI for your operating system (choose from options on the right hand side of linked page). Once it's installed, run the command

aws configure

And input your AWS Access Key and Secret Access Key from the previous step. When prompted for a default region name, enter us-east-1, and default output format should be json.

Next, let's install this tool. First, if you do not have conda, install it here. Make a virtual environment with Python 3.8 (that's what this repo is tested against):

conda create -n neurocaas_cli python=3.8

Then, clone this repo to your local machine. Navigate to it with

cd /path/to/this/repo

And run the following commands:

pip install -e src/
pip install -r requirements.txt

If all went smoothly, you should be able to get documentation for the command line tool by running:

neurocaas-cli 

from the command line. Finally, locate the name of the bucket associated with the analysis you would like to use. If you know what analysis you'd like to use, you can locate the bucket name by checking out the corresponding PipelineName parameter located in its blueprint, here. Locate an analysis, and check for the PipelineName parameter in the stack_config_template.json file located within that analysis's folder.

We can set up the cli to interface with a specific analysis by initializing it:

neurocaas-cli init -b {bucketname} -g {groupprefix}

Where {bucketname} should be replaced by the name of the bucket you found, and {groupprefix} by the parameter you found under S3 Bucket for Datasets and Files in step 1.

  1. Locate data and config files you would like to use.

Once you have installed the necessary tools, you can upload data and configuration files to your NeuroCAAS account. Configuration files should be in yaml format, and datasets are in formats dictated by the developer. See the website landing page for per-analysis specific information.

  1. Upload data and config files to NeuroCAAS

With data located on your computer, you can upload it to NeuroCAAS with:

neurocaas-cli analyze upload-data -d "path/to/your/datafile"

You can upload multiple datasets by passing multiple arguments with the -d parameter.

Likewise, upload configuration files with:

neurocaas-cli analyze upload-data -d "path/to/your/config"
  1. Analyze your data.

You can see what data and configuration files are available for analysis by running the command:

neurocaas-cli analyze list-inputs

Then, from the list of datasets and configuration files available, select which ones you want to analyze. You can analyze multiple datasets with a single config file. Once you know what data you would like to analyze, run the following command, referencing your data and config file:

neurocaas-cli analyze submit-job -d "dataset1" -d "dataset2" -d "dataset3" -c "config file" -r "timestamp"

The last timestamp parameter is optional- it will be autogenerated if you do not provide one.

You can see a list of all jobs you have ever run by running the command:

neurocaas-cli analyze list-results

This list includes ongoing jobs. If you want to retrieve the results of a job (finished or ongoing), you can poll any given job for its logs and outputs.

neurocaas-cli analyze setup-polling -l localpath -rp resultpath -i interval -t timeout

localpath is the location you want to write the results to. resultpath references one of the results given by list-results above. interval and timeout describe the rate of polling and how long it should continue.

Finally, job submission and polling can be combined:

neurocaas-cli analyze submit-and-poll -d "dataset1" -d "dataset2" -d "dataset3" -c "config file" -r "timestamp" -l localpath -rp resultpath -i interval -t timeout

You can run any command with the --help tag for more information.

Ongoing todos:

  • Incorporate Joao's automatic credentialing system.
  • Make this repo a template for others to use.

About

WIP cli for neurocaas.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages