Designed to be as simple as possible, this CLI does the bare minimum necessary to interact with NeuroCAAS. In particular, it has two functions:
- Register user info necessary to interact with NeuroCAAS via AWS.
- Analyze data located on a local computer with NeuroCAAS.
Note that this is a work in progress. If you run into issues, please open an issue on the issues tab of this page.
- Sign up for a NeuroCAAS account, and locate relevant credential information.
If you do not have a NeuroCAAS account, first please sign up for one at this site. Once you have been approved, log in to your account, and go to the "profile" page by clicking your name on the upper right hand side.
On this page, you will have to mark down four important pieces of identifying information. Be careful, as some of this information is sensitive. It's not a good idea to save this information in insecure locations like a remote server. Treat it like an SSH key.
- AWS Access Key
- AWS Secret Access Key (take special care with this parameter)
- S3 Bucket For Datasets and config files
- Then, install and configure the AWS CLI and this tool.
Now that you have relevant credential information, let's install some software. First, download the relevant version of the AWS CLI for your operating system (choose from options on the right hand side of linked page). Once it's installed, run the command
aws configure
And input your AWS Access Key and Secret Access Key from the previous step. When prompted for a default region name, enter us-east-1
, and default output format should be json
.
Next, let's install this tool. First, if you do not have conda, install it here. Make a virtual environment with Python 3.8 (that's what this repo is tested against):
conda create -n neurocaas_cli python=3.8
Then, clone this repo to your local machine. Navigate to it with
cd /path/to/this/repo
And run the following commands:
pip install -e src/
pip install -r requirements.txt
If all went smoothly, you should be able to get documentation for the command line tool by running:
neurocaas-cli
from the command line. Finally, locate the name of the bucket associated with the analysis you would like to use. If you know what analysis you'd like to use, you can locate the bucket name by checking out the corresponding PipelineName
parameter located in its blueprint, here. Locate an analysis, and check for the PipelineName
parameter in the stack_config_template.json
file located within that analysis's folder.
We can set up the cli to interface with a specific analysis by initializing it:
neurocaas-cli init -b {bucketname} -g {groupprefix}
Where {bucketname}
should be replaced by the name of the bucket you found, and {groupprefix}
by the parameter you found under S3 Bucket for Datasets and Files
in step 1.
- Locate data and config files you would like to use.
Once you have installed the necessary tools, you can upload data and configuration files to your NeuroCAAS account. Configuration files should be in yaml format, and datasets are in formats dictated by the developer. See the website landing page for per-analysis specific information.
- Upload data and config files to NeuroCAAS
With data located on your computer, you can upload it to NeuroCAAS with:
neurocaas-cli analyze upload-data -d "path/to/your/datafile"
You can upload multiple datasets by passing multiple arguments with the -d
parameter.
Likewise, upload configuration files with:
neurocaas-cli analyze upload-data -d "path/to/your/config"
- Analyze your data.
You can see what data and configuration files are available for analysis by running the command:
neurocaas-cli analyze list-inputs
Then, from the list of datasets and configuration files available, select which ones you want to analyze. You can analyze multiple datasets with a single config file. Once you know what data you would like to analyze, run the following command, referencing your data and config file:
neurocaas-cli analyze submit-job -d "dataset1" -d "dataset2" -d "dataset3" -c "config file" -r "timestamp"
The last timestamp parameter is optional- it will be autogenerated if you do not provide one.
You can see a list of all jobs you have ever run by running the command:
neurocaas-cli analyze list-results
This list includes ongoing jobs. If you want to retrieve the results of a job (finished or ongoing), you can poll any given job for its logs and outputs.
neurocaas-cli analyze setup-polling -l localpath -rp resultpath -i interval -t timeout
localpath
is the location you want to write the results to. resultpath
references one of the results given by list-results
above. interval
and timeout
describe the rate of polling and how long it should continue.
Finally, job submission and polling can be combined:
neurocaas-cli analyze submit-and-poll -d "dataset1" -d "dataset2" -d "dataset3" -c "config file" -r "timestamp" -l localpath -rp resultpath -i interval -t timeout
You can run any command with the --help
tag for more information.
- Incorporate Joao's automatic credentialing system.
- Make this repo a template for others to use.