Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config files handling #22

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions SINGE_Example.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
%% Simple example that runs SCINGE for two replicates of two hyperparameter settings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of this file changed? If you delete the old SCINGE_Example.m in the same commit where you add this file, git may recognize that you renamed the file. That would make comparing the new and old version easier.

Please convert the name here as well

clear all;
close all;
clc;
if ~isdeployed
addpath(genpath('.'));
end

%% Import list of parameter combinations
fid = fopen('SINGE_params.cfg');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should support specifying the config file names as a command line argument. This could be the default name or we could require both types of config files to be provided before running. That way we can compile this file one and let users run on different datasets without needing to recompile.

temp = fgetl(fid);
while any(temp~=-1)||isempty(temp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much error handling do we need here? I believe that parseParams.m will provide default values for most of the parameters. Are the data files the only required configuration file values?

if ~isempty(temp)
temp = strsplit(temp);
pid = str2num(temp{1});
pname = temp{2};
pval = temp{3};
if isnumeric(str2num(pval))&&~isempty(str2num(pval))
pval = str2num(pval);
else
ind = find(pval=='''');
pval(ind) = [];
end
param_list{pid}.(pname) = pval;
end
temp = fgetl(fid);
end
%% Specify Path to Input data and path to Output folder, gene_list and number of subsampled replicates
fid = fopen('SINGE_IO.cfg');
temp = fgetl(fid);
while any(temp~=-1)||isempty(temp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above regarding error handling and required config file values.

if ~isempty(temp)
temp = strsplit(temp);
pname = temp{1};
pval = temp{2};
if isnumeric(str2num(pval))&&~isempty(str2num(pval))
pval = str2num(pval);
else
ind = find(pval=='''');
pval(ind) = [];
end
IO.(pname) = pval;
end
temp = fgetl(fid);
end
%% Run SINGE
[ranked_edges, gene_influence] = SINGE(IO.gene_list,IO.Data,IO.outdir,IO.num_replicates,param_list);
4 changes: 4 additions & 0 deletions SINGE_IO.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Data data1/X_SCODE_data.mat
outdir Output
num_replicates 2
Copy link
Member

@agitter agitter May 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this file SINGE_config.cfg, SINGE_options.cfg, or SINGE_settings.cfg? Most of it is I/O related, except the number of replicates.

Before merging, we'll need to update the readme to describe this file and how to run SINGE with it. Is this space separated or whitespace-separated? Tab-separated may be safest so we can support Windows file paths.

gene_list data1/tf.mat
19 changes: 19 additions & 0 deletions SINGE_config.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
1 ID 541
atuldeshpande marked this conversation as resolved.
Show resolved Hide resolved
1 lambda 0.01
1 dT 10
1 num_lags 5
1 kernel_width 2
1 prob_zero_removal 0
1 prob_remove_samples 0.2
1 family 'gaussian'
1 date '01/31/2019'

2 ID 542
2 lambda 0.01
2 dT 5
2 num_lags 9
2 kernel_width 4
2 prob_zero_removal 0.2
2 prob_remove_samples 0.1
2 family 'gaussian'
2 date '31-Jan-2019'
19 changes: 19 additions & 0 deletions SINGE_params.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
1 ID 541
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this file tab-separated as well? I see multiple spaces instead of tabs.

We will also document this file in the readme.

1 lambda 0.01
1 dT 10
1 num_lags 5
1 kernel_width 2
1 prob_zero_removal 0
1 prob_remove_samples 0.2
1 family 'gaussian'
1 date '01/31/2019'

2 ID 542
2 lambda 0.01
2 dT 5
2 num_lags 9
2 kernel_width 4
2 prob_zero_removal 0.2
2 prob_remove_samples 0.1
2 family 'gaussian'
2 date '31-Jan-2019'
42 changes: 42 additions & 0 deletions code/SINGE.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
function [ranked_edges, gene_influence] = SCINGE(gene_list,Data,outdir,num_replicates,param_list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the old SCINGE.m if it is no longer needed. How much of this file changed? Just the name?

% [ranked_edges, gene_influence] = SCINGE(gene_list,Data,outdir,num_replicates,param_list)
% Standalone SCINGE implementation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename SCINGE here and below.

% Inputs:
% gene_list = N x 1 cell array with list of relevant genes in the data set
% Data = string representing the path of mat file containing the expression
% data corresponding to above gene_list in the form of cell array X.
% outdir = directory path to store individual GLG test results before Borda
% aggregation
% num_replicates = number of subsampled replicates (global SCINGE parameter)
% param_list = list of hyperparameter combinations for individual GLG tests
% Outputs:
% ranked_edges = ranked list of gene interactions with corresponding SCINGE scores
% gene_influence = ranked lists of regulators (genes) with corresponding SCINGE influence
SINGE_version = '0.1.0';
display(SINGE_version);
for rep = 1:num_replicates
for ii = 1:length(param_list)
GLG_Instance(Data,'lambda',param_list{ii}.lambda,'dT',param_list{ii}.dT,'num_lags',param_list{ii}.num_lags,'kernel_width',param_list{ii}.kernel_width,'prob_zero_removal',param_list{ii}.prob_zero_removal,'replicate',rep,'ID',param_list{ii}.ID,'outdir',outdir,'family',param_list{ii}.family,'prob_remove_samples',param_list{ii}.prob_remove_samples,'date',param_list{ii}.date);
end
end
Str = Data;
Str(Str=='.') = 'p';

lind = max(max(strfind(Str,filesep)),0);
mind = length(Str);
if isempty(mind)||(mind<lind)
mind = length(Str);
end
Str = Str(lind+1:mind);
Agg = Modified_Borda_Aggregation(Str,outdir);
load(gene_list);
ranked_edges = adjmatrix2edgelist(Agg,gene_list);
[influence,ind] = sort(sum(Agg,2),'descend');
gene_influence = [cell2table(gene_list(ind)) array2table(influence)];
gene_influence.Properties.VariableNames{1} = 'Gene_Name';
ranked_edgesw = ranked_edges;
ranked_edgesw.SCINGE_Score = floor(ranked_edgesw.SCINGE_Score*10^5)/10^5;
gene_influencew = gene_influence;
gene_influencew.influence = floor(gene_influencew.influence*10^5)/10^5;
writetable(ranked_edgesw,fullfile(outdir,'SCINGE_Ranked_Edge_List.txt'),'WriteVariableNames',true,'WriteRowNames',false,'Delimiter','\t');
writetable(gene_influencew,fullfile(outdir,'SCINGE_Gene_Influence.txt'),'WriteVariableNames',true,'WriteRowNames',false,'Delimiter','\t');
Binary file modified data1/tf.mat
Binary file not shown.