This repository has been archived by the owner on Aug 25, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
2017.12.12
DMTR edited this page Dec 12, 2017
·
6 revisions
Good morning! Today I will try to finish the script for Piano.
Progress (written as it goes, so bear with me):
- Started writing the script for Piano (using my limited R skills - though I feel like I'm getting better, which is good).
- There are some complications though:
- I found out that there are duplicates regarding the gene symbols after the processed DESeq2 files had been translated from Ensembl gene ID to gene symbols.
- This is problematic because then it is not simple to load them as a dataframe in R, because it cannot have more than one row with the same row names.
- I have checked their (most importantly) p-values and they seem to be the same, and since Piano requires only the p-value for each gene, I discarded all the duplicates and just retain one copy of them.
- Another error is that there are rows with 'NA' values, as these cannot be proccesed anyway, they were also discarded.
- Additionally, for some unknown reason, I keep getting the error message which I suspect because some gene symbols have extra characters that are not in the gene set files. I will try to remove these: from xx6jas.12 to xx6jas.
gsaRes <- runGSA(HS_stat, gsc=go, verbose=TRUE)
Running gene set analysis:
Checking arguments...done!
Final gene/gene-set association: 0 genes and 0 gene sets
Details:
Calculating gene set statistics from 0 out of 16076 gene-level statistics
Using all 16076 gene-level statistics for significance estimation
Removed 15578 genes from GSC due to lack of matching gene statistics
Removed 4436 gene sets containing no genes after gene removal
Removed additionally 0 gene sets not matching the size limits
Loaded additional information for 0 gene sets
Calculating gene set statistics...
Show Traceback
**Error in gsc[[iGeneSet]] : subscript out of bounds**
- Ok, now it finally works. Lesson of the day: do not blindly follow the instruction manual. They are already the gene symbols, it was superfluous to insert the string "gene" preceding the actual gene symbol. Another thing that I found was unnecessary was removing the extra bits in the gene name.
names(HS_stat) <- paste("gene", HS$gene_symbol, sep="") # Nope!
names(HS_stat) <- paste(HS$gene_symbol) # Ja!
- Now that the script has been tested to work on one DESeq2 output, I need to modify it to make it more 'modular', i.e. just requires the input of the processed DESeq2 file path instead of hardcoding everything. I'm not sure if I can do this with my limited knowledge in R, but since time is of the essence (the group is meeting tomorrow to discuss all the results!), I will hardcode it anyway. If I have the time, then I will try to make it work (maybe by making it into a function?).
- In the meantime, I will convert this to an Rscript and send it to Uppmax for faster processing before taking my lunch break.
- I got the results, but they lack the directionality of the pathway changes. After asking Arif, he asked me to check whether I have put the log2FoldChange and some other parameters in the function - which I had not. So here we go again.
gsaRes <- runGSA(HS_stat, fc, gsc=kegg, geneSetStat="reporter",
signifMethod="nullDist", nPerm=1000, verbose=TRUE)
- Got the results. Now going to fix my script for Limma. It was weird to get a adjusted p-val in the magnitude of 10-68 (otherwise would be happy), but in this case I realised that I forgot to take into account the biological replicates within the cohort.
This makes me hungry. (C)2017-2018 • DMTR13
- Description
- Scripts
- Sample Results
-
Poster^
^Requires Canvas access