-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
223 lines (153 loc) · 6.3 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
---
title: Peaklist Annotator and Browser
output:
github_document:
toc: true
toc_depth: 3
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE, message=FALSE}
library(knitr)
knitr::opts_chunk$set(
collapse = TRUE
)
```
The purpose of this package is to help identification in metabolomics by:
1. annotating an XCMS (or other) peaklist with the compounds from the compound
databases
2. displaying an interactive browseable peaktable with the annotated compounds
in nested tables
This package is in an early alpha stage. It will need to integrate better with
[CompoundDb](https://github.com/EuracBiomedicalResearch) and require less
"manual" data wrangling.
Will also be better when the database files can be made avaialable directly.
<br>
<br>
# Get compound tables from databases
We use the new package [CompoundDb](https://github.com/EuracBiomedicalResearch)
```{r, cache=FALSE, warning=FALSE, message=FALSE}
library(dplyr)
library(PeakABro)
library(CompoundDb)
library(shinyBS) # for some reason explicit loading is required.
```
```{r, cache=TRUE, warning=FALSE}
# Read the HMDB SDF file.
# You need to download this from the HMDB website.
hmdb_tab <- compound_tbl_sdf("inst/extdata/HMDB.sdf")
hmdb_meta <- make_metadata(source = "HMDB",
url = "http://www.hmdb.ca",
source_version = "4.0",
source_date = "2017-09-10",
organism = "Hsapiens"
)
hmdb_db <- createCompDb(hmdb_tab, metadata = hmdb_meta, path = tempdir())
```
Now we can craete our own package with this data and install it.
You only need to do this once.
```{r, cache=TRUE, warning=FALSE, message=FALSE}
createCompDbPackage(
hmdb_db,
version = "0.0.1",
author = "Jan Stanstrup",
path = tempdir(),
maintainer = "Jan Stanstrup <stanstrup@gmail.com>"
)
library(devtools)
install_local(paste0(tempdir(),"/CompDb.Hsapiens.HMDB.4.0"))
```
Now we can load this package and get the data.
```{r, cache=TRUE, warning=FALSE}
library(CompDb.Hsapiens.HMDB.4.0)
HMDB_tbl <- compounds(CompDb.Hsapiens.HMDB.4.0, return.type = "tibble")
HMDB_tbl %>%
slice(1:3) %>%
kable
```
This takes rather long because the databases are quite large.
Therefore I try to supply pre-parsed data.
So far only LipidBlast is available.
This will be moved to a seperate package shortly
(also there seems to be a bug in LipidBlast ATM that causes wrong info).
```{r, cache=TRUE}
lipidblast_tbl <- readRDS(system.file("extdata", "lipidblast_tbl.rds", package="PeakABro"))
```
```{r, cache=TRUE}
lipidblast_tbl %>% slice(1:3) %>% kable
```
<br>
<br>
# Expand adducts
Lets use only HMDB for now.
```{r, cache=FALSE, warning=FALSE}
cmp_tbl_exp_pos <- expand_adducts(HMDB_tbl, mode = "pos", adducts = c("[M+H]+", "[M+Na]+", "[2M+H]+", "[M+K]+", "[M+H-H2O]+"))
cmp_tbl_exp_pos %>% slice(1:3) %>% kable
```
<br>
<br>
# Annotate peaklist
We ultimately want to use the compound table in an interactive browser so lets remove some redundant info and take only one mode.
```{r, warning=FALSE}
cmp_tbl_exp_pos <- cmp_tbl_exp_pos %>%
filter(mode=="pos") %>% # if you generated also neg above
select(-charge, -nmol, -mode)
```
Next we load a sample peaklist. I have removed the data columns in this sample.
```{r, warning=FALSE}
library(readr)
peaklist <- read_tsv(system.file("extdata", "peaklist_pos.tsv", package="PeakABro"))
peaklist %>% slice(1:3) %>% kable
```
Now we can annotate the table.
The idea here is that each row will have a nested table with annotations from the compound table.
```{r, warning=FALSE}
library(purrr)
peaklist_anno <- peaklist %>% mutate(anno = map(mz, cmp_mz_filter, cmp_tbl_exp_pos, ppm=30))
```
How the peaktable looks like this:
```{r}
peaklist_anno %>% select(-mzmin, -mzmax, -rtmin, -rtmax, -npeaks)
```
And one of the nested tables look like this:
```{r}
peaklist_anno$anno[[1]] %>% slice(1:3) %>% kable
```
<br>
<br>
# Interactive Browser
## Prepare the table for interactive browser
Before we are ready to explore the peaklist interactively there are a few things we need to do and some optional things to fix:
```{r}
library(tidyr)
peaklist_anno_nest <- peaklist_anno %>%
mutate(rt=round(rt/60,2), mz = round(mz,4)) %>% # peaklist rt in minutes and round
select(mz, rt, isotopes, adduct, anno, pcgroup) %>% # get only relevant info
mutate(anno = map(anno,~ mutate(..1, mz = round(mz, 4), mass = round(mass, 4), ppm = round(ppm,0)))) %>% # round values in annotation tables
nest(-pcgroup, .key = "features") %>% # this is required! We nest the tables by the pcgroup
mutate(avg_rt = map_dbl(features,~round(mean(..1$rt),2))) %>% # extract average mass for each pcgroup
select(pcgroup, avg_rt, features) # putting the nested table last. ATM needed for the browser to work
```
We are almost there but first we want to add some magic to the table for the interaction (adds + button and View button).
```{r}
peaklist_anno_nest_ready <- peaklist_browser_prep(peaklist_anno_nest, collapse_col = "features", modal_col = "anno")
```
<br>
<br>
## Interactively browse peaklist
Now we can start the browser!
You need to use a proper browser like chrome. Not the RStudio browser.
```{r, eval=FALSE}
peaklist_browser(peaklist_anno_nest_ready, collapse_col = "features", modal_col = "anno")
```
![Peaklist Browser](inst/peaklist_browser.gif)
<br>
<br>
# Sources and licenses
* **LipidBlast**: Downloaded from MassBank of North America (MoNA) http://mona.fiehnlab.ucdavis.edu/downloads under the CC BY 4 license.
* **LipidMaps**: Downloaded from http://www.lipidmaps.org/resources/downloads
No data included in this package due to licensing issues.
<br>
<br>
# Journal References
* Kind T, Liu K-H, Lee DY, DeFelice B, Meissen JK, Fiehn O. LipidBlast in silico tandem mass spectrometry database for lipid identification. Nat. Methods. 2013 Aug;10(8):755–8. http://dx.doi.org/10.1038/nmeth.2551
* Dennis EA, Brown HA, Deems RA, Glass CK, Merrill AH, Murphy RC, et al. The LIPID MAPS Approach to Lipidomics. Funct. Lipidomics. CRC Press; 2005. p. 1–15. http://dx.doi.org/10.1201/9781420027655.ch1