From edbc9f3a44b7beaae93c830a73ee73d1e01fe3a2 Mon Sep 17 00:00:00 2001 From: Gene233 Date: Fri, 29 Mar 2024 18:08:19 +1100 Subject: [PATCH] Update vignette. --- vignettes/smartid_Demo.Rmd | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/vignettes/smartid_Demo.Rmd b/vignettes/smartid_Demo.Rmd index c499504..a3e96ca 100644 --- a/vignettes/smartid_Demo.Rmd +++ b/vignettes/smartid_Demo.Rmd @@ -114,20 +114,20 @@ idf_iae_methods() The basic version of TF, IDF and IAE can be termed as: $\mathbf{TF_{i,j}}=\frac{N_{i,j}}{\sum_j{N_{i,j}}},$ -$\mathbf{IDF_i} = log(1+\frac{n}{n_i+1}),$ -$\mathbf{IAE_i} = log(1+\frac{n}{\sum_j^n\hat N_{i,j}+1})$ +$\mathbf{IDF_i} = \log(1+\frac{n}{n_i+1}),$ +$\mathbf{IAE_i} = \log(1+\frac{n}{\sum_j^n\hat N_{i,j}+1})$ Where $N_{i,j}$ is the counts of feature $i$ in cell $j$; $\hat N_{i,j}$ is $max(0,N_{i,j}-threshold)$; $n$ is the total number of documents(cells); $n_i$ is $\sum_{j = 1}^{n} sign(N_{i,j} > threshold)$. Here for labeled data, we can choose logTF * IDF_prob * IAE_prob for marker identification: -$$\mathbf{score}=logTF*IDF_{prob}*IAE_{prob}$$ +$$\mathbf{score}=\log \mathbf{TF}*\mathbf{IDF}_{prob}*\mathbf{IAE}_{prob}$$ The probability version of IDF can be termed as: -$$\mathbf{IDF_{i,j}} = log(1+\frac{\frac{n_{i,j\in D}}{n_{j\in D}}}{max(\frac{n_{i,j\in \hat D}}{n_{j\in \hat D}})+ e^{-8}}\frac{n_{i,j\in D}}{n_{j\in D}})$$ +$$\mathbf{IDF_{i,j}} = \log(1+\frac{\frac{n_{i,j\in D}}{n_{j\in D}}}{max(\frac{n_{i,j\in \hat D}}{n_{j\in \hat D}})+ e^{-8}}\frac{n_{i,j\in D}}{n_{j\in D}})$$ And the probability version of IAE can be termed as: -$$\mathbf{IAE_{i,j}} = log(1+\frac{mean(\hat N_{i,j\in D})}{max(mean(\hat N_{i,j\in \hat D}))+ e^{-8}}*mean(\hat N_{i,j\in D}))$$ +$$\mathbf{IAE_{i,j}} = \log(1+\frac{mean(\hat N_{i,j\in D})}{max(mean(\hat N_{i,j\in \hat D}))+ e^{-8}}*mean(\hat N_{i,j\in D}))$$ Where $D$ is the category of cell $j$; $\hat D$ is the category other than $D$. @@ -260,11 +260,11 @@ upset(fromList(c(marker_ls, marker_ls_new)), nsets = 6) While for the unlabeled data, `smartid` also provides the score methods with no need for label information. Here we choose logTF * IDF_sd * IAE_sd for for gene-set scoring as a use case: -$$\mathbf{score}=logTF*IDF_{sd}*IAE_{sd}$$ +$$\mathbf{score}=\log \mathbf{TF}*\mathbf{IDF}_{sd}*\mathbf{IAE}_{sd}$$ Where IDF and IAE can be termed as: -$$\mathbf{IDF_i} = log(1+SD(TF_{i})*\frac{n}{n_i+1})$$ -$$\mathbf{IAE_i} = log(1+SD(TF_{i})*\frac{n}{\sum_{j=1}^{n}\hat N_{i,j}+1})$$ +$$\mathbf{IDF_i} = \log(1+SD(\mathbf{TF}_{i})*\frac{n}{n_i+1})$$ +$$\mathbf{IAE_i} = \log(1+SD(\mathbf{TF}_{i})*\frac{n}{\sum_{j=1}^{n}\hat N_{i,j}+1})$$ ## Score Samples