Skip to content

Commit

Permalink
Update vignette.
Browse files Browse the repository at this point in the history
  • Loading branch information
Gene233 committed Mar 29, 2024
1 parent 14e5f63 commit edbc9f3
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions vignettes/smartid_Demo.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -114,20 +114,20 @@ idf_iae_methods()
The basic version of TF, IDF and IAE can be termed as:

$\mathbf{TF_{i,j}}=\frac{N_{i,j}}{\sum_j{N_{i,j}}},$
$\mathbf{IDF_i} = log(1+\frac{n}{n_i+1}),$
$\mathbf{IAE_i} = log(1+\frac{n}{\sum_j^n\hat N_{i,j}+1})$
$\mathbf{IDF_i} = \log(1+\frac{n}{n_i+1}),$
$\mathbf{IAE_i} = \log(1+\frac{n}{\sum_j^n\hat N_{i,j}+1})$

Where $N_{i,j}$ is the counts of feature $i$ in cell $j$; $\hat N_{i,j}$ is $max(0,N_{i,j}-threshold)$;
$n$ is the total number of documents(cells); $n_i$ is $\sum_{j = 1}^{n} sign(N_{i,j} > threshold)$.

Here for labeled data, we can choose logTF * IDF_prob * IAE_prob for marker identification:
$$\mathbf{score}=logTF*IDF_{prob}*IAE_{prob}$$
$$\mathbf{score}=\log \mathbf{TF}*\mathbf{IDF}_{prob}*\mathbf{IAE}_{prob}$$

The probability version of IDF can be termed as:
$$\mathbf{IDF_{i,j}} = log(1+\frac{\frac{n_{i,j\in D}}{n_{j\in D}}}{max(\frac{n_{i,j\in \hat D}}{n_{j\in \hat D}})+ e^{-8}}\frac{n_{i,j\in D}}{n_{j\in D}})$$
$$\mathbf{IDF_{i,j}} = \log(1+\frac{\frac{n_{i,j\in D}}{n_{j\in D}}}{max(\frac{n_{i,j\in \hat D}}{n_{j\in \hat D}})+ e^{-8}}\frac{n_{i,j\in D}}{n_{j\in D}})$$

And the probability version of IAE can be termed as:
$$\mathbf{IAE_{i,j}} = log(1+\frac{mean(\hat N_{i,j\in D})}{max(mean(\hat N_{i,j\in \hat D}))+ e^{-8}}*mean(\hat N_{i,j\in D}))$$
$$\mathbf{IAE_{i,j}} = \log(1+\frac{mean(\hat N_{i,j\in D})}{max(mean(\hat N_{i,j\in \hat D}))+ e^{-8}}*mean(\hat N_{i,j\in D}))$$

Where $D$ is the category of cell $j$; $\hat D$ is the category other than $D$.

Expand Down Expand Up @@ -260,11 +260,11 @@ upset(fromList(c(marker_ls, marker_ls_new)), nsets = 6)
While for the unlabeled data, `smartid` also provides the score methods with no need for label information.

Here we choose logTF * IDF_sd * IAE_sd for for gene-set scoring as a use case:
$$\mathbf{score}=logTF*IDF_{sd}*IAE_{sd}$$
$$\mathbf{score}=\log \mathbf{TF}*\mathbf{IDF}_{sd}*\mathbf{IAE}_{sd}$$

Where IDF and IAE can be termed as:
$$\mathbf{IDF_i} = log(1+SD(TF_{i})*\frac{n}{n_i+1})$$
$$\mathbf{IAE_i} = log(1+SD(TF_{i})*\frac{n}{\sum_{j=1}^{n}\hat N_{i,j}+1})$$
$$\mathbf{IDF_i} = \log(1+SD(\mathbf{TF}_{i})*\frac{n}{n_i+1})$$
$$\mathbf{IAE_i} = \log(1+SD(\mathbf{TF}_{i})*\frac{n}{\sum_{j=1}^{n}\hat N_{i,j}+1})$$

## Score Samples

Expand Down

0 comments on commit edbc9f3

Please sign in to comment.