Skip to content

Commit

Permalink
Update vignette.
Browse files Browse the repository at this point in the history
  • Loading branch information
Gene233 committed Mar 29, 2024
1 parent 27f0ddc commit 0b611b6
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions vignettes/smartid_Demo.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ data_sim

## Score Samples

The first step is to score all samples/cells by using specified approach. The score can be composed of 3 terms: TF (term/feature frequency), IDF (inverse document/cell frequency) and IAE (inverse average expression of features). Each term has a couple of available choices with different formats to suit labeled or un-labeled data. Users can use function `idf_iae_methods()` to see available methods for IDF/IAE term. More details of each term can be seen in help page of each function, e.g. `?smartid:::idf`.
The first step is to score all samples/cells by using specified approach. The score can be composed of 3 terms: TF (term/feature frequency), IDF (inverse document/cell frequency) and IAE (inverse average expression of features). Each term has a couple of available choices with different formats to suit labeled or un-labeled data. Users can use function `idf_iae_methods()` to see available methods for IDF/IAE term. More details of each term can be seen in help page of each function, e.g. `?idf`.

```{r}
## show available methods
Expand All @@ -117,17 +117,17 @@ $\mathbf{TF_{i,j}}=\frac{N_{i,j}}{\sum_j{N_{i,j}}},$
$\mathbf{IDF_i} = \log(1+\frac{n}{n_i+1}),$
$\mathbf{IAE_i} = \log(1+\frac{n}{\sum_j^n\hat N_{i,j}+1})$

Where $N_{i,j}$ is the counts of feature $i$ in cell $j$; $\hat N_{i,j}$ is $max(0,N_{i,j}-threshold)$;
$n$ is the total number of documents(cells); $n_i$ is $\sum_{j = 1}^{n} sign(N_{i,j} > threshold)$.
Where $N_{i,j}$ is the counts of feature $i$ in cell $j$; $\hat N_{i,j}$ is $\max(0,N_{i,j}-threshold)$;
$n$ is the total number of documents(cells); $n_i$ is $\sum_{j = 1}^{n} \mathrm{sign}(N_{i,j} > \mathrm{threshold})$.

Here for labeled data, we can choose logTF * IDF_prob * IAE_prob for marker identification:
$$\mathbf{score}=\log \mathbf{TF}*\mathbf{IDF}_{prob}*\mathbf{IAE}_{prob}$$

The probability version of IDF can be termed as:
$$\mathbf{IDF_{i,j}} = \log(1+\frac{\frac{n_{i,j\in D}}{n_{j\in D}}}{max(\frac{n_{i,j\in \hat D}}{n_{j\in \hat D}})+ e^{-8}}\frac{n_{i,j\in D}}{n_{j\in D}})$$
$$\mathbf{IDF_{i,j}} = \log(1+\frac{\frac{n_{i,j\in D}}{n_{j\in D}}}{\max(\frac{n_{i,j\in \hat D}}{n_{j\in \hat D}})+ e^{-8}}\frac{n_{i,j\in D}}{n_{j\in D}})$$

And the probability version of IAE can be termed as:
$$\mathbf{IAE_{i,j}} = \log(1+\frac{mean(\hat N_{i,j\in D})}{max(mean(\hat N_{i,j\in \hat D}))+ e^{-8}}*mean(\hat N_{i,j\in D}))$$
$$\mathbf{IAE_{i,j}} = \log(1+\frac{\mathrm{mean}(\hat N_{i,j\in D})}{\max(\mathrm{mean}(\hat N_{i,j\in \hat D}))+ e^{-8}}*\mathrm{mean}(\hat N_{i,j\in D}))$$

Where $D$ is the category of cell $j$; $\hat D$ is the category other than $D$.

Expand Down Expand Up @@ -263,8 +263,8 @@ Here we choose logTF * IDF_sd * IAE_sd for for gene-set scoring as a use case:
$$\mathbf{score}=\log \mathbf{TF}*\mathbf{IDF}_{sd}*\mathbf{IAE}_{sd}$$

Where IDF and IAE can be termed as:
$$\mathbf{IDF_i} = \log(1+SD(\mathbf{TF}_{i})*\frac{n}{n_i+1})$$
$$\mathbf{IAE_i} = \log(1+SD(\mathbf{TF}_{i})*\frac{n}{\sum_{j=1}^{n}\hat N_{i,j}+1})$$
$$\mathbf{IDF_i} = \log(1+\mathrm{SD}(\mathbf{TF}_{i})*\frac{n}{n_i+1})$$
$$\mathbf{IAE_i} = \log(1+\mathrm{SD}(\mathbf{TF}_{i})*\frac{n}{\sum_{j=1}^{n}\hat N_{i,j}+1})$$

## Score Samples

Expand Down

0 comments on commit 0b611b6

Please sign in to comment.