Skip to content

Commit

Permalink
Update vignette equations.
Browse files Browse the repository at this point in the history
  • Loading branch information
Gene233 committed Mar 28, 2024
1 parent ff2cfe9 commit fcba431
Showing 1 changed file with 4 additions and 3 deletions.
7 changes: 4 additions & 3 deletions vignettes/smartid_Demo.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -115,19 +115,20 @@ The basic version of TF, IDF and IAE can be termed as:

$\mathbf{TF_{i,j}}=\frac{N_{i,j}}{\sum_j{N_{i,j}}},$
$\mathbf{IDF_i} = log(1+\frac{n}{n_i+1}),$
$\mathbf{IAE_i} = log(1+\frac{n}{\hat N_{i,j}+1})$
$\mathbf{IAE_i} = log(1+\frac{n}{\sum_j^n\hat N_{i,j}+1})$

Where $N_{i,j}$ is the counts of feature $i$ in cell $j$; $\hat N_{i,j}$ is $max(0,N_{i,j}-threshold)$;
$n$ is the total number of documents(cells); $n_i$ is $\sum_{j = 1}^{n} sign(N_{i,j} > threshold)$.

Here for labeled data, we can choose logTF * IDF_prob * IAE_prob for marker identification:
$\mathbf{score}=logTF*IDF_{prob}*IAE_{prob}$
$$\mathbf{score}=logTF*IDF_{prob}*IAE_{prob}$$

The probability version of IDF can be termed as:
$\mathbf{IDF_{i,j}} = log(1+\frac{\frac{n_{i,j\in D}}{n_{j\in D}}}{max(\frac{n_{i,j\in \hat D}}{n_{j\in \hat D}})+ e^{-8}}\frac{n_{i,j\in D}}{n_{j\in D}})$

And the probability version of IAE can be termed as:
$\mathbf{IAE_{i,j}} = log(1+\frac{mean(N_{i,j\in D})}{max(mean(N_{i,j\in \hat D}))+ e^{-8}}*mean(N_{i,j\in D}))$

Where $D$ is the category of cell $j$; $\hat D$ is the category other than $D$.

TF here stands for gene frequency, which is similar to CPM, while IDF represents the inverse cell/sample frequency for scRNA-seq data, and IAE is the inverse average expression of each gene across all cells or cells in each labeled group.
Expand Down Expand Up @@ -255,7 +256,7 @@ upset(fromList(c(marker_ls, marker_ls_new)), nsets = 6)
While for the unlabeled data, `smartid` also provides the score methods with no need for label information.

Here we choose logTF * IDF_sd * IAE_sd for for gene-set scoring as a use case:
$\mathbf{score}=logTF*IDF_{sd}*IAE_{sd}$
$$\mathbf{score}=logTF*IDF_{sd}*IAE_{sd}$$

Where $\mathbf{IDF_i} = log(1+SD(TF_{i})*\frac{n}{n_i+1})$,
$\mathbf{IAE_i} = log(1+SD(TF_{i})*\frac{n}{\sum_{j=1}^{n}N_{i,j}+1})$
Expand Down

0 comments on commit fcba431

Please sign in to comment.