Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ad_map data: var ('cell_type') added after the training in the merfish data frame, but it's 'NA' #127

Open
KunHHE opened this issue Dec 31, 2024 · 4 comments

Comments

@KunHHE
Copy link

KunHHE commented Dec 31, 2024

Hi, I tried to run Tangram for cell type annotation. But don't know how Tangram works for cell type projection from reference data.
I checked the ad_map, there's a var ('cell_type') added after the training in the merfish data frame, but it's 'NA' under this 'cell_type' column. Can I ask you how to get the cell type annotation for my merfish data for each cluster? Thansk very much

comb_adata; merfish data
adata_sc; reference

tg.pp_adatas(adata_sc, comb_adata, genes=None);

assert "training_genes" in adata_sc.uns
assert "training_genes" in comb_adata.uns
print(f"Number of training_genes: {len(adata_sc.uns['training_genes'])}");

ad_map = tg.map_cells_to_space(
adata_sc,
comb_adata,
mode="cells",
cluster_label='leiden',
density_prior='rna_count_based',
num_epochs=100,
device='cpu',
);

tg.project_cell_annotations(ad_map, comb_adata, annotation="cell_type")
annotation_list = list(pd.unique(adata_sc.obs['cell_type']))
tg.plot_cell_annotation_sc(comb_adata, annotation_list,perc=0.02,spot_size=50);

Then check ad_map: AnnData object with n_obs × n_vars = 79667 × 4515
obs: 'age', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'n_genes', 'n_counts', 'clust_annot', 'organism_ontology_term_id', 'sex_ontology_term_id', 'suspension_type', 'cell_type_ontology_term_id', 'assay_ontology_term_id', 'tissue_ontology_term_id', 'disease_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'development_stage_ontology_term_id', 'donor_id', 'is_primary_data', 'cell_type_annot', 'tissue_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'observation_joinid', 'leiden'
var: 'region', 'slide', 'cell_id', 'area', 'sample_id', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_10_genes', 'pct_counts_in_top_20_genes', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_150_genes', 'n_counts', 'leiden', 'uniform_density', 'rna_count_based_density', 'cell_type'
uns: 'train_genes_df', 'training_history'

@wakelin-g
Copy link

Your cell type annotations need to be in a column of your adata_sc.obs (single cell data). Based on your code, I assume they are in the adata.obs['leiden'] column.

You should not have cluster_label = leiden in your call to map_cells_to_space. This is only used if you are doing mode = cluster, not mode = cells. Instead, you would put annotation = leiden in project_cell_annotations. This will add your cell type annotations to comb_adata.obsm['tangram_ct_pred'].

Also, if you are analysing MERFISH data, you should be using density_prior = uniform instead of density_prior = rna_count_based.

You should revise your code to be something closer to this (red=delete, blue=add):

tg.pp_adatas(adata_sc, comb_adata, genes=None);

assert "training_genes" in adata_sc.uns
assert "training_genes" in comb_adata.uns
print(f"Number of training_genes: {len(adata_sc.uns['training_genes'])}");

ad_map = tg.map_cells_to_space(
adata_sc,
comb_adata,
mode="cells",
- cluster_label='leiden',
- density_prior='rna_count_based',
+ density_prior='uniform',
num_epochs=100,
device='cpu',
);

- tg.project_cell_annotations(ad_map, comb_adata, annotation="cell_type")
+ tg.project_cell_annotations(ad_map, comb_adata, annotation="leiden")
- annotation_list = list(pd.unique(adata_sc.obs['cell_type']))
+ annotation_list = list(pd.unique(adata_sc.obs['leiden']))
tg.plot_cell_annotation_sc(comb_adata, annotation_list, perc=0.02, spot_size=50);

See the Tangram Jupyter notebooks for more detailed information. FYI, 100 epochs will likely not be sufficient for convergence (probably try ~500 or so).

@KunHHE
Copy link
Author

KunHHE commented Jan 3, 2025

Thanks @wakelin-g. "if you are analysing MERFISH data, you should be using density_prior = uniform instead of density_prior = rna_count_based." Can you clarify to me, MERFISH/Xenium datasets will use density_prior = uniform, then how about sequencing based spatial datasets like Visium (HD), Slide-seq? Thanks very much! I will revise the script and re-run it, and keep you updated.

@wakelin-g
Copy link

Thanks @wakelin-g. "if you are analysing MERFISH data, you should be using density_prior = uniform instead of density_prior = rna_count_based." Can you clarify to me, MERFISH/Xenium datasets will use density_prior = uniform, then how about sequencing based spatial datasets like Visium (HD), Slide-seq? Thanks very much! I will revise the script and re-run it, and keep you updated.

For technologies which have single-cell resolution (MERFISH, Xenium, Visium HD, ..., etc.), use uniform. For technologies where multiple cells are likely contained within a single spatial element (i.e., Visium non-HD), use rna_count_based.

Again, you should take a look at the notebook which explains what density_prior actually does and what the different options mean.

@KunHHE
Copy link
Author

KunHHE commented Jan 4, 2025

Dear @wakelin-g, I did a new tets using the code from your guidance. If I use 'leiden', then I can see the plot_cell_annotation_sc only show the leiden cluster number (the figure below), there should be cell type labeing directly? I think I still should use annotation="cell_type" when tg.project_cell_annotations? Thanks so much!

image

If I change back to 'cell_type', it looks correct?
image

Then the predicted cell type is in the obsm.tangram_ct_pred
image

Can I ask you my original leiden clusters in adata1 are 24, but in the tangram_ct_pred only 12 clusters, Can I double check with you: This is because the reference only has those annotated 12 cell types, thus it can't match the original leiden clusters?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants