Integration with single cell, spatial and in situ analysis
If using an external R installation (may not be necessary on Linux systems).
import os
os.environ['R_HOME'] = r'C:\Program Files\R\R-4.4.1'
import the packages
import os
import cytobulk as ct
import scanpy as sc
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
Here we use the single-cell, spatial, and in situ data from Sarah E. B. Taylor et al. and utilize CytoBulk for integrating the datasets. [Download sc data] [Download st data]
Load ST and sc-RNA seq data
sc_adata = sc.read_h5ad("C:/Users/wangxueying/project/CytoBulk/case/10x/input/sc_adata.h5ad")
sc_ori_adata = sc_adata.copy()
st_adata = sc.read_h5ad("C:/Users/wangxueying/project/CytoBulk/case/10x/input/st_adata_sub_3.h5ad")
The cell type information is stored in sc_adata.obs['cell_type]
sc_adata.obs['cell_type'].value_counts()
Deconvolute ST data with sc-RNA seq as the reference.
If you want to use the pretrained model, please download the folder, extract it, and set the path of the extracted folder as the out_dir parameter. [Download] This will help you skip the training steps.
deconv_result,deconv_adata = ct.tl.st_deconv(st_adata = st_adata,
sc_adata = sc_adata,
annotation_key ="cell_type",
out_dir="C:/Users/wangxueying/project/CytoBulk/case/10x/sub3/demo",
dataset_name="BRCA_10X",
different_source=True,
n_cell=10)
The scaled fraction values (where the sum of each row equals 1) are stored in deconv_result and deconv_h5ad.uns['deconv']. The raw data is stored in the out_dir/out_put folder with the file name project_name_prediction_frac.csv.
deconv_result.iloc[:5, :7]
deconv_adata.uns['deconv'].iloc[:5, :7]
Spatial Visualization of Deconvolution Results
import matplotlib as mpl
i='Stromal'
plot_adata = deconv_adata.copy()
plot_adata.obs=plot_adata.uns['deconv']
mpl.rcdefaults()
fig=sc.pl.spatial(
plot_adata,
color=i,
img_key=None,
alpha=1,
color_map='mako',
size=1.3,
title=f'BRCA_10X predicted {i}',
spot_size=150,
return_fig=True
)
Mapping scRNA-seq to ST data
Read the cell count data for each spot(optional).
If the cell count information is not known beforehand, you can use ct.tl.predict_cell_num to segment HE images using Cellpose, or let the program perform an internal estimation. Here we use the cell count data collected from paper of Sarah E. B. Taylor et al.
cell_num = pd.read_csv(r"C:\Users\wangxueying\project\CytoBulk\case\10x\sub3\cell_num_3.csv",index_col=0)
deconv_adata=deconv_adata[cell_num.index,:]
cell_num=cell_num.loc[deconv_adata.obs_names,:]
deconv_adata.obsm['cell_num'] = cell_num
deconv_adata.uns['deconv'] = deconv_adata.uns['deconv'].loc[deconv_adata.obs_names,:]
deconv_adata.obsm['cell_num'].head(5)
Mapping
reconstructed_cell, reconstructed_adata = ct.tl.st_mapping(st_adata = deconv_adata,
sc_adata = sc_ori_adata,
out_dir="C:/Users/wangxueying/project/CytoBulk/case/10x/sub3/demo",
project="BRCA_10X",
annotation_key='cell_type')
The matching relationship between cells and spots is stored in reconstructed_cell. For single cells mapped to the same spot, we aggregated their gene expression, and the aggregated expression is stored in reconstructed_adata. The original ST expression data is stored in reconstructed_adata.layers['original_st'].
reconstructed_cell.head(5)
Check the pearson correlation between original expression data and reconstructed expression data.
from scipy.stats import pearsonr
import numpy as np
correlations = []
for spot in reconstructed_adata.obs_names:
if spot in reconstructed_adata.obs_names:
expr1 = reconstructed_adata[spot, :].X[0]
expr2 = reconstructed_adata.layers['original_st'][reconstructed_adata.obs_names.get_loc(spot), :]
#r, p = pearsonr(expr1.X[0], expr2.X[0])
r, p = pearsonr(expr1, expr2)
correlations.append((spot, r, p))
correlations_df = pd.DataFrame(correlations, columns=['spot', 'Pearson R', 'pvalue'])
correlations_df['X']=reconstructed_adata.obsm['spatial'][:,0]
correlations_df['Y']=reconstructed_adata.obsm['spatial'][:,1]
correlations_df.set_index('spot',inplace=True)
reconstructed_adata.obs=correlations_df
np.mean(correlations_df['Pearson R'])
Visualization of correlation result
import matplotlib.pyplot as plt
fig=sc.pl.spatial(
reconstructed_adata,
color='Pearson R',
img_key=None,
alpha=0.8,
size=2,
title=f'Reconstructed BRCA 10X \nmean Pearson correlation = 0.692\ngene number = 18,082',
return_fig=True,
add_outline=False,
frameon=False,
spot_size=.8,
library_id='CytAssist_FFPE_Human_Breast_Cancer',
color_map="mako_r"
)