RNA velocity in spermatogenesis#

RNA velocity analysis with the EM model using data preprocessed with velocyto. The required data can be downloaded here from FigShare and is expected to be in data/spermatogenesis/raw/, i.e.,

mkdir -p ../../data/spermatogenesis/raw/
wget https://figshare.com/ndownloader/files/53395004 -O ../../data/spermatogenesis/raw/adata.h5ad

Library imports#

import anndata as ad
import scanpy as sc
import scvelo as scv

from crp.core import DATA_DIR
sc.logging.print_version_and_date()
Running Scanpy 1.10.4, on 2025-04-03 08:26.

General settings#

# set verbosity levels
sc.settings.verbosity = 2
scv.settings.verbosity = 3
scv.settings.set_figure_params("scvelo", dpi_save=400, dpi=80, transparent=True, fontsize=20, color_map="viridis")
scv.settings.plot_prefix = ""
SAVE_DATA = True
if SAVE_DATA:
    (DATA_DIR / "spermatogenesis" / "processed").mkdir(exist_ok=True)

Data loading#

adata = ad.io.read_h5ad(DATA_DIR / "spermatogenesis" / "raw" / "velocyto.h5ad")
adata
AnnData object with n_obs × n_vars = 1829 × 54144
    obs: 'cell_index', 'clusters_coarse', 'clusters'
    layers: 'spliced', 'unspliced'

Data pre-processing#

scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000, log=False)
sc.pp.log1p(adata)

adata
Filtered out 46556 genes that are detected 20 counts (shared).
Normalized count data: X, spliced, unspliced.
Extracted 2000 highly variable genes.
AnnData object with n_obs × n_vars = 1829 × 2000
    obs: 'cell_index', 'clusters_coarse', 'clusters', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts'
    var: 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
    uns: 'log1p'
    layers: 'spliced', 'unspliced'
sc.tl.pca(adata)
sc.pp.neighbors(adata, n_pcs=30, n_neighbors=30)
computing PCA
    with n_comps=50
    finished (0:00:01)
computing neighbors
    using 'X_pca' with n_pcs = 30
    finished (0:00:02)
scv.pp.moments(adata, n_pcs=None, n_neighbors=None)
computing moments based on connectivities
    finished (0:00:00) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)

Model fitting#

scv.tl.recover_dynamics(adata, var_names="all", n_jobs=10)
scv.tl.velocity(adata, mode="dynamical")
recovering dynamics (using 10/14 cores)
    finished (0:00:24) --> added 
    'fit_pars', fitted parameters for splicing dynamics (adata.var)
computing velocities
    finished (0:00:00) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)

Save results#

if SAVE_DATA:
    adata.write_h5ad(DATA_DIR / "spermatogenesis" / "processed" / "adata.h5ad")