Simulate synthetic datasets with dyngen

Contents

Simulate synthetic datasets with dyngen#

This notebook demonstrates how we simulate datasets using dyngen in R. We generated single-cell datasets with varying lineage complexities across four different complexity levels. For each level, 30 datasets were simulated as replicates.

Library imports#

library(tidyverse)
library(dyngen)
library(anndata)
library(dyno)

Warning message:
“package ‘readr’ was built under R version 4.3.3”
Warning message:
“package ‘forcats’ was built under R version 4.3.3”
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Attaching package: ‘anndata’


The following object is masked from ‘package:readr’:

    read_csv


Loading required package: dynfeature

Loading required package: dynguidelines

Loading required package: dynmethods

Loading required package: dynplot

Loading required package: dynwrap

Constants#

scale = c("complexity_1","complexity_2","complexity_3","complexity_4")
names(scale) = c(1,2,4,6)

Simulation#

set.seed(0)
num_repeats <- 30 ## simulate 30 synthetic datasets for each scale

for (num_mod in c(1,2,4,6)){
    print(num_mod)
  for (run in 1:num_repeats){
    print(run)
    backbone <- backbone_branching(
      num_modifications = num_mod,
      min_degree = 3,
      max_degree = 5
    )
    
    config <-
      initialise_model(
        backbone = backbone,
        num_cells = 1000,
        num_tfs = runif(1, min=50, max=80),
        num_targets = 800,
        num_hks = 50,
        download_cache_dir = "~/.cache/dyngen",
        simulation_params = simulation_default(
          census_interval = 2,
          compute_cellwise_grn = TRUE,
          compute_rna_velocity = TRUE
        )
      )
    
    out <- generate_dataset(
      config,
      format = "anndata",
      make_plots = FALSE
    )
    
    dataset <- out$dataset
    model <- out$model
    
    ad <- as_anndata(model)
    ad$write_h5ad(paste0(scale[as.character(num_mod)],"/raw/dataset_sim", run, ".h5ad"))
  }
}