# Simulate synthetic datasets with dyngen

This notebook demonstrates how we simulate datasets using dyngen in R. We generated single-cell datasets with varying lineage complexities across four different complexity levels. For each level, 30 datasets were simulated as replicates.

## Library imports

In [1]:
library(tidyverse)
library(dyngen)
library(anndata)
library(dyno)

“package ‘readr’ was built under R version 4.3.3”
“package ‘forcats’ was built under R version 4.3.3”
── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.4     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.4     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors

Attaching package: ‘anndata’


The following object is masked from

## Constants

In [2]:
scale = c("complexity_1","complexity_2","complexity_3","complexity_4")
names(scale) = c(1,2,4,6)

## Simulation

In [3]:
set.seed(0)
num_repeats <- 30 ## simulate 30 synthetic datasets for each scale

In [None]:
for (num_mod in c(1,2,4,6)){
    print(num_mod)
  for (run in 1:num_repeats){
    print(run)
    backbone <- backbone_branching(
      num_modifications = num_mod,
      min_degree = 3,
      max_degree = 5
    )
    
    config <-
      initialise_model(
        backbone = backbone,
        num_cells = 1000,
        num_tfs = runif(1, min=50, max=80),
        num_targets = 800,
        num_hks = 50,
        download_cache_dir = "~/.cache/dyngen",
        simulation_params = simulation_default(
          census_interval = 2,
          compute_cellwise_grn = TRUE,
          compute_rna_velocity = TRUE
        )
      )
    
    out <- generate_dataset(
      config,
      format = "anndata",
      make_plots = FALSE
    )
    
    dataset <- out$dataset
    model <- out$model
    
    ad <- as_anndata(model)
    ad$write_h5ad(paste0(scale[as.character(num_mod)],"/raw/dataset_sim", run, ".h5ad"))
  }
}