Out-of-core Analysis

Out-of-core Analysis#

With the ever-growing size of datasets, being able to operate on larger-than-memory data is every important. Both scanpy (CPU) and rapids_singlecell (GPU) support full out-of-core, parallelizable analysis steps. The ad.experimental.read_elem_as_dask enables users to lazily load numeric matrix elements of the AnnData, such as X or obsm["X_pca"]. In this notebook, we’ll generate a PCA on an 100,000,000 cells from various cell lines with different treatments before visualizing a subset of the data. Downstream tasks could include anything from deep-learning to a linear classifier.

For individual tutorials, see here for rapids-singlecell and here for scanpy

from pathlib import Path

import numpy as np
import dask.distributed as dd
import scanpy as sc
import anndata as ad
import h5py
import dask

from collections import Counter
import pandas as pd
from tqdm import tqdm
import dask
import time
from dask.distributed import Client, LocalCluster

import dask

sc.logging.print_header()

/home/icb/weixu.wang/miniconda3/envs/rapids_singlecell/lib/python3.12/site-packages/session_info2/__init__.py:124: UserWarning: The '__version__' attribute is deprecated and will be removed in MarkupSafe 3.1. Use feature detection, or `importlib.metadata.version("markupsafe")`, instead.
  and (v := getattr(pkg, "__version__", None))
/tmp/ipykernel_632503/582836007.py:19: RuntimeWarning: Failed to import dependencies for application/vnd.jupyter.widget-view+json representation. (ModuleNotFoundError: No module named 'ipywidgets')
  sc.logging.print_header()

Package	Version
numpy	1.26.4
dask	2024.11.2
scanpy	1.11.0rc2
anndata	0.11.0rc3
h5py	3.12.1
pandas	2.2.3
tqdm	4.67.1
distributed	2024.11.2
Component	Info
Python	3.12.8 \| packaged by conda-forge \| (main, Dec 5 2024, 14:24:40) [GCC 13.3.0]
OS	Linux-5.14.0-427.37.1.el9_4.x86_64-x86_64-with-glibc2.34
CPU	128 logical CPU cores, x86_64
GPU	ID: 0, NVIDIA A100 80GB PCIe, Driver: 560.35.03, Memory: 81920 MiB
Updated	2025-03-05 16:10

Dependencies

Dependency	Version
scikit-learn	1.5.1
scipy	1.14.1
zstandard	0.23.0
Pygments	2.19.1
sparse	0.15.5
prompt_toolkit	3.0.50
wcwidth	0.2.13
torch	2.6.0 (2.6.0+cu124)
setproctitle	1.3.5
typing_extensions	4.12.2
igraph	0.11.8
comm	0.2.2
crc32c	2.7.1
asciitree	0.3.3
louvain	0.8.2
ipykernel	6.29.5
zarr	2.18.3
asttokens	3.0.0
tblib	3.0.0
cycler	0.12.1
numba	0.60.0
numcodecs	0.15.1
msgpack	1.1.0
pyarrow	17.0.0
lz4	4.3.3
jedi	0.19.2
zict	3.0.0
toolz	1.0.0 (toolz: 1.0.0, tlz: 1.0.1)
jaraco.collections	5.1.0
jaraco.text	3.12.1
stack_data	0.6.3
rapids-dask-dependency	24.12.0a0
pickleshare	0.7.5
tornado	6.4.2
texttable	1.7.0
pytz	2024.1
pyparsing	3.2.1
joblib	1.4.2
jupyter_client	8.6.3
pynvml	11.4.1
omegaconf	2.3.0
leidenalg	0.10.2
cupy	13.3.0
cuda-python	12.6.0
traitlets	5.14.3
platformdirs	4.3.6
rmm	24.12.1 (24.12.01)
debugpy	1.8.12
psutil	6.1.1
more-itertools	10.6.0
natsort	8.4.0
legacy-api-wrap	1.4.1
pure_eval	0.2.3
Deprecated	1.2.18
threadpoolctl	3.5.0
pyzmq	26.2.1
executing	2.1.0
PyYAML	6.0.2
colorama	0.4.6
session-info2	0.1.2
llvmlite	0.43.0
parso	0.8.4
locket	1.0.0
packaging	24.2
fastrlock	0.8.3
wrapt	1.17.2
jupyter_core	5.7.2
jaraco.functools	4.0.1
sortedcontainers	2.4.0
setuptools	75.8.0
click	8.1.8
cytoolz	1.0.1
ipython	8.32.0
jaraco.context	5.3.0
Jinja2	3.1.5
cloudpickle	3.1.1
decorator	5.1.1
defusedxml	0.7.1
MarkupSafe	3.0.2
kiwisolver	1.4.8
pillow	11.1.0
six	1.17.0
matplotlib	3.10.0
python-dateutil	2.9.0.post0

Copyable Markdown

| Package     | Version   |
| ----------- | --------- |
| numpy       | 1.26.4    |
| dask        | 2024.11.2 |
| scanpy      | 1.11.0rc2 |
| anndata     | 0.11.0rc3 |
| h5py        | 3.12.1    |
| pandas      | 2.2.3     |
| tqdm        | 4.67.1    |
| distributed | 2024.11.2 |

| Dependency             | Version                          |
| ---------------------- | -------------------------------- |
| scikit-learn           | 1.5.1                            |
| scipy                  | 1.14.1                           |
| zstandard              | 0.23.0                           |
| Pygments               | 2.19.1                           |
| sparse                 | 0.15.5                           |
| prompt_toolkit         | 3.0.50                           |
| wcwidth                | 0.2.13                           |
| torch                  | 2.6.0 (2.6.0+cu124)              |
| setproctitle           | 1.3.5                            |
| typing_extensions      | 4.12.2                           |
| igraph                 | 0.11.8                           |
| comm                   | 0.2.2                            |
| crc32c                 | 2.7.1                            |
| asciitree              | 0.3.3                            |
| louvain                | 0.8.2                            |
| ipykernel              | 6.29.5                           |
| zarr                   | 2.18.3                           |
| asttokens              | 3.0.0                            |
| tblib                  | 3.0.0                            |
| cycler                 | 0.12.1                           |
| numba                  | 0.60.0                           |
| numcodecs              | 0.15.1                           |
| msgpack                | 1.1.0                            |
| pyarrow                | 17.0.0                           |
| lz4                    | 4.3.3                            |
| jedi                   | 0.19.2                           |
| zict                   | 3.0.0                            |
| toolz                  | 1.0.0 (toolz: 1.0.0, tlz: 1.0.1) |
| jaraco.collections     | 5.1.0                            |
| jaraco.text            | 3.12.1                           |
| stack_data             | 0.6.3                            |
| rapids-dask-dependency | 24.12.0a0                        |
| pickleshare            | 0.7.5                            |
| tornado                | 6.4.2                            |
| texttable              | 1.7.0                            |
| pytz                   | 2024.1                           |
| pyparsing              | 3.2.1                            |
| joblib                 | 1.4.2                            |
| jupyter_client         | 8.6.3                            |
| pynvml                 | 11.4.1                           |
| omegaconf              | 2.3.0                            |
| leidenalg              | 0.10.2                           |
| cupy                   | 13.3.0                           |
| cuda-python            | 12.6.0                           |
| traitlets              | 5.14.3                           |
| platformdirs           | 4.3.6                            |
| rmm                    | 24.12.1 (24.12.01)               |
| debugpy                | 1.8.12                           |
| psutil                 | 6.1.1                            |
| more-itertools         | 10.6.0                           |
| natsort                | 8.4.0                            |
| legacy-api-wrap        | 1.4.1                            |
| pure_eval              | 0.2.3                            |
| Deprecated             | 1.2.18                           |
| threadpoolctl          | 3.5.0                            |
| pyzmq                  | 26.2.1                           |
| executing              | 2.1.0                            |
| PyYAML                 | 6.0.2                            |
| colorama               | 0.4.6                            |
| session-info2          | 0.1.2                            |
| llvmlite               | 0.43.0                           |
| parso                  | 0.8.4                            |
| locket                 | 1.0.0                            |
| packaging              | 24.2                             |
| fastrlock              | 0.8.3                            |
| wrapt                  | 1.17.2                           |
| jupyter_core           | 5.7.2                            |
| jaraco.functools       | 4.0.1                            |
| sortedcontainers       | 2.4.0                            |
| setuptools             | 75.8.0                           |
| click                  | 8.1.8                            |
| cytoolz                | 1.0.1                            |
| ipython                | 8.32.0                           |
| jaraco.context         | 5.3.0                            |
| Jinja2                 | 3.1.5                            |
| cloudpickle            | 3.1.1                            |
| decorator              | 5.1.1                            |
| defusedxml             | 0.7.1                            |
| MarkupSafe             | 3.0.2                            |
| kiwisolver             | 1.4.8                            |
| pillow                 | 11.1.0                           |
| six                    | 1.17.0                           |
| matplotlib             | 3.10.0                           |
| python-dateutil        | 2.9.0.post0                      |

| Component | Info                                                                          |
| --------- | ----------------------------------------------------------------------------- |
| Python    | 3.12.8 | packaged by conda-forge | (main, Dec  5 2024, 14:24:40) [GCC 13.3.0] |
| OS        | Linux-5.14.0-427.37.1.el9_4.x86_64-x86_64-with-glibc2.34                      |
| CPU       | 128 logical CPU cores, x86_64                                                 |
| GPU       | ID: 0, NVIDIA A100 80GB PCIe, Driver: 560.35.03, Memory: 81920 MiB            |
| Updated   | 2025-03-05 16:10                                                              |

Technical Notes#

To turn on one or the other of GPU or CPU, use the flag below. If you are using a GPU, configure the gpus flag accordingly to the number of GPUs you have available. Similarly, for the CPU implementation, you can configure the number of workers. We ran this notebook using CPU with a slurm configuration like srun ... -c 32 --mem=300gb, using 16 dask workers and the below sparse chunks size while we were only able to get one GPU, although the beauty of dask is that one is enough to complete the analysis thanks to the out-of-core capabilities. You can of course use fewer CPU cores and this notebook will still be runnable, such is the beauty of truly lazy, chunked computation. Nothing is loaded into memory until it is needed, and only the amount needed to perform the local computation on a chunk of the data.

The interplay between memory and workers is delicate - too many workers and not enough memory will cause your calculation to crash, so often it is good to have a few fewer workers than you think would be best. For example, because a single GPU has somewhere in the range of 32GB, that is more than the individual 16 workers can have who split the 300GB, thereby allowing for a larger chunk size. Furthermore with rapids-memory-management, memory concerns are not so overriding on the GPU.

Caution: Out-of-core support is still quite new in both packages, so we are learning more and more daily about performance optimizations. The information here about performance thus represents our best attempt to provide advice on performance. On only one GPU, preprocessing can be quite lengthy, and therefore it is advisable to use rapids only with more than one GPU (be sure to update the gpus variable accordingly to reflect this - it should be something like '0,1,2,3', for example, for four GPUs). Otherwise, we have observed that the single-GPU PCA is faster than the multi-core PCA (~12.5 min vs. ~18 min), but the time needed to bring the data back to the CPU from the GPU for writing causes the two to be comparable i.e., PCA + writing the results has the same time between one GPU and multi-core CPU with the above specs.

use_gpu = False

if use_gpu:
    import rapids_singlecell as rsc
    SPARSE_CHUNK_SIZE = 1_000_000
    from dask_cuda import LocalCUDACluster

    import rmm
    import cupy as cp
    from cupyx.scipy import spx

    from rmm.allocators.cupy import rmm_cupy_allocator

    def set_mem():
        rmm.reinitialize(managed_memory=True)
        cp.cuda.set_allocator(rmm_cupy_allocator)
    gpus = "0" # comma separated like "0,1,2" for 3 gpus
    cluster = LocalCUDACluster(CUDA_VISIBLE_DEVICES=gpus)
    dask.array.register_chunk_type(spx.csr_matrix)
else:

    SPARSE_CHUNK_SIZE = 100_000
    cluster = LocalCluster(n_workers=16)

client = Client(cluster)
if use_gpu:
    client.run(set_mem)
    mod = rsc
else:
    mod = sc

client

Client

Client-7912c966-f9dc-11ef-a6b7-00001049fe80

Connection method: Cluster object	Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

LocalCluster

3e2f9434

Dashboard: http://127.0.0.1:8787/status	Workers: 16
Total threads: 16	Total memory: 240.00 GiB
Status: running	Using processes: True

Scheduler Info

Scheduler

Scheduler-dcc86f43-8418-4c2b-bec4-f820e7038b52

Comm: tcp://127.0.0.1:8477	Workers: 16
Dashboard: http://127.0.0.1:8787/status	Total threads: 16
Started: Just now	Total memory: 240.00 GiB

Workers

Worker: 0

Comm: tcp://127.0.0.1:8483	Total threads: 1
Dashboard: http://127.0.0.1:8265/status	Memory: 15.00 GiB
Nanny: tcp://127.0.0.1:8285
Local directory: /tmp/dask-scratch-space/worker-ft3iwcso

Worker: 1

Comm: tcp://127.0.0.1:8101	Total threads: 1
Dashboard: http://127.0.0.1:8487/status	Memory: 15.00 GiB
Nanny: tcp://127.0.0.1:8481
Local directory: /tmp/dask-scratch-space/worker-2hvdphk9

Worker: 2

Comm: tcp://127.0.0.1:8363	Total threads: 1
Dashboard: http://127.0.0.1:8061/status	Memory: 15.00 GiB
Nanny: tcp://127.0.0.1:8233
Local directory: /tmp/dask-scratch-space/worker-k0dybc4h

Worker: 3

Comm: tcp://127.0.0.1:8155	Total threads: 1
Dashboard: http://127.0.0.1:8485/status	Memory: 15.00 GiB
Nanny: tcp://127.0.0.1:8499
Local directory: /tmp/dask-scratch-space/worker-gn_0piah

Worker: 4

Comm: tcp://127.0.0.1:8259	Total threads: 1
Dashboard: http://127.0.0.1:8215/status	Memory: 15.00 GiB
Nanny: tcp://127.0.0.1:8025
Local directory: /tmp/dask-scratch-space/worker-dwdoyzsm

Worker: 5

Comm: tcp://127.0.0.1:8081	Total threads: 1
Dashboard: http://127.0.0.1:8497/status	Memory: 15.00 GiB
Nanny: tcp://127.0.0.1:8059
Local directory: /tmp/dask-scratch-space/worker-uqt168qj

Worker: 6

Comm: tcp://127.0.0.1:8301	Total threads: 1
Dashboard: http://127.0.0.1:8231/status	Memory: 15.00 GiB
Nanny: tcp://127.0.0.1:8255
Local directory: /tmp/dask-scratch-space/worker-ldeqv_q_

Worker: 7

Comm: tcp://127.0.0.1:8121	Total threads: 1
Dashboard: http://127.0.0.1:8293/status	Memory: 15.00 GiB
Nanny: tcp://127.0.0.1:8185
Local directory: /tmp/dask-scratch-space/worker-5kv9x6bx

Worker: 8

Comm: tcp://127.0.0.1:8235	Total threads: 1
Dashboard: http://127.0.0.1:8055/status	Memory: 15.00 GiB
Nanny: tcp://127.0.0.1:8109
Local directory: /tmp/dask-scratch-space/worker-3zbts9vu

Worker: 9

Comm: tcp://127.0.0.1:8191	Total threads: 1
Dashboard: http://127.0.0.1:8267/status	Memory: 15.00 GiB
Nanny: tcp://127.0.0.1:8373
Local directory: /tmp/dask-scratch-space/worker-jx4zbf7k

Worker: 10

Comm: tcp://127.0.0.1:8375	Total threads: 1
Dashboard: http://127.0.0.1:8445/status	Memory: 15.00 GiB
Nanny: tcp://127.0.0.1:8237
Local directory: /tmp/dask-scratch-space/worker-u5oh644i

Worker: 11

Comm: tcp://127.0.0.1:8189	Total threads: 1
Dashboard: http://127.0.0.1:8069/status	Memory: 15.00 GiB
Nanny: tcp://127.0.0.1:8261
Local directory: /tmp/dask-scratch-space/worker-rxe7zfzv

Worker: 12

Comm: tcp://127.0.0.1:8361	Total threads: 1
Dashboard: http://127.0.0.1:8129/status	Memory: 15.00 GiB
Nanny: tcp://127.0.0.1:8309
Local directory: /tmp/dask-scratch-space/worker-gc99zj5u

Worker: 13

Comm: tcp://127.0.0.1:8461	Total threads: 1
Dashboard: http://127.0.0.1:8357/status	Memory: 15.00 GiB
Nanny: tcp://127.0.0.1:8219
Local directory: /tmp/dask-scratch-space/worker-w8q0t2gp

Worker: 14

Comm: tcp://127.0.0.1:8433	Total threads: 1
Dashboard: http://127.0.0.1:8263/status	Memory: 15.00 GiB
Nanny: tcp://127.0.0.1:8257
Local directory: /tmp/dask-scratch-space/worker-1la7lk5m

Worker: 15

Comm: tcp://127.0.0.1:8243	Total threads: 1
Dashboard: http://127.0.0.1:8307/status	Memory: 15.00 GiB
Nanny: tcp://127.0.0.1:8067
Local directory: /tmp/dask-scratch-space/worker-k9rna9hc

Preprocessing#

First, let’s do some standard preprocessing on the datasets. Note that we use ad.experimental.read_elem_as_dask to load the big data, the cell x gene matrix, lazily. The preprocessing is then done lazily i.e., normalization, log1p, and highly variable gene selection. Individually, we’ll do this and then merge all the datasets into one large dataset to be used in the next step. We’ll finally merge the data into one large AnnData for the PCA. Writing this out and then reading it back in is faster than rechunking a virtually concatenated i.e., anndata.concat dataset (dask requires uniform chunks, which would not be possible without rechunking).

%%time
adatas = []
all_highly_variable_genes = []
for i in tqdm(range(14)):
    id = str(i + 1)
    PATH = f"/lustre/groups/ml01/workspace/weixu.wang/100m/plate{id}_filtered.h5ad"

    with h5py.File(PATH, "r") as f:
        adata = ad.AnnData(
            obs=ad.io.read_elem(f["obs"]),
            var=ad.io.read_elem(f["var"]),
        )
        adata.X = ad.experimental.read_elem_as_dask(
            f["X"], chunks=(SPARSE_CHUNK_SIZE, adata.shape[1])
        )

    if use_gpu:
        rsc.get.anndata_to_GPU(adata)

    # 100m filtering
    pass_filter_mask = adata.obs["pass_filter"] == "full"
    adata = adata[pass_filter_mask, :].copy()

    sc.pp.normalize_total(adata)
    sc.pp.log1p(adata)
    sc.pp.highly_variable_genes(adata, n_top_genes=2000)

    highly_variable_genes = set(adata.var_names[adata.var["highly_variable"]])
    all_highly_variable_genes.append(highly_variable_genes)
    adatas.append(adata)

## select the genes appears more than two plates
gene_counts = Counter(gene for genes in all_highly_variable_genes for gene in genes)
selected_genes = {gene for gene, count in gene_counts.items() if count > 2}

for i in tqdm(range(14)):
    id = str(i + 1)

    common_genes = list(set(adata.var_names) & selected_genes)
    adata = adatas[i][:, common_genes].copy()

    output_path = f"/lustre/groups/ml01/workspace/ilan.gold/100m/plate{id}_filtered_preprocessed_{'gpu' if use_gpu else 'cpu'}.h5ad"
    adata.write_h5ad(output_path)

Merge all data into giant file#

We’ll now merge the data into one large file for the PCA. This is faster than rechunking a virtually concatenated i.e., anndata.concat dataset (dask requires uniform chunks, which would not be possible without rechunking). Furthermore, rechunking can cause a memory blow-up.

%%time
data_dict = {}
for i in range(14):
    data_dict.update({f"plate_{i+1}": f"/lustre/groups/ml01/workspace/ilan.gold/100m/plate{i+1}_filtered_preprocessed_{'gpu' if use_gpu else 'cpu'}.h5ad"})
ad.experimental.concat_on_disk(
    data_dict,
    f'/lustre/groups/ml01/workspace/ilan.gold/100m/plate_merged_{"gpu" if use_gpu else "cpu"}.h5ad',
    label='plate',
)

Calculate cell embeddings#

To do this we will use PCA. Both scanpy and rapids_singlecell provide out-of-core implementations of PCA on sparse datasets.

%%time
with h5py.File(f"/lustre/groups/ml01/workspace/ilan.gold/100m/plate_merged_{'gpu' if use_gpu else 'cpu'}.h5ad", "r") as f:
    adata = ad.AnnData(
        obs=ad.io.read_elem(f["obs"]),
        var=ad.io.read_elem(f["var"]),
    )
    adata.X = ad.experimental.read_elem_as_dask(
        f["X"], chunks=(SPARSE_CHUNK_SIZE, adata.shape[1])
    )

CPU times: user 2min 13s, sys: 59.2 s, total: 3min 12s
Wall time: 3min 40s

adata

AnnData object with n_obs × n_vars = 95624334 × 2304
    obs: 'sample', 'species', 'gene_count', 'tscp_count', 'mread_count', 'bc1_wind', 'bc2_wind', 'bc3_wind', 'bc1_well', 'bc2_well', 'bc3_well', 'id', 'drugname_drugconc', 'drug', 'INT_ID', 'NUM.SNPS', 'NUM.READS', 'demuxlet_call', 'BEST.GUESS', 'BEST.LLK', 'NEXT.GUESS', 'NEXT.LLK', 'DIFF.LLK.BEST.NEXT', 'BEST.POSTERIOR', 'SNG.POSTERIOR', 'cell_line', 'SNG.BEST.LLK', 'SNG.NEXT.GUESS', 'SNG.NEXT.LLK', 'SNG.ONLY.POSTERIOR', 'DBL.BEST.GUESS', 'DBL.BEST.LLK', 'DIFF.LLK.SNG.DBL', 'sublibrary', 'BARCODE', 'pcnt_mito', 'S_score', 'G2M_score', 'phase', 'cell_line_orig', 'pass_filter', 'cell_name', 'plate'
    obsm: 'X_pca'

%%time
if use_gpu:
    rsc.get.anndata_to_GPU(adata)

CPU times: user 3 μs, sys: 1e+03 ns, total: 4 μs
Wall time: 29.3 μs

%%time
mod.pp.pca(adata, n_comps=300, mask_var=None)
if use_gpu:
    adata.obsm["X_pca"] = adata.obsm["X_pca"].map_blocks(
        lambda x: x.get(), meta=np.array([]), dtype=adata.obsm["X_pca"].dtype
    ) # bring back to CPU

CPU times: user 4min, sys: 26.1 s, total: 4min 26s
Wall time: 17min 11s

%%time
adata.obsm["X_pca"].to_zarr(f"/lustre/groups/ml01/workspace/ilan.gold/100m/plate_merged_pca_{'gpu' if use_gpu else 'cpu'}.zarr")

CPU times: user 2min 48s, sys: 19.3 s, total: 3min 7s
Wall time: 15min 14s

Now the data can be read back in with dask if needed and put into the anndata object. A slight more complex way to write the data, but perhaps more anndata-ic would have been to use write_elem: https://anndata.readthedocs.io/en/latest/generated/anndata.io.write_elem.html

%%time
adata.obsm["X_pca"] = dask.array.from_zarr(f"/lustre/groups/ml01/workspace/ilan.gold/100m/plate_merged_pca_{'gpu' if use_gpu else 'cpu'}.zarr")

CPU times: user 196 ms, sys: 70.1 ms, total: 266 ms
Wall time: 1.23 s

Visualization#

Briefly, we visualize a subset of the results to avoid meaningless overplotting. However, the full PCA results are still available on disk and are ready for use downstream, whether in deep-learning applications or for simple linear classifiers.

%%time
subset_adata = adata[np.random.randint(0, adata.shape[0], (1_000_000,))]
subset_adata

CPU times: user 1.56 s, sys: 676 ms, total: 2.24 s
Wall time: 3.35 s

View of AnnData object with n_obs × n_vars = 1000000 × 2304
    obs: 'sample', 'species', 'gene_count', 'tscp_count', 'mread_count', 'bc1_wind', 'bc2_wind', 'bc3_wind', 'bc1_well', 'bc2_well', 'bc3_well', 'id', 'drugname_drugconc', 'drug', 'INT_ID', 'NUM.SNPS', 'NUM.READS', 'demuxlet_call', 'BEST.GUESS', 'BEST.LLK', 'NEXT.GUESS', 'NEXT.LLK', 'DIFF.LLK.BEST.NEXT', 'BEST.POSTERIOR', 'SNG.POSTERIOR', 'cell_line', 'SNG.BEST.LLK', 'SNG.NEXT.GUESS', 'SNG.NEXT.LLK', 'SNG.ONLY.POSTERIOR', 'DBL.BEST.GUESS', 'DBL.BEST.LLK', 'DIFF.LLK.SNG.DBL', 'sublibrary', 'BARCODE', 'pcnt_mito', 'S_score', 'G2M_score', 'phase', 'cell_line_orig', 'pass_filter', 'cell_name', 'plate'
    obsm: 'X_pca'

%%time
if use_gpu:
    subset_adata.obsm["X_pca"] = subset_adata.obsm["X_pca"].map_blocks(
        spx.csr_matrix, spx.csr_matrix(np.array([])), dtype=subset_adata.obsm["X_pca"].dtype
    ) # bring back to GPU
else:
    subset_adata.obsm["X_pca"] = subset_adata.obsm["X_pca"].compute() 

/ictstr01/home/icb/ilan.gold/test_ken/venv/lib64/python3.12/site-packages/anndata/_core/index.py:173: PerformanceWarning: Slicing with an out-of-order index is generating 1044 times more chunks
  return a[subset_idx]
/ictstr01/home/icb/ilan.gold/test_ken/venv/lib64/python3.12/site-packages/distributed/client.py:3357: UserWarning: Sending large graph of size 49.22 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
  warnings.warn(
<timed exec>:1: ImplicitModificationWarning: Setting element `.obsm['X_pca']` of view, initializing view as actual.
/ictstr01/home/icb/ilan.gold/test_ken/venv/lib64/python3.12/site-packages/anndata/_core/anndata.py:1756: UserWarning: Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
  utils.warn_names_duplicates("obs")

CPU times: user 9min 20s, sys: 35.3 s, total: 9min 56s
Wall time: 10min 35s

/ictstr01/home/icb/ilan.gold/test_ken/venv/lib64/python3.12/site-packages/anndata/_core/anndata.py:1756: UserWarning: Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
  utils.warn_names_duplicates("obs")

%%time
mod.pp.neighbors(subset_adata)

/ictstr01/home/icb/ilan.gold/test_ken/venv/lib64/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

CPU times: user 10min 5s, sys: 24.9 s, total: 10min 30s
Wall time: 9min 40s

%%time
mod.tl.umap(subset_adata)
if use_gpu:
    subset_adata.obsm["X_umap"] = subset_adata.obsm["X_umap"].get()

CPU times: user 2h 35min 37s, sys: 25.3 s, total: 2h 36min 3s
Wall time: 23min 32s

%%time
sc.pl.umap(
    subset_adata,
    color=["cell_line", "plate"],
    ncols=1
)

_images/f4f0babddd830adfcd5a28f0c07000700f7254c2b988d9c22995a4417abab3e8.png

CPU times: user 11 s, sys: 514 ms, total: 11.5 s
Wall time: 14.7 s