Why not Seurat?

Workflow choices that influenced our switch from Seurat SCTransform to the Bioconductor SingleCellExperiment ecosystem.

💡

Seurat and SingleCellExperiment objects can be imported and exported.

💡

We use Seurat reference-based analysis.

Reference-based analysis is available during dataset creation, subsetting, and integration.

Pseudobulk differential expression

The SingleCellExperiment ecosystem provides utilities to run pseudo-bulk differential expression analyses per cluster when there are multiple control and test samples. The OSCA handbook provides the following justifications for pseudo-bulking:

Larger counts are more amenable to standard DE analysis pipelines designed for bulk RNA-seq data. Normalization is more straightforward and certain statistical approximations are more accurate […]
Collapsing cells into samples reflects the fact that our biological replication occurs at the sample level. Each sample is represented no more than once for each condition, avoiding problems from unmodelled correlations between samples. Supplying the per-cell counts directly to a DE analysis pipeline would imply that each cell is an independent biological replicate, which is not true from an experimental perspective.
Variance between cells within each sample is masked, provided it does not affect variance across (replicate) samples. This avoids penalizing DEGs that are not uniformly up- or down-regulated for all cells in all samples of one condition. Masking is generally desirable as DEGs - unlike marker genes - do not need to have low within-sample variance to be interesting, e.g., if the treatment effect is consistent across replicate populations but heterogeneous on a per-cell basis. (Of course, high per-cell variability will still result in weaker DE if it affects the variability across populations, while homogeneous per-cell responses will result in stronger DE due to a larger population-level log-fold change. These effects are also largely desirable.)

💡

Pseudobulk methods outperform non-pseudobulk methods when benchmarked.

Batch correction for pseudobulk

For both workflows, integration is only used to align cells for clustering. The corrected values do not get used in downstream analyses. However, the OSCA workflow implements library size adjustment between samples with multiBatchNorm. Seurat does not have comparable functionality.

💡

multiBatchNorm protects against situations where, for example, a particular transcript is detected in one sample but not another due to systematically smaller library sizes.

Ambient expression

The OSCA handbook provides methods for dealing with ambient expression in multi-sample differential expression analyses.

Ambient expression arrises from differential lysis between samples into the cell suspension. Seurat does not provide any recommendations for handling ambient expression.

💡

Example of problematic ambient expression:

RBCs lyse → high Hemoglobin in all droplets → Hemoglobin is differentially expressed as compared to samples without this issue.

SCTransform downsides

SCTransform has potential and real downsides. In particular:

The transformed values from sctransform exhibit no relation to the original scale of the (log-)counts. This is not a problem for exploratory analyses but makes it difficult to interpret differential expression analyses […]

Time and memory

The Seurat SCTransform workflow generates a non-sparse expression matrix. This makes integration of multiple samples very slow and requires large amounts of RAM.

Edit this page on GitHub

Why not Seurat?

Pseudobulk differential expression#

Batch correction for pseudobulk#

Ambient expression#

SCTransform downsides#

Time and memory#

Pseudobulk differential expression

Batch correction for pseudobulk

Ambient expression

SCTransform downsides

Time and memory