Why not Seurat?
Workflow choices that influenced our switch from Seurat SCTransform to the Bioconductor SingleCellExperiment ecosystem.
Pseudobulk differential expression
The SingleCellExperiment ecosystem provides utilities to run pseudo-bulk differential expression analyses per cluster when there are multiple control and test samples. The OSCA handbook provides the following justifications for pseudo-bulking:
- Larger counts are more amenable to standard DE analysis pipelines designed for bulk RNA-seq data. Normalization is more straightforward and certain statistical approximations are more accurate […]
- Collapsing cells into samples reflects the fact that our biological replication occurs at the sample level. Each sample is represented no more than once for each condition, avoiding problems from unmodelled correlations between samples. Supplying the per-cell counts directly to a DE analysis pipeline would imply that each cell is an independent biological replicate, which is not true from an experimental perspective.
- Variance between cells within each sample is masked, provided it does not affect variance across (replicate) samples. This avoids penalizing DEGs that are not uniformly up- or down-regulated for all cells in all samples of one condition. Masking is generally desirable as DEGs - unlike marker genes - do not need to have low within-sample variance to be interesting, e.g., if the treatment effect is consistent across replicate populations but heterogeneous on a per-cell basis. (Of course, high per-cell variability will still result in weaker DE if it affects the variability across populations, while homogeneous per-cell responses will result in stronger DE due to a larger population-level log-fold change. These effects are also largely desirable.)
Batch correction for pseudobulk
For both workflows, integration is only used to align cells for clustering. The corrected values do not get used in downstream analyses. However, the OSCA workflow implements library size adjustment between samples with multiBatchNorm. Seurat does not have comparable functionality.
The OSCA handbook provides methods for dealing with ambient expression in multi-sample differential expression analyses.
Ambient expression arrises from differential lysis between samples into the cell suspension. Seurat does not provide any recommendations for handling ambient expression.
SCTransform has potential and real downsides. In particular:
The transformed values from sctransform exhibit no relation to the original scale of the (log-)counts. This is not a problem for exploratory analyses but makes it difficult to interpret differential expression analyses […]
Time and memory
The Seurat SCTransform workflow generates a non-sparse expression matrix. This makes integration of multiple samples very slow and requires large amounts of RAM.