Preprocessing
Preprocessing that happens to added datasets.
General
The OSCA handbook was used to guide preprocessing choices.
Pseudoalignment
kallisto v0.46.0 is used for pseudo-quantification with an index built using GRCh38 release 94.
kallisto bustools is up to 51 times faster than Cell Ranger and runs in constant memory.
Droplet processing
Empty droplets are detected using emptyDrops, which retains distinct cell types that simple knee point thresholds discard. emptyDrops has also been adapted by Cell Ranger V3.
Feature selection
Highly variable genes are within the top 10% of biological variance.
A priori genes of interest can also be specified by subsetting.
Dimensionality reduction and clustering
The top 30 principle components are used to detect clusters and generate UMAP plots.
Clustering uses the leiden algorithm and the resolution parameter can be adjusted.
Marker genes
Wilcoxon rank sum tests calculated with presto are used to sort up-regulated marker genes for each cluster.