Testing core functionalities of the ACTIONet package
Testing core functionalities of the ACTIONet package
- Preparing the environment
- Preparation
- Running ACTIONet
- Running multi-level ACTION decomposition
- Prune nonspecific and/or unreliable archetypes
- Building ACTIONet graph
- Layout ACTIONet
- Identiy equivalent classes of archetypes and group them together
- Use graph core of global and induced subgraphs to infer centrality/quality of each cell
- Re-normalize input (~gene expression) matrix and compute feature (~gene) specificity scores
- Compute gene specificity for each archetype
- Creating a trace of all that we have done
- Visualization
Here we provide an in-depth, step-by-step guide to familiarize you with ACTIONet building blocks. For the sake of compatibility, we will use the PBMC 3k dataset used in the Seurat and Scanpy for this tutorial.
Preparing the environment
To start, you need to load the ACTIONet package.
For the ACTIONet installation instruction, see ACTIONet.
Preparation
Importing data from a SingleCellExperiment
object
We have preprocessed and stored the PBMC 3k dataset as an SCE
object, which is the main datatype used in the ACTIONet framework. For more information on how to convert different datatypes to SingleCellExperiment
format, please consult our tutorial on Bring your data along – Import/export options in ACTIONet.
class: SingleCellExperiment
dim: 13714 2638
metadata(0):
assays(1): logcounts
rownames(13714): AL627309.1 AP006222.2 ... PNRC2-1 SRSF10-1
rowData names(1): n.cells
colnames(2638): AAACATACAACCAC-1 AAACATTGAGCTAC-1 ... TTTGCATGAGAGGC-1
TTTGCATGCCTCAC-1
colData names(5): nFeatures_RNA percent.mito nCount_RNA louvain ident
reducedDimNames(3): PCA TSNE UMAP
spikeNames(0):
Reducing the SCE
object
Next, we use the reduce.sce()
function to both normalize the counts’ matrix (lib-size normalization + log-transformation, by default) and compute a factorized (reduced) form of the kernel matrix (50 by default). By default, no batch correction is performed, and the simplest depth-normalization method is used. We then save the output for future use.
[1] "Running main reduction"
PS: ACTIONet includes 7 additional normalization methods (scran, linnorm, scone, SCnorm, DESeq2, TMM, and logCPM). However, the default
method is the fastest and performs very well for ACTIONet construction.
PS: ACTIONet also seamlessly integrates with Harmony batch-correction method. For more information on how to use batch correction please consult To batch correct or not to batch correct, that is the question!.
Creating an ACTIONetExperiment (ACE)
object to hold results
We have extended the SingleCellExperiment
class to incorporate additional slots, similar to AnnData
format, to hold multi-dimentional and structural meta-data for rows and columns. To construct an ACE
object from an SCE
object, we run
class: ACTIONetExperiment
dim: 13714 2638
metadata(1): reduction.time
assays(1): logcounts
rownames(13714): AL627309.1 AP006222.2 ... PNRC2-1 SRSF10-1
rowData names(1): n.cells
colnames(2638): AAACATACAACCAC-1 AAACATTGAGCTAC-1 ... TTTGCATGAGAGGC-1
TTTGCATGCCTCAC-1
colData names(5): nFeatures_RNA percent.mito nCount_RNA louvain ident
reducedDimNames(4): PCA TSNE UMAP S_r
spikeNames(0):
rowNets(0):
colNets(0):
rowFactors(0):
colFactors(0):
As it can be seen, the new slots are rowNets
, colNets
, rowFactors
, and colFactors
. Each slot has its on getter/setter functions, which we will use through out the rest of the tutorial to store results.
Running ACTIONet
Running multi-level ACTION decomposition
This will run ACTIONet with increasing number of archetypes
Prune nonspecific and/or unreliable archetypes
To remove unreliable archetypes and a firt pass filtering, we can use prune_archetypes()
function:
pruning.out = prune_archetypes(ACTION.out$C, ACTION.out$H)
C_stacked = pruning.out$C_stacked
H_stacked = pruning.out$H_stacked
C_stacked
and H_stacked
are the concatenated C
and H
matrices across different levels, respectively, after pruning noisy archetypes.
Building ACTIONet graph
To store computed graph as a network associated with cells (columns) in the ACE
object, we can use:
Layout ACTIONet
We have adopted and modified the SGD-based layout algorithm utilized in the UMAP for our visualization. We use S_r
for our initialization:
initial.coordinates = t(scale(reducedDims(sce)[["S_r"]]))
vis.out = layoutNetwork(G, S_r = initial.coordinates, n_epochs = 500)
And then we can store the coordinates as reducedDims()
and computed de novo colors as attributes in the colData()
:
Identiy equivalent classes of archetypes and group them together
There is a large redundancy between the set of archetyes. unify_archetypes()
tries to coalesce these archetypes and partittion them into equivalent classses.
unification.out = unify_archetypes(G, S_r, C_stacked, H_stacked, minPoints = 10,
minClusterSize = 10, outlier_threshold = 0.5)
And store the results back in the ACE
object:
Use graph core of global and induced subgraphs to infer centrality/quality of each cell
We have developed a novel technique to estimate the quality of cells based on the strategic placement in the ACTIONet, w.r.t to each archetype:
Re-normalize input (~gene expression) matrix and compute feature (~gene) specificity scores
Compute gene specificity for each archetype
This computes the “markerness” of each genes for different archetypes:
specificity.out = compute_archetype_feature_specificity(norm.out$S_norm, unification.out$H_unified)
specificity.out = lapply(specificity.out, function(specificity.scores) {
rownames(specificity.scores) = rownames(ace)
colnames(specificity.scores) = paste("A", 1:ncol(specificity.scores))
return(specificity.scores)
})
rowFactors(ace)[["archetype_gene_profile"]] = specificity.out[["archetypes"]]
rowFactors(ace)[["archetype_gene_specificity"]] = specificity.out[["upper_significance"]]