Title: | Cluster Independent Algorithm for Rare Cell Types Identification |
---|---|
Description: | Identification of markers of rare cell types by looking at genes whose expression is confined in small regions of the expression space <https://github.com/ScialdoneLab>. |
Authors: | Gabriele Lubatti |
Maintainer: | Gabriele Lubatti <[email protected]> |
License: | Artistic-2.0 |
Version: | 0.1.0 |
Built: | 2024-10-29 04:34:12 UTC |
Source: | https://github.com/cran/CIARA |
It selects highly localized genes as specified in CIARA_gene, starting from genes in background
CIARA( norm_matrix, knn_matrix, background, cores_number = 1, p_value = 0.001, odds_ratio = 2, local_region = 1, approximation = FALSE )
CIARA( norm_matrix, knn_matrix, background, cores_number = 1, p_value = 0.001, odds_ratio = 2, local_region = 1, approximation = FALSE )
norm_matrix |
Norm count matrix (n_genes X n_cells). |
knn_matrix |
K-nearest neighbors matrix (n_cells X n_cells). |
background |
Vector of genes for which the function CIARA_gene is run. |
cores_number |
Integer.Number of cores to use. |
p_value |
p value returned by the function fisher.test with parameter alternative = "g" |
odds_ratio |
odds_ratio returned by the function fisher.test with parameter alternative = "g" |
local_region |
Integer. Minimum number of local regions (cell with its knn neighbours) where the binarized gene expression is enriched in 1. |
approximation |
Logical.For a given gene, the fisher test is run in the local regions of only the cells where the binarized gene expression is 1. |
Dataframe with n_rows equal to the length of background . Each row is the output from CIARA_gene.
Gabriele Lubatti [email protected]
The gene expression is binarized (1/0) if the value in a given cell is above/below the median. Each of cell with its first K nearest neighbors defined a local region. If there are at least local_region enriched in 1 according to fisher.test, then the gene is defined as highly localized and a final p value is assigned to it. The final p value is the minimum of the p values from all the enriched local regions. If there are no enriched local regions, then the p value by default is set to 1
CIARA_gene( norm_matrix, knn_matrix, gene_expression, p_value = 0.001, odds_ratio = 2, local_region = 1, approximation = FALSE )
CIARA_gene( norm_matrix, knn_matrix, gene_expression, p_value = 0.001, odds_ratio = 2, local_region = 1, approximation = FALSE )
norm_matrix |
Norm count matrix (n_genes X n_cells). |
knn_matrix |
K-nearest neighbors matrix (n_cells X n_cells). |
gene_expression |
numeric vector with the gene expression (length equal to n_cells). The gene expression is binarized (equal to 0/1 in the cells where the value is below/above the median) |
p_value |
p value returned by the function fisher.test with parameter alternative = "g" |
odds_ratio |
odds_ratio returned by the function fisher.test with parameter alternative = "g" |
local_region |
Integer. Minimum number of local regions (cell with its knn neighbours) where the binarized gene expression is enriched in 1. |
approximation |
Logical.For a given gene, the fisher test is run in the local regions of only the cells where the binarized gene expression is 1. |
List with one element corresponding to the p value of the gene.
Gabriele Lubatti [email protected]
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/fisher.test
cluster_analysis_integrate_rare
cluster_analysis_integrate_rare( raw_counts, project_name, resolution, neighbors, max_dimension, feature_genes = NULL )
cluster_analysis_integrate_rare( raw_counts, project_name, resolution, neighbors, max_dimension, feature_genes = NULL )
raw_counts |
Raw count matrix (n_genes X n_cells). |
project_name |
Character name of the Seurat project. |
resolution |
Numeric value specifying the parameter resolution used in the Seurat function FindClusters. |
neighbors |
Numeric value specifying the parameter k.param in the Seurat function FindNeighbors |
max_dimension |
Numeric value specifying the maximum number of the PCA dimensions used in the parameter dims for the Seurat function FindNeighbors |
feature_genes |
vector of features specifying the argument features in the Seurat function RunPCA. |
Seurat object including raw and normalized counts matrices, UMAP coordinates and cluster result.
Gabriele Lubatti [email protected]
https://www.rdocumentation.org/packages/Seurat/versions/4.0.1/topics/FindClusters https://www.rdocumentation.org/packages/Seurat/versions/4.0.1/topics/FindNeighbors https://www.rdocumentation.org/packages/Seurat/versions/4.0.1/topics/RunPCA
cluster_analysis_sub
cluster_analysis_sub( raw_counts, resolution, neighbors, max_dimension, name_cluster )
cluster_analysis_sub( raw_counts, resolution, neighbors, max_dimension, name_cluster )
raw_counts |
Raw count matrix (n_genes X n_cells). |
resolution |
Numeric value specifying the parameter resolution used in the Seurat function FindClusters. |
neighbors |
Numeric value specifying the parameter k.param in the Seurat function FindNeighbors |
max_dimension |
Numeric value specifying the maximum number of the PCA dimensions used in the parameter dims for the Seurat function FindNeighbors |
name_cluster |
Character.Name of the original cluster for which the sub clustering is done. |
Seurat object including raw and normalized counts matrices and cluster result.
Gabriele Lubatti [email protected]
https://www.rdocumentation.org/packages/Seurat/versions/4.0.1/topics/RunPCA https://www.rdocumentation.org/packages/Seurat/versions/4.0.1/topics/FindVariableFeatures
find_resolution
find_resolution(seurat_object, resolution_vector)
find_resolution(seurat_object, resolution_vector)
seurat_object |
Seurat object as returned by cluster_analysis_integrate_rare |
resolution_vector |
vector with all values of resolution for which the Seurat function FindClusters is run |
Clustree object showing the connection between clusters obtained at different level of resolution as specified in resolution_vector.
Gabriele Lubatti [email protected]
https://CRAN.R-project.org/package=clustree
get_background_full
get_background_full( norm_matrix, threshold = 1, n_cells_low = 3, n_cells_high = 20 )
get_background_full( norm_matrix, threshold = 1, n_cells_low = 3, n_cells_high = 20 )
norm_matrix |
Norm count matrix (n_genes X n_cells). |
threshold |
threshold in expression for a given gene |
n_cells_low |
minimum number of cells where a gene is expressed at a level above threshold |
n_cells_high |
maximum number of cells where a gene is expressed at a level above threshold |
Character vector with all genes expressed at a level higher than threshold in a number of cells between n_cells and n_cells_high.
Gabriele Lubatti [email protected]
The Seurat function FindMarkers is used to identify general marker for each cluster (specific cluster vs all other cluster). This list of markers is then filtered keeping only the genes that appear as markers in a unique cluster.
markers_cluster_seurat(seurat_object, cluster, cell_names, number_top)
markers_cluster_seurat(seurat_object, cluster, cell_names, number_top)
seurat_object |
Seurat object as returned by cluster_analysis_sub or by cluster_analysis_integrate_rare. |
cluster |
Vector of length equal to the number of cells, with cluster assignment. |
cell_names |
Vector of length equal to the number of cells, with cell names. |
number_top |
Integer. Number of top marker genes to keep for each cluster. |
List of three elements. The first is a vector with number_top marker genes for each cluster. The second is a vector with number_top marker genes and corresponding cluster. The third element is a vector with all marker genes for each cluster.
Gabriele Lubatti [email protected]
https://www.rdocumentation.org/packages/Seurat/versions/4.0.1/topics/FindMarkers
merge_cluster
merge_cluster(old_cluster, new_cluster, max_number = NULL)
merge_cluster(old_cluster, new_cluster, max_number = NULL)
old_cluster |
original cluster assignment that need to be updated |
new_cluster |
new cluster assignment that need to be integrated with old_cluster. |
max_number |
Threshold in size for clusters in new_cluster. Only cluster with number of cells smaller than max_number will be integrated in old cluster. If max_number is NULL, then all the clusters in new_cluster are integrated in old cluster. |
Numeric vector of length equal to old_cluster showing the merged cluster assignment between old cluster and new_cluster.
Gabriele Lubatti [email protected]
plot_balloon_marker
plot_balloon_marker( norm_counts, cluster, marker_complete, max_number, max_size = 5, text_size = 7 )
plot_balloon_marker( norm_counts, cluster, marker_complete, max_number, max_size = 5, text_size = 7 )
norm_counts |
Norm count matrix (genes X cells). |
cluster |
Vector of length equal to the number of cells, with cluster assignment. |
marker_complete |
Third element of the output list as returned by the function markers_cluster_seurat |
max_number |
Integer. Maximum number of markers for each cluster for which we want to plot the expression. |
max_size |
Integer. Size of the dots to be plotted. |
text_size |
Size of the text in the heatmap plot. |
ggplot2 object showing balloon plot.
Gabriele Lubatti [email protected]
Cells are coloured according to the expression of gene_id and plotted according to coordinate_umap.
plot_gene(norm_counts, coordinate_umap, gene_id, title_name)
plot_gene(norm_counts, coordinate_umap, gene_id, title_name)
norm_counts |
Norm count matrix (genes X cells). |
coordinate_umap |
Data frame with dimensionality reduction coordinates. Number of rows must be equal to the number of cells |
gene_id |
Character name of the gene. |
title_name |
Character name. |
ggplot2 object.
Gabriele Lubatti [email protected]
https://CRAN.R-project.org/package=ggplot2
The sum of each gene in genes_relevant across all cells is first normalized to 1. Then for each cell, the sum from the (normalized) genes expression is computed and shown in the output plot.
plot_genes_sum(coordinate_umap, norm_counts, genes_relevant, name_title)
plot_genes_sum(coordinate_umap, norm_counts, genes_relevant, name_title)
coordinate_umap |
Data frame with dimensionality reduction coordinates. Number of rows must be equal to the number of cells |
norm_counts |
Norm count matrix (genes X cells). |
genes_relevant |
Vector with gene names for which we want to visualize the sum in each cell. |
name_title |
Character value. |
ggplot2 object.
Gabriele Lubatti [email protected]
https://CRAN.R-project.org/package=ggplot2
plot_heatmap_marker
plot_heatmap_marker( marker_top, marker_all_cluster, cluster, condition, norm_counts, text_size )
plot_heatmap_marker( marker_top, marker_all_cluster, cluster, condition, norm_counts, text_size )
marker_top |
First element returned by markers_cluster_seurat |
marker_all_cluster |
Second element returned by markers_cluster_seurat |
cluster |
Vector of length equal to the number of cells, with cluster assignment. |
condition |
Vector or length equal to the number of cells, specifying the condition of the cells (i.e. batch, dataset of origin..) |
norm_counts |
Norm count matrix (genes X cells). |
text_size |
Size of the text in the heatmap plot. |
Heatmap class object.
Gabriele Lubatti [email protected]
https://www.rdocumentation.org/packages/ComplexHeatmap/versions/1.10.2/topics/Heatmap
It shows in an interactive plot which are the highly localized genes in each cell. It is based on plotly library
plot_interactive( coordinate_umap, color, text, min_x = NULL, max_x = NULL, min_y = NULL, max_y = NULL )
plot_interactive( coordinate_umap, color, text, min_x = NULL, max_x = NULL, min_y = NULL, max_y = NULL )
coordinate_umap |
Data frame with dimensionality reduction coordinates. Number of rows must be equal to the number of cells |
color |
vector of length equal to n_rows in coordinate_umap.Each cell will be coloured following a gradient according to the corresponding value of this vector. |
text |
Character vector specifying the highly localized genes in each cell. It is the output from selection_localized_genes. |
min_x |
Set the min limit on the x axis. |
max_x |
Set the max limit on the x axis. |
min_y |
Set the min limit on the y axis. |
max_y |
Set the min limit on the y axis. |
plotly object given by plot_ly function (from library plotly).
Gabriele Lubatti [email protected]
plot_umap
plot_umap(coordinate_umap, cluster)
plot_umap(coordinate_umap, cluster)
coordinate_umap |
Data frame with dimensionality reduction coordinates. Number of rows must be equal to the number of cells |
cluster |
Vector of length equal to the number of cells, with cluster assignment. |
ggplot2 object.
Gabriele Lubatti [email protected]
https://CRAN.R-project.org/package=ggplot2
selection_localized_genes
selection_localized_genes( norm_counts, localized_genes, min_number_cells = 4, max_number_genes = 10 )
selection_localized_genes( norm_counts, localized_genes, min_number_cells = 4, max_number_genes = 10 )
norm_counts |
Norm count matrix (genes X cells). |
localized_genes |
vector of highly localized genes as provided by the last element of the list given as output from CIARA_mixing_final. |
min_number_cells |
Minimum number of cells where a genes must be expressed (> 0). |
max_number_genes |
Maximum number of genes to show for each cell in the interactive plot from plot_interactive. |
Character vector where each entry contains the name of the top max_number_genes for the corresponding cell.
Gabriele Lubatti [email protected]
For each cluster in cluster, HVGs are defined with Seurat function FindVariableFeatures. A Fisher test is performed to see if there is a statistically significant enrichment between the top number_hvg and the localized_genes
test_hvg( raw_counts, cluster, localized_genes, background, number_hvg, min_p_value )
test_hvg( raw_counts, cluster, localized_genes, background, number_hvg, min_p_value )
raw_counts |
Raw count matrix (n_genes X n_cells). |
cluster |
Vector of length equal to the number of cells, with cluster assignment. |
localized_genes |
Character vector with localized genes detected by CIARA. |
background |
Character vector with all the genes names to use as background for the Fisher test. |
number_hvg |
Integer value. Number of top HVGs provided by the Seurat function FindVariableFeatures. |
min_p_value |
Threshold on p values provided by Fisher test. |
A list with two elements.
first element |
The first one is a list with length equal to the number of clusters. Each entry is list of three elements. The first two elements contain the p value and the odds ration given by the Fisher test The third is a vector with genes names that are present both in localized_genes and in top number_hvg HVGs . |
second element |
a character vector with the name of the cluster that have a p value smaller than min_p_value. |
Gabriele Lubatti [email protected]
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/fisher.test
A white-marker is a gene whose median expression across cells belong to single_cluster is greater than threshold and in all the other clusters is equal to zero.
white_black_markers( cluster, single_cluster, norm_counts, marker_list, threshold = 0 )
white_black_markers( cluster, single_cluster, norm_counts, marker_list, threshold = 0 )
cluster |
Vector of length equal to the number of cells, with cluster assignment. |
single_cluster |
Character. Label of one specify cluster |
norm_counts |
Norm count matrix (genes X cells). |
marker_list |
Third element of the output list as returned by the function markers_cluster_seurat |
threshold |
Numeric. The median of the genes across cells belong to single_cluster has to be greater than threshold in order to be consider as a white-black marker for single_cluster |
Logical vector of length equal to marker_list, with TRUE/FALSE if the gene is/is not a white-black marker for single_cluster.
Gabriele Lubatti [email protected]