Package 'CIARA' reference manual

Title:	Cluster Independent Algorithm for Rare Cell Types Identification
Description:	Identification of markers of rare cell types by looking at genes whose expression is confined in small regions of the expression space <https://github.com/ScialdoneLab>.
Authors:	Gabriele Lubatti
Maintainer:	Gabriele Lubatti <[email protected]>
License:	Artistic-2.0
Version:	0.1.0
Built:	2025-01-27 04:27:41 UTC
Source:	https://github.com/cran/CIARA

CIARA

Description

It selects highly localized genes as specified in CIARA_gene, starting from genes in background

Usage

CIARA(
  norm_matrix,
  knn_matrix,
  background,
  cores_number = 1,
  p_value = 0.001,
  odds_ratio = 2,
  local_region = 1,
  approximation = FALSE
)
CIARA(
  norm_matrix,
  knn_matrix,
  background,
  cores_number = 1,
  p_value = 0.001,
  odds_ratio = 2,
  local_region = 1,
  approximation = FALSE
)

Arguments

`norm_matrix`	Norm count matrix (n_genes X n_cells).
`knn_matrix`	K-nearest neighbors matrix (n_cells X n_cells).
`background`	Vector of genes for which the function CIARA_gene is run.
`cores_number`	Integer.Number of cores to use.
`p_value`	p value returned by the function fisher.test with parameter alternative = "g"
`odds_ratio`	odds_ratio returned by the function fisher.test with parameter alternative = "g"
`local_region`	Integer. Minimum number of local regions (cell with its knn neighbours) where the binarized gene expression is enriched in 1.
`approximation`	Logical.For a given gene, the fisher test is run in the local regions of only the cells where the binarized gene expression is 1.

Value

Dataframe with n_rows equal to the length of background . Each row is the output from CIARA_gene.

Author(s)

Gabriele Lubatti [email protected]

The gene expression is binarized (1/0) if the value in a given cell is above/below the median. Each of cell with its first K nearest neighbors defined a local region. If there are at least local_region enriched in 1 according to fisher.test, then the gene is defined as highly localized and a final p value is assigned to it. The final p value is the minimum of the p values from all the enriched local regions. If there are no enriched local regions, then the p value by default is set to 1

Usage

CIARA_gene(
  norm_matrix,
  knn_matrix,
  gene_expression,
  p_value = 0.001,
  odds_ratio = 2,
  local_region = 1,
  approximation = FALSE
)
CIARA_gene(
  norm_matrix,
  knn_matrix,
  gene_expression,
  p_value = 0.001,
  odds_ratio = 2,
  local_region = 1,
  approximation = FALSE
)

Arguments

`norm_matrix`	Norm count matrix (n_genes X n_cells).
`knn_matrix`	K-nearest neighbors matrix (n_cells X n_cells).
`gene_expression`	numeric vector with the gene expression (length equal to n_cells). The gene expression is binarized (equal to 0/1 in the cells where the value is below/above the median)
`p_value`	p value returned by the function fisher.test with parameter alternative = "g"
`odds_ratio`	odds_ratio returned by the function fisher.test with parameter alternative = "g"
`local_region`	Integer. Minimum number of local regions (cell with its knn neighbours) where the binarized gene expression is enriched in 1.
`approximation`	Logical.For a given gene, the fisher test is run in the local regions of only the cells where the binarized gene expression is 1.

Value

List with one element corresponding to the p value of the gene.

Author(s)

Gabriele Lubatti [email protected]

cluster_analysis_integrate_rare

Description

cluster_analysis_integrate_rare

Usage

cluster_analysis_integrate_rare(
  raw_counts,
  project_name,
  resolution,
  neighbors,
  max_dimension,
  feature_genes = NULL
)
cluster_analysis_integrate_rare(
  raw_counts,
  project_name,
  resolution,
  neighbors,
  max_dimension,
  feature_genes = NULL
)

Arguments

`raw_counts`	Raw count matrix (n_genes X n_cells).
`project_name`	Character name of the Seurat project.
`resolution`	Numeric value specifying the parameter resolution used in the Seurat function FindClusters.
`neighbors`	Numeric value specifying the parameter k.param in the Seurat function FindNeighbors
`max_dimension`	Numeric value specifying the maximum number of the PCA dimensions used in the parameter dims for the Seurat function FindNeighbors
`feature_genes`	vector of features specifying the argument features in the Seurat function RunPCA.

Value

Seurat object including raw and normalized counts matrices, UMAP coordinates and cluster result.

Author(s)

Gabriele Lubatti [email protected]

cluster_analysis_sub

Description

cluster_analysis_sub

Usage

cluster_analysis_sub(
  raw_counts,
  resolution,
  neighbors,
  max_dimension,
  name_cluster
)
cluster_analysis_sub(
  raw_counts,
  resolution,
  neighbors,
  max_dimension,
  name_cluster
)

Arguments

`raw_counts`	Raw count matrix (n_genes X n_cells).
`resolution`	Numeric value specifying the parameter resolution used in the Seurat function FindClusters.
`neighbors`	Numeric value specifying the parameter k.param in the Seurat function FindNeighbors
`max_dimension`	Numeric value specifying the maximum number of the PCA dimensions used in the parameter dims for the Seurat function FindNeighbors
`name_cluster`	Character.Name of the original cluster for which the sub clustering is done.

Value

Seurat object including raw and normalized counts matrices and cluster result.

Author(s)

Gabriele Lubatti [email protected]

find_resolution

Description

find_resolution

Usage

find_resolution(seurat_object, resolution_vector)
find_resolution(seurat_object, resolution_vector)

Arguments

`seurat_object`	Seurat object as returned by cluster_analysis_integrate_rare
`resolution_vector`	vector with all values of resolution for which the Seurat function FindClusters is run

Value

Clustree object showing the connection between clusters obtained at different level of resolution as specified in resolution_vector.

Author(s)

Gabriele Lubatti [email protected]

get_background_full

Description

get_background_full

Usage

get_background_full(
  norm_matrix,
  threshold = 1,
  n_cells_low = 3,
  n_cells_high = 20
)
get_background_full(
  norm_matrix,
  threshold = 1,
  n_cells_low = 3,
  n_cells_high = 20
)

Arguments

`norm_matrix`	Norm count matrix (n_genes X n_cells).
`threshold`	threshold in expression for a given gene
`n_cells_low`	minimum number of cells where a gene is expressed at a level above threshold
`n_cells_high`	maximum number of cells where a gene is expressed at a level above threshold

Value

Character vector with all genes expressed at a level higher than threshold in a number of cells between n_cells and n_cells_high.

Author(s)

Gabriele Lubatti [email protected]

markers_cluster_seurat

Description

The Seurat function FindMarkers is used to identify general marker for each cluster (specific cluster vs all other cluster). This list of markers is then filtered keeping only the genes that appear as markers in a unique cluster.

Usage

markers_cluster_seurat(seurat_object, cluster, cell_names, number_top)
markers_cluster_seurat(seurat_object, cluster, cell_names, number_top)

Arguments

`seurat_object`	Seurat object as returned by cluster_analysis_sub or by cluster_analysis_integrate_rare.
`cluster`	Vector of length equal to the number of cells, with cluster assignment.
`cell_names`	Vector of length equal to the number of cells, with cell names.
`number_top`	Integer. Number of top marker genes to keep for each cluster.

Value

List of three elements. The first is a vector with number_top marker genes for each cluster. The second is a vector with number_top marker genes and corresponding cluster. The third element is a vector with all marker genes for each cluster.

Author(s)

Gabriele Lubatti [email protected]

merge_cluster

Description

merge_cluster

Usage

merge_cluster(old_cluster, new_cluster, max_number = NULL)
merge_cluster(old_cluster, new_cluster, max_number = NULL)

Arguments

`old_cluster`	original cluster assignment that need to be updated
`new_cluster`	new cluster assignment that need to be integrated with old_cluster.
`max_number`	Threshold in size for clusters in new_cluster. Only cluster with number of cells smaller than max_number will be integrated in old cluster. If max_number is NULL, then all the clusters in new_cluster are integrated in old cluster.

Value

Numeric vector of length equal to old_cluster showing the merged cluster assignment between old cluster and new_cluster.

Author(s)

Gabriele Lubatti [email protected]

plot_balloon_marker

Description

plot_balloon_marker

Usage

plot_balloon_marker(
  norm_counts,
  cluster,
  marker_complete,
  max_number,
  max_size = 5,
  text_size = 7
)
plot_balloon_marker(
  norm_counts,
  cluster,
  marker_complete,
  max_number,
  max_size = 5,
  text_size = 7
)

Arguments

`norm_counts`	Norm count matrix (genes X cells).
`cluster`	Vector of length equal to the number of cells, with cluster assignment.
`marker_complete`	Third element of the output list as returned by the function markers_cluster_seurat
`max_number`	Integer. Maximum number of markers for each cluster for which we want to plot the expression.
`max_size`	Integer. Size of the dots to be plotted.
`text_size`	Size of the text in the heatmap plot.

Value

ggplot2 object showing balloon plot.

Author(s)

Gabriele Lubatti [email protected]

plot_gene

Description

Cells are coloured according to the expression of gene_id and plotted according to coordinate_umap.

Usage

plot_gene(norm_counts, coordinate_umap, gene_id, title_name)
plot_gene(norm_counts, coordinate_umap, gene_id, title_name)

Arguments

`norm_counts`	Norm count matrix (genes X cells).
`coordinate_umap`	Data frame with dimensionality reduction coordinates. Number of rows must be equal to the number of cells
`gene_id`	Character name of the gene.
`title_name`	Character name.

Value

ggplot2 object.

Author(s)

Gabriele Lubatti [email protected]

plot_genes_sum

Description

The sum of each gene in genes_relevant across all cells is first normalized to 1. Then for each cell, the sum from the (normalized) genes expression is computed and shown in the output plot.

Usage

plot_genes_sum(coordinate_umap, norm_counts, genes_relevant, name_title)
plot_genes_sum(coordinate_umap, norm_counts, genes_relevant, name_title)

Arguments

`coordinate_umap`	Data frame with dimensionality reduction coordinates. Number of rows must be equal to the number of cells
`norm_counts`	Norm count matrix (genes X cells).
`genes_relevant`	Vector with gene names for which we want to visualize the sum in each cell.
`name_title`	Character value.

Value

ggplot2 object.

Author(s)

Gabriele Lubatti [email protected]

plot_heatmap_marker

Description

plot_heatmap_marker

Usage

plot_heatmap_marker(
  marker_top,
  marker_all_cluster,
  cluster,
  condition,
  norm_counts,
  text_size
)
plot_heatmap_marker(
  marker_top,
  marker_all_cluster,
  cluster,
  condition,
  norm_counts,
  text_size
)

Arguments

`marker_top`	First element returned by markers_cluster_seurat
`marker_all_cluster`	Second element returned by markers_cluster_seurat
`cluster`	Vector of length equal to the number of cells, with cluster assignment.
`condition`	Vector or length equal to the number of cells, specifying the condition of the cells (i.e. batch, dataset of origin..)
`norm_counts`	Norm count matrix (genes X cells).
`text_size`	Size of the text in the heatmap plot.

Value

Heatmap class object.

Author(s)

Gabriele Lubatti [email protected]

plot_interactive

Description

It shows in an interactive plot which are the highly localized genes in each cell. It is based on plotly library

Usage

plot_interactive(
  coordinate_umap,
  color,
  text,
  min_x = NULL,
  max_x = NULL,
  min_y = NULL,
  max_y = NULL
)
plot_interactive(
  coordinate_umap,
  color,
  text,
  min_x = NULL,
  max_x = NULL,
  min_y = NULL,
  max_y = NULL
)

Arguments

`coordinate_umap`	Data frame with dimensionality reduction coordinates. Number of rows must be equal to the number of cells
`color`	vector of length equal to n_rows in coordinate_umap.Each cell will be coloured following a gradient according to the corresponding value of this vector.
`text`	Character vector specifying the highly localized genes in each cell. It is the output from selection_localized_genes.
`min_x`	Set the min limit on the x axis.
`max_x`	Set the max limit on the x axis.
`min_y`	Set the min limit on the y axis.
`max_y`	Set the min limit on the y axis.

Value

plotly object given by plot_ly function (from library plotly).

Author(s)

Gabriele Lubatti [email protected]

plot_umap

Description

plot_umap

Usage

plot_umap(coordinate_umap, cluster)
plot_umap(coordinate_umap, cluster)

Arguments

`coordinate_umap`	Data frame with dimensionality reduction coordinates. Number of rows must be equal to the number of cells
`cluster`	Vector of length equal to the number of cells, with cluster assignment.

Value

ggplot2 object.

Author(s)

Gabriele Lubatti [email protected]

selection_localized_genes

Description

selection_localized_genes

Usage

selection_localized_genes(
  norm_counts,
  localized_genes,
  min_number_cells = 4,
  max_number_genes = 10
)
selection_localized_genes(
  norm_counts,
  localized_genes,
  min_number_cells = 4,
  max_number_genes = 10
)

Arguments

`norm_counts`	Norm count matrix (genes X cells).
`localized_genes`	vector of highly localized genes as provided by the last element of the list given as output from CIARA_mixing_final.
`min_number_cells`	Minimum number of cells where a genes must be expressed (> 0).
`max_number_genes`	Maximum number of genes to show for each cell in the interactive plot from plot_interactive.

Value

Character vector where each entry contains the name of the top max_number_genes for the corresponding cell.

Author(s)

Gabriele Lubatti [email protected]

test_hvg

Description

For each cluster in cluster, HVGs are defined with Seurat function FindVariableFeatures. A Fisher test is performed to see if there is a statistically significant enrichment between the top number_hvg and the localized_genes

Usage

test_hvg(
  raw_counts,
  cluster,
  localized_genes,
  background,
  number_hvg,
  min_p_value
)
test_hvg(
  raw_counts,
  cluster,
  localized_genes,
  background,
  number_hvg,
  min_p_value
)

Arguments

`raw_counts`	Raw count matrix (n_genes X n_cells).
`cluster`	Vector of length equal to the number of cells, with cluster assignment.
`localized_genes`	Character vector with localized genes detected by CIARA.
`background`	Character vector with all the genes names to use as background for the Fisher test.
`number_hvg`	Integer value. Number of top HVGs provided by the Seurat function FindVariableFeatures.
`min_p_value`	Threshold on p values provided by Fisher test.

Value

A list with two elements.

`first element`	The first one is a list with length equal to the number of clusters. Each entry is list of three elements. The first two elements contain the p value and the odds ration given by the Fisher test The third is a vector with genes names that are present both in localized_genes and in top number_hvg HVGs .
`second element`	a character vector with the name of the cluster that have a p value smaller than min_p_value.

Author(s)

Gabriele Lubatti [email protected]

white_black_markers

Description

A white-marker is a gene whose median expression across cells belong to single_cluster is greater than threshold and in all the other clusters is equal to zero.

Usage

white_black_markers(
  cluster,
  single_cluster,
  norm_counts,
  marker_list,
  threshold = 0
)
white_black_markers(
  cluster,
  single_cluster,
  norm_counts,
  marker_list,
  threshold = 0
)

Arguments

`cluster`	Vector of length equal to the number of cells, with cluster assignment.
`single_cluster`	Character. Label of one specify cluster
`norm_counts`	Norm count matrix (genes X cells).
`marker_list`	Third element of the output list as returned by the function markers_cluster_seurat
`threshold`	Numeric. The median of the genes across cells belong to single_cluster has to be greater than threshold in order to be consider as a white-black marker for single_cluster

Value

Logical vector of length equal to marker_list, with TRUE/FALSE if the gene is/is not a white-black marker for single_cluster.

Author(s)

Gabriele Lubatti [email protected]

Package 'CIARA'

Help Index

CIARA

Description

Usage

Arguments

Value

Author(s)

CIARA_gene

Description

Usage

Arguments

Value

Author(s)

See Also

cluster_analysis_integrate_rare

Description

Usage

Arguments

Value

Author(s)

See Also

cluster_analysis_sub

Description

Usage

Arguments

Value

Author(s)

See Also

find_resolution

Description

Usage

Arguments

Value

Author(s)

See Also

get_background_full

Description

Usage

Arguments

Value

Author(s)

markers_cluster_seurat

Description

Usage

Arguments

Value

Author(s)

See Also

merge_cluster

Description

Usage

Arguments

Value

Author(s)

plot_balloon_marker

Description

Usage

Arguments

Value

Author(s)

plot_gene

Description

Usage

Arguments

Value

Author(s)

See Also

plot_genes_sum

Description

Usage

Arguments

Value

Author(s)

See Also

plot_heatmap_marker

Description

Usage

Arguments

Value