---
title: "SCOPRO"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{SCOPRO}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
% \VignetteDepends{CIARA}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width=7,
fig.height=5
)
```
The vignette depends on CIARA packages.
```{r setup}
library(SCOPRO)
required <- c("CIARA")
if (!all(unlist(lapply(required, function(pkg) requireNamespace(pkg, quietly = TRUE)))))
knitr::opts_chunk$set(eval = FALSE)
```
In this vignette it is shown the projection performed between single cell RNA seq mouse data from [Iturbe et al., 2021](https://www.nature.com/articles/s41594-021-00590-w) and in vivo mouse datasets from [Deng et al. , 2014](https://pubmed.ncbi.nlm.nih.gov/24408435/) and [Mohammed et al. , 2017](https://www.sciencedirect.com/science/article/pii/S2211124717309610).
The single cell RNA seq dataset includes 1285 mouse embryonic stem cells, including a small cluster of 2-cell-like cells (2CLC) (cluster 2, 31 cells).
The in vivo mouse dataset from [Deng et al. , 2014](https://pubmed.ncbi.nlm.nih.gov/24408435/) includes stages from from early 2 cells-stage to late blastocyst while the in vivo mouse dataset from [Mohammed et al. , 2017](https://www.sciencedirect.com/science/article/pii/S2211124717309610) includes stages from from E4.5 to E6.5.
## Load mouse ESCs raw count matrix
We load the raw count matrix provided in the original paper and create norm counts and run cluster analysis with CIARA function **cluster_analysis_integrate_rare **
Raw count matrix can be downloaded [here](https://hmgubox2.helmholtz-muenchen.de/index.php/s/EHQSnjMJxkR7QYT)
```{r, eval = FALSE }
current_wd <- getwd()
url = "https://hmgubox2.helmholtz-muenchen.de/index.php/s/EHQSnjMJxkR7QYT/download/SCOPRO.zip"
destfile <- paste0(current_wd,"/SCOPRO.zip")
download.file(url, destfile, quiet = FALSE)
unzip(destfile, exdir=current_wd)
```
```{r, eval = FALSE }
setwd(paste0(current_wd,"/SCOPRO"))
load(file='mayra_dati_raw_0.Rda')
mayra_seurat_0=cluster_analysis_integrate_rare(mayra_dati_raw_0,"Mayra_data_0",0.1,5,30)
norm_es_vitro=as.matrix(GetAssayData(mayra_seurat_0, slot = "data",assay="RNA"))
cluster_es_vitro=as.vector(mayra_seurat_0$RNA_snn_res.0.1)
```
## Load in vivo mouse datasets
The seurat object **seurat_genes_published_mouse.Rda** already includes the raw and normalized count matrix obtained combining the two in vivo datasets ( [Deng et al., 014](https://pubmed.ncbi.nlm.nih.gov/24408435/) and [Mohammed et al. , 2017](https://www.sciencedirect.com/science/article/pii/S2211124717309610) ).
Normalization was done with Seurat function **NormalizeData** (default parameters).
Seurat object can be downloaded [here](https://hmgubox2.helmholtz-muenchen.de/index.php/s/EHQSnjMJxkR7QYT)
```{r , eval = FALSE}
setwd(paste0(current_wd,"/SCOPRO"))
load(file="seurat_genes_published_mouse.Rda")
norm_vivo <- as.matrix(GetAssayData(seurat_genes_published_mouse, slot = "data",assay="RNA"))
```
## Compute markers for selected in vivo stages
```{r, eval = FALSE}
DefaultAssay(seurat_genes_published_mouse) <- "RNA"
cluster_mouse_published <- as.vector(seurat_genes_published_mouse$stim)
relevant_stages <- c("Late_2_cell", "epiblast_4.5", "epiblast_5.5", "epiblast_6.5")
DefaultAssay(seurat_genes_published_mouse) <- "RNA"
markers_first_ESC_small <- CIARA::markers_cluster_seurat(seurat_genes_published_mouse[,cluster_mouse_published%in%relevant_stages],cluster_mouse_published[cluster_mouse_published%in%relevant_stages],names(seurat_genes_published_mouse$RNA_snn_res.0.2)[cluster_mouse_published%in%relevant_stages],10)
```
```{r,eval = FALSE }
markers_mouse <- as.vector(markers_first_ESC_small[[3]])
stages_markers <- names(markers_first_ESC_small[[3]])
## Keeping only the genes in common between in vitro and in vivo datasets
stages_markers <- stages_markers[markers_mouse %in% row.names(norm_es_vitro)]
markers_small <- markers_mouse[markers_mouse %in% row.names(norm_es_vitro)]
names(markers_small) <- stages_markers
```
## Select only black/white markers for in vivo stages
For each in vivo stage, we select only the markers for which the median is above 0.1 and is below 0.1 in all the other stages.
```{r ,eval = FALSE}
marker_result <- select_top_markers(relevant_stages, cluster_mouse_published, norm_vivo, markers_small, max_number = 100, threshold = 0.1)
marker_all <- marker_result[[1]]
marker_stages <- marker_result[[2]]
```
## Run SCOPRO
We run SCOPRO between the cluster of the mouse ESCs dataset and the in vivo stage "Late 2-cells".
The function SCOPRO first computes the mean expression profile of \emph{marker_stages_filter} genes for each cluster in the in vivo and in vitro dataset.
For a given cluster, a connectivity matrix is computed with number of rows and number of columns equal to the length of \emph{marker_stages_filter}. Each entry (i,j) in the matrix can be 1 if the fold_change between gene i and gene j is above \emph{fold_change}. Otherwise is 0.
Finally the connectivity matrix of Late 2-cells stage and all the clusters in the in vitro dataset are compared.
A gene i is considered to be conserved between Late 2-cells stage and an in vitro cluster if the jaccard index of the links of gene i is above \emph{threshold}.
There are 25 markers of the Late 2-cells stage that are also expressed in the mouse ESC datasets.
More than 75% of these 25 markers are conserved in the cluster number 2.
This result is expected since cluster 2 is made up by 2CLC, a rare population of cells known to be transcriptionally similar to the late 2 cells-stage in the mouse embryo development (typical markers of 2CLC are the Zscan4 genes, also highly expressed in the late 2 cells-stage).
```{r ,eval = FALSE}
marker_stages_filter <- filter_in_vitro(norm_es_vitro,cluster_es_vitro ,marker_all, fraction = 0.10, threshold = 0)
analysis_2cell <- SCOPRO(norm_es_vitro,norm_vivo,cluster_es_vitro,cluster_mouse_published,"Late_2_cell",marker_stages_filter, threshold = 0.1, number_link = 1, fold_change = 3, threshold_fold_change = 0.1 ,marker_stages, relevant_stages)
#png("/Users/gabriele.lubatti/Downloads/SCOPRO_1.png")
plot_score(analysis_2cell, marker_stages, marker_stages_filter, relevant_stages, "Late_2_cell", "Final score", "Cluster", "Late_2_cell")
#dev.off()
```
## Visualization of conserved/ not conserved genes between late 2 cells stage and in vitro clusters
We can visualize which are the markers of the late 2 cells stage that are conserved/ not conserved in cluster 2.
As expected the Zscan4 family genes are conserved.
```{r,eval = FALSE}
common_genes <- select_common_genes(analysis_2cell, marker_stages, relevant_stages, "Late_2_cell", cluster_es_vitro, "2")
no_common_genes <- select_no_common_genes(analysis_2cell, marker_stages, relevant_stages, "Late_2_cell", cluster_es_vitro, "2")
all_genes <- c(no_common_genes[1:4], common_genes[1:10])
all_genes_label <- c(paste0(no_common_genes[1:4], "-no_conserved"), paste0(common_genes[1:10], "-conserved"))
rabbit_plot <- plot_score_genes(all_genes, "Mouse ESC", "Mouse vitro", norm_es_vitro,norm_vivo[ , cluster_mouse_published=="Late_2_cell"],cluster_es_vitro, cluster_mouse_published[cluster_mouse_published == "Late_2_cell"], all_genes_label, 7, 10, "Late_2_cell")
#png("/Users/gabriele.lubatti/Downloads/SCOPRO_2.png")
rabbit_plot
#dev.off()
```
```{r}
utils::sessionInfo()
```