图片
前言Hello小伙伴们大家好,我是生信技能树的小学徒”我才不吃蛋黄“。今天是胃癌单细胞数据集GSE163558复现系列第七期。第六期我们根据TCGA数据库中胃癌和正常胃组织之间的差异表达基因,定义了每个上皮细胞的恶性和非恶性评分。本期,我们将分析恶性上皮细胞G0-G4的Marker基因并绘制热图和小提琴图,此外,我们还将使用AddModuleScore_UCell函数计算细胞的增殖和迁移评分。
1.背景介绍细胞增殖是生物体的重要生命特征,细胞以分裂的方式进行增殖,是生物体生长、发育、繁殖以及遗传的基础。在肿瘤多阶段演进的早期阶段,肿瘤细胞在原发灶无限增殖。增殖能力强是肿瘤细胞恶性程度高的重要标志之一。除了在原发灶增殖,肿瘤还可以发生转移,即肿瘤细胞在远离其起源部位的器官中生长,转移是多数肿瘤的最终且最致命的表现。本数据集有多个胃癌转移样本:6名患者的10个新鲜人体组织样本,包括3个原发性肿瘤样本(PT)、1个邻近非肿瘤样本(NT)和6个转移样本(M)。转移样本包括 2个肝脏转移样本(Li)、2个淋巴结转移样本(LN)、1个腹膜转移样本(P)和1个卵巢转移样本(O)。肿瘤细胞远处转移的过程如下:肿瘤细胞侵袭力增强;从原发部位突破血管/淋巴管,进入循环;从血管再次突破进入组织定植于远处器官,最终在远处器官中增殖。在上述过程中,肿瘤细胞具有不同表型,并在肿瘤微环境中,与其周围免疫细胞和基质细胞相互作用,以支持瘤细胞生长,并帮助瘤细胞逃避免疫系统的监视。为了更好的突破组织和血管内皮,肿瘤细胞会发生上皮-间充质转化(Epithelial-Mesenchymal Transition,EMT),即上皮细胞向间充质细胞表型转化的过程,在EMT发生过程中,肿瘤上皮细胞向间充质细胞转化,表现为细胞形态和功能的双重改变,形态上由多边形或鹅卵石状转变成细长的纺锤状或梭形,功能上细胞极性消失、细胞骨架改变、细胞间去粘连化以及获得侵袭运动能力等;这些表型改变会使得细胞间黏附度降低,迁移运动特性增强。使用AddModuleScore_UCell函数计算细胞的增殖和迁移评分,可以协助我们评估不同恶性上皮亚群的增殖、转移潜力和恶性程度。
2.数据分析2.1 富集分析首先清除系统环境变量,设置工作目录,加载R包,读取恶性上皮Seurat数据:
rm(list=ls())getwd()setwd('6-TCGA_STAD/')library(tidyverse)library(tinyarray)library(data.table) library(Seurat)scRNA = readRDS('malignant.rds')
使用Seurat内置函数FindMarkers,以G1为对照,分析恶性上皮细胞各亚群(G0-4)Marker基因:
head(scRNA@meta.data)Idents(scRNA) = scRNA$celltypect = levels(scRNA@active.ident)ct1 = c("G3", "G2", "G4", "G0")all_markers = lapply(ct1, function(x){ # x = ct[1] print(x) markers <- FindMarkers(scRNA, group.by = "celltype", logfc.threshold = 0.1, ident.1 = x, ident.2 = ct[5]) #markers_sig <- subset(markers, p_val_adj < 0.1) return(markers)})
对all_markers的list重命名,然后以“p_val_adj < 0.01”标准筛选差异表达基因:
length(all_markers)names(all_markers) = ct1lapply(all_markers,nrow)all_markers_sig = lapply(all_markers, function(x){ markers_sig <- subset(x, p_val_adj < 0.01)})
在差异表达基因的基础上进行富集分析,循环绘制KEGG和GO上下调基因富集条形图:
plot = list()for (i in 1:length(all_markers_sig)){ deg = all_markers_sig[[i]] deg$change = 'unknown' deg[deg$avg_log2FC >2,]$change = 'up' deg[deg$avg_log2FC < -2,]$change = 'down' table(deg$change) entrezIDs = bitr(rownames(deg), fromType = "SYMBOL", toType = "ENTREZID", OrgDb= "org.Hs.eg.db", drop = TRUE) gene<- entrezIDs$ENTREZID marker_new = deg[rownames(deg) %in% entrezIDs$SYMBOL,] identical(rownames(marker_new) , entrezIDs$SYMBOL) p = identical(rownames(marker_new) , entrezIDs$SYMBOL);p if(!p) entrezIDs = entrezIDs[match(rownames(marker_new) ,entrezIDs$SYMBOL),] marker_new$ENTREZID = entrezIDs$ENTREZID a = double_enrich(marker_new,n = 5) a plot[[i]] <- a}
plot是包含多个富集分析条形图的list,我们可以分别提取查看,比如查看G3相对于G1(plot[[1]])差异表达基因的GO富集分析的条形图(G3_G1$gp):
G3_G1 = plot[[1]]G3_G1$gp
图片
图片
图片
2.2 绘制基因表达热图在这里,我们取了300个细胞,绘制了'CD63','CLDN4','EGR1'等基因的热图:
Idents(scRNA)scRNA1 = scRNAscRNA1 <- ScaleData(scRNA1,features = rownames(scRNA1))cells = subset(scRNA1,downsample=300)##取其中的300个细胞,为了图好看rownames(cells)gene_order <- c('CD63','CLDN4','EGR1','SRGN','VIM','LAPTM5','AGR2','MT1C','S100A6','PLCG2','SAT1','TSPYL2')gene <- factor(gene_order, levels = unique(gene_order))cells2 <- cells[rownames(cells) %in% gene,]df <- as.data.frame(AverageExpression(object = cells2)$RNA)df <- df[gene_order, ]df = na.omit(df)pheatmap(df, cluster_rows = FALSE, cluster_cols = FALSE, show_colnames = TRUE, scale = "row",gaps_row = c(seq(3, 11,3)), gaps_col = c(1:4))ggsave('gene_heatmap.pdf',width = 12,height = 8)
图片
2.3 绘制CD44表达小提琴图数据准备:
首先提取scRNA中基因表达矩阵raw.data:
raw.data = as.matrix(scRNA@assays$RNA$counts)raw.data[1:6,1:6]length(colnames(raw.data))
提取CD44表达数据框data.frame
a = scRNA@assays$RNA$datarownames(a)b = as.data.frame(a['CD44',]) colnames(b) = 'CD44'identical(rownames(b),colnames(a))
设置raw.data列名为celltype名,将raw.data赋值为data:
colnames(raw.data) = scRNA$celltypelibrary(ggcorrplot)library(ggthemes)data = raw.datacolnames(data)table(colnames(data))
图片
创建数据框dat:
rownames(data)dat = data.frame(expression = b,group = colnames(data))dat = as.data.frame(dat)
将dat$group设置为因子型变量,重新设置levels顺序G0-G4,str函数查看数据框dat变量,na.omit函数删除NA值,数据准备完毕后,使用ggplot绘制小提琴图:
dat$group=factor(dat$group, levels = c("G0","G1","G2","G3","G4"))str(dat$group)str(dat$CD44)dat = na.omit(dat)p = ggplot(dat = dat,mapping = aes(x = group,y = CD44)) +geom_violin(scale = "width",adjust =1,trim = TRUE,mapping = aes(fill = group)) + theme_few() +scale_fill_manual(values = mycolors)+ geom_jitter(width = 0.35, size = 1.1, color = "black") + # 添加点,可以调整width和size参数 theme(axis.text.x =element_text(size=20), axis.text.y=element_text(size=20))+ labs(x="",y="Expression Level",title = "CD44")+ theme(plot.title = element_text(hjust = 0.5)) + theme(plot.title = element_text(size=25))+ NoLegend()p
图片
设定参考组,添加显著性标记(星号):
library(ggpubr)p+stat_compare_means(method = "anova", label.y = 3.5)+ # Add global p-value stat_compare_means(label = "p.signif", method = "t.test", ref.group = "G0") # Pairwise comparison against reference
图片
自己设定对比,添加显著性标记(p值):
compaired <- list( c("G0", "G1"), c("G1", "G2"), c("G2", "G3") ,c("G3", "G4"))p + stat_compare_means(comparisons=compaired,method = "t.test")
图片
自己设定对比,添加显著性标记(星号):
p+geom_signif(comparisons = compaired,step_increase = 0.1,map_signif_level = T,test = t.test)
图片
CD44相关基因表达热图(同上):
gene_order <- c('CD44','PROM1' ,'LGR5','SOX2','TFRC','CXCR4' ,'JAG1' )gene <- factor(gene_order, levels = unique(gene_order))cells2 <- cells[rownames(cells) %in% gene,]df <- as.data.frame(AverageExpression(object = cells2)$RNA)df <- df[gene_order, ]df = na.omit(df)pheatmap(df, cluster_rows = FALSE, cluster_cols = FALSE, show_colnames = TRUE, scale = "row")
图片
2.4 计算增殖和迁移评分首先,根据原文,定义增殖和迁移评分list:proliferation = c('MKI67','IGF1','ITGB2','PDGFC','JAG1','PHGDH');migration = c('VIM','SNAI1','MMP9','AREG','ARID5B' ,'FAT1'),然后使用AddModuleScore_UCell函数计算评分:
library(UCell)proliferation = c('MKI67','IGF1','ITGB2','PDGFC','JAG1','PHGDH')migration = c('VIM','SNAI1','MMP9','AREG','ARID5B' ,'FAT1')marker <- list(proliferation,migration)#将基因整成listnames(marker)[1] <- 'proliferation'names(marker)[2] <- 'migration'score <- AddModuleScore_UCell(scRNA, features=marker)
准备包含增殖/迁移评分的数据框data(同上),然后绘制小提琴图:
raw.data = as.matrix(score@assays$RNA$counts)raw.data[1:6,1:6]length(colnames(raw.data))rownames(score)a = score$proliferation_UCellcolnames(raw.data) = score$celltypelibrary(ggcorrplot)library(ggthemes)data = raw.datacolnames(data)table(colnames(data))b = data.frame(expression = a,group = colnames(data))data = as.data.frame(b)dat$group=factor(dat$group, levels = c("G0","G1","G2","G3","G4"))str(data$group)str(data$expression)data = na.omit(data)p = ggplot(dat = data,mapping = aes(x = group,y = expression)) +geom_violin(scale = "width",adjust =1,trim = TRUE,mapping = aes(fill = group)) + theme_few() +scale_fill_manual(values = mycolors)+ geom_jitter(width = 0.35, size = 1.1, color = "black") + # 添加点,可以调整width和size参数 theme(axis.text.x =element_text(size=20), axis.text.y=element_text(size=20))+ labs(x="",y="Proliferation score")+ theme(plot.title = element_text(hjust = 0.5)) + theme(plot.title = element_text(size=25))+ NoLegend()p
图片
自己设定对比,然后添加显著性标记p值:
compaired <- list( c("G0", "G1"), c("G1", "G2"), c("G2", "G3") ,c("G3", "G4"))p + stat_compare_means(comparisons=compaired,method = "t.test")
图片
结语本期,我们分析了恶性上皮细胞G0-G4的Marker基因并绘制热图和小提琴图,并使用AddModuleScore_UCell函数计算细胞的增殖和迁移评分。下一期,我们将正式进入单细胞测序高级分析,使用monocle2进行拟时序分析(Pseudo-time analysis)。干货满满,欢迎大家持续追更,谢谢!
图片
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报。- 上一篇:张瑜:特朗普2.0的初感受与再思考
- 下一篇:没有了