Gene Set Enrichment Analysis
1. 介绍
请务必熟悉关于差异表达分析和分子变量的教程。要更深入地了解差异表达分析(DEA)的结果,可以查找在所提出的聚类中富集或基于组织学创建的基因集。SPATA2实现了hypeR包,该包使用超几何检验来检测富集的基因集。
# load required packages library(SPATA2) library(tidyverse)# load SPATA2 inbuilt example data #object_t269 <- loadExampleObject("UKF269T", process = TRUE, meta = TRUE) object_t269<- readRDS("object_t269.rds") object_t269 <- updateSpataObject(object_t269)# plot histology plotSurface(object_t269, color_by = "histology", pt_clrp = "npg")# plot bayes space cluster plotSurface(object_t269, color_by = "bayes_space", pt_clrp = "uc")2. 计算GSEA
函数 runGSEA() 执行计算。它需要 runDEA() 产生的结果,该函数用于执行差异表达分析。默认情况下,该函数使用 SPATA2 对象中保存的所有基因集。如果您不希望对某些基因集进行测试,可以提供一个子集化的基因集列表。默认示例对象 object_t269 是通过 initiateSpataObjectVisium() 创建的,其中包含多种类别的多个基因集(如 Hallmark、Biocarta 等),这些基因集均带有相应的前缀(如 HM 或 BC)。
gs_list <- getGeneSetList(object_t269) head(gs_list)## $BC_41BB_PATHWAY ## [1] "IFNG" "TNFSF9" "IL4" "NFKBIA" "TNFRSF9" "IL2" "MAP4K5" ## [8] "MAPK8" "MAP3K5" "CHUK" "MAPK14" "RELA" "TRAF2" "ATF2" ## [15] "IKBKB" "NFKB1" "MAP3K1" "JUN" ## ## $BC_ACE2_PATHWAY ## [1] "REN" "COL4A1" "AGT" "CMA1" "COL4A6" "ACE2" "ACE" "COL4A5" ## [9] "AGTR2" "AGTR1" "COL4A2" "COL4A3" "COL4A4" ## ## $BC_ACETAMINOPHEN_PATHWAY ## [1] "CYP1A2" "CYP2E1" "PTGS2" "PTGS1" "NR1I3" ## ## $BC_ACH_PATHWAY ## [1] "FASLG" "PIK3CG" "RAPSN" "PIK3R1" "FOXO3" "YWHAH" "CHRNG" "CHRNB1" ## [9] "BAD" "PTK2B" "TERT" "MUSK" "AKT1" "PIK3CA" ## ## $BC_ACTINY_PATHWAY ## [1] "ACTA1" "WASF1" "PSMA7" "NTRK1" "RAC1" "WASF2" "ABI2" "NCK1" ## [9] "PIR" "NCKAP1" "WASF3" "WASL" ## ## $BC_AGPCR_PATHWAY ## [1] "GRK4" "GNAS" "PRKACG" "PRKAR1A" "PRKAR2B" "PRKCB" "PRKACB" ## [8] "ARRB1" "PRKAR1B" "PRKCA" "PRKAR2A"length(gs_list)## [1] 11654为了清晰起见,本示例仅使用Biocarta(BC)和Hallmark(HM)基因集。
# subset gene sets that start with HM or BC gs_list_sub <- getGeneSetList(object_t269, class = c("BC", "HM")) length(gs_list_sub)## [1] 339如果只想在基因集的子集上运行GSEA,请使用参数签名。如果要针对SPATA2对象_t269中存储的所有基因集进行测试,请将其留空。
set.seed(1) sample(gs_list_sub, size = 3)## $HM_NOTCH_SIGNALING ## [1] "JAG1" "NOTCH3" "NOTCH2" "APH1A" "HES1" "CCND1" "FZD1" ## [8] "PSEN2" "FZD7" "DTX1" "DLL1" "FZD5" "MAML2" "NOTCH1" ## [15] "PSENEN" "WNT5A" "CUL1" "WNT2" "DTX4" "SAP30" "PPARD" ## [22] "KAT2A" "HEYL" "SKP1" "RBX1" "TCF7L2" "ARRB1" "LFNG" ## [29] "PRKCA" "DTX2" "ST3GAL6" "FBXW11" ## ## $BC_MEF2D_PATHWAY ## [1] "HDAC2" "HDAC1" "CALM1" "MEF2D" "CAPN2" "NFATC2" "PPP3CA" "PRKCB" ## [9] "CALM2" "PPP3CB" "CAPNS1" "CALM3" "CABIN1" "EP300" "PRKCA" "NFATC1" ## [17] "CAPNS2" "PPP3CC" ## ## $BC_HSWI_SNF_PATHWAY ## [1] "SMARCD1" "NF1" "NR3C1" "SMARCC2" "SMARCA4" "ARID1A" "SMARCC1" ## [8] "ACTB" "SMARCB1" "TBP" "GTF2A1" "SMARCE1"object_t269 <- runGSEA( object = object_t269, across = "histology", background = 21563, signatures = gs_list_sub )3. 提取结果
可以通过以下函数手动提取结果。getGseaResults() 会提取一个 hypeR 对象列表(每个组对应一个对象)。getGseaResultsDf() 会提取一个数据框,该数据框是将所有 hypeR 对象的数据合并后的结果。每个基因集所属的组别信息保存在根据分组变量命名的变量/列中——此处为 histology(组织学类型)。
getGseaDf( object = object_t269, across = "histology", method_de = "wilcox", n_gsets = 3# extract top 20 most significant gene sets )## # A tibble: 9 × 10 ## # Groups: histology [3] ## histology label pval fdr signature geneset overlap background hits ## <fct> <fct> <dbl> <dbl> <int> <int> <int> <dbl> <chr> ## 1 tumor HM_E… 6.80e-47 7.90e-43 5082 200 143 21563 AURK… ## 2 tumor HM_I… 7.70e-39 4.50e-35 5082 196 132 21563 STAT… ## 3 tumor HM_G… 5.20e-34 2 e-30 5082 196 126 21563 AURK… ## 4 transition BP.G… 1.3 e-10 1.20e- 6 1192 856 94 21563 TANK… ## 5 transition CC.G… 2 e-10 1.20e- 6 1192 914 98 21563 PARP… ## 6 transition MF.G… 3.20e-10 1.20e- 6 1192 200 36 21563 CHD1… ## 7 infiltrated CC.G… 1.90e-61 2.3 e-57 1950 1332 315 21563 CDH8… ## 8 infiltrated BP.G… 2.90e-47 1.70e-43 1950 733 198 21563 CDH8… ## 9 infiltrated CC.G… 4.6 e-38 1.80e-34 1950 1317 266 21563 HDAC… ## # ℹ 1 more variable: overlap_perc <dbl>4. Plotting results
基因集富集结果可以通过点图进行可视化。这可以通过设置 by_group = TRUE 按组进行,也可以合并所有组的结果进行展示。图2展示了聚类0和聚类5的富集情况,突出了该区域与缺氧相关的活性。
plotGseaDotPlot( object = object_t269, across = "histology", across_subset = c("tumor", "infiltrated"), n_gsets = 10, by_group = TRUE, transform_with = list(fdr = log10), nrow = 2 )使用 by_group = FALSE 时,结果会合并到一个图表中。
plotGseaDotPlot( object = object_t269, across = "histology", color_by = "histology", n_gsets = 7, pt_alpha = 0.8, transform_with = list(fdr = log10), by_group = FALSE# merge in one plot )