GEPIA 3

Help

Introduction to GEPIA

The GEPIA series has provided robust and widely used tools for analyzing gene expression data derived from TCGA, GTEx and ICGC, as well as cell line drug screen data. GEPIA enabled comprehensive studies of differential expression, survival analysis, and gene correlation.

Quick Start

Enter a gene/isoform symbol or gene/isoform ID (Ensembl ID) in the "Enter gene name" field, and click the "GoPIA!" button to search for the gene of interest.

Expression Calculation in GEPIA3

All functions were based on human reference genome hg38. GEPIA3 used RNA-seq gene or transcript expression in TCGA tumor, TCGA peritumor and GTEx normal tissues. The expression values were TPM (Transcripts Per Million) and expected counts calculated by RSEM from UCSC Toil RNA-seq Recompute. Cell line expression were calculated from CCLE RNA-seq samples. We downloaded RSEM calculated TPM (log2 transformed) from DepMap 22Q1. The gene and isoform biotype annotation were based on GENCODEv46. Our analysis framework permits optional user-selected normalization (log2 transformation) as input, except: (a) DESeq2 only accepts expected counts as input; (b) statistical methods requiring normality assumptions for expression analysis. We highly recommended using log2 transformed TPM as input in most common cases. Boxplots and scatter plots with expression values used log2 transformed TPM as marked in axis labels.
Geneset: all expression-based functions in GEPIA3 support the use of multiple genes as input, and then compute the signature score using the first principal component value derived from Principal Component Analysis (PCA) of the expression levels of the input genes.

The update from GEPIA2 to GEPIA3

GEPIA3 introduces substantial updates, including modules for drug sensitivity analysis that link gene expression profiles with therapeutic outcomes. Additionally, it incorporates advanced network analysis frameworks, leveraging eQTL, co-mutation, and protein interaction data to map regulatory landscapes. Comprehensive tools for studying multiple types of RNA alterations have also been added, enabling the investigation of diverse transcriptomic modifications. These innovations establish GEPIA3 as an indispensable resource for integrative and translational cancer genomics research.

Drug Analysis

TCGA Drug Response Analysis

This function enables users to compare gene/geneset expression level between patients with different drug responses annotated in TCGA.

Parameters

Gene: Input one or a group of genes of interest.
Methods: Select ANOVA or Wilcoxon for differential analysis.
log2 transformed: Select whether use log2(TPM+1) as expression abundance.
Datasets Selection: Select one or multiple cancer types of interest in the "Dataset Selection" field and click "add" to build dataset list in the "Datasets" field.
Groups: Choose which group to compare

Results

The table shows the results of differential expression analysis for gene expression levels between patients with different drug responses. Images

TCGA Drug Treatment vs Survival

This function provides users to compare gene expression-survival relationship between patients with and without specific drug treatment.

Parameters

Gene: Input a gene/isoform or geneset of interest.
Methods: Select the OS or PFS survival response.
Axis Units: Select Month or Day unit for plotting.
Datasets Selection: Select one or multiple cancer types of interest in the "Dataset Selection" field and click "add" to build dataset list in the "Datasets" field.
Colors: Choose color to plot KM curves
Group Cutoff: Select a suitable expression threshold for splitting the high-expression and low-expression cohorts.
Cutoff-High(%): Samples with expression level higher than this threshold are considered as the high-expression cohort.
Cutoff-Low(%): Cutoff-Low(%): Samples with expression level lower than this threshold are considered the low-expression cohort.

Results

The result table shows the results of Cox regression analysis for gene expression levels and patient survival between the drug and non-drug groups. click the button beside drug names to calculate the survival differences between the four patient groups: high expression with the drug, low expression with drug, high expression without drug, and low expression without drug. Kaplan-Meier (KM) survival curves will be generated for each pairwise comparison. Images

Cell Line Drug Screen

This function provides users to calculate correction between drug response and gene expression/copy number in CTRP, GDSC and CREAMMIST databases.

Parameters

Gene: Input a gene/isoform or geneset of interest.
Drug Screen Dataset Select the dataset to use.
Drug Sensitivity Metric Select the score for drug sensitivity measurement.
Cell Line Types: Select one or multiple cell types and click "+" to build the dataset.

Results

Cell Crispr Screens

This function allows users to compare the cell responses to CRISPR gene perturbation between drug-treated and untreated conditions.

Parameters

Gene: Input a gene for analysis. Gene ID is not supported, please use gene official name. A total of 17,382 genes were included in the screen, therefore, not all genes are available.
Treatment Datasets: Click "+" to select datasets for drug treatment. Data with more than five entries and data for nonexistent selected genes will be filtered out.

Results

DNA damage

original paper

Network Analysis

Comprehensive Analysis

This feature allows users to explore multiple interaction relationships of genes based on selected datasets and interaction types. STRING database annotations can also be optionally included.

Parameters

Gene Input: Input a list of gene symbols or Ensembl IDs of interest.
Interaction Types: Choose interaction types, including:
- Positive Correlation
- Anti-correlation
- Co-alternation
- Mutually Exclusive Alternation
- Synthetic Lethality (SL) / Synthetic Viability (SV)
- OncoPPI: Oncogenic Protein-Protein Interactions
Nodes Displayed: Choose the number of nodes you want to include in the network graph.
Datasets: Select cancer datasets of interest (e.g., TCGA Tumor). Multiple datasets can be chosen, or manual input separated by commas is supported.
STRING Annotation: Enable this option to include gene relationships from the STRING database.

Notes

Only gene pairs with an absolute Pearson Correlation Coefficient (PCC) greater than 0.5 are considered in the analysis of expression patterns.
When searching , the displayed network highlights the top k (Nodes Displayed input) nodes with the highest degrees connected to the target gene within the selected datasets and interaction patterns.

Results

Edges: A table summarizing gene pairs, interaction types, confidence measures (such as score for STRING and SL/SV, PCC for co-expression, and p-value/q-value for co/me-alteration and oncoPPI), and relevant cancer datasets.
Network Visualization: An interactive graph displaying gene relationships. Nodes represent genes, and edges represent interactions. Different colors are used to indicate interaction types.

Alteration Network

This feature allows users to identify and visualize gene alteration relationships, including co-alteration and mutually exclusive alteration patterns, within selected datasets and genes.

Parameters

Gene: Input a gene/isoform or multiple genes of interest.
Alteration Pattern: Select alteration pattern of interest.
ICGC Tumor/Selected Datasets: Select cancer types of interest in the "ICGC Tumor" field and click "add" to build dataset list in the "Selected Datasets" field. Also, manual input of cancer types split by comma (e.g. Kidney-RCC,Liver-HCC) is also acceptable. The interaction analysis is based on the datasets list.

Notes

We considered six types of alterations: alternative promoter usage, expression outliers, genetic variants, alternative splicing events, allele-specific expression, and copy number variations.
When searching , the displayed network highlights the top 10 nodes with the highest degrees connected to the target gene within the selected datasets and interaction patterns.

Results

Table:Genes that are co-altered or me-altered with the target gene, along with selected features of the former.
Heatmap Visualization: Visualizes the alteration status of selected genes across samples from the user-selected cohort. Each bar represents an individual sample, with red indicating alteration and gray indicating no alteration. The percentage on the right reflects the proportion of altered samples for each gene.
Network Visualization: An interactive graph displaying gene relationships. Nodes represent genes, and edges represent interactions. Different colors are used to indicate interaction types.

Expression Network

This feature allows users to search for genes that are most correlated with a target gene or geneset within selected datasets.

Parameters

Gene: Input a gene/isoform or geneset of interest.
Top # Genes: Set the number of top correlated genes to display. (Maximum: 1000)
Interaction Pattern: Select interaction pattern of interest.
Log Transformation: Apply log2 transformation to normalize the gene expression data.
TCGA Tumor/TCGA Normal/GTEx/Used Expression Datasets: Select cancer types of interest in the "TCGA Tumor", "TCGA Normal" or "GTEx" field and click "add" to build dataset list in the "Used Expression Datasets" field. Also, manual input of cancer types split by comma (e.g. COAD Tumor,READ Tumor) is also acceptable. The correlation analysis is based on the datasets list.

Results

Table: Displays the top correlated genes, their correlation coefficients, and the number of tumor, normal, and GTEx samples analyzed.
Network Visualization: For single gene input, the top correlated genes are displayed in a network, with annotation fron STRING database.

Protein Interaction

A structurally informed human protein–protein interactome reveals proteome-wide perturbations caused by disease mutations

Nature Biotechnology

Supplementary Table 3

33 TCGA cancer types

PIONEER

The Protein Data Bank (PDB): Co-crystal structures of PPIs were downloaded from the PDB FTP site (PDB FTP).
Interactome3D: Homologous structures for PPIs without co-crystal data were collected from Interactome3D.

A structurally informed human protein–protein interactome reveals proteome-wide perturbations caused by disease mutations

Parameters

Gene List: Input one or multiple genes to query their oncogenic protein interactions.
Datasets: Select TCGA cancer types (e.g., "CHOL Tumor", "BRCA Tumor") to filter relevant oncogenic interactions.

Results

Table: Summarizes oncogenic protein interaction details:
- Gene A: Input gene involved in the interaction.
- Gene B: Interacting partner gene.
- P-value A / P-value B: Statistical significance of the somatic mutation enrichment for each gene.
- FDR: Adjusted False Discovery Rate for multiple testing correction.
- Cancer Type: Specific cancer type where the interaction is identified.
- Source: Indicates the method used to define protein-protein interface regions.
Network Visualization: Visualizes oncogenic protein interactions, annotated with STRING database.

SLSV Analysis

"Harnessing genetic interactions for prediction of immune checkpoint inhibitors response signatures in cancer cells."

Parameters

Gene: Input a gene/isoform or multiple genes of interest.
Interaction Pattern: Select interaction pattern of interest.
Tissue/Selected Datasets: Select tissue of interest in the "Tissue" field and click "add" to build dataset list in the "Selected Datasets" field. Also, manual input of tissue split by comma (e.g. pancancer,blood) is also acceptable.

Results

Table: Displays SL and SV interaction results:
- Gene A: The queried gene symbol.
- Gene B: The interacting partner gene.
- Interaction Type: Indicates whether the interaction is SL or SV.
- Tissue: Specifies the tissue or cancer type where the interaction occurs.
Network Visualization: Visualizes genetic interactions as an interactive network, allowing for exploration of SL and SV relationships.

eQTL

PancanQTL: systematic identification of cis-eQTLs and trans-eQTLs in 33 cancer types.

Parameters

Gene: Input a gene/isoform or multiple genes of interest.
eQTL Type: Choose between "Cis-eQTLs" for local gene expression regulation or "Trans-eQTLs" for distant gene expression regulation.
TCGA Tumor/Selected Datasets: Select cancer types of interest in the "TCGA Tumor" field and click "add" to build dataset list in the "Selected Datasets" field. Also, manual input of cancer types split by comma (e.g. COAD Tumor,READ Tumor) is also acceptable. The interaction analysis is based on the datasets list.

Results

Table: Displays eqtls results:

Gene: The queried gene symbol.
SNP ID: The identifier of the single nucleotide polymorphism (SNP) associated with gene expression (eQTL).
Alleles: The observed allelic variants at the SNP locus (e.g., A/G, T/C), representing the different nucleotides in the population.
Cancer Type: The specific cancer type in which the eQTL association was identified.
Beta: The effect size indicating how much the SNP alters gene expression; a positive value suggests upregulation, while a negative value suggests downregulation.
P-value: The statistical significance of the eQTL association; lower values indicate stronger evidence against the null hypothesis of no association.
FDR: The false discovery rate–adjusted p-value accounting for multiple testing; controls the expected proportion of false positives.

RNA Alterations

Allele-Specific Expression

This function enables users to identify allelic expression imbalances in one or more genes, along with the factors driving these imbalances.

Parameters

Gene: Input one or a group of genes of interest.

Results

Table: Displays the allele-specific expression levels of the gene in cancer, as well as whether the gene is an imprinted gene, a cancer gene, or a cancer-testis gene.
Plot: Heatmaps that show the effect sizes of influencing factors and the input somatic burdens.
Effect size here refers to the magnitude of the relationship between the genomic factors and allelic expression imbalance, as quantified by generalized linear model. Zoom-in plots for somatic mutation factors are provided to ensure that finer details are not masked by factors with larger effect sizes.

Alternative Promoter

This function provides users with information on relative promoter activity and enables comparisons between tumor samples and their corresponding normal tissues.

Parameters

Gene: Input one or a group of genes of interest.
ICGC Tumor: Select cancer types of interest.

Results

Table: Shows the basic information of promoters and their average relative activities.
Barplot: Displays average activities in tumors and their corresponding normal tissues. Note that only the p-values < 0.10 will be annotated on the plot.

Gene Fusion

This function allows users to search for gene fusion events in specific cancer types.

Parameters

Gene: Input one or a group of genes of interest. Or enter "all" to search for all fusion events in the given cancer types.
ICGC Tumor: Select cancer types of interest.
show top n frequent genes: Set the number of genes to be displayed in the bar plot for gene frequency.

Results

Table: Displays the basic information of fusion events and the frequency of these events in the given cancer types.
Circos Plot: Visualization for the gene fusion events. Note that when searching for all events in a given cancer type, only the top n (set by users) most frequent genes will be annotated on the plot.
Barplot: Shows the top n (set by users) most frequent genes in the search result.

Other Updates

Differential Expression with Hotspot Mutation

This function enables users to compare gene/geneset expression level between patients with and without key hotspot mutations.

Parameters

Gene: Input a gene/isoform or geneset of interest.
Methods: Select ANOVA or Wilcoxon for differential analysis.
log2 transformed: Select whether use log2(TPM+1) as expression abundance.
Hotspot Mutation Type: Select the type of mutation. Gene level means any nonsilent mutation in the gene. Amino acid level means changes at specific amino acid site (HGSV protein-level).
Datasets Selection: Select one or multiple cancer types of interest in the "Dataset Selection" field and click "add" to build dataset list in the "Datasets" field.
Plot Color: Choose group color to plot differential expression boxplot and correlation scatter plot.

Results

The table shows the significant results of differential expression between hotspot mutant and wildtype cases. Click the button beside mutation to plot gene/geneset expression distribution of mutant and wildtype groups, as well as the expression correlation with the hotspot gene. Images

Functions

Gene Card

Enter a gene symbol or gene ID (Ensembl ID), and click the "GoPIA" button to quick search your gene of interest.

GEPIA provides an interactive bodymap to visualize the expression in both tumor and normal samples, as well as an expression profile that displays the gene's expression level across all tumor samples and normal tissues.

Profile Summary

Differential Genes

This function allows users to list the tumor/normal tissue differentially expressed genes or isoforms in a cancer type, and plot the chromosomal distribution of these genes.

Users can apply custom statistical methods and thresholds on a given dataset to dynamically obtain differentially expressed genes. For the DESeq2 and limma options, genes with higher |log2FC| values and lower q values than pre-set thresholds are considered differentially expressed genes.

Parameters

Dataset Selection: Select a cancer type of interest.
BioType: Type of genes or isoforms.
Normal samples: Select "TCGA normal + GTEx normal" or "TCGA normal" for matched normal data.
|log2FC| Cutoff: Set custom fold-change threshold.
q-value: Set custom q-value threshold.
Gene/Isoform: Query differentially expressed genes or isoforms.
Differential Methods: Select a method for differential analysis.

Results

Click the "List" button: GEPIA will generate a list of differentially expressed genes. Images

Prognostic Genes

This function enables users to search for genes most associated with patient survival based on a cancer type.

GEPIA uses the Log-rank test for hypothesis testing across all genes, with q-values calculated for multiple testing corrections.

Parameters

Datasets Selection: Select a cancer dataset of interest.
Biotype: Type of genes or isoforms.
Group Cutoff: Select a suitable expression threshold for splitting the high-expression and low-expression groups.
TimeResponse: Select the OS or PFI survival method.
Gene/Isoform: Query most prognosis-associated genes or isoforms.

Results

Click the "List" button: GEPIA will generate a list of prognostic genes. Images

Find Drugs

This function enables users to perform survival analysis based on TCGA patients using specific drug treatment.

GEPIA performs differential analysis of Overall Survival (OS) or Progression-free Interval (PFI) between patients with or without using specific drugs. GEPIA uses Log-rank test, a.k.a the Mantel-Cox test, for hypothesis test.

Parameters

Drugs: Select a drug of interest. Both generic name and trade name are provided.
TCGA Caner Types/Selected Datasets: Select one or multiple cancer types and click "+" to build the dataset.
TimeResponse: Select the OS or PFS survival response.
TimeUnit: Select Month or Day unit for plotting.
Colors: Choose color to plot.

Results

Click the "Run" button: GEPIA will generate survival curve comparing two groups (with or without drug treatment). Images

Analysis

Expression Analysis

Expression DIY

GEPIA plots expression profiles of a given gene according to selected datasets and statistical methods by cancer types or pathological stages. The details are shown below:

Profile DIY

GEPIA generates dot plots profiling gene expression across multiple cancer types and paired normal samples, with each dot representing a distinct tumor or normal sample.

Parameters

Gene: Input a gene/isoform of interest.
TCGA Tumor/Selected Datasets: Select cancer types of interest in the "TCGA Tumor" field and click "+" to build a dataset list in the "Selected Datasets" field. Manual input of cancer types split by comma (e.g. ACC,BRCA,BLCA) is also acceptable. The x-axis of the plot will follow the order of datasets.
Normal Samples: Select "TCGA normal + GTEx normal" or "TCGA normal" for matched normal data in plotting. [The matched normal samples for differential analysis are "TCGA normal + GTEx normal"]
Differential Methods: Statistical methods used for differential gene expression analysis.
|log2FC| Cutoff: Set custom fold-change threshold.
q-value: Set custom fold-change threshold.

Results

Click the “Run” button: GEPIA will generate a gene expression profile based on users' custom input parameters. Images

Boxplot

GEPIA generates box plots with jitter for comparing expression in several cancer types (For best visual quality, we recommend 1-4 cancer types).

Parameters

Gene: Input a gene or multiple genes of interest.
TCGA Tumor/Selected Datasets: Select cancer types of interest in the "TCGA Tumor" field and click "+" to build a dataset list in the "Selected Datasets" field. Manual input of cancer types split by comma (e.g. ACC,BRCA,BLCA) is also acceptable. The x-axis of the plot will follow the order of datasets.
Normal Samples: Select "TCGA normal + GTEx normal" or "TCGA normal" for matched normal data in plotting. [The matched normal samples for differential analysis are "TCGA normal + GTEx normal"]
|log2FC| Cutoff: Set custom fold-change threshold.
q-value: Set custom fold-change threshold.
High Group: Set the box color of tumor dataset and the font color for cancer types with elevated expression.
Low Group: Set the box color of normal dataset and the font color for cancer types with reduced expression.

Results

Click the “Run” button: GEPIA will present a gene expression box plot based on users' custom input parameters. Images

Correlation Analysis

This function performs pair-wise gene expression correlation analysis for given sets of TCGA and/or GTEx expression data, using methods including Pearson, Spearman and Kendall. One gene can be normalized by other gene.

GEPIA3 uses the log-scale for calculation and visualization.

Parameters

Gene A: Input a gene or multiple genes of interest. [For x-axis]
Gene B: Input a gene or multiple genes of interest. [For y-axis]
TCGA Tumor/TCGA Normal/GTEx/Selected Expression Datasets: Select cancer types of interest in the “TCGA Tumor”, “TCGA Normal” or “GTEx" field and click “add” to build dataset list in the “Used Expression Datasets” field. Also, manual input of cancer types split by comma (e.g. COAD Tumor,READ Tumor) is also acceptable. The correlation analysis is based on the datasets list.
Correlation Coefficient: The method for calculating the correlation coefficient.
use log2 (TPM+1) scale: Prior to computation, data are subjected to log2(TPM+1) scale to enhance robustness

Results

Click the “Run” button: GEPIA will generate a scatter plot of correlation analysis result. Images

Dimensionality Reduction

GEPIA performs Principal Component Analysis (PCA) for a given gene list using custom TCGA and/or GTEx expression data. First, GEPIA presents a 3D plot of top three principal components (PC) and generates a bar plot for variances interpreted by each PC. Second, GEPIA presents 2D plot or 3D plot based on user-specified PCs.

Parameters

Gene Set: Input a gene list of interest. Manual input of genes(e.g. ERBB2,EGFR)
TCGA Tumor/TCGA Normal/GTEx/Selected Expression Datasets: Select cancer types of interest in the “TCGA Tumor”, “TCGA Normal” or “GTEx" field and click “add” to build dataset list in the “Used Expression Datasets” field. Also, manual input of cancer types split by comma (e.g. COAD Tumor,READ Tumor) is also acceptable. The correlation analysis is based on the datasets list.
use log2 (TPM+1) scale: Prior to computation, data are subjected to log2(TPM+1) scale to enhance robustness

Results

Click the “Run” button: GEPIA will generate a scatter dimensionality reduction result. Images

Hotspot Mutation

GEPIA Update - Other Updates

Survival Analysis

This function enable users to perform survival analysis based on the expression status of one gene or a multi-genes' signature and plot a Kaplan-Meier curve. Some genesets are provided.

GEPIA performs Overall Survival (OS) or Progression-free Interval (PFI) analysis based on gene expression. GEPIA uses Log-rank test, a.k.a the Mantel-Cox test, for hypothesis test. Cohorts thresholds can be adjusted. The cox proportional hazard ratio and the 95% confidence interval information can also be included in the survival plot.

Parameters

Gene: Input a gene/isoform or geneset of interest.
TimeResponse: Select the OS or PFS survival response.
TimeUnit: Select Month or Day unit for plotting.
TCGA Tumor/Selected Datasets: Select one or multiple cancer types of interest in the "Dataset Selection" field and click "add" to build dataset list in the "Datasets" field.
Group Cutoff: Select a suitable expression threshold for splitting the high-expression and low-expression cohorts.
Cutoff-High(%): Samples with expression level higher than this threshold are considered as the high-expression cohort.
Cutoff-Low(%): Cutoff-Low(%): Samples with expression level lower than this threshold are considered the low-expression cohort.
Colors: Choose color to plot.

Results

Univariable

Multivariable

Differential Analysis

We use three methods for differential expression analysis using TCGA tumor samples with paired adjacent TCGA normal samples and GTEx normal samples.

Why do we use both TCGA normal and GTEx normal samples for differential analysis? TCGA produced 11,257 tumor samples across 33 cancer types, while this project only provides 1,475 normal samples. The imbalance between the tumor and normal data can cause inefficiency in various differential analyses. Fortunately, GTEx project produced RNA sequencing data for ~8,000 normal samples. Meanwhile, UCSC Xena project recomputed the TCGA and GTEx raw RNA-Seq data using a standard pipeline, which makes two datasets compatible. As a result, the TCGA and GTEx data could be integrated for very comprehensive expression analysis.

we consulted with multiple medical experts to determine the most appropriate tumor-normal comparisons. The comparisons and data we used are presented in the Dataset Sources - Tissues.

Differential methods

ANOVA

Considering the different stratifications of sex, age, ethnicity in tumor and normal samples, we applied four-way analysis of variance (ANOVA), using sex, age, ethnicity and disease state (Tumor or Normal) as variables for calculating differential expression:

Gene expression ~ sex + age + ethnicity + disease state

The expression data are first log2(TPM+1) transformed and the log2FC is defined as median(Tumor) - median(Normal).

The Benjamini and Hochberg false discovery rate (FDR) method was used to adjust the p-value in each factor to obtain the multiple testing adjusted q-value.

LIMMA

For an alternative method, we use the linear model and the empirical Bayes method implemented by the R package limma, with adjusted p-value (Benjamini and Hochberg FDR). The limma method leverages the highly parallel nature of genomic data, borrowing information between the gene-wise models.

Similarly, the expression data are first log2(TPM+1) transformed and the log2FC is defined as median(Tumor) - median(Normal).

Top 10

Cancer drug targets such as ERBB2, VEGF, are over-expressed in a subset (10-20%) of tumors and lowly expressed in all normal tissues. When discovering cancer drug targets, genes like ERBB2 and VEGF are ideal candidates as therapeutic targets. For this purpose, we implemented a method for detecting the genes that are over-expressed in only a subset of tumors for a given cancer type.

For each cancer type, we choose tumor samples that have the top 10% expression level for a given gene. For comparison, we choose the same number of normal samples that have the highest expression level for the same gene. We rank the tumor and normal samples by expression level and calculate the percentage of tumor samples in top 50% ranked list as the percentage value. The expression data are first log2(TPM+1) scaled and the log2FC is defined as median(Tumor) - median(Normal).

By default, we report the over-expressed genes as those passing following thresholds:

log2FC > 1, percentage > 0.9.

For example, CLEC3A is over expressed in breast cancer as the profile below:

Definition of differentially expressed genes

For the ANOVA and LIMMA methods, genes with higher |log2FC| values and lower q values than pre-set thresholds are considered differentially expressed genes.

For the Top 10 option, genes with higher log2FC values and higher percentage value than the thresholds are considered over-expressed genes.

Data Pre-processing

Data Collection

For resources selection, GEPIA3 collected datasets with the largest data size or the most comprehensive compiling of existing resources.

Functionalities	Resources	Selection criteria
Expression analysis	TCGA	Integrative pan-cancer multi-omic resources
Expression analysis	GTEx	Largest public expression datasource for normal tissues
Drug cell line responses	GDSC	Well-established systematic screening datasets for drug response in over 900 cancer cell lines
	CTRP
	CREAMMIST	Standardized and aggregated drug response in different projects, providing an integrative dose-response curve across datasets
Drug CRISPR screen	Olivieri et al. Cell. 2020	Collected from BioGRID ORCs database with following criteria: a) published, b) high throughput screens with gene size above 10000, c) human cells exposure to cancer drugs, d) systematic screens for more than 3 different drugs.
	Lau et al. Genome Biol. 2020
	Wang et al. Nucleic Acids Res. 2021
	Ramaker et al. BMC Cancer. 2021
Alteration network	PCAWG	Most comprehensive publicly available resource for gene alterations (both DNA and RNA) beyond expression abundance, uniformly processed across cancer types
Alteration network	PCAWG Transcriptome Core Group et al. Nature. 2020
Protein Interaction	Xiong et al. Nat Biotechnol. 2024	Integrates TCGA-wide mutation data with multi-source structure-defined interfaces (PDB, Interactome3D, PIONEER) to reveal enriched, functionally disrupted PPIs across cancers
eQTL	PancanQTL	The only dataset comprehensively providing both cis- and trans-eQTLs in multiple cancer types using TCGA data
SL/SV	CGIdb2.0	Systematically integrated database of SL data from 33 published studies and SV data from 11 studies, currently one of the most comprehensive SL/SV datasets available
Allele-specific Expression	PCAWG Transcriptome Core Group et al. Nature. 2020	Most comprehensive publicly available resource for RNA alterations beyond expression abundance, uniformly processed across cancer types
Alternative Promoter	PCAWG
Gene Fusion	PCAWG

RNA Alteration Data Quality Control

For RNA Alterations, to ensure consistency and reliability, GEPIA3 uses data from the PCAWG project, which provides high-quality and uniformly processed data across a wide range of cancer types.

For the ASE and Gene Fusion modules, GEPIA3 directly adopts the processed results from PCAWG without further modification. The ASE data quantify allelic expression imbalances and summarize the effect sizes of influential factors as well as input mutational burden categories. The gene fusion data contains curated fusion events across different cancer types.

The Alternative Promoter module is based on promoter activity matrices from PCAWG. To enable fast queries and visualization, we applied several preprocessing steps. Sample metadata from UCSC Xena are used to classify samples by cancer type and tissue origin (tumor vs. normal). For each promoter, we compute the mean and standard deviation of relative activity within each group, count the number of non-missing observations, and perform two-sample t-tests to assess differential activity between tumor and normal tissues.

Network Data Quality Control

We used the gene centric binary table from ICGC PCAWG3 group as the input alterations. Firstly, we filtered out the alterations occurred in less than 5 samples. We also ruled out the co-occurrent candidate gene pairs located in the same chromosome to eliminate the bypass co-occurrence. Secondly, we used the DISCOVER tool to test the significance of co-occurrence or mutual exclusivity. Thirdly, we used the BH method for multiple test correction. In the downstream analysis if not specially mentioned, we only include the 8 cancer types with more than 50 tumor samples: Kidney-RCC, Lymph-BNHL, Liver-HCC, Ovary-AdenoCA, Breast-AdenoCA, Panc-AdenoCA, Lymph-CLL and ColoRect-AdenoCA and only include 6 types of alterations (alternative promoter , expression outlier, variants, alternative splicing, allele specific expression and copy number variations) because fusion and RNA editing only involve small number of alterations.

Results Download

The PDF and the SVG download is available by clicking the button nearby the results.

License Statement: All content on this website is freely available to all users, including for commercial use.

Theme Settings

Color Scheme

Light

Dark

Layout Mode

Fluid

Boxed

Topbar Color

Light

Dark

Brand

Sidebar Size

Default

Condensed

Hover View

Help

Introduction to GEPIA

Quick Start

Expression Calculation in GEPIA3

The update from GEPIA2 to GEPIA3

Drug Analysis

Network Analysis

SLSV Analysis

Parameters

Results

RNA Alterations

Other Updates

Functions

Gene Card

Profile Summary

Analysis

Expression Analysis

Survival Analysis

Drug Analysis

Network Analysis

RNA Alterations

Differential Analysis

Differential methods

Definition of differentially expressed genes

Data Pre-processing

Data Collection

RNA Alteration Data Quality Control

Network Data Quality Control

Results Download