Theme Settings
Color Scheme
Light
Dark
Layout Mode
Fluid
Boxed
Topbar Color
Light
Dark
Brand

Q & A for GEPIA3

Q.

How to get the raw data from GEPIA?

Unfortunately, we cannot provide the raw data download since we don't own these data. Users could download the expression data from UCSC Xena project (UCSC Toil RNA-seq Recompute), the clinical data from GDC portal. All the expression values were calculated from RSEM.

Q.

What is the input when "Gene/Signature" button switched on?

The “Gene/Signatures” button is designed to allow users to choose between single gene expression or multi-gene signatures. GEPIA3 received first component value in Principle Component Analysis (PCA) as the signature for multiple gene expression. You can freely enter any gene inside the signature text box. The pre-compiled signatures provided are standardized markers of immune cell types, curated from Zhang et al. Cell (2017), Zhang et al. Nature (2018) and Guo et al. Nature Medicine (2018).

Q.

How did the TCGA drug treatment information curated? Does the procedure influence my results?

GEPIA3 adopted a multi-step quality control and processing procedure for TCGA drug treatment information to ensure the reliablity of the results.

Drug name standardization
GEPIA3 used drug names in the "pharmaceutical_therapy_drug_name" column. We corrected typos and standardized synonymous names manually with the following rules:
- For entries using brand names: the original brand names would be retained and the corresponding general names would be added (e.g. "Zelboraf" was annotated as "Vemurafenib" and "Zelboraf").
- For entries using target genes as names, we retained the annotation and would not add any other specific drug names to it such as "BRAF inhibitor".
- For drug with specific dosage: we annotated it with both drug name and drug name with dosage (e.g. "high dose Cytarabine" was annotated as "Cytarabine" and "Cytarabine-high dose").
- For combination therapy: if the therapy used specific drugs, we added these drug names into annotation. If the drug usage was not precise enough, only therapy names were included (i.e. "FOLFIRI" was annotated as "FOLFIRI, 5-Flourouracil, Leucovorin, Irinotecan", while "PLEK" was annotated as "PLEK").
- All the entries with invalid drug names were excluded such as "NOS", "Clinical trial" and "[Not Available]".
This curation retained original therapy information, enabling users to analyze drug responses with sub-types, such as drugs with specific brand or dosage. You can download the curated drug names for each entry here.

Quality control for patient receiving multiple treatments
TCGA record patient responses for each treatment. Thus, there might be different responses for a patient receiving treatment for many times. Thus, for patients with multiple records of receiving therapy, we implemented the following standardized approach to reduce ambiguous information:
- Records without valid drug responses (Complete Response/Partial Response/Stable Disease/Progressive Disease) were excluded. i.e. "[UNKNOWN]", "[NA]", or "[Not Available]" .
- For patients treated with different drugs across multiple administrations, all corresponding entries were retained.
- For multiple treatments with the same agent, if different responses recorded, all entries were excluded. Conversely, when identical responses were observed, only retain a single record for the patient's response to the drug.

Processing for combination therapy (patient receiving multiple drugs in one time treatment)
To evaluate therapeutic outcomes in patients undergoing combination therapy, we systematically documented all standardized treatment regimens (including CHOP, FOLFIRI, and CAV protocols) during initial data curation. To ensure statistical power, we excluded drugs with total sample size below 20. As a result, these combination groups were subsequently excluded due to insufficient cohort sizes (n < 20). Thus, GEPIA3 could not be used for multi-drug analysis.

Issues You Should to Be Aware Of
All individual drug components in combination therapy were preserved in our curated dataset. During analyses, each drug underwent independent evaluation, which means:
- GEPIA3 outputs each entry of a single drug, which represents univariable testing results for patients using this drug. It doesn't mean those patients only use this drug.
- In the result table showing multiple drugs, same patients might be counted in different drug terms, i.e. patients receiving combination therapies.
- For each drug in the result entry, every patient would only be counted one time. Particularly, a patient would not be presented in different response groups for one drug, and would not be presented in both with-drug and without-drug groups simultaneously.

Following our systematic and rigorous processing strategy, 76 drugs were included in TCGA patient drug response analysis. You can download the drug table with its sample size and corresponding name in cell line screen here.

Q.

What cell lines are used for drug screen?

The cell line type classification is annotated by CCLE. If you want to know the particular cells used for each primary site or cancer type, see the corresponding tables: - [GDSC/CREAMMIST ] [CTRP ]

Q.

What is the definition of RNA alteration?

RNA alterations involve changes to RNA molecules' sequence, structure or expression levels, affecting gene expression and protein synthesis. In GEPIA3, the term “RNA alterations” refers to transcript-level regulatory or structural variations that change expression patterns, including allele-specific expression (ASE), alternative promoter usage, and gene fusions. The RNA alterations reveal regulatory variations that are not detectable through expression abundance alone.

The datasets in this module are derived from the PCAWG (Pan-Cancer Analysis of Whole Genomes) project, which is recognized as the most comprehensive and well-curated source for such transcriptomic events. Specifically, the results for ASE and gene fusions are directly obtained from PCAWG annotations. For alternative promoter usage, the PCAWG group provided the promoter activity matrices which we have processed to compute group-level statistics and tumor-versus-normal t-test p-values.

Q.

What is alternative promoter? How do I apply this function to my research?

Gene expression is often regulated by alternaltive promoters. Dysregulation of gene transcriptional programmes that promote cancer onset and progression. Tumor-specific changing of promoter activity act as a regulator for aberrant gene expression. Thus, your could compare multiple promoter activity between tumor and peritumor tissues across cancer types. GO Analysis

Here we provided an example for the usage of "Alternative Promoter". The alternative promoter activities of BCAS3 gene in Bladder-TCC. BCAS3, a cancer development related gene in breast cancer, glioblastoma, and head and neck squamous cell carcinoma, shows no significant expression change in the TCGA-BLCA dataset. However, it demonstrates a marked shift in promoter usage in the ICGC Bladder-TCC dataset in GEPIA3, highlighting how promoter-level regulation can operate independently of total gene expression.
Images

Q.

What is allele-specific expression (ASE)? What do effect sizes present for in the heatmap?

Allelic expression imbalance (AEI) represents the differences in transcript levels between two alleles within the same sample, allowing for a more direct detection of cis-regulatory effects while minimizing the influence of extrinsic factors such as environmental influences or gene interactions. This provides valuable information that extends beyond total gene expression levels. In this function, we present effect sizes that quantify the magnitude of the relationship between genomic factors and allelic expression imbalance, as determined by a generalized linear model. This measure reflects the strength of each factor's contribution to AEI. Such information is particularly beneficial for exploratory analyses. Heatmaps enable compactly summarizing multivariate effects across genes and regulatory factors, providing a rapidly screen for patterns of interest for users. GO Analysis

Q.

How do I apply "eQTL" function in "Network Analysis" module in my research?

An eQTL is a locus that accounts for a portion of the genetic variance in gene expression phenotypes, typically involving SNPs associated with expression levels. By integrating eQTLs, GEPIA3 extends the expression-based analysis capabilities of GEPIA1/2, allowing users to explore not only how gene expression changes in cancer, but also the underlying mechanisms driving these changes. This function provides valuable insights into tumor-specific transcriptional patterns. GO Analysis

The eQTL dataset utilized in our study is derived from TCGA-based analyses, which link somatic variants to gene regulation in cancer. For instance, in GEPIA3, we demonstrate that rs71275967 is an eQTL for BRCA1 in lung squamous cell carcinoma (LUSC), indicating that this SNP is associated with the expression of BRCA1. The beta value of 0.163 signifies that each copy of the effect allele corresponds to a 0.163-unit increase in BRCA1 expression, while the FDR of 0.046 indicates that this result is statistically significant after adjusting for multiple testing.
Images

License Statement: All content on this website is freely available to all users, including for commercial use.