SPEED: (S)ignaling (P)athway (E)nrichment using (E)xperimental (D)atasets

 Home   Run SPEED   Download Annotations   Database Statistics   Suggest Pathways   Help 

Contents:

Running SPEED Pathway Enrichment
Downloading SPEED Signature Genes
Viewing SPEED Database Statistics
Python Scripting
Citing SPEED
Changelog


Running SPEED Pathway Enrichment:

SPEED uses a Fisher's exact test with FDR correction to identify a significant overlap between an input gene list and each of the SPEED pathway signature gene sets. The Run SPEED page includes a form for submitting gene lists in the following formats: Entrez Gene, gene symbol, Uniprot, GI number, Refseq, Ensembl, and IPI. We recommend the UniProt ID Mapping service for converting your gene list to an accepted format. Currently only human genes are supported.

The Run SPEED form includes 6 required fields:
1. Gene input - Your input gene list, one gene per line, should be entered in the text box labeled "Please enter your gene list below". Example gene lists can be pasted in the box by clicking any of the seven "Example" buttons. Example gene lists are collected from the following publications:
TGFB:Classen et al.(2007)
MAPK_PI3K:Tullai et al.(2004)
TLR:Malcolm et al. (2003)
TNFa:von Bernuth et al. (2008)
IL-1:Misior et al.(2009)
JAK-STAT:Indraccolo et al. (2007)
Wnt:Zirn et al. (2006)


2. Pathways to Include - Here, the pathways to be searched against your input list should be checked. Recommended pathways are selected by default.

SPEED signature genes are genes that are differentially expressed across many experiments. The following 2 fields allow the user to select the amount of differential expression and overlap across experiments. Default values are suggested in the input text boxes.
3. Max. absolute z-score percentile - Absolute z-scores in SPEED measure the number of standard deviations a gene expression ratio is above or below the mean in the experiment. All ratios are ranked by |z-score| (%) per experiment. Each SPEED signature gene must be ranked better (in a lower percentile) than the maximum absolute z-score percentile in many experiments to qualify.

4. Min. percent overlap across experiments - The number of experiments (arrays) in which a SPEED signature gene must be differentially expressed can be changed using the minimum overlap parameter. The minimum overlap is evaluated as a fraction of experiments, i.e. 33% minimum overlap suggests that a gene must be differentially expressed in at least one-third of the experiments to be considered a SPEED signature gene.

Aditionally, SPEED signature genes must not arise from low expression values to avoid noise.
5. Max. expression level percentile - All genes are ranked by average expression level (%) per experiment. Each SPEED signature gene must be ranked better (in a lower percentile) than the maximum expression level percentile to qualify, i.e. 40% maximum expression level percentile suggests that each gene must be expressed better than at least 60% of the other genes in the experiment.

Finally, SPEED signature genes can either be unique for each pathway or overlap across pathways.
6. Unique signature genes - If the "Signature genes must be unique" check box is selected, then the additional constraint of uniqueness is added to each signature gene and represents parallel pathways. On the other hand, leaving the box unchecked represents pathway crosstalk by allowing pathways to share signature genes.

The form also provides 2 optional fields:
7. Download results - The "Download results as tab delimited text" check box can be selected to print the results to a text file. The text file contains the Fisher's exact test results in comment lines marked with a pound (#) sign. A table, with a commented header row, containing SPEED pathway annotations for each identified input gene in tab delimited format is included next.

8. Incl. Background List - By default, SPEED considers all genes with expression values in the SPEED database as the background. Optionally, a user can provide a gene list in a text file as the background set. Each gene must be stored in a separate line in one of the acceptable formats listed above. There should be no comments or headers in the file. Each gene must be followed by a space, tab, or new line.

Parameter Selection:

The notion of a signature gene is that it is regulated by a given pathway and that its differential expression can be repeatedly observed in multiple experiments. The third parameter (Max. absolute z-score percentile) controls the level of differential expression. A lower z-score cutoff (more lenient, ex: top 10%) allows more regulated genes to be considered, while a higher cutoff (mor stringent, ex: top 1%) requires greater levels of differential expression.
The forth parameter (Min. percent overlap across experiments) controls the degree of consistency required for a gene to be considered as a signature gene. Lower values (ex: 20%) suggest lesser consistency, while higher values (80%) require a gene to be differentially regulated in almost all the experiments. We find that a stringent z-score percentile with a lenient percent overlap (or vice-versa) provides a better balance between number of genes considered to be signature genes and level of confidence in the assignment.
It is also important to note that differential expression ratios derived from microarrays are noisy and low expression measurements can lead to incorrect differential expression ratios. Therefore the fifth parameter (Max. expression level percentile) should reflect your trust in low expression measurements, which are averaged across replicates and probes for a given gene. A higher (more lenient) max. expression level percentile (ex: 80%) would allow more genes to be considered as signature genes, while a lower (stringent) percentile (ex: 20%) would allow fewer genes to be considered.

Downloading SPEED Signature Genes:

All SPEED signature gene annotations can be downloaded at the Download Annotations page in tab delimited text format based on parameters described above. The entire database (compressed .zip) can also be downloaded as SQLite dump or tab delimited text from the same page.

Viewing SPEED Database Statistics:

SPEED data is collected from the Gene Expression Omnibus (GEO) at NCBI. The page displays the GEO accessions used to collect gene expression data for each SPEED pathway. Clicking the pathway names displays further details about each experiment for the corresponding pathway.

Python Scripting:

SPEED source code is downloaded with the SQLite database at Download Annotations page. SPEED is programmed using Python 2.5.1 and has dependencies on the scipy, which must also be installed. Scripts that import the SPEED source (speed_functions.py) should be placed in the same directory with the database in a child directory called "other_scripts". This is the default configuration and can be altered by editing speed_functions.py. The following code will run SPEED from a python script:

import speed_functions as speed
# inputs gene_list and background_genes are lists of integers in Entrez Gene ID Format
# if background_genes is an empty list, all genes in the SPEED database are considered
gene_list = [102,240,393,677,1263,2323,2771,2956,3066,3303]
background_genes = []

# parameters to extract signature genes controlling differential expression, overlap across experiments, and minimum expression level
zscore_percent = 1.0
overlap = 20.0
discard_percent = 50.0

# 1 if unique constraint, 0 otherwise
unique = 0

# speed.default_pathways lists the 4 default pathways, or specify pathways to restrict search 
incl_pathways = ['TGFB', 'TLR', 'MAPK_PI3K', 'JAK-STAT']

# run SPEED algorithm
fishers, p_values, fdr, signature_genes_in_list = speed.run_algorithm(gene_list, background_genes, zscore_percent, overlap, discard_percent, unique, incl_pathways)

# code returns dictionaries with pathways as keys and (1) fishers tests, (2) p-values, (3) fdr, and (4) signature genes in input list as values.

Citing SPEED:

If you use SPEED for your own research, please provide a reference to "http://speed.sys-bio.net"
and cite the speed paper:
Parikh JR, Klinger B, Xia Y, Marto JA, Blüthgen N. Discovering causal signaling pathways through gene-expression patterns. Nucleic Acids Res. 2010 Jul 1;38 Suppl:W109-17. (pubmed)

Changelog:

7.8.2011 Fixed bug: Background lists with line ends containing only CR were not processed - NB

The SPEED project is hosted by the Group of Systems Biology of Molecular Networks at the Laboratory of Molecular Tumor Pathology (Charite) and the Institute of Theoretical Biology (HU Berlin).

Please contact Nils Blüthgen at nils.bluethgen@charite.de for more information.