Home | Run SPEED | Download Annotations | Database Statistics | Suggest Pathways | Help |
SPEED uses a Fisher's exact test with FDR correction to identify a significant overlap between an
input gene list and each of the SPEED pathway signature gene sets. The
Run SPEED page includes a form for submitting
gene lists in the following formats: Entrez Gene, gene symbol, Uniprot, GI number, Refseq, Ensembl, and IPI. We recommend the UniProt ID Mapping service
for converting your gene list to an accepted format. Currently only human genes are supported.
The Run SPEED form includes 6 required fields:
1. Gene input - Your input gene list, one gene per line, should be entered in the
text box labeled "Please enter your gene list below".
Example gene lists can be
pasted in the box by clicking any of the seven "Example" buttons. Example gene lists are collected from the following publications:
TGFB: | Classen et al.(2007) |
MAPK_PI3K: | Tullai et al.(2004) |
TLR: | Malcolm et al. (2003) |
TNFa: | von Bernuth et al. (2008) |
IL-1: | Misior et al.(2009) |
JAK-STAT: | Indraccolo et al. (2007) |
Wnt: | Zirn et al. (2006) |
The notion of a signature gene is that it is regulated by a given pathway and that its differential expression can be repeatedly observed in multiple experiments. The third parameter (Max. absolute z-score percentile) controls the level of differential expression. A lower z-score cutoff (more lenient, ex: top 10%) allows more regulated genes to be considered, while a higher cutoff (mor stringent, ex: top 1%) requires greater levels of differential expression.
The forth parameter (Min. percent overlap across experiments) controls the degree of consistency required for a gene to be considered as a signature gene. Lower values (ex: 20%) suggest lesser consistency, while higher values (80%) require a gene to be differentially regulated in almost all the experiments.
We find that a stringent z-score percentile with a lenient percent overlap (or vice-versa) provides a better balance between number of genes considered to be signature genes and level of confidence in the assignment.
It is also important to note that differential expression ratios derived from microarrays are noisy and low expression measurements can lead to incorrect differential expression ratios. Therefore the fifth parameter (Max. expression level percentile) should reflect your trust in low expression measurements, which are averaged across replicates and probes for a given gene. A higher (more lenient) max. expression level percentile (ex: 80%) would allow more genes to be considered as signature genes, while a lower (stringent) percentile (ex: 20%) would allow fewer genes to be considered.
All SPEED signature gene annotations can be downloaded at the Download Annotations page in tab delimited text format
based on parameters described above.
The entire database (compressed .zip) can also be downloaded as SQLite dump or tab delimited text from the same page.
SPEED data is collected from the Gene Expression Omnibus (GEO) at NCBI.
The page displays the GEO accessions used to collect gene
expression data for each SPEED pathway. Clicking the pathway names displays further details about each experiment for
the corresponding pathway.
SPEED source code is downloaded with the SQLite database at Download Annotations page.
SPEED is programmed using Python 2.5.1 and has dependencies on the scipy, which must also be installed.
Scripts that import the SPEED source (speed_functions.py) should be placed in the same directory with the database in a child
directory called "other_scripts". This is the default configuration and can be altered by editing speed_functions.py.
The following code will run SPEED from a python script:
import speed_functions as speed # inputs gene_list and background_genes are lists of integers in Entrez Gene ID Format # if background_genes is an empty list, all genes in the SPEED database are considered gene_list = [102,240,393,677,1263,2323,2771,2956,3066,3303] background_genes = [] # parameters to extract signature genes controlling differential expression, overlap across experiments, and minimum expression level zscore_percent = 1.0 overlap = 20.0 discard_percent = 50.0 # 1 if unique constraint, 0 otherwise unique = 0 # speed.default_pathways lists the 4 default pathways, or specify pathways to restrict search incl_pathways = ['TGFB', 'TLR', 'MAPK_PI3K', 'JAK-STAT'] # run SPEED algorithm fishers, p_values, fdr, signature_genes_in_list = speed.run_algorithm(gene_list, background_genes, zscore_percent, overlap, discard_percent, unique, incl_pathways) # code returns dictionaries with pathways as keys and (1) fishers tests, (2) p-values, (3) fdr, and (4) signature genes in input list as values.
If you use SPEED for your own research, please provide a reference to "http://speed.sys-bio.net"
and cite the speed paper:
Parikh JR, Klinger B, Xia Y, Marto JA, Blüthgen N. Discovering causal signaling pathways through gene-expression patterns. Nucleic Acids Res. 2010 Jul 1;38 Suppl:W109-17.
(pubmed)
The SPEED project is hosted by the Group of Systems Biology of Molecular Networks at
the Laboratory of Molecular Tumor Pathology (Charite) and the Institute of Theoretical Biology (HU Berlin).
Please contact Nils Blüthgen at nils.bluethgen@charite.de for more information.