SGD Webinars – Exploring Expression Datasets & Coexpressed Genes with SPELL

This video is a recording of the fifth webinar
in the SGD Webinar Series, titled: Exploring Expression Datasets and Coexpressed Genes
with SPELL. Please enjoy. Hello and welcome everyone to the fifth webinar
in the SGD Webinar Series. My name is Kevin MacPherson, I’m an assistant
biocurator at the Saccharomyces Genome Database (SGD). And today, I’m going to demonstrate how to
use the expression dataset search engine, SPELL. Before we begin, please feel free to ask questions
anytime throughout the webinar. Just click on the Q&A button on the right
hand side of the BlueJeans page, as shown by the red arrow, and the SGD team will answer
your questions as they come in. We’ll also be having a Q&A session at the
end, so, please be sure to stick around for that if you have any further questions. What is SPELL? SPELL is a web-based search engine for large-scale
gene expression microarray compendia, that was developed by the Troyanskaya lab at Princeton
University. Given a set of query genes, SPELL identifies
expression datasets from published microarray experiments and ranks them according to their
relevance to the query. SPELL also identifies genes from these datasets
that have similar expression profiles to the query set. In essence, SPELL is a discovery tool that
connects your genes of interest to informative expression datasets and coexpressed genes. Results from SPELL queries are ordered into
a matrix, like the one shown here, which can be clicked on and explored for further analysis. To access SPELL, open the SGD homepage at, open the Function tab, and click on Expression (as shown by the red arrow). Or, if you’re already on a locus summary page
at SGD, just open the Expression tab. This will take you to the gene expression
page on SGD, which has a link to the SPELL tool for that gene, as well as a histogram
that is created from the same data we use to populate SPELL. Lastly, you could just type in the URL: So let’s start the tutorial by going over how
to run a multigene query. Here’s the SPELL entry page, again at To run a query with multiple genes, enter
or paste your genes of interest in the search box and click on Search. Note that SPELL will accept either systematic
or standard gene names, and that separating gene names with commas is optional. Once the search is completed, SPELL displays
the results as a heat map. The query genes we entered are shown here, while datasets relevant to these genes are shown at the top. SPELL determines the relevance of these datasets
based off of coexpression levels for your query genes. In other words, datasets where query genes
are highly coexpressed are deemed more relevant than datasets where query genes are less coexpressed
in comparison. By default, only the top 10 most relevant
datasets are shown. To display lower ranking datasets instead,
you can select them in intervals of 10 in the “Datasets to view” pulldown menu. As a side note, it’s worth mentioning that
if you run a SPELL query with only one gene, then all the datasets will be considered equally
relevant, because there are no coexpression levels to compare. Click on any of the dataset titles to be taken
to a page that provides more information on the publication behind it. Note that each dataset has one or more tags,
shown here, that indicates the general biological topic investigated by its associated publication. In this example, we see that cell cycle-related datasets are identified as being the most informative. If you’d like to filter for datasets that
have a particular tag, expand the “Dataset Tags” list here. Select the tags that you wish to filter for, and then click on Update. Note that SPELL does not recalculate the results
when filters are applied–it only changes which datasets are presented to you. As mentioned before, query genes and their
expression profiles are displayed here. Following them are other genes with similar
expression profiles, ranked by their level of coexpression across all the datasets in
SPELL. To examine a gene’s expression profile for
a particular dataset in more detail, just click on the appropriate patch in the heat
map. This will open a window that displays the
number values associated with the expression data, and the publication it belongs to. If you’d like to add any of the enriched genes
to your actual query, or remove a gene from your query set, just check or uncheck the
genes as appropriate. And then click Update. SPELL also uses all of the genes in your results
to produce an enrichment table of Gene Ontology terms, or GO terms for short, at the bottom
of the page. Briefly, Gene Ontology terms are standardized
phrases in a hierarchical structure that describe the biology and function of gene products. The enrichment produced by SPELL indicates
which Gene Ontology terms are highly enriched with the results of your query. In this case, terms such as “chromosome” and
“DNA replication” are listed among the top. SPELL also lists calculated p-value and the
percentage of genes within the query and across the genome that share this Gene Ontology term. Click on any of the Gene Ontology terms if
you’d like to find our more information on that term on the Gene Ontology website. Lastly, if we return to the top of the page,
there is an Additional Display options menu that has two more important settings: mapping
method, to change how SPELL displays single and dual-channel arrays, and color scheme,
for switching the color of the heat map from red-green to blue-yellow. That pretty much covers how to run multigene
queries in SPELL. To wrap up this webinar, let me demonstrate
one of the practical ways that you can use this tool: predicting members of a biological
pathway, such as glycolysis, based on coexpression levels. Recall that SPELL identifies genes with similar
expression profiles to your query set. Because of this, running a multigene query
in SPELL where your query genes are all related to a particular biological pathway might identify
novel members of that pathway based on coexpression. Let’s use glycolysis to test this idea. If we enter in an incomplete set of glycolysis
genes into a SPELL query, will SPELL identify the missing members? The first thing to do is to obtain a set of
genes related to glycolysis. This can be done easily on the SGD website. I’ll just look up “glycolysis” in the search
box, click on the Genes category, since I’m looking for genes, and scroll down to the
bottom of the page, to expand the Biological Process options. Then, I’ll click on “glycolytic process”, and now I’ve got 19 genes that are annotated to “glycolytic process” and also have the
word “glycolysis” somewhere in their description. To download these genes so that I can import
them into SPELL, I will click on “Wrapped”, and then on “Download”. Returning to SPELL, paste the downloaded genes
into the search box. In this example, I’m going to delete the first
gene in this list, FBA1, to see if SPELL will identify it as an informative gene for the
entire glycolysis query. And there we have it: SPELL identified FBA1
as a highly relevant gene, based on its coexpression level with the glycolysis query. Other genes enriched at the top are also relevant
to glycolysis or its byproducts. ADH1 is alcohol dehydrogenase, which catalyzes
the last step in glycolytic fermentation to ethanol, while PDC1 is pyruvate decarboxylase,
which decarboxylates pyruvate from glycolysis into acetaldehyde. The final thing I’d like to mention is that
if you’d like to find out more about the algorithm behind SPELL, the version of SPELL hosted
by SGD, and other implementation details, just see “About the Website” here on the left
menu. This concludes our webinar. Thank you for coming, and please be sure to
keep an eye out for our follow-up email for more information about SPELL, our next webinar,
and a link to our YouTube channel where today’s webinar will be uploaded. So thank you again, and let’s go ahead and
start our Q&A session. Thank you for watching the SGD webinar series. If you have any questions or comments, please
contact us at [email protected]

Add a Comment

Your email address will not be published. Required fields are marked *