RNA-Seq Part II: Using RNA-Seq Profile Search

Navigate to the RNA-Seq Expression Profile Search page by clicking on the RNA-Seq icon on the front page. The RNA-Seq Profile Search allows users to search for genes of interest based upon their relative expression within different developmental stages, tissues, treatments, or cell lines. Before demonstrating the usefulness of this search, it is important to explain the approach used by FlyBase to calculate expression values and expression value bins by Reads Per Kilobase of transcript per Million mapped reads, so called RPKM. FlyBase RNA-Seq values for each gene are displayed as reads per kilobase of exon model per million mapped reads. These values are pre-calculated by FlyBase. RPKM values are also grouped into bins of different expression levels for each dataset ranging from low or extremely low expression to extremely high expression. For additional information, please see the
link in the description below the video. Now, let’s try out the RNA-Seq Profile Search function. As a first example, we want to retrieve the transcripts that are very highly expressed at embryonic stage and have a low expression in every other stage, except adult female. This search can be visualised with a Venn diagram in which the intersection contains the genes we are looking for. We’ll begin by selecting only the “stage”
checkbox at the top of the page. The search table is split between restricting searches based on “Expression ON” and “Expression Off”. Let’s restrict our search on the “Expression On” side to look for genes expressed throughout all embryonic stages. We’ll do this by clicking the outermost checkbox located to the right of the embryonic stages. Next, we’ll select the dropdown menu and
choose “very high” to restrict our search for genes with very high RPKM values. You may notice that the nested checkboxes do not populate with checkmarks. This is because the outer checkboxes work with an “or” logic. The resulting search will be positive if our “very high” expression criterion is matched in any of the embryonic stages. If we wanted to search with a requirement for very high expression in several stages, we can manually check a few of the nested checkboxes. For additional information, please click on
“Documentation”. Next, let’s restrict our search further
by requiring expression of genes to be low in every other stage except adult female. After selecting the stages of interest, we’ll choose “search genes by stage expression only” to view the results. The resulting gene list is sorted alphabetically and populated with genes whose RPKM values are very high in at least one embryonic stage and low in all other stages, except adult female. We can click on a result to investigate
further, for example “giant nuclei”. You can observe the RPKM values on any Gene Report page by scrolling down and clicking “Expression Data” followed by “High-Throughput Expression Data”. Expanding the modENCODE Development RNA-Seq tab displays the RPKM values of this gene throughout development. Importantly, these RPKM values are relative to the RPKM values of every gene within a specific dataset. For example, “194” is relative to all other genes’ RPKM values in embryonic 0 to 2 hours. These are not absolute values, they are not comparable across different datasets in this case, different developmental stages. Note that this particular example shows high RPKM values in one embryonic stage as well as two adult female stages. Let’s return to the search page to further refine our search. The RNA-Seq Profile Search page also permits imposing search criteria across different modENCODE datasets such as tissue, treatment, and cell-line. Let’s modify our previous search to include a “tissue” restriction. Selecting the second dataset checkbox expands the page to include an additional search table. We’ll leave the previous search criteria
in place and add a restriction to exclude genes with greater than low RPKM values in ovaries. Again we can represent our search with a Venn diagram in which the intersection contains the genes we are looking for. Importantly, when submitting this combined search be sure to click the “submit combined search” button at the top of the page. You can now see that our resulting list has been reduced to 45 genes. Interestingly, the gene “giant nuclei” is now missing. If you recall, giant nuclei displayed a high
RPKM value in adult females. Since we’ve now excluded genes with specific expression in ovaries, we can assume that most of its adult female expression was probably due to its localization within the ovaries. For additional information regarding RNA-Seq Profile Search, please see the link in the description below the video.

