IGV | RNA-Seq Data Basics | Splice Junction Track, Downsampling


Here I’ll show the basics of viewing RNA-Seq data in IGV. We’ll use human genome 18 as a reference, and liver RNA-Seq data from the IGV hosted server. Now let’s zoom into a particular gene to
get a look at the data. The most striking difference here from
regular DNA sequencing data is that there are abrupt and dramatic changes in
read coverage. This brings us to one of the most important concepts when studying RNA-Seq data. Let’s say that we want to understand the expression of a single gene. After being transcribed the pre-mRNA of this gene is spliced to produce mature mRNA. Note that the genomic locations where the spliceosome cuts the pre-mRNA are called splice junctions. And notice that the introns, as
well as the third exon, have been spliced out. The mature mRNA is then reversed
transcribed to cDNA and sequenced, giving us reads that span multiple exons. Now our task is to use the sequencing reads to quantify the expression of the exons in this gene. To do this we’ll use the reference genome. The problem is that some reads don’t map exactly to the reference genome because they’re missing the sequences that were spliced out. So in order for them to map, they need to be split. When many reads spanning this gene are split and aligned, we can clearly see that only the first and second exons are being expressed, which makes sense because the third exon was spliced out of the pre-mRNA. This is why when you look closely at RNA-Seq data in IGV, you’ll see many split reads where part of the read is mapped to one exon, and the other part is mapped to an adjacent exon. This read for example, is split across two exons. We can confirm that these are two pieces of the same read by checking the names in the pop-up menus. But to make it easy to tell which reads
are split, IGV displays a thin blue line that connect pieces of the same read. One helpful tool for visualizing splice junctions in IGV is the splice junction
track, which you can enable by right-clicking the alignment track and, selecting ‘Show Splice Junction track’. The splice junction track serves as a visual representation of the brakes in read coverage due to splicing. To get a better look at the coverage and splice junction tracks, let’s change the splice junction
track height to 150, and the coverage track height 280. In the spice junction track, the red arcs extending above the center line represent splice junctions on the plus strand, whereas blue arcs extending below the center line represents splice junctions on the minus strand. The thickness and height of the arcs are proportional to the number of reads that span a given junction. This splice junction, for example, spans more reads than this splice junction. If you click on the splice junction
track, you can see the exact number of reads that span the junctions in the location you clicked. Well it looks like there are three splice junctions where we clicked, but it’s difficult to see all three because they’re all overlapping. However if we right-click and select Expanded, we can see all three junctions on separate tracks. Now we can click on the individual splice junctions to see the number of reads spanning each. Right-clicking and selecting Squished
will give us a more condensed view. Let’s collapse the splice junction track for
now. When you’re exploring RNA-Seq data, You may come across a section of data
like this, where the coverage doesn’t appear to match the gene annotation track. For example, in the TMPO gene, we see that about half the reads don’t
include this large exon. If we right-click the gene track and select
Expanded, we can see why. In two isoforms of TMPO, this large exon is spliced out, which is why we see so many reads without it. You might have also noticed these black lines above regions with high coverage. These black lines indicate regions where the data has been downsampled, meaning that in regions with very high coverage, IGV puts a cap on the number of reads shown in order to save memory. You can change this number by going to the View menu, Preferences, and in the Alignments tab, change the Max read count, or disable downsampling altogether. Note that the coverage track always
shows the total number of reads, even in downsampled regions. Thanks for watching, be sure to check out
our other IGV tutorials.

2 Comments

Add a Comment

Your email address will not be published. Required fields are marked *