RNA-Seq Data Analysis Tutorial (08) – Genomic Location Specificly Regulated Genes & Motif Sequences


Open the UCSC Genome Browser site to get the location information of genes. Select RefSeq genes of human genome hg38. Select the file format as “all fields from selected table.” Set the file name and download it. Open the Data Manager tab and open the Edit Platform panel. Use the Look Up tool to take the information of the downloaded file. Select the columns of gene symbols to connect both tables. Select the columns containing chromosomal location information to import. Copy the information in the txStart and txEnd columns, and paste on the chromStart and chromEnd columns Delete the txStart and txEnd columns. Click the OK button to save the editing. Select the “RefSeq (hg38)” under the Genome menu to load it. Now you can see where the selected genes are in the Chromosomes tab. You can examine whether genes with a specific expression pattern are positionally concentrated or not. If you download the sequence data, please have them in one folder. Open the Find Regions from Seq tool to search locations of the binding motif. You set the folder containing the genomic sequence files. Copy the consensus sequence of ERRalpha from gene databases. Paste it on the Find Regions from Seq tool and Run. You save the result as a Region List. You can visualize the Region List by drag-n-drop on the Genome View. Genomic Location Filter helps you finding genes with the consensus sequence in their promoter regions. TSS of the gene-level RNA-Seq data can’t be determined. So set a bit wide searching area and execute. Now you get a list of genes with the consensus sequence in their promoter regions. Let’s combine this list with the commonly-up or -down gene lists with the Venn Diagram tool. You find only one gene which is at the intersection with the commonly-down list. Save as a measurement list of the gene. Make the gene selected to find its location with the Chromosomes tab. Let’s closely look at around the promoter region of the gene with the Genome View. The consensus sequence looks at a bit downstream of the TSS, on the complementary strand. If you look at the transcripts of the gene, there are two that can have the consensus sequence at their upstream of TSSs. The counts of the gene at the control is about 1000, and it can go down to several hundred by the siRNAs.

Tags:, ,

Add a Comment

Your email address will not be published. Required fields are marked *