Abstract
Non-coding RNAs (ncRNAs) are thought to be functional RNA molecules that are transcribed from DNA but not translated into proteins. Some computational approaches for cataloging ncRNAs have been reported, but most of these approaches are developed for identifying specific ncRNAs or particular ncRNA families. A computational pipeline was developed that automatically detects and classifies different categories of differentially expressed ncRNAs, and further identifies the “target” protein coding genes for these ncRNAs. The pipeline is designed for two types of gene expression data, microarray or RNA-sequencing. Differential expression of genes was measured using appropriate tools for each data set. This pipeline was primarily developed using Perl, BioPerl, and shell scripting. Expression values were utilized in the pipeline to determine differential or cell-specific gene expression. Differentially expressed biotypes/genes were classified into different categories of ncRNAs and protein coding genes. Next steps of the pipeline focused on detecting putative targets, specifically for long intergenic noncoding RNAs (lincRNAs), pseudogenes and antisense RNAs (asRNA). Differentially expressed protein-coding genes, located close to lincRNAs, were identified and annotated as potential targets. Pseudogenes were also used to determine their respective parent protein coding genes, based on sequence homology and annotated as potential targets. Finally, targets for differentially expressed asRNAs were determined through the identification of the protein coding genes that were located on the opposite DNA strand to asRNAs. While developing the methodology, it was simultaneously implemented in two different biological models. The first study was a microarray dataset (GSE 30165) based on a rat neuropathic pain model involving two neuronal tissues, Dorsal Root Ganglion (DRG) and Sciatic Nerve (SN). This study was explored in detail, and few interesting candidate genes were found as potential “targets” for lincRNAs and pseudogene. The second study was performed on an RNA sequence dataset involving two neuronal sources, cerebellar granular neurons (CGNs) and DRG, based on axonal regeneration properties of those cells. Targets for lincRNAs, pseudogenes, and asRNAs for CGN and DGN were identified. Some interesting targets were found, and their associations in the pathogenesis of neurological and neurodegenerative related processes are discussed.