Deep Learning Approaches for Inferring Protein Model Qualities, Predicting Protein Functions, and Understanding the Relationships between 3D Genome and Protein Functions

Chenguang Zhao

Back

Dissertation

Deep Learning Approaches for Inferring Protein Model Qualities, Predicting Protein Functions, and Understanding the Relationships between 3D Genome and Protein Functions

Chenguang Zhao

Doctor of Philosophy (PhD), University of Miami

2023-12

Abstract

Protein model accuracy

Quality assessment

Deep learning

Protein structure prediction

Protein function prediction

3D Genome

The estimation of protein model accuracy (EMA) or model quality assessment (QA) is important for protein structure prediction. We developed two novel methods: MASS2 and LAW, for predicting residue-specific or local qualities of individual models, which incorporate residual neural networks and graph neural networks, respectively. High-throughput sequencing technologies have generated massive protein sequences, but the annotations of protein sequences highly rely on the low-throughput and expensive biological experiments. We developed a deep learning system named PANDA2 to predict protein functions, which used the cutting-edge graph neural network to model the topology of the gene ontology (GO) directed acyclic graph (DAG) and integrated the features generated by transformer protein language models. AlphaFold recently achieved promising performances when predicting protein tertiary structures, and the AlphaFold protein structure database (AlphaFold DB) is fast-expanding. We aimed to develop a deep-learning tool that is specifically trained with AlphaFold models and predict GO terms from AlphaFold models. We developed PANDA-3D, an advanced learning architecture, by combining geometric vector perceptron graph neural networks and variant transformer decoder layers for multi-label classification. Topologically associating domains (TADs) are the structural and functional units of the genome. However, the functions of protein-coding genes existing in the same or different TADs have not been fully investigated. We compared the functional similarities of protein-coding genes existing in the same TAD and between different TADs, and also in the same gap region (the region between two consecutive TADs) and between different gap regions. We further created two types of gene–gene spatial interaction networks. A graph auto-encoder was applied to learn the network topology, reconstruct the networks, and predict the functions of the central genes/nodes based on the functions of the neighboring genes/nodes.

Files and links (1)

pdf

cxz417F2313.76 MB

Embargoed Access, Embargo ends: 2025-12-06

Metrics

31 Record Views

Details

Title: Deep Learning Approaches for Inferring Protein Model Qualities, Predicting Protein Functions, and Understanding the Relationships between 3D Genome and Protein Functions
Creators: Chenguang Zhao
Contributors: Zheng Wang (Committee Member)
Stefan Wuchty (Committee Member)
Liang Liang (Committee Member)
Athula Wikramanayake (Committee Member)
Theses and Dissertations: Doctor of Philosophy (PhD), University of Miami; Dissertation
Degree in: Computer Science
Date of defense: 2023-10-10
Academic Unit: A&S - Computer Science
Language: English
Resource Type: Dissertation
Record Identifier: 991031965420802976