Abstract
The estimation of protein model accuracy (EMA) or model quality assessment (QA) is important for protein structure prediction. We developed two novel methods: MASS2 and LAW, for predicting residue-specific or local qualities of individual models, which incorporate residual neural networks and graph neural networks, respectively. High-throughput sequencing technologies have generated massive protein sequences, but the annotations of protein sequences highly rely on the low-throughput and expensive biological experiments. We developed a deep learning system named PANDA2 to predict protein functions, which used the cutting-edge graph neural network to model the topology of the gene ontology (GO) directed acyclic graph (DAG) and integrated the features generated by transformer protein language models. AlphaFold recently achieved promising performances when predicting protein tertiary structures, and the AlphaFold protein structure database (AlphaFold DB) is fast-expanding. We aimed to develop a deep-learning tool that is specifically trained with AlphaFold models and predict GO terms from AlphaFold models. We developed PANDA-3D, an advanced learning architecture, by combining geometric vector perceptron graph neural networks and variant transformer decoder layers for multi-label classification. Topologically associating domains (TADs) are the structural and functional units of the genome. However, the functions of protein-coding genes existing in the same or different TADs have not been fully investigated. We compared the functional similarities of protein-coding genes existing in the same TAD and between different TADs, and also in the same gap region (the region between two consecutive TADs) and between different gap regions. We further created two types of gene–gene spatial interaction networks. A graph auto-encoder was applied to learn the network topology, reconstruct the networks, and predict the functions of the central genes/nodes based on the functions of the neighboring genes/nodes.