Abstract
Next-generation sequencing techniques provide us with an opportunity for generating sequenced proteins and identifying the biological families and functions of the proteins. However, compared with identified proteins, uncharacterized proteins consist of a notable percentage of the predicted proteins in bioinformatics research field. However, previous clustering based algorithms heavily relying on large data samples are not accurate enough to assign protein families given a small amount of family annotated proteins. Therefore, considering limited protein data with annotated protein families, a more accurate and faster protein family prediction method is required. In this paper, we apply the Multi-layer Graph Convolutional Networks (GCN) architecture on a Protein-Protein Interaction (PPI) network with limited characterized proteins to explore the performance of protein family classification by taking into account both of the network topology and physicochemical protein amino acid features.