Dividing the large glycoside hydrolase family 13 into subfamilies: towards improved functional annotations of alpha-amylase-related proteins
Abstract
Family GH13, also known as the alpha-amylase family, is the largest sequence-based family of glycoside hydrolases and groups together a number of different enzyme activities and substrate specificities acting on alpha-glycosidic bonds. This polyspecificity results in the fact that the simple membership of this family cannot be used for the prediction of gene function based on sequence alone. In order to establish robust groups that show an improved correlation between sequence and enzymatic specificity, we have performed a large-scale analysis of 1691 family GH13 sequences by combining clustering, similarity search and phylogenetic methods. About 80% of the sequences could be reliably classified into 35 subfamilies. Most subfamilies appear monofunctional (i.e. contain enzymes with the same substrate and the same product). The close examination of the other, apparently polyspecific, subfamilies revealed that they actually group together enzymes with strongly related (or even sometimes virtually identical) activities. Overall our subfamily assignment allows to set the limits for genomic function prediction on this large family of biologically and industrially important enzymes