HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Journal articles

Semisupervised Gaussian process for automated enzyme search

Abstract : Synthetic biology is today harnessing the design of novel and greener biosynthesis routes for the production of added-value chemicals and natural products. The design of novel pathways often requires a detailed selection of enzyme sequences to import into the chassis at each of the reaction steps. To address such design requirements in an automated way, we present here a tool for exploring the space of enzymatic reactions. Given a reaction and an enzyme the tool provides a probability estimate that the enzyme catalyzes the reaction. Our tool first considers the similarity of a reaction to known biochemical reactions with respect to signatures around their reaction centers. Signatures are defined based on chemical transformation rules by using extended connectivity fingerprint descriptors. A semisupervised Gaussian process model associated with the similar known reactions then provides the probability estimate. The Gaussian process model uses information about both the reaction and the enzyme in providing the estimate. These estimates were validated experimentally by the application of the Gaussian process model to a newly identified metabolite in Escherichia coli in order to search for the enzymes catalyzing its associated reactions. Furthermore, we show with several pathway design examples how such ability to assign probability estimates to enzymatic reactions provides the potential to assist in bioengineering applications, providing experimental validation to our proposed approach. To the best of our knowledge, the proposed approach is the first application of Gaussian processes dealing with biological sequences and chemicals, the use of a semisupervised Gaussian process framework is also novel in the context of machine learning applied to bioinformatics. However, the ability of an enzyme to catalyze a reaction depends on the affinity between the substrates of the reaction and the enzyme. This affinity is generally quantified by the Michaelis constant K-M. Therefore, we also demonstrate using Gaussian process regression to predict K-M given a substrate-enzyme pair.
Document type :
Journal articles
Complete list of metadata

Contributor : Migration Prodinra Connect in order to contact the contributor
Submitted on : Wednesday, May 27, 2020 - 9:43:56 PM
Last modification on : Wednesday, January 26, 2022 - 2:00:39 PM



Joseph Mellor, Ioana Grigoras, Pablo Carbonell, Jean-Loup Faulon. Semisupervised Gaussian process for automated enzyme search. ACS Synthetic Biology, American Chemical Society, 2016, 5 (6), pp.518-528. ⟨10.1021/acssynbio.5b00294⟩. ⟨hal-02636917⟩



Record views