New computational methods and tools for structure-based protein design
Abstract
Structure-based Computational Protein Design (CPD) has become an increasingly valuable tool for engineering of proteins with desired properties and for investigating sequence-structure relations in ways that were not previously possible. CPD seeks to identify amino acid sequences that will fold into a given 3D-scaffold and possess the targeted property. The application of CPD is broad, ranging from medicine, biotechnology, and synthetic biology to nanotechnologies.
Herein, we present our most recent methodological advances in the CPD field that enabled overcoming technological bottlenecks and hence propose innovative computational methods and tools to explore large sequence-conformation spaces while providing more accuracy and robustness than classical approaches. In particular, relying on our previous Artificial Intelligence-based protein design methods [1-5], we developed EasyE and JayZ, two methods for predicting changes in protein-protein binding free energy upon mutations that either ignore or include conformational entropic contributions [6]. Assessed on a large benchmark of binding affinity experimental measures, both methods outperform existing established approaches.
We also introduce our recent Shades tool, a fully automated data-driven CPD method that exploits local structural environments in known protein structures together with energy to guide sequence design, while sampling side-chain and backbone conformations to accommodate mutations [7]. Shades is based on customized libraries of non-contiguous in-contact amino acid residue motifs. On a benchmark of 40 proteins selected from different protein families, Shades was able to effectively reconstruct sequences by assembling non-contiguous residue sequences coming from similar in-contact residue tertiary motifs in unrelated proteins. Moreover, Shades outperforms a flexible backbone design application, from the Rosetta software, at rebuilding target sequences.
[1-5] Traoré et al. 2017 Methods Mol Biol. 107-123 – Traoré et al. 2016 J Comput Chem. 1048-58 – Simoncini et al. 2015 J Chem Theory Comput. 5980-9 – Allouche et al. 2014 Artif. Intell. 59-79 – Traore et al. 2013 Bioinformatics. 2129-2136.
[6] Viricel et al. 2018 Bioinformatics. 2581-258. [7] Simoncini et al. 2018 Bioinformatics, in press