PeTriBERT : Augmenting BERT with tridimensional encoding for inverse protein folding and design - Archive ouverte HAL Access content directly
Preprints, Working Papers, ... Year : 2022

PeTriBERT : Augmenting BERT with tridimensional encoding for inverse protein folding and design

(1, 2) , (1, 2) , (3, 4) , (3, 4)
1
2
3
4

Abstract

Abstract Protein is biology workhorse. Since the recent break-through of novel folding methods, the amount of available structural data is increasing, closing the gap between data-driven sequence-based and structure-based methods. In this work, we focus on the inverse folding problem that consists in predicting an amino-acid primary sequence from protein 3D structure. For this purpose, we introduce a simple Transformer model from Natural Language Processing augmented 3D-structural data. We call the resulting model PeTriBERT: Proteins embedded in tridimensional representation in a BERT model. We train this small 40-million parameters model on more than 350 000 proteins sequences retrieved from the newly available AlphaFoldDB database. Using PetriBert, we are able to in silico generate totally new proteins with a GFP-like structure. These 9 of 10 of these GFP structural homologues have no ressemblance when blasted on the whole entry proteome database. This shows that PetriBert indeed capture protein folding rules and become a valuable tool for de novo protein design.
Fichier principal
Vignette du fichier
DumortierB.-et al-bioRxiv-2022.pdf (5.57 Mo) Télécharger le fichier
Origin : Publisher files allowed on an open archive
Licence : CC BY - Attribution

Dates and versions

hal-03759515 , version 1 (24-08-2022)

Licence

Attribution - CC BY 4.0

Identifiers

Cite

Baldwin Dumortier, Antoine Liutkus, Clément Carré, Gabriel Krouk. PeTriBERT : Augmenting BERT with tridimensional encoding for inverse protein folding and design. 2022. ⟨hal-03759515⟩
30 View
13 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More