KVL Staff on Project
Didier Barradas Bautista
didier.barradasbautista@kaust.edu.sa
Building 1, Level 0, Office 0125
The KVL is happy to announce the project's conclusion, resulting in a paper about Artificial intelligence techniques applied to computational structural biology.
Scoring is a critical step in docking and represents, in fact, a separate challenge of the CAPRI (Critical Assessment of PRedicted Interactions) experiment since 2006. Traditionally, protein-protein docking models (DMs) scoring functions are energy- or knowledge-based. However, over the years, a wide variety of algorithms have been developed, some combining the above potentials into a hybrid approach or integrating them with evolutionary information, others based on alternative approaches, such as the consensus of the inter-residue contacts at the interface of the complex.1,2 Nowadays, over 100 scoring functions are available from the CCharPPI web server,3 while more potentials can be obtained from other public sources. These are all descriptors of the protein-protein complexes, which can be, in principle, combined to gain an improved performance in assessing the quality of predicted 3D models. We present the results of a machine learning (ML) approach we developed to exploit all the scoring functions we could collect from public sources.4,5 To this aim, we generated a set of ≈ 7 x 106 DMs for the 230 complexes in the protein-protein interaction benchmark 5 (BM5)6 with three different docking programs.7-9 Furthermore, we explored the effect of training data augmentation on the above models. Availability: Generated DMs sets were made available at Zenodo and at KAUST repository . ML algorithms are available at colab
The paper is available for download from: here
This machine learning paper shows state-of-the-art different binary classifiers and semi-weak deep learning techniques related to data augmentation datasets. It provides a complete description to use in a new way, framework Snorkel and discusses the differences in performance of deep learning and classical machine learning algorithms.