Table of Contents

Improving classification of correct and incorrect protein–protein docking models by augmenting the training set

KVL Staff on Project

Didier Barradas Bautista
didier.barradasbautista@kaust.edu.sa
Building 1, Level 0, Office 0125

Overview

The KVL is happy to announce the project's conclusion, resulting in a paper about Artificial intelligence techniques applied to computational structural biology.

Work Summary

Scoring is a critical step in docking and represents, in fact, a separate challenge of the CAPRI (Critical Assessment of PRedicted Interactions) experiment since 2006. Traditionally, protein-protein docking models (DMs) scoring functions are energy- or knowledge-based. However, over the years, a wide variety of algorithms have been developed, some combining the above potentials into a hybrid approach or integrating them with evolutionary information, others based on alternative approaches, such as the consensus of the inter-residue contacts at the interface of the complex. Nowadays, over 100 scoring functions are available from the CCharPPI web server, while more potentials can be obtained from other public sources. These are all descriptors of the protein-protein complexes, which can be, in principle, combined to gain an improved performance in assessing the quality of predicted 3D models. We present the results of a machine learning (ML) approach we developed to exploit all the scoring functions we could collect from public sources. To this aim, we generated a set of ≈ 7 x 10^6 DMs for the 230 complexes in the protein-protein interaction benchmark 5 (BM5) with three different docking programs. Furthermore, we explored the effect of training data augmentation on the above models. Availability: Generated DMs sets were made available at Zenodo and at KAUST repository . ML algorithms are available at colab 

The paper is available for download from: here

Impact

This machine learning paper shows state-of-the-art different binary classifiers and semi-weak deep learning techniques related to data augmentation datasets. It provides a complete description to use in a new way, framework Snorkel and discusses the differences in performance of deep learning and classical machine learning algorithms.


, ,