5 A Bayesian Statistical Model for Residue-Residue Contact Prediction

All methods so far predict contacts by finding the one solution of parameters \(\via\) and \(\wijab\) that maximizes a regularized version of the log likelihood of the MSA and in a second step transforming the MAP estimates of the couplings \(\w^*\) into heuristic contact scores (see Introduction 1.2.4.5). Apart from the heuristic transformation that omits meaningful information comprised in the coupling matrices \(\wij\) as discussed in section 2, using the MAP estimate of the parameters instead of the true distribution has the decisive disadvantage of concealing the uncertainty of the estimates.

The next sections present the derivation of a principled Bayesian statistical approach for contact prediction eradicating these deficiencies. The model provides estimates of the posterior probability distributions of contact states \(\cij\) for all residues pairs \(i\) and \(j\), given the MSA \(\X\). A true contact (contact state \(\cij\eq1\)) is defined as two residues whose \(\Cb\)-\(\Cb\) distance \(\le 8 \angstrom\), whereas a residue pair with \(\Cb\)-\(\Cb\) distance \(>8 \angstrom\) is considered not to be in physical contact (contact state \(\cij\eq0\)). The parameters \((\v, \w)\) of the MRF model describing the probability distribution of the sequences in the MSA are treated as hidden parameters that can be integrated out using an approximation to the posterior distribution of couplings \(\w\). This approach also allows to explictely model the dependence of coupling coeffcients \(\wij\) on contacts/non-contacts as a mixture of Gaussians with contact state dependent mixture weights and thus can even learn correlations between couplings. Furthermore, it provides probability estimates for the predicted contacts that could simplify the selection of constraints for de novo structure prediction by establishing suitable probability cutoffs.