PhD thesis: residue-residue contact prediction

5.1 Computing the Posterior Probabilty of a Contact

The joint probability of contact states \(\c\) and MRF model parameters \((\v, \w)\) given the MSA \(\X\) and a set of sequence derived features \(\phi\) (such as listed in method section 4.6.1), can be written as a hierarchical Bayesian model of the form:

\[\begin{align} p(\c, \v, \w | \X, \phi) &\propto p(\X | \v, \w) p(\v, \w | \c) \, p(\c | \phi ) \, . \tag{5.1} \end{align}\]

The ultimate goal is to compute the posterior probability of the contact states, \(p(\c | \X, \phi)\), that can be obtained by treating the parameters \((\v, \w)\) as hidden variables and marginalizing over these parameters,

\[\begin{align} p(\c | \X , \phi) &\propto p(\X | \c) p(\c | \phi)\\ p(\X | \c) &= \int \int p(\X | \v,\w) \, p(\v, \w | \c) \,d\v\,d\w \; . \tag{5.2} \end{align}\]

The single potentials \(\v\) will be fixed at their best estimate \(\v^*\) (see method section 3.7.4) by using a very tight prior \(p(\v) = \Gauss(\v|\v^*,\lambda_v^{-1} \I) \rightarrow \delta(\v-\v*)\) for \(\lambda_v \rightarrow \infty\) that acts as a delta function. This allows the replacement of the intergral over \(\v\) with the value of the integrand at its mode \(\v^*\).

Computing the integral over \(\w\) can be achieved by factorizing the integrand into factors over \((i,j)\) and performing each integration over the coupling coefficients \(\wij\) for \((i,j)\) separately.

For that account, the prior over \(\w\) will be modelled as a product over independent contributions over \(\wij\) with \(\wij\) depending only on the contact state \(\cij\), which is described in detail in the next section 5.2. The prior over the Potts model parameters then yields,

\[\begin{equation} p(\v,\w|\c) = \Gauss(\v|\v^*,\lambda_v^{-1} \I) \, \prod_{1\le i<j\le L} p(\wij|\cij) \; . \tag{5.3} \end{equation}\]

Furthermore, method section 5.7.2 proposes an approximation to the regularised likelihood, \(p(\X | \v,\w) \, p(\v, \w)\), with a Gaussian distribution that facilitates the analytical solution of the integral in eq. (5.2). The detailed derivation of the solution to the integral is covered in method section 5.7.3.

Finally, the marginals \(p(\cij | \X, \phi) = \int p(\c | \X, \phi) d \c_{\backslash ij}\), where \(\c_{\backslash ij}\) is the vector containing all coordinates of \(\c\) except \(\cij\) can be computed to obtain the posterior probability distribution of the contact states (see method section 5.4).