3 Optimizing the Full Likelihood
Section 1.2.4 introduced the Potts model for contact prediction that is able to distinguish between directly and indirectly coupled residue pairs by jointly modelling the probabilty of a protein sequence over all residues. Maximum-likelihood inference of the model parameters is numerically challenging due to the exponential complexity of the partition function that normalizes the probability distribution. Several approximate inference techniques for the full likelihood have been developed trying to sidestep the exact computation of the partition function. At this point in time, pseudo-likelihood is the most successful approximate solution with regard to predicting residue-residue contacts (see section 1.2.4.5). It has been shown that the pseudo-likelihood is a consistent estimator to the full likelihood in the limit of large amounts of data. However, it is unclear whether it represents a good approximation when there is only little data, in other words for small protein families that are the most interesting targets for contact prediction (see Figure 1.11).
While the partition function of the full likelihood cannot be efficiently computed, it is possible to approximate the gradient of the full likelihood with an approach called contrastive divergence that makes use of MCMC sampling techniques [192]. This section elaborates on how contrastive divergence can be used to optimize the full likelihood with gradient descent techniques. Furthermore, two aspects of the underlying Potts model, namely gap treatment and the choice of regularization, have been refined which is explained in detail in methods section 3.7.1.
References
192. Hinton, G.E. (2002). Training Products of Experts by Minimizing Contrastive Divergence. Neural Comput. 14, 1771–1800., doi: doi:10.1162/ 089976602760128018.