PhD thesis: residue-residue contact prediction

3.6 Discussion

It is not feasible to evaluate the full likelihood of the Potts model for proteins of typical length due to the complexity of the normalization constant. The most popular approach for protein contact prediction to get around this problem is to optimize the pseudo-likelihood instead. However, it is unknown how well the pseudo-likelihood solution approximates the full likelihood solution in case protein families have only few members. In this chapter I tested an alternative approach to infer the Potts model parameters, called contrastive divergence (CD). It optimizes the full likelihood of the Potts model by approximating the gradient with short Gibbs chains. However, a benchamrk on a large test set showed that the predictive performance of CD does not improve over pseudo-likelihood with respect to the precision of top ranked contact predictions (see Figure 3.22). CD achieved minor improvements for small protein families, however this improvement could be traced back to amplified signals between strongly conserved residue pairs.

I elaborated in detail on the hyperparameter optimization for the stochastic gradient descent optimizer and the CD model itself. Eventhough the adaptive learning rate optimizer ADAM did not improve performance over plain stochastic gradient descent, it is still likely that appropriate modifications to the optimization procedure, e.g. averaging [204], might be beneficial for particular variants of CD. As dicussed in section 3.2.1, the convergence criterion is a crucial aspect for optimization, not only affecting runtime but it can also prevent overfitting. It might be worth to assess the convergence properties with more sophisticated convergence metrics, like the EB-criterion proposed by Mahsereci et al. [202], instead of using the L2 norm of the coupling parameters, \(||\w||_2\).

Against expectations, the best performance with respect to the precision of the top ranked contacts was obtained by using the most simple variant of the contrastive divergence algorithm, CD-1. With CD-1, sequence samples are generated according to the current state of the model by evolving Gibbs chains, that have been initialized at data samples, for only one full step. Interestingly, better gradient estimates were obtained by running more Gibbs chains in parallel (see section 3.3.2), but did not carry over to better predictive performance. It is possible that the improved gradient helps to finetune the parameters. Finetuning would only have a negligible effect on the contact score, computed as the APC corrected Frobenius norm of the couplings, and the ovaral ranking of residue pairs.

Cocco and colleagues argued that for the purpose of contact prediction, where predictions only need to capture the topology of the network of coevolving positions, approximate methods such as pseudo-likelihood maximization might be sufficient to provide accurate results [96]. They showed that different approaches for Potts model parameter inferrence yield highly correlated contact scores, using the APC corrected Frobenius norm. In contrast, more quantitative applications, such as inferring mutation landscapes, where energies or probabilities have to be accurate, require precise approaches to fit the model parameters that can reproduce the fine statistics of the empircal data.

Therefore, it can be speculated that the heuristic contact score that has empirically been found to work very well for pseudo-likelihood couplings, might not be an appropriate choice for benchmarking the contrastive divergence approach. Perhaps the CD couplings need to be evluated in a more sophisticated framework or for other purposes than contact prediction.

References

204. Ma, X., and Wang, X. (2016). Average Contrastive Divergence for Training Restricted Boltzmann Machines. Entropy 18, 35., doi: 10.3390/e18010035.

202. Mahsereci, M., Balles, L., Lassner, C., and Hennig, P. (2017). Early Stopping without a Validation Set. arXiv.

96. Cocco, S., Feinauer, C., Figliuzzi, M., Monasson, R., and Weigt, M. (2017). Inverse Statistical Physics of Protein Sequences: A Key Issues Review. arXiv.