5.2 Modelling the Prior Over Couplings Depending on Contact States

The prior over couplings \(p(\wij|\cij)\) will be modelled as a mixture of \(K\!+\!1\) 400-dimensional Gaussians, with means \(\muk \in \mathbb{R}^{400}\), precision matrices \(\Lk \in \mathbb{R}^{400\times 400}\), and normalised weights \(g_k(\cij)\) that depend on the contact state \(\cij\),

\[\begin{align} p(\wij | \cij) = \sum_{k=0}^K g_k(\cij) \, \Gauss(\wij | \muk, \Lk^{-1}) \,. \tag{5.4} \end{align}\]

The assumption that the contact-state dependent coupling prior can be modelled as a multivariate Gaussian is justified by the analysis of single and 2-dimensional coupling distributions presented in section 2.2 and in section 2.4. The couplings \(\wijab\) for the analysis presented in those sections have been filtered, such that there is sufficient evidence for \(a\) and \(b\) in the alignment (see method section 2.6.6 for details). Therefore, the presented distributions should resemble the posterior distribution of couplings, \(p(\w | \X , \v^*) \propto \Gauss (\w | \w^*, \H^{-1})\), in the case that the diagonal elements \((\H)_{ijab, ijab}\) have non-negligible values. The analysis showed that the univariate distributions of single couplings \(\wijab\) are characteristic for the physico-chemical properties of the corresponding amino acid pairing \((a,b)\) and vary with inter-residue distance. More than that, the 2-dimensional distributions suggest that there are higher order dependencies between the 400 couplings \(\wijab\) that reflect amino acid specific pereferences of the interaction between the corresponding residues \(i\) and \(j\). By explicitely modelling the prior over couplings, \(p(\wij|\cij)\), as a 400-dimensional Gaussian mixture, is is possible to capture these characteristic interdependencies between the couplings.

The \(K\) 400-dimensional Gaussian mixture components are defined by means \(\muk \in \mathbb{R}^{400}\), precision matrices \(\Lk \in \mathbb{R}^{400\times 400}\), and normalised weights \(g_k(\cij)\) that depend on the contact state \(\cij \! \in \! \{0,1\}\). The zeroth component is expected to capture the majority of coupling parameters without a strong covariation signal, \(\wijab \! \approx \! 0\). Generally, the couplings are expected to vanish for non-contacts (\(\cij \eq 0\)) but couplings will also be close to zero for contacts (\(\cij \eq 1\)) when there is no covariation between residues \(i\) and \(j\) or when there is no evidence in the alignment originating from amino acid pairings \(a\) and \(b\). Therefore, \(\mu_{0} \eq 0\) will be kept fixed. Furthermore, the precision matrices \(\Lk\) will be modelled as diagonal matrices, thereby drastically reducing the computational complexity of the optimization problem. In order to ensure that interdependencies between couplings can be modelled with diagonal precision matrices, the number of components \(K\) is a crucial parameter.