PhD Thesis Susann Vorberg
Summary
Acknowledgements
1
Background
1.1
Biological Background
1.2
Introduction to Contact Prediction
1.2.1
Local Statistical Models
1.2.2
Global Statistical Models
1.2.3
Machine Learning Methods and Meta-Predictors
1.2.4
Modelling Protein Families with Potts Model
1.3
Applications
1.4
Evaluating Contact Prediction Methods
1.4.1
Sequence Separation
1.4.2
Interpretation of Evaluation Results
1.5
Challenges for Coevolutionary Inference
1.5.1
Phylogenetic Effects as a Source of Noise
1.5.2
Entropic Effects as a Source of Noise
1.5.3
Finite Sampling Effects
1.5.4
Multiple Sequence Alignments
1.5.5
Alternative Sources of Coevolution
2
Interpretation of Coupling Matrices
2.1
Single Coupling Values Carry Evidence of Contacts
2.2
Coupling Profiles Vary with Distance
2.3
Physico-Chemical Fingerprints in Coupling Matrices
2.4
Higher Order Dependencies Between Couplings
2.5
Discussion
2.6
Methods
2.6.1
Dataset
2.6.2
Computing Pseudo-Likelihood Couplings
2.6.3
Sequence Reweighting
2.6.4
Computing Amino Acid Frequencies
2.6.5
Regularization
2.6.6
Correlation of Couplings with Contact Class
2.6.7
Coupling Distribution Plots
3
Optimizing the Full Likelihood
3.1
Approximating the Gradient of the Full Likelihood with Contrastive Divergence
3.2
Optimizing the Full Likelihood
3.2.1
Convergence Criterion for Stochastic Gradient Descent
3.2.2
Tuning Hyperparameters of Stochastic Gradient Descent Optimizer
3.3
Tuning the Gibbs Sampling Scheme for Contrastive Divergence
3.3.1
Tuning Regularization Coefficients for Contrastive Divergence
3.3.2
Varying the Sample Size
3.3.3
Varying the number of Gibbs Steps
3.3.4
Persistent Contrastive Divergence
3.4
Using ADAM to Optimize Contrastive Divergence
3.4.1
A
Potts
model specific convergence criterion
3.5
Comparing CD couplings to pLL couplings
3.5.1
Protein 1c75A00
3.5.2
Protein 1ss3A00 and 1c55A00
3.6
Discussion
3.7
Methods
3.7.1
The Potts Model
3.7.2
Treating Gaps as Missing Information
3.7.3
The Regularized Full Log Likelihood and its Gradient With Gap Treatment
3.7.4
The prior on single potentials
3.7.5
Stochastic Gradien Descent
3.7.6
Computing the Gradient with Contrastive Divergence
4
Random Forest Contact Prior
4.1
Random Forest Classifiers
4.2
Hyperparameter Optimization for Random Forest
4.3
Evaluating Random Forest Model as Contact Predictor
4.4
Using Contact Scores as Additional Features
4.5
Discussion
4.6
Methods
4.6.1
Features used to train Random Forest Model
4.6.2
Simple Contact Prior with Respect to Protein Length
4.6.3
Cross-validation for Random Forest Training
4.6.4
Feature Selection
5
A Bayesian Statistical Model for Residue-Residue Contact Prediction
5.1
Computing the Posterior Probabilty of a Contact
5.2
Modelling the Prior Over Couplings Depending on Contact States
5.3
Training the Hyperparameters in the Likelihood Function of Contact States
5.3.1
Training Hyperparameters for a Gaussian Mixture with Three Components
5.3.2
Training Hyperparameters for a Gaussian Mixture with Five and Ten Components
5.4
Evaluating the Bayesian Models for Contact Prediction
5.5
Analysing Contact Maps Predicted With Bayesian Framework
5.6
Discussion
5.7
Methods
5.7.1
Modelling the Prior Over Couplings Depending on Contact States
5.7.2
Gaussian Approximation to the Posterior of Couplings
5.7.3
Integrating out the Hidden Variables to Obtain the Likelihood Function of the Contact States
5.7.4
The Hessian off-diagonal Elements Carry a Negligible Signal
5.7.5
Efficiently Computing the negative Hessian of the regularized log-likelihood
5.7.6
Efficiently Computing the Inverse of Matrix
\(\Lijk\)
5.7.7
The gradient of the log likelihood with respect to
\(\muk\)
5.7.8
The gradient of the log likelihood with respect to
\(\Lk\)
5.7.9
The gradient of the log likelihood with respect to
\(\gamma_k\)
5.7.10
Extending the Bayesian Statistical Model for the Prediction of Protein Residue-Residue Distances
5.7.11
Training the Hyperparameters in the Likelihood Function of Contact States
6
Conclusion and Outlook
Appendix
A
Abbreviations
B
Amino Acid Alphabet
C
Dataset Properties
D
Interpretation of Coupling Matrices
E
Optimizing Full Likelihood with Gradient Descent
F
Training of the Random Forest Contact Prior
G
Bayesian statistical model for contact prediction
References
Published with bookdown
PhD thesis: residue-residue contact prediction
1
Background
Please enable JavaScript to view the
comments powered by Disqus.