Understanding the structure and dynamics of proteins has always been of great interest to many scientific fields concerning life. Small angle x-ray scattering (SAXS) is among the probing techniques that allows the observation of proteins in their natural environment. In this work, estimation of mixture coefficients of protein conformations in a heterogeneous solution is inquired using SAXS. We propose a signal processing and machine learning framework for the estimation. In this thesis, we describe a model for maximum likelihood estimation (MLE) of the relative abundances of different conformations of a protein in a heterogeneous mixture from SAXS intensities. To consider cases where the solution includes intermediate or unknown conformations, we develop a subset selection method based on k-means clustering and the Cramer-Rao bound on the mixture coefficient estimation error to find a sparse basis set that represents the space spanned by the measured SAXS intensities of the known conformations of a protein. Then, using the selected basis set and the assumptions on the model for the intensity measurements, we show that the MLE model can be expressed as a constrained convex optimization problem.
We also describe a method for maximum a posteriori (MAP) estimation of the mixture coefficients of ensemble of conformations in a protein mixture solution using measured SAXS intensities. The proposed method builds upon a model for the measurements of crystallographically determined conformations. Assuming that a priori information on the protein mixture is available, and that priori information follows a Dirichlet distribution, we develop a method to estimate the relative abundances with MAP estimator.
The Dirichlet distribution depends on concentration parameters which may not be known in practice and thus need to be estimated. To estimate these unknown concentration parameters we developed an expectation-maximization (EM) method.
Adenylate kinase (AdK) protein was selected as the test bed due to its known conformations Known conformations are assumed to form the full vector bases that span the measurement space. In Monte Carlo simulations, mixture coefficient estimation performances of MAP and maximum likelihood (ML) (which assumes a uniform prior on the mixture coefficients) estimators are compared. MAP estimators using known and unknown concentration parameters are also compared in terms of estimation performances. The results show that prior knowledge improves estimation accuracy, but performance is sensitive to perturbations in the Dirichlet distribution's concentration parameters. Moreover, the estimation method based on EM algorithm shows comparable results to approximately known prior parameters.
In this thesis, we also present an application of MLE method in order to identify the conformations in a heterogeneous mixture solution using x-ray solution scattering data from E.coli AdK protein with various ligands. We estimate the relative abundances of AdK protein conformations undergo to titration of ligands. Calculation of x-ray scattering data for known AdK protein conformations which are obtained from Protein Data Bank (PDB) structures are performed by using a model which uses coarse grained representation of proteins, ensemble average of protein conformations with a hydration layer around protein surface. Structural rearrangements of AdK has been identified in an open (apo) conformation and a closed conformation in which AMP and ATP bound at two distinct sites are brought together to create two molecules of ADP.
Here, although we use 45 crystallographically determined experimental structures and we could generate many more using, for instance, molecular dynamics calculations, the clustering technique indicates that the data cannot support the determination of relative abundances for more than 5 conformations. The estimation of this maximum number of conformations is intrinsic to the methodology we have used here.
Advisor: Professor Deniz Erdogmus
Professor Deniz Erdogmus
Professor Murat Akcakaya
Professor Jaydeep Bardhan
Professor Dana Brooks
Professor Lee Makowski