Predicting Thermodynamic Properties over Combinatorially Large Chemical Spaces

Naden, Levi, Chemical Engineering - School of Engineering and Applied Science, University of Virginia
Green, David, Department of Chemical Engineering, University of Virginia
Shirts, Michael, Summer Session Office, University of Virginia
Ford, Roseanne, Department of Chemical Engineering, University of Virginia
Geise, Geoffrey, Department of Chemical Engineering, University of Virginia
DuBay, Kateri, Department of Chemistry, University of Virginia

Current computational property prediction methods are limited in the number of molecules they can test at once. To predict properties for thousands or millions of molecules at once, new techniques must be developed with efficient computational scaling in the number of molecules simultaneously tested. In this dissertation, I develop a general approach to carry out computational alchemical free energy calculations using a variance minimized linear basis function approach. This approach provides a means to collect data for statistical free energy estimates that scales efficiently with the number of thermodynamic states or tested molecules. I achieve efficient scaling by splitting the potential energy function into a sum of pairs of basis functions and alchemical switches, so that energy is computed through matrix multiplication instead of simulation force code. The basis function approach allows construction of optimized, minimal variance alchemical switches from a single simulation, entirely in postprocessing, removing the need to optimize through iterative simulations. This is possible because each set of alchemical switches only changes the distribution of samples over the sampled thermodynamic space. I used this novel technique to find the variance minimized alchemical pathway to be: coupling Weeks-Chandler-Andersen decomposed Lennard-Jones forces with a capped repulsive nonbonded basis function, removing the fully coupled cap once the probability of observing atoms within the capped region is zero, and then coupling electrostatics through linear scaling. I show this pathway is just as statistically efficient as common soft core alchemical pathways on small organic solutes in water.

I extend the basis function approach to look at atomic parameter searches in multiple parameter dimensions. The relative solvation free energy differences are computed for over 130,000 nonbonded parameter combinations of an ion. This system provides a simple problem where only one particle is alchemically modified to better focus on development of multidimensional sampling techniques. The computational effort of generating energies needed for free energy analysis drops from over a thousand CPU years to tens of CPU seconds because of my basis function approach. I compute free energies, entropy, enthalpy, and radial distribution functions of arbitrary parameter combinations using only the data from 203 sampled states. This work also creates an adaptive sampling process to generate mutual phase space overlap. The phase space overlap of sampled states is monitored alongside the mean and maximum uncertainty to determine convergence in a multidimensional space.

I develop a method to predict solvation properties of a combinatorial number of molecules simultaneously from a single simulation by combining the computational efficiency of the basis function approach with the multidimensional free energy convergence techniques. I estimate solvation free energies of 103 molecules combinatorially constructed by independently mutating 30 R-groups on a benzene core with separate basis function sets, creating 30 alchemical dimensions to sample. This is a practical system where the multi-atom R-groups are alchemically changed at different rates, creating complex interactions between R-groups and the solvent. I sample the large chemical space through Hybrid Monte Carlo (MC) and λ-dynamics to avoid pre-populating MC moves in 30D space, and to avoid numerical instabilities associated with λ-dynamics. The basis function analysis provides up to 145,000x speed-up over relying on simulation force code to compute energies required for free energy estimation.

PHD (Doctor of Philosophy)
Computational Biology, Free Energy, Chemical Sampling
Issued Date: