DFT-Machine Learning Approach for Accurate Prediction of pKa

  • Robin Lawler
  • , Yao Hao Liu
  • , Nessa Majaya
  • , Omar Allam
  • , Hyunchul Ju
  • , Jin Young Kim
  • , Seung Soon Jang

Research output: Contribution to journalArticlepeer-review

20 Scopus citations

Abstract

In this study, we propose a novel method of pKaprediction in a diverse set of acids, which combines density functional theory (DFT) method with machine learning (ML) methods. First, the DFT method with B3LYP/6-31++G**/SM8 is used to predict pKa, yielding a mean absolute error of 1.85 pKaunits. Subsequently, such pKavalues predicted from the DFT method are employed as one of 10 molecular descriptors for developing ML models trained on experimental data. Kernel Ridge Regression (KRR), Gaussian Process Regression, and Artificial Neural Network are optimized using threePipelines:Pipeline 1involving only hyperparameter optimization (HPO),Pipeline 2involving HPO followed by a relative contribution analysis (RCA) and recursive feature elimination (RFE), andPipeline 3involving HPO followed by RCA and RFE on an expanded set of composite features. Finally, it is demonstrated that KRR withPipeline 3yields optimal pKaprediction at an MAE of 0.60 log units. This algorithm was then utilized to predict the pKaof 37 novel acids. The two most important features were determined to be the number of hydrogen atoms in the molecule and the degree of oxidation of the acid. The predicted pKavalues were documented for future reference.

Original languageEnglish
Pages (from-to)8712-8722
Number of pages11
JournalJournal of Physical Chemistry A
Volume125
Issue number39
DOIs
StatePublished - 7 Oct 2021

Bibliographical note

Publisher Copyright:
© 2021 American Chemical Society

Fingerprint

Dive into the research topics of 'DFT-Machine Learning Approach for Accurate Prediction of pKa'. Together they form a unique fingerprint.

Cite this