Abstract:
Overfitting in machine learning models for partial atomic charges was investigated for highly heterogeneous datasets common in medicinal chemistry. Random forest and multilayer perceptron models were trained and validated on a specially clustered dataset of drug-like molecules. Analysis of standard quality metrics for reproducing RESP charges showed that the trained models exhibit no evidence of overfitting.
Keywords:overfitting, transferability, machine learning, neural network, multilayer perceptron, random forest, partial atomic charges, drug-like molecules, chemical datasets.