Abstract
A large-scale benchmark for electronic-level molecular understanding.
Existing molecular machine learning force fields (MLFFs) generally focus on atoms, molecules, and simple quantum chemical properties such as energy and force, but overlook the importance of electron density (ED) ρ(r) for accurately understanding molecular force fields. ED describes the probability of finding electrons at specific locations around atoms or molecules, and according to the Hohenberg-Kohn theorem, it uniquely determines all ground-state properties of interactive multi-particle systems.
EDBench introduces a large-scale, high-quality ED dataset built upon PCQM4Mv2, covering 3.3 million molecules. It also provides an ED-centric benchmark suite spanning prediction, retrieval, and generation. The results show that learning from EDBench is feasible, accurate, and can substantially reduce computational cost compared with traditional DFT calculations, laying a foundation for ED-driven drug discovery and materials science.