TEST SET for Materials Science and Engineering

In computational materials science, the demand of well-designed test sets with accurate and easily accessed reference data is rapidly increasing, especially under an ongoing innovation driven by data technology. On the one hand, as a prerequisite to liberating the power of test sets, effective sharing and recycling of materials data becomes mandatory. On the other hand, the representative and quality of data are crucial to the success rate of post processing data mining, error analysis and method development.

In this context, the test set for Materials Science and Engineering (MSE) with its website has been designed by the Fritz Haber Institute (FHI) to capture necessary elements learned from quantum chemistry, to focus on materials properties of interest and computational challenges in materials science, and to facilitate the visualization and accessibility of the reference data in the test set.

  • Data Representative

    • A hierarchy of electronic-structure approaches

      Our MSE test set provides numerically accurate reference values for a full breath of first-principles electronic-structure methods, beginning from the simplest local density approximation (LDA), generalized gradient approximations (GGAs), meta-GGAs, hybrid, and all the way the state-of-the-art correlation approaches:

      Note: In the FHI-aims platform, the periodic implementation of the coupled-cluster approach with singles and doubles (CCSD) is in steady progress with its code optimization. The CCSD calculations with their numerical convergence benchmark will be announced for the MSE test set later.

    • A list of materials properties

      At present, we focus on the ground-state properties of materials, including total energy per atom, cohesive energy, lattice constant, and bulk modulus. The numerically well-converged values of these properties have been obtained at different theoretical levels of aforementioned electronic-structure approaches.

    • Well-chosen sets of materials

      We aim at setting up representative sets of materials for varying benchmarking purposes. The first set of materials collects 7 elements and 12 binaries with cubic structure, representing first-and second-row elements and binaries with fcc, bcc, diamond, rocksalt, and zincblende structures. Moreover the set includes metals, semiconductors and insulators with metallic as well as covalent, ionic, vdW and mixed bonding characters.

  • Data Quality

    • Accuracy and Reproducibility

      Our MSE test set with its website features numerically accurate results for the aforementioned electronic-structure methods, materials properties and well-chosen sets of materials. A strong effort is made to push the borders of numerical accuracy for the total energies per atom. The numerically accurate references allows for unbiased benchmarking of numerical accuracy and reproducibility of these methods in different implementations and in real applications.

      The question of numerical accuracy and reproducibility is rapidly increasing in importance, evidenced in a recent community-wide effort Delta Project, which highlights the materiality of this topic by comparing 15 solid-state codes, using 40 different potentials or basis set types, assessing the quality of the GGA equation of state for 71 elemental crystals. This challenge becomes much greater in the application of the state-of-the-art correlation methods, for example MP2, RPA, and CCSD.

    • Convergence Quality

      The MSE test set with its website also distinguishes itself by sharing the detailed data in the extensive study of numerical convergence w.r.t. relativistic effect, basis set (BS), k-grid, the localization of numerical atom-centered orbital (NAO) functions, and/or the auxiliary basis set (ABS) in the localized resolution-of-identity approach. On the one hand, it

      For the methods with similar theoretical and numerical complexity, the numerical convergence behavior has been extensively studied at least for one of them. At present, they include LDA, PBE for GGAs, SCAN for meta-GGAs, HSE06 for hybrid-GGAs, MP2 and RPA@PBE for advanced correlation approaches.

    • Linear Regression

      Data quality, including the accuracy and stability, can be quantized statistically in linear regression analysis. Our MSE website (ANALYSIS) can perform the linear regression between two or more methods for any user-definable set of materials and for any available materials properties. The outcome linear-regression quantities include

      • Slope and intercept: the optimized parameters for a linear model.

      • R2: the coefficient of determination indicates the goodness of fit of the linear model.

      • RMSD: the root-mean-square deviations either between different methods or of the residuals in linear regression.

      • Graphical analysis: visualize the linear regression.

      These quantities materialize the data quality, which can be used to diagnose the numerical stability of a data set, locate the numerical noise in the set, and judge if the numerical errors in basis sets, k-grid meshes or other numerical settings, are systematic (linear) or not.

  • Data Accessibility

    Our MSE website has been established to facilitate the data navigation to any available material and method in the database. 1) Moving the mouse to the desired materials in the above table, a float session will appear to show the numerically converged results and also their convergence benchmark if available. 2) Using the advanced search engine in the side bar link SEARCH allows to access a group of results for a given material, structure, or method.