A recent Google DeepMind project paper Scaling deep learning for materials discovery described using GNNs trained on a large and diverse set of first-principles calculations to enable the efficient discovery of inorganic materials and the discovery of 2.2 million new crystal structures, many of which escaped previous human chemical intuition. This represents an order-of-magnitude expansion in stable materials known to humanity. Out of the 2.2 million structures, 381,000 are identified as stable crystal discoveries.
Example crystal structures
Stable crystal data
DeepMind has created a Github repo for these stable crystal discoveries containing documentation and scripts to download data. One of the downloadable files named stable_materials_summary.csv contains 381,000 records. Each row is a stable crystal discovered. The repo documentation describes the contents of the csv file.
- Composition: alphabetically-ordered composition
- MaterialId: a unique id corresponding to the entry
- Reduced Formula: reduced chemical formula
- Elements: chemical system eg [‘Sr’, ‘Ag’, ‘Pr’, ‘Tb’]
- NSites: number of atoms
- Volume: volume in units Å^3
- Density: density in units Å^3 / atom
- Point Group: assigned point group
- Space Group: assigned space group
- Space Group Number: assigned space group number
- Crystal System: assigned crystal system
- Corrected Energy: energy adjusted by MP2020 corrections
- Formation Energy Per Atom: normalized energy corrected by reference elements
- Decomposition Energy Per Atom: decomposition energy relative to the downloaded Materials Project convex hull
- Dimensionality Cheon: dimensionality predicted by Cheon et al. 2017
- Bandgap: calculated bandgap
- Is Train: in training set for associated machine learning models
- Decomposition Energy Per Atom All: distance to convex hull of all entries
- Decomposition Energy Per Atom Relative: distance to convex hull of all entries except for the current
- Decomposition Energy Per Atom MP: distance to convex hull of all entries from Materials Project (including recalculations)
- Decomposition Energy Per Atom MP OQMD: distance to convex hull of all entries from Materials Project + Open Quantum Materials Database (including recalculations)
I created the following visualizations from these columns to help understand the structure of the stable_materials_summary.csv data.
Distributions of numerical columns
Bar charts of categorical columns
The bottom right subplot uses a count of constituent elements that wasn’t in the original csv file. It is simply a count of crystals elements eg if Elements column value was [‘Sr’, ‘Ag’, ‘Pr’, ‘Tb’] then the count is 4.
Element occurrences
The visualization counts the occurrences of 84 elements over all the 381,000 crystals. For readability the same values are presented in tabular format below too. Si (Silicon) is most frequent element and Pm (Promethium) the least frequent. This feels a bit like a shopping list of ingredients if you wanted to try to assemble these crystals.
Element | # Crystals |
Si | 54,540 |
Ir | 46,863 |
Ho | 43,809 |
Tb | 42,975 |
Rh | 42,367 |
Dy | 42,088 |
Y | 41,551 |
Er | 40,166 |
Co | 39,040 |
Ru | 37,413 |
Pt | 37,147 |
O | 35,986 |
Tm | 34,653 |
Al | 33,851 |
P | 33,479 |
Ga | 30,200 |
Ni | 29,696 |
Pd | 29,403 |
Nd | 28,541 |
B | 27,544 |
Sn | 27,514 |
Lu | 27,088 |
In | 25,881 |
Pr | 25,431 |
Sc | 23,478 |
Os | 23,209 |
La | 22,376 |
Zn | 20,446 |
Cd | 20,380 |
Fe | 18,741 |
Zr | 18,679 |
Th | 18,628 |
Bi | 17,738 |
Hf | 17,566 |
Li | 17,234 |
Tc | 17,232 |
Yb | 16,509 |
Cu | 16,357 |
Sb | 16,252 |
Au | 15,765 |
As | 15,606 |
Sm | 15,186 |
Ti | 14,392 |
Ca | 14,034 |
Nb | 14,008 |
Ta | 14,008 |
Ge | 13,887 |
Mn | 13,785 |
Mg | 13,511 |
Ce | 13,010 |
Pb | 12,820 |
Re | 12,818 |
Hg | 12,612 |
Ag | 12,566 |
C | 11,925 |
Mo | 11,000 |
Se | 10,970 |
Sr | 10,638 |
Pa | 10,514 |
N | 9,570 |
W | 9,456 |
Br | 9,329 |
S | 9,006 |
V | 8,981 |
Ba | 8,874 |
I | 8,689 |
Rb | 8,254 |
F | 8,108 |
Cs | 7,899 |
Na | 7,634 |
K | 7,046 |
U | 6,902 |
Cr | 6,831 |
Ac | 6,754 |
Eu | 6,373 |
Be | 6,295 |
Te | 6,214 |
Tl | 5,981 |
Cl | 5,970 |
H | 5,540 |
Gd | 5,283 |
Np | 3,675 |
Pu | 2,622 |
Pm | 870 |