DeepMind Materials new crystal structure data analysis

A recent Google DeepMind project paper Scaling deep learning for materials discovery described using GNNs trained on a large and diverse set of first-principles calculations to enable the efficient discovery of inorganic materials and the discovery of 2.2 million new crystal structures, many of which escaped previous human chemical intuition. This represents an order-of-magnitude expansion in stable materials known to humanity. Out of the 2.2 million structures, 381,000 are identified as stable crystal discoveries.


Example crystal structures

Stable crystal data

DeepMind has created a Github repo for these stable crystal discoveries containing documentation and scripts to download data. One of the downloadable files named stable_materials_summary.csv contains 381,000 records. Each row is a stable crystal discovered. The repo documentation describes the contents of the csv file.

  • Composition: alphabetically-ordered composition
  • MaterialId: a unique id corresponding to the entry
  • Reduced Formula: reduced chemical formula
  • Elements: chemical system eg [‘Sr’, ‘Ag’, ‘Pr’, ‘Tb’]
  • NSites: number of atoms
  • Volume: volume in units Å^3
  • Density: density in units Å^3 / atom
  • Point Group: assigned point group
  • Space Group: assigned space group
  • Space Group Number: assigned space group number
  • Crystal System: assigned crystal system
  • Corrected Energy: energy adjusted by MP2020 corrections
  • Formation Energy Per Atom: normalized energy corrected by reference elements
  • Decomposition Energy Per Atom: decomposition energy relative to the downloaded Materials Project convex hull
  • Dimensionality Cheon: dimensionality predicted by Cheon et al. 2017
  • Bandgap: calculated bandgap
  • Is Train: in training set for associated machine learning models
  • Decomposition Energy Per Atom All: distance to convex hull of all entries
  • Decomposition Energy Per Atom Relative: distance to convex hull of all entries except for the current
  • Decomposition Energy Per Atom MP: distance to convex hull of all entries from Materials Project (including recalculations)
  • Decomposition Energy Per Atom MP OQMD: distance to convex hull of all entries from Materials Project + Open Quantum Materials Database (including recalculations)


I created the following visualizations from these columns to help understand the structure of the stable_materials_summary.csv data.


Distributions of numerical columns


Bar charts of categorical columns

The bottom right subplot uses a count of constituent elements that wasn’t in the original csv file.  It is simply a count of crystals elements eg if Elements column value was [‘Sr’, ‘Ag’, ‘Pr’, ‘Tb’] then the count is 4.


Element occurrences

The visualization counts the occurrences of 84 elements over all the 381,000 crystals. For readability the same values are presented in tabular format below too. Si (Silicon) is most frequent element and Pm (Promethium) the least frequent. This feels a bit like a shopping list of ingredients if you wanted to try to assemble these crystals.


Element # Crystals
Si 54,540
Ir 46,863
Ho 43,809
Tb 42,975
Rh 42,367
Dy 42,088
Y 41,551
Er 40,166
Co 39,040
Ru 37,413
Pt 37,147
O 35,986
Tm 34,653
Al 33,851
P 33,479
Ga 30,200
Ni 29,696
Pd 29,403
Nd 28,541
B 27,544
Sn 27,514
Lu 27,088
In 25,881
Pr 25,431
Sc 23,478
Os 23,209
La 22,376
Zn 20,446
Cd 20,380
Fe 18,741
Zr 18,679
Th 18,628
Bi 17,738
Hf 17,566
Li 17,234
Tc 17,232
Yb 16,509
Cu 16,357
Sb 16,252
Au 15,765
As 15,606
Sm 15,186
Ti 14,392
Ca 14,034
Nb 14,008
Ta 14,008
Ge 13,887
Mn 13,785
Mg 13,511
Ce 13,010
Pb 12,820
Re 12,818
Hg 12,612
Ag 12,566
C 11,925
Mo 11,000
Se 10,970
Sr 10,638
Pa 10,514
N 9,570
W 9,456
Br 9,329
S 9,006
V 8,981
Ba 8,874
I 8,689
Rb 8,254
F 8,108
Cs 7,899
Na 7,634
K 7,046
U 6,902
Cr 6,831
Ac 6,754
Eu 6,373
Be 6,295
Te 6,214
Tl 5,981
Cl 5,970
H 5,540
Gd 5,283
Np 3,675
Pu 2,622
Pm 870

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top