The Rule of Four: Insights into the Structural Anomalies and Distribution of Inorganic Compounds


One fundamental question in materials science is why certain materials with specific characteristics are more abundant than others. This inquiry, while simple in phrasing, is complex in practice due to the vastness of the compositional and configurational space materials can occupy. This complexity is further compounded by the challenges of systematically exploring and categorizing the properties of these materials, especially as new materials are continuously being discovered and characterized. A particularly intriguing and previously unreported observation is the anomalous abundance of inorganic compounds whose primitive unit cell contains a number of atoms that is a multiple of four. This phenomenon, termed the “rule of four” (RoF), poses a unique challenge to materials scientists. It raises questions about the underlying principles that govern material formation and stability, and why this specific structural characteristic appears more frequently than others.

Understanding the RoF and its implications presents several challenges because the the sheer number of possible elemental combinations and configurations makes it difficult to identify patterns and correlations without advanced computational tools. Additionally, high-quality, diverse datasets are essential for accurate materials characterization and discovery. However, gathering and curating such datasets is a resource-intensive process. Moreover, materials’ properties are influenced by both global structural descriptors (such as symmetries and packing configurations) and local structural descriptors (such as the smooth overlap of atomic positions). Analyzing these descriptors to uncover meaningful insights requires sophisticated computational techniques. Furthermore, determining whether structural anomalies like the RoF correlate with physical properties such as formation energy and stability involves extensive computational and experimental validation.

To this end, new study published in NPJ Computational Materials and conduced by Elena Gazzarrini, Dr. Rose Cersonsky, Dr. Marnik Bercx, Dr. Carl Adorf & led by Professor Nicola Marzari from the Theory and Simulation of Materials (THEOS) and National Center for Computational Design and Discovery of Novel Materials (MARVEL) at École Polytechnique Fédérale de Lausanne in Switzerland quantified the occurrence of the RoF in two extensive databases of inorganic crystal structures. The researchers began by curating structural data from two primary databases: the Materials Project (MP) and the Materials Cloud 3-dimensional crystal structures ‘source’ database (MC3D-source). The MP database includes crystal structures relaxed with first-principles calculations, while the MC3D-source combines experimental structures from multiple sources such as the Crystallographic Open Database (COD) and the Inorganic Crystal Structures Database (ICSD). To ensure consistency, all structures were reduced to their primitive unit cells using the spglib softwar. This step confirmed that the observed rule of four (RoF) was not an artifact of the mathematical description or database processing. The finding that changing the symprec parameter had minimal impact on the RoF distribution reinforced the legitimacy of this anomaly. The next step the authors analyzed the formation energies of compounds to test if the RoF was correlated with stability. Formation energy per atom, a measure of a compound’s stability with respect to elemental phases, was calculated using data from the MP database. The normalized distribution of formation energies for 83,989 compounds revealed no significant correlation between RoF structures and lower formation energies. However, RoF structures exhibited a longer positive tail of large formation energies, suggesting a broader range of stability within these compounds. This finding indicated that the RoF was not directly associated with a tendency towards lower energy states.

The researchers then examined the relationship between the RoF and crystal symmetries, specifically space groups and point groups, which define the set of symmetry operations that leave the structure unchanged. Histograms of inherited symmetries showed a relative abundance of non-RoF structures in high-symmetry point groups, while RoF structures were predominantly found in low-symmetry groups (2, m, 2/m, mm2, 222, and mmm). This indicated that RoF structures are characterized by low symmetries and loosely packed arrangements, maximizing free volume. Further analysis of geometric properties, such as the number of atomic species (Nspecies), revealed that RoF structures generally comprised more elements and smaller atomic radii compared to non-RoF structures. Moreover, to investigate further the local environments of RoF structures, the researchers employed the Smooth Overlap of Atomic Positions (SOAP) method, which provides a statistical framework for analyzing local atomic symmetries. SOAP vectors were used to represent each compound’s average three-body local environment. The study utilized Principal Covariates Regression (PCovR) to explore the correlation between local symmetries and formation energies. Despite extensive analysis, PCovR results showed no significant difference in energy between structurally similar RoF and non-RoF structures, indicating that local symmetries did not strongly correlate with energetic descriptors. They also applied a Random Forest (RF) classification algorithm to distinguish RoF structures based on local structural descriptors. Using species-invariant SOAP vectors, the RF classifier achieved an accuracy of 87% in predicting RoF structures. The accuracy plateaued at a local environment cutoff of 4.0 Å, suggesting that relevant local features occurred within the first two neighbor shells. This finding highlighted the importance of local structural symmetries in differentiating RoF from non-RoF structures.

The authors’ findings hold significant implications for the field of materials science, particularly in computational materials discovery and materials informatics.  the researchers provided new information into the structural characteristics and distribution of materials.  The discovery of the RoF enabled more refined classification schemes for inorganic compounds, aiding in the identification of materials with unique structural properties. This can streamline the materials discovery process, allowing researchers to focus on compounds that conform to or deviate from the RoF for targeted applications. Another important significance is the showcase of the study’s integration of advanced machine learning techniques, such as the Random Forest classifier and SOAP vectors, demonstrates the power of local structural descriptors in predicting material characteristics. These models can be further developed to predict other material properties, enhancing the efficiency and accuracy of computational materials screening. Indeed, understanding the structural peculiarities of materials, such as those governed by the RoF, is important in many applications  from electronics, energy storage, to catalysis. Materials with specific structural features may exhibit unique electronic, thermal, or mechanical properties, making them suitable for specialized applications.

The Rule of Four: Insights into the Structural Anomalies and Distribution of Inorganic Compounds - Advances in Engineering
A bar chart illustrating the rule of four (RoF) by showing the relative abundance of inorganic compounds with primitive unit cells containing multiples of four atoms compared to those with non-multiples of four. The blue bars represent the abundance of RoF structures, while the orange bars represent non-RoF structures. This visual representation helps to highlight the disproportionate number of structures that conform to the RoF. Key Points: X-Axis: Number of atoms in the primitive unit cell. Y-Axis: Relative abundance of these structures in percentage. Blue Bars: Abundance of structures following the RoF (multiples of 4 atoms). Orange Bars: Abundance of structures not following the RoF (non-multiples of 4 atoms). The chart clearly shows that structures with primitive unit cells containing multiples of four atoms are more abundant than those with non-multiples of four, illustrating the rule of four. (The graph is made by Advances in Engineering Graphics & Science Team).

About the author

Elena Gazzarrini

Computing Fellow at CERN

Ex-physics student at King’s College London, UC Berkeley and EPFL with experience in computer simulations of complex bio-physics systems and in nano-materials simulations for catalytic reactions. Following her interest in the deployment of large data systems and of computing architectures, Elena is currently a Computing Fellow in the CERN IT Department, working in close collaboration with scientists from High Energy Physics (HEP) and Astrophysics on the implementation of a reproducible research analysis platform (Reana) and its integration with a common data storage solution (Data Lake). The aim is demonstrating how such an ecosystem can serve the interdisciplinary needs of Dark Matter Search Science Projects as part of the European Open Science Cloud (EOSC) Future project, funded to integrate, consolidate, and connect e-infrastructures, research communities, and initiatives in Open Science. Working in close collaboration with physics postdocs.

Experience in

  • 3D animations, video editing, content research on technology evolution
  • website and digital content development
  • deployment of computational cloud infrastructures for collaborative research
  • software development for research preservation and reproducibility
  • database management and data transfer systems
  • nano-materials simulations for catalytic reactions
  • statistical and clustering algorithms to better understand emerging energetic and topological properties of inorganic materials datasets
  • computer simulation and analysis of complex data systems

About the author

Professor Nicola Marzari

Nicola Marzari holds the chair of Theory and Simulation of Materials at the École Polytechnique Fédérale de Lausanne (Switzerland), where he is also the director of the National Centre on Computational Design and Discovery of Novel Materials of the Swiss National Science Foundation. He heads the Laboratory for Materials Simulations at the Paul Scherrer Institut (Switzerland) and holds an Excellence Chair at the University of Bremen (Germany).

Previous tenured appointments include the Toyota Chair for Materials Processing at the Massachusetts Institute of Technology, and the first Statutory Chair of Materials Modelling at the University of Oxford (UK), where he was also the director of the Materials Modelling Laboratory. He is a past Chairperson of the Psi-k Charity. He holds a PhD in Physics from the University of Cambridge (UK), and a Laurea in Physics from the University of Trieste (Italy).

His research is dedicated to the development and application of quantum-mechanical simulations to understand, predict, and design the properties and performance of novel materials and devices. More than 30 members of his group have moved to faculty positions worldwide, including MIT, Harvard, Imperial College, EPFL, and Seoul National; 5 have been awarded the US NSF CAREER Award. The open-access software and data infrastructure developed by the group sustains more than 4,000 scientific publications per year.


Elena Gazzarrini, Rose K. Cersonsky, Marnik Bercx, Carl S. Adorf & Nicola Marzari. The rule of four: anomalous distributions in the stoichiometries of inorganic compounds. npj Comput Mater 10, 73 (2024).

Go to npj Comput Mater

Check Also

Lower energy requirements for batteries using enhanced conductive additive - Advances in Engineering

Lower energy requirements for batteries using enhanced conductive additive