Material Discovery with PhaseSelect – The AI Pathfinder in Uncharted Chemical Spaces


The pursuit of novel materials commences with the strategic selection of chemical elements from the periodic table. This decision shapes the synthetic journey and influences the functional properties of the resulting materials. The process involves identifying a phase field—a collection of chemical elements (e.g., {Cu, O, B}) that can form various compositions (e.g., Cu2BO4)—and exploring its potential. The selection of a particular phase field sets the course for research and experimentation, underlining the importance of this high-level decision-making in material science.

Recent advancements in machine learning (ML) have revolutionized material property prediction, leveraging vast materials databases to forecast properties based on composition and structure. Despite this progress, challenges persist, especially in predicting materials with unknown compositions or structures. Conventional ML methods, reliant on extensive screening, may overlook potential new materials due to limitations in their database-driven approach. Small composition variations, often critical for material properties, can lead to discrepancies between synthesized materials and computational models. This situation underscores the necessity for tools that can assess entire phase fields, considering both known and potential compositions, and counteract the historical bias inherent in composition-based ML models.

By examining materials at the level of constituent elements, we can address these challenges. Grouping materials into phase fields allows for a more balanced representation in datasets, improving ML model accuracy and enhancing their ability to predict unexplored chemistry. The PhaseSelect approach, an integrated ML model, exemplifies this strategy by prioritizing phase fields based on functional performance and chemical similarity. PhaseSelect employs a semi-supervised learning algorithm to understand chemical elements based on their co-occurrence in known materials. This model then uses supervised learning to assess the functional performance of materials, focusing on properties like superconducting transition temperature, Curie temperature, and bandgap energy. The model architecture integrates several artificial neural networks, each responsible for a specific aspect of the learning process: aggregation of compositions into phase fields, elemental representation, phase field representation, and supervised assessment of properties.

In a new study published in NPJ Computational Materials Journal led by Professor Matthew Rosseinsky and conducted by Dr. Andrij Vasylenko, Dr. Dmytro Antypov, Dr. Vladimir Gusev, Dr. Michael Gaultois, Matthew Dyer from the University of Liverpool,  developed a machine learning model named PhaseSelect. Their aim was to improve the process of discovering new materials by focusing on the selection of phase fields at the level of the periodic table, which entails selecting sets of chemical elements that could potentially form various compositions. The team started by conceptualizing a framework to assess unexplored candidate inorganic functional materials. This framework was built on the idea of identifying phase fields that are likely to contain these candidates, thereby simplifying the complex challenge of evaluating all possible compositions from chosen elements. They developed PhaseSelect, an integrated machine learning model that begins with semi-supervised learning to understand representations of chemical elements based on their co-occurrence in known materials. This step was crucial in capturing the complex relationships between different elements. The model incorporated an ‘attention’ mechanism to assess the contributions of individual chemical elements to the functional performance of materials. This approach is inspired by techniques used in natural language processing and was adapted to understand material compositions. PhaseSelect was designed to predict material properties (e.g., superconducting transition temperature, Curie temperature, and bandgap energy) using regression and binary classification approaches.

The researchers gathered data from existing materials databases, focusing on compositions with experimentally verified properties. They aggregated materials with similar constituent elements into a single phase field. PhaseSelect was trained using this aggregated data. The model learned to predict the maximum value of a property achievable within a phase field and to classify materials into performance categories based on these properties. The team conducted experiments where the model performed regression tasks (predicting maximum values of properties within phase fields) and classification tasks (dividing materials into high- and low-performing groups based on certain property thresholds). Another key experiment involved using an unsupervised learning approach to rank phase fields based on their chemical novelty. This involved assessing how similar these phase fields were to known, synthetically stable materials. The researchers analyzed the model’s predictions for various phase fields, assessing its accuracy and reliability in forecasting material properties.

A significant part of their analysis focused on identifying unexplored phase fields that showed promise for yielding new materials with desirable properties. To validate the effectiveness of PhaseSelect, the team compared its performance against baseline models like default random forests with Magpie descriptors. This comparative analysis demonstrated the superior capability of PhaseSelect in predicting material properties and identifying promising phase fields. In summary, the researchers developed an innovative machine learning model, PhaseSelect, that intelligently prioritizes phase fields for material discovery. They conducted a series of experiments focusing on data aggregation, model training, property prediction, and novelty ranking. Through these efforts, they significantly advanced the field of material science, offering a more efficient and data-driven approach to discovering new functional materials.

PhaseSelect processes materials databases to aggregate compositions into phase fields. Each phase field is associated with the maximum reported value of a specific property among all compositions within that field. This aggregation strategy addresses potential biases in the datasets and enables a more uniform representation of materials.

PhaseSelect builds a matrix representing the coexistence of chemical elements in known materials. Using a shallow autoencoder neural network, the model compresses this information into a latent space, where elemental vectors are grouped based on similarity. The model then uses multi-head local attention to weigh the relevance of elements within each phase field, enhancing the model’s ability to predict material properties accurately. The model conducts supervised assessments of properties through separate neural networks for regression and classification. These assessments predict the maximum achievable property values within phase fields and classify them into performance categories. Additionally, PhaseSelect ranks phase fields based on their chemical similarity to experimentally verified materials, employing an unsupervised deep AutoEncoder neural network. PhaseSelect’s architecture allows it to predict properties and chemical accessibility of phase fields effectively. The model demonstrates improved performance compared to baseline models, indicating its ability to understand complex relationships between elemental combinations and material properties. PhaseSelect also identifies unexplored ternary phase fields with high probabilities of exhibiting desired properties. For example, it can classify materials with respect to superconductivity, magnetism, and energy bandgaps, highlighting phase fields likely to yield stable compositions with superior functional properties. The model’s predictions align with expert knowledge in chemistry, suggesting its effectiveness in guiding material research towards promising directions.

PhaseSelect offers a novel approach to material discovery, focusing on the high-level assessment of unexplored candidate inorganic functional materials. By evaluating materials at the phase field level, the model circumvents the exhaustive individual assessment of all possible compositions. This strategy aids in decision-making for experimental solid-state inorganic chemistry, guiding researchers towards combinations of elements most likely to yield new stable compounds with superior functional properties. The integration of PhaseSelect’s predictions with expert knowledge and understanding enables researchers to prioritize promising phase fields, reducing the risks associated with material discovery. The model’s attention mechanism also provides insights into machine learning interpretations for materials science, facilitating the extrapolation of materials database knowledge to unexplored phase fields. PhaseSelect thus represents a significant advancement in the conceptualization and discovery of novel functional materials.

Material Discovery with PhaseSelect - The AI Pathfinder in Uncharted Chemical Spaces - Advances in Engineering
Image Credit: npj Computational Materials 9, 164 (2023).

About the author

Professor Matthew Rosseinsky

Department of Chemistry
University of Liverpool

I was elected to the Royal Society in 2008.I was awarded the Hughes Medal of the Royal Society in 2011 “for his influential discoveries in the synthetic chemistry of solid state electronic materials and novel microporous structures.” In 2017, I was awarded the Davy Medal of the Royal Society “for his advances in the design and discovery of functional materials, integrating the development of new experimental and computational techniques.” I am currently a Royal Society Research Professor (since 2013).

I work on the synthetic chemistry, design and discovery of solid state materials, which have applications ranging from catalysis to superconductivity. A current focus is the development of new methods of identifying functional materials, emphasising the integration of experiment with computational methods.


Vasylenko, A., Antypov, D., Gusev, V.V. et al. Element selection for functional materials discovery by integrated machine learning of elemental contributions to properties. NPJ Computational Materials Journal 9, 164 (2023).

Go to NPJ Computational Materials Journal

Check Also

Illuminating Paths to Fluorinated Molecules: A Photocatalytic Leap in Organic Synthesis - Advances in Engineering

Illuminating Paths to Fluorinated Molecules: A Photocatalytic Leap in Organic Synthesis