CrystalFp: the crystal fingerprinting project

[CrystalFp end user application]

CrystalFp started as a way to solve a problem with the USPEX crystal structure predictor. Every USPEX run produces hundreds or thousands of crystal structures, some of which may be identical. To ease the extraction of unique and potentially interesting structures a method to find and remove duplicated structures has to be found.

The approach adopted was to apply usual high-dimensional classification concepts to the unusual field of crystallography.

We adopted a visual design and validation method to develop a classifier library (CrystalFp) and an end-user application to select and validate method choices, to gain users' acceptance and to tap into their domain expertise.

Using the end-user application with real datasets, we experimented with various crystal structure descriptors, distinct distance measures and tried different clustering methods to identify groups of similar structures. These methods are already applied in combinatorial chemistry to organic molecules for a different goal and in somewhat different forms, but are not widely used for crystal structures classification.

The use of the classifier has already accelerated the analysis of USPEX output by at least one order of magnitude, promoting some new crystallographic insight and discovery. Furthermore the visual display of key algorithm indicators has led to diverse, unexpected discoveries that will improve the USPEX algorithms.

Unexpected discoveries

Looking at the data we found unexpected correlations and patterns that deserve further investigation. So the research continues.

This wiki is intended to collect public data and results of this research, and to be a repository of ideas and current research directions (this part is restricted to project participant at the moment).

Resources

You can get an idea about the project in the documentation area or access the code in the library area.

Here, the fingerprinting method section collects general information about the project, its philosophy and related publications. The library contains the source code and API documentation of the CrystalFp library. And, finally, the current research section acts as a (restricted) repository of ideas, things to do and so on.

On this wiki there are also pages about the tools we are using to analyze the data with a particular emphasis on HPC methods to leverage CSCS computational resources and speedup the analysis work.

Last, but not least, CrystalFp inside STM4 has its own Getting started guide page.

Enjoy! (and be patient if something is still missing...)

Contacts

Contact us if you want more information on the project or want to collaborate with new ideas.

  • Ing. Mario Valle — Swiss National Supercomputing Centre (CSCS) — Switzerland
  • Prof. Artem R. Oganov — Dept. of Geosciences and New York Center for Computational Science, State University of New York at Stony Brook.