How to analyze thousands of complex crystal structures in a minute

Learn how to use STM4 to analyze USPEX outputs

	Welcome to the STM4 for USPEX tutorial! This tutorial has been run in: Poitiers (France) at the DN-NSM 2011 workshop on June 27, 2011 Xi'an (China) during the USPEX workshop on August 4, 2011
	This tutorial will cover: STM4 parts Few important ideas from AVS/Express, the core of STM4 A tour inside STM4 looking inside the modules provided The CrystalFp functionalities inside STM4 and to use them to analyze USPEX outputs The batch CrystalFp implementation You will try the various argument by yourself on STM4 installed on the tutorial machines.
	Now we look at the STM4 main sections.
	In 4 lines: STM4 is a framework for the development of unusual and enhanced techniques for chemistry visualization. This means that it is not a point-and-click chemistry visualization tool, but it is a much more flexible environment to build chemistry visualization applications. Note also that STM4 grow is driven by user requests, so order and organization are not its “forte”.
	Your turn. Try to launch STM4.
	Looking at the STM4 windows the following areas are visible: Library Build area Viewer The area containing the control panels of the various modules Few command that operate on the viewer itself
	The STM4 library is one library inside the original AVS/Express visualization libraries (more in the next part).
	Look at the various STM4 sub-libraries. More details later.
	There are prebuilt applications that can be used as generic visualizers. There is MolDisplayApp for generic chemistry structure files, CrystalDisplayApp that adds symmetry computation and unit cell replication, and VolumeDisplayApp to display structure and volumetric data like electronic density. Try to instantiate MolDisplayApp and look at one of the data files provided.
	Anatomy of an application. In this example the MolDisplayApp pre-built application distributed inside STM4.
	Another, more complex, application that read a structure plus a corresponding volumetric data (like electronic density). It replicates the unit cell (both structure and volumetric data) and computes an isosurface on the volumetric data. The application displays also a legend of the atom size and color (even if they are almost all hidden inside the isosurface). By the way, the example is the VolumeDisplayApp prebuilt application to which the Color Legend module has been added. You see, it is really simple to customize the provided applications.
	Right mouse button on a module reveals functionalities like: Help (see next slide) Info (the name and data type of the module's input and output ports) The connector colors reveal the data type. The most colorful of all is the STM4 Molecule data type
	You can have help for each module or, from the STM4 web pages, access the list of all modules.
	A few STM4 resources: STM4 main pages The STM4 modules documentation STM4 download The original STM3 paper (please cite it if you use STM4): M. Valle, STM3: a chemistry visualization platform, Zeitschrift für Kristallographie, vol. 220, no. 5-6, pp. 585-588, 2005 (Note that STM3 was the first version of what is now STM4) A user level AVS/Express course A (marketing) brochure on AVS/Express
	Now we look at the AVS/Express core ideas that are inherited by STM4.
	STM4 is built on top of AVS/Express.
	AVS/Express is a commercial visualization environment. I know, a commercial tool can be a problem, but it is fast to develop an application using it, and I have 8 year experience with it…
	Here are some buzz-phrases related to AVS/Express. Besides the marketing flavor, they accurately describe the tool main characteristics: AVS/Express is a multiplatform development environment. It is not an end-user, point-and-click visualization tool. That means more power, but also more complexity 3D visualization. Charting and 2D are supported, but not at the same level. Object oriented. What you see are really a bunch of classes. You derive, instantiate, call methods and so on. The programming is mainly done visually, combining and connecting modules in a LEGO-like fashion. To know more about AVS/Express and the projects that use it take a look at the AVS/Express links list above.
	When you start AVS/Express a dialog let you choose the type of application you want to start with or let you open a previously saved application. For now choose the default (Single-window DataViewer and 3D). The two windows that open are: The Network Editor — with the module library and the working area. The Data Viewer — whose parts are explained in the next slide.
	The Data Viewer is made by four parts: The Viewer itself A Toolbar to access commonly used functions like setting the mouse interaction behavior or the “magic” button RNC (Reset/Normalize/Center) that puts again the scene content inside the viewer limits. An area in which the module user interface panels appear. In the same area open the various Editors A status panel with the interrupt button (bottom left) The example image is the result of the LEA (Lagrangian-Eulerian-Advection) module for the visualization of time dependent 2D vector fields. This module is available in the International AVS Centre (IAC) repository. The editors menu is the entry point for various viewer-related control functions. For example here you can change the background color of the viewer or its dimensions.
	When you start AVS/Express, the default library appears (called Start). It contains few demos and examples and a sub-library explaining “What’s New” in the current version of AVS/Express.
	Besides the Start library the most important libraries are: Main: contains all important modules Standard Objects: collects the low level objects (like int, float, string) Examples: very useful examples on all the visualization techniques offered User Interface: widgets to build user interfaces Annotation Graphing: charting 2D Library Workspaces: parking place for user developed modules
	The Main library contains more that 500 modules. Each module implements a different visualization technique. These techniques are distributed into two sublibraries: Filters (that alters only the data, not the underlying geometry) and Mappers (that change data and geometry).
	The data-flow networks are built in the working area called the Network Editor. Drag needed modules from the libraries to the working area (Network Editor: NE). The modules are connected “drawing” the connection between them. Drawing the connection again disconnects the modules. To access the interior structure of a module, double click on it. The contextual menu has various functions: On modules “Help” and “Info” are useful. On NE: “Arrange Icons”. On a connection: “Insert Link” helps “see” what pass on the connection. On the NE: Shift-M2 (on PC Shift-ButtonLeft) Pan; Ctrl-M2 (on PC Ctrl-ButtonLeft) Zoom.
	Now try an example (to widen your horizons toward scientific visualization…)
	The most interesting feature of AVS/Express is the manner in which execution of the various modules is managed. This architecture is called: data-flow. In this architecture you do not manage execution. You simply describe the data passing architecture between processing modules.
	At the beginning everything is quiescent.
	Then you enter a filename for the Read Field module.
	The Read Field module starts executing and output valid data on its output port. Then the surf plot module has all its input port connected to valid data so it starts execution.
	Surf plot then produces valid data on its output port. Instead the texture mesh module, that has two input ports, remains idle because only one of its input ports has valid data.
	Then you enter a valid filename on the Read Image control panel.
	Read Image executes and produces data on its output port. Then the texture mesh module senses valid data on all its input ports, so it starts executing.
	The texture mesh red output port carries valid data when the module finishes execution, so the viewer module can start execution and shows the resulting visualization on screen.
	And here is the final result. Funny, isn't it? The main goal of the data-flow architecture is to simplify the programmer work letting him (and better, her) concentrates on the data manipulation tasks and not on the control infrastructure.
	The Object Manager manages the module execution order, the data movement between modules, etc. It is the heart of AVS/Express. The Object Manager has three interfaces: The Network Editor graphical one An API for C, C++ and Fortran The V language An example of V language is in the file created when you save your application. Today we do not cover the V language, but remember that is the V language that makes AVS/Express so powerful. V language statements and Object Manager commands can be entered directly at the VCP prompt, the one appearing in the window where AVS/Express has been launched, that looks like: `OM(Root) ->`
	AVS/Express is not only visualization, it is a programming environment. So there is available also a set of modules to build user interfaces.
	You can mix standard AVS/Express modules with STM4 modules. Imagine the possibilities…
	You can simply add AVS/Express in a STM4 application. For example here I added orthoslice, gradient and glyph modules to VolumeDisplayApp to show the gradient of the scalar value on a volume slice.
	After you create your marvelous application, save it using the File menu. The file is a text file and should have extension `.v`
	There are two AVS/Express editions. The difference is mostly the price. STM4 works with both editions.
	AVS homepage (here are the AVS Offices' addresses and a form to request a temporary license) Official documentation AVS forum AVS/Express built-in examples Visualization techniques book (in the AVS/Express manuals) International AVS Center (IAC) IAC training material Patches, documentation and examples Other resources A user level AVS/Express course
	Some more in depth view of the STM4 modules.
	The Experimental sub-library will be covered later (it contains the modules useful to analyze USPEX outputs). Its content is revealed by double clicking.
	Few pre-built applications to solve normal chemistry visualization tasks.
	Start with the generic viewer MolDisplayApp.
	Delete it and instantiate the crystal structure viewer CrystalDisplayApp (compared to the previous one it adds symmetries).
	Then add tetrahedras using the Draw Simple Polyhedra module.
	Add some ornaments: Logo Background Fade Text Title Color Legend to identify atom types
	Measure distances and angles using the pre-built application PickDisplayApp or adding the low level Measure Structure module to your application.
	Readers and writer for various, static and dynamic, chemical file formats plus movies and images production.
	STM4 supports various formats. These are the ones someone told me to support. So more could be added.
	More than one structure could be visualized. Or the same structure could be rendered with two different rendering modes.
	Various modules to output graphical objects help data understanding.
	The display structure module has controls to render atoms and bonds, colors and color schemes.
	Bonds related modules.
	Add bonds for each atom individually plus coordination tetrahedra using the Compute Selected Bonds module.
	There are modules that compute useful derived structures.
	Interactivity. Measures and more, like pick unit cell to redefines the unit cell for the displayed structure by picking on four atoms (pre-built application: PickUnitCellApp).
	Measure a structure by picking one or more (groups of) atoms using the pre-built application PickDisplayApp.
	Measure atoms closer to a given atom (inside a given radius) using the Measure Closer Atoms module. Note that this is the idea behind the fingerprinting method.
	You can select a subset of the atoms by atom characteristics (Select Atoms) or by atom's associated data (Threshold Data). To select atoms inside a given geometrical shape see the Crop Structure module.
	Try data thresholding to select atoms. Here the unselected atoms are rendered as gray lines only.
	Trajectory support. Most useful is the Accumulate Traces module. Results are visualized as clouds or as lines. Available also the pre-build TracesDisplayApp application.
	Here is an example.
	There are data formats that carry not only atomic positions but also volumetric data.
	With the VolumeDisplayApp pre-built application an isosurface on Gaussian cube data can be visualized.
	Volume render needs parameter tuning to achieve the best results.
	At last Crystallography support!
	Crop structure using a sphere (to reveal internal structure for example).
	The Find Symmetries module calls KPLOT executable and packs results for visualization inside STM4
	Fermi surfaces example (it reads EIGENVAL files) visualized inside the FermiSurfacesApp pre-built application. Another application (FermiBandsApp) displays Fermi bands.
	Find enthalpy transitions (needs enthalpy file) using the Enthalpy Transitions module.
	To support the convex-hull method you find the MultiComponentApp prebuilt application. It reads the energy file and the composition file from USPEX, try to find two structures that acts as basis, then plot each other structure as function of combination percentage of the two basis elements and the energy difference.
	Now the modules that support USPEX results analysis. The code behind them is called CrystalFp and you can download an use it independently from STM4 (I will cover this, if there is enough time, at the end of the tutorial)
	There are three prebuilt applications: FpCompareApp if you don’t have energies FpEnergyCompareApp the most commonly used application FpEnergyLandscapeApp same as FpEnergyCompareApp but can visualize also energy landscapes The other sub-library is still not finished but will contains the new version of these applications. Unfortunately none of the CrystalFp modules has an help (yet).
	Load FpEnergyLandscapeApp A lot of windows open Go to Read Structure control panel and load the structure file (don’t start the Run yet)
	Go to the Read Energy control panel and load the energy file (select if the energy is per atom or per structure). Its format is a floating point ASCII number per line in the same order as the POSCAR structures.
	To initialize: go to the Structure Similarity control panel and push Reset If needed check the “Clusters” toggle (if the structures are nanoclusters and not crystals) Return to the Read Structure panel and push reset, then push run to load all structures If you need to append more than one run then uncheck Enable load, check Append, enter run identifier (normally increases by 10000 each run), read the structure and energy files, than check enable load and push Run on the Read structure module. If you ask why these complications are needed, the answer is “Data Flow Architecture”. The modules starts as soon as anything is available, instead of waiting till all the data is ready.
	When finished loading structures go there and push “Compute Fingerprints” Then “Compute Distances”
	Adjust clustering threshold if needed.
	Now you can analyze results. For the charts there is a cursor to read values and a Gaussian smoothing line.
	Go to scatterplot. Adjust timestep (reduce it if you have many points) Go It tries the best of “Num. retries” runs Change colors
	Show the mapping efficiency chart (more points near the diagonal the better).
	Display energy in the scatterplot then go to landscape and check “Enable”.
	If you need to align structures you can select them with right button pushed on the scatterplot, or selecting from lists.
	Your turn. Try it on your data.
	Or better, in the Andriy O. Lyakhov tutorial you will analyze real data as computed fresh from USPEX.
	If we have time (and you are interested in it) I want to present the batch interface to CrystalFp. The code is the same you have seen, but it is accessed from a command line application.
	Few differences in script between Windows and Linux-Mac.
	If you run cfp without parameters you see its command line switches.
	This is the usual call to generate data: `fp.fld + fp.dat` contains the fingerprints `dist.fld + dist.dat` contains the distance matrix `summary.dat` few (machine readable) summary data `map.dat` to map results to original structures (needed if structures are removed due to similarity) `sorted.dat` the distances sorted in increasing order `analysis.csv` a comma separated list of all the analysis values for all structures
	If you use the statistical language R you can read and visualize the results. Obviously the two `.fld` files can be visualized also with AVS/Express using the Read Field module.
	The cfp driver program does also scatterplots that could be visualized with R.
	Thanks for your attention. And don’t hesitate to contact me if you have any doubt.