Data representation in chemistry
- Chemists usually deal with a narrow range of data types. In this page I will try to understand them.
- Chemistry visualization uses a limited, traditional set of representations for those data.
- I am convinced that more complex data types and unusual representations could help bring new insights in the chemistry research.
- Otherwise the risk is this: “A stagnant set of representations limits the way scientists think about their models and thereby limits potential insights”.
So I collected this material as one of the inputs for a research underway about representation in chemistry and the support or constrain provided by the current chemistry visualization tools.
My hope is that this work could help me in creating and proposing new visualization tools and techniques. And, maybe, catalyze new ideas for the chemists I collaborate with.
What you find here
Here you find a collection of images from various chemistry visualization tools together with open questions about their usefulness and their specific goals.
I am not a chemist, but work with them. I eagerly try to learn new things in this field, so mistakes and misunderstanding are only mine.
Chemistry visualization deals with a narrow range of data types. The most important are:
- Structural information
- E.g. atoms and bonds plus associated attributes like charge. The structures could be time dependent to show reactions or evolution of phenomena.
- Volumetric data
- Usually scalar variables like electron density, ELF, orbitals. A less usual, but similar data type is volumetric vector data found, for example, in electronic currents or magnetic lines.
- 1D scalar
- E.g. spectra, energy vs. simulation step.
- 2D scalar
- E.g. COSY and NOESY spectra, two parameters energy landscapes.
After a brief discussion about why a chemist should be interested or not interested in 3D representations, the various images are grouped as:
- Pure structural – various forms of ball-and-stick
- A digression on bonds representation
- Structural with associated data
- Time dependent structural
- Summarizing methods
- An important tool is the computation of molecular surfaces
- Volumetric data
- Non volumetric data
In chemistry 2D structural representations are still king. Here are some relative merits and problems of 2D vs. 3D structural representations.
Ball-and-Stick is the most fundamental and common representation. The atoms are normally colored according to atom type and the bonds simply echo this information.
Atoms and bonds coloring could be differentiated to carry additional information to be fused together with the structural one. For example bonds could be colored by bond strain or bond dipole strength.
The bonds coloring could help or hinder structure perception. The two color form (middle) increases the number of sharp edges in the image, thus increasing distraction, compared to the smooth one (left). On the other hand this form helps distinguish atoms. The neutral form (right) can avoid distractions, leaving the focus on the atoms.
There are various proposals of color schemas for ball-and-stick representation.
The visualization of bonds can lead to high visual clutter in overview renderings, but enables the user to judge the exact alignment and chemical makeup of chemical structures in close-up views. Remember also that this representation is the most effective for teaching: it is well tuned to the mental model students usually already have.
The original function of physical ball and stick models were the support of measurements of structure angles and bonds lengths, leaving the real structure representation to space filling models (see Francoeur E.: The Forgotten Tool: The Design and Use of Molecular Models, Social Studies of Science, Vol 27, No, 1 (Feb 1997), 7-40).
- Benefits from differentiating the atoms rendering method from the bonds one
- Benefits from fusing different info using atoms and bonds
- Black&White rendering?
- Equally sized spheres? Using something different from spheres? Semi-transparent atoms or bonds?
This representation, called licorice or Dreiding model, contains less distractors compared to the normal ball-and-stick.
The shaded coloring (left) makes finding where the atoms are difficult. But could be an interesting method to show attributes, like charge, which vary smoothly on the whole structure (see below).
This representation is also well tuned to fuse a hypothesized molecular structure with the corresponding experimental electronic density maps. An example could be found on one of the various Crystallography & NMR System pages.
Here is an example of smooth mapping on a licorice model of an atom attribute which vary smoothly on the whole structure.
The line representation is less performance heavy compared to the previous one, but makes really difficult to see the three-dimensional structure.
Atoms rendered as spheres could be added, but the result is awful. All the three-dimensional structural information and depth cues are hardly perceivable.
In this simulation of ice melting the line representation provides the context showing the structure of solid ice. The parts that start melting are rendered as usual ball-and-stick colored by the coordination number.
Data from Davide Donadio – Computational Science, Department of Chemistry and Applied Biosciences, ETH Zürich.
The CPK representation, called also space filling, renders the atoms with spheres whose radius is the van der Waals atom radius.
Goal of this representation is to give an idea of the molecule external surface. But the multitude of borders present makes the perception of the surface difficult. The big spheres lead also to a high degree of occlusion, which can make understanding of details of the chemical structure or comprehension of phenomena, like superficial adsorption, difficult.
Originally the space filling representation was particularly suited for the study of “steric hindrance”, that is the way in which the volume of atoms imposes constraints both within and between molecules. But this original goal for the material realization of this model is lost on computer: the spheres could interfere and interpenetrate without any problem.
Another observation could be done on the model move from physical to graphical. The computer model preserves the appearance and problems of the physical one, without imitating the real virtue of the physical realization of this representation: the physical “feel” of the surface bumps, constrains and degrees of freedom of the structure (see Francoeur 1997 op cit.).
Strange coloring of a ball-and-stick model. In my opinion its author tried to replicate a model built with physical balls and physical sticks. Instead on the originating page there is a different explanation: “The rings are the circles on the atoms that highlight the bond connections. Two colors are used to define the rings, an inner and an outer color. The rings help to give a three dimensional appearance to the figure”. I’m still not entirely convinced.
The images (from the Chem-Ray site) show: Zinc Sulfide (left) and Molybdenum Acetate (right).
This is another creative rendering mode for bonds. It has been created with Garlic. I do not understand the usefulness of this rendering mode compared to plain cylinders. But with an appropriate selection of parameters it can simulate the molecule external surface as on the right image.
Here the 3D representation simply reproduces an old 2D rendering trick from the ORTEP program, the Oak Ridge Thermal Ellipsoid Program by Carroll Johnson (1976) (see image at right).
Images from Crystallography Centre, NIU at Galway.
- What else could enhance standard ball-and-stick without being gratuitous decoration?
A nice example of the combination of two different representations (space filling and licorice) to represent different actors in a chemical reaction: the space filling representation is a Cu(100) surface, instead the licorice one is a hexylbenzene molecule absorbed on the copper surface.
Image from the Nano@PolyMTL gallery (used with permission).
Double and triple bonds representation.
Normally bonds, and especially multiple ones, are computed by the visualization program, using inter-atomic distances, or set manually by the user because file formats often does not describe them. For example the PDB format describes only three types of bonds in its CONECT record:
- Normal bonds
- Hydrogen bonds
- Salt bridges
Some tools interpret CONECT records that define a bond more than once as specifying the bond order of that bond, i.e. a bond specified twice is a double bond and a bond specified three (or more) times is a triple bond. Anyway this is not a standard PDB feature.
- When is this added complexity useful?
- Is there any format that describes double or triple bonds? Yes: Tripos MOL2
Another example of non usual bonding that is difficult to render with standard visualization tools; they are ruthenium compounds from Dirk Deubel – ETH Zürich.
Other examples of strange bonded molecules could be find on the Molecules with Silly or Unusual Names page, for example a molecule that resembles a 3-legged piano stool.
Another example of aromatic bonds representation. This image has been created with BALLview on an ovalene structure.
A scalar value, like electrical charge, could be associated to atoms. The values are mapped to colors using a colormap (center and right).
Besides the usual rainbow colormap (center), another useful one is the blue-white-red colormap (right) to highlight the sign of the scalar value and the zones where this value is near zero.
Classical molecular dynamics simulation of argon atoms freezing. Here a scalar value associated to each atom represents its status. Atoms approaching the freezing threshold for this parameter turn less and less transparent. Frozen atoms are rendered as yellow spheres.
The clouds provide context for the frozen atoms visualization and highlight the transition between fluid and frozen.
Federica Trudu – Computational Science, Department of Chemistry and Applied Biosciences, ETH Zürich.
Another example of vector quantity associated to atoms is the magnetic spin, here represented by arrows.
Image created using FpStudio.
From E. A. Wiley, G. Deslongchamps: PostDock: A novel visualization tool for the analysis of molecular docking, Computing and Visualization in Science (online first). This is a nice representation of a set of docking candidates.
The representation combines color and transparency to show docking energy and pose RMSD. Here a less transparent ligand means lower docking energy and yellow means lower docking error. This way the less promising docking solution are darker and more transparent and so less visually obstructing.
The image legend says: "PostDock analysis of reverse-docking results with (a) naphthyl-liked receptor on left, and (b) fluorenyl-linked receptor on right. 9-EtA is shown in green ball-and-stick".
A trajectory can be represented by a sequence of static structures, one for each step, like this animated image from Crystallography Centre, NIU at Galway.
This representation, based on a set of disjointed structures, could evolve along three distinct paths:
- Compute global quantities that summarize or simplify the whole trajectory (see below)
- Use the atom positions as a set of timeseries and use standard timeseries analysis tools
- Consider the trajectory as a move in a multidimensional space and use multidimensional analysis tools (e.g. 3 atoms that evolve for 10 steps can be represented as a point that moves in a 9D space or a fixed point in a 90D space)
A summary representation of a whole trajectory could associate to each atom an ellipsoid that displays the volume where the corresponding atom moves during the entire trajectory.
Image on the right from Amira.
Here all the structures from a trajectory are combined together to summarize the whole dynamic data set (left). Older positions are rendered as fading structures.
This is an enhancement compared to motion representation as an overlapped series of snapshots only (right). Here colors only distinguish the timestep value.
Image on the left from Amira. The related publication is: Johannes Schmidt-Ehrenberg, Daniel Baum, Hans-Christian Hege: Visualizing Dynamic Molecular Conformations. Proceedings of IEEE Visualization 2002, p. 235–242, 2002.
Image on the right created instead with STM4 on alanine data.
The movement of an atom is traced by a line that could be color-coded by age of the corresponding positions (left) or perceptually enhanced to reveal its three-dimensional structure (right). The following movie is an example.
Problem with this representation is that it is too detailed and contains a lot of unnecessary information, like the harmonic oscillation of the atom around a long term path.
Simulation of a topaz crystal by Sergey Churakov – PSI.
Another method to show spatial occupancy, like the ellipsoid method, uses an accumulator grid. Here the positions of atoms in a trajectory are sampled by a regular grid of cells. Each cell accumulates the count of times an atom passes through it. The result is then visualized with a volume rendering technique.
Here the resulting shapes are more realistic compared to the ellipsoidal crude approximation.
Simulation of a topaz crystal by Sergey Churakov – PSI.
A set of structures could derive not only from a trajectory, but also from a set of conformers of a given structure. They can be combined and superimposed to highlight the differences between them or visualized using any of the previous techniques.
Here are the first 5 lowest energy conformers of the Thyroliberin TSH/PRL Releasing Hormone from the Ramon Soriano page.
This image is taken from an ab initio molecular dynamics simulation of the self-interstitial defect in silicon. This research is done at the Integrated Systems Laboratory of ETH – Zürich by Beat Sahli.
This visualization shows two interesting representations used together: a static reference lattice (gray) and the atoms of the trajectory colored by distance from the reference position. The technique is better view in the accompaining movie.
The usual problem with structural visualization is that there is “too much” visual information and all that information is perceived as having the same importance.
Instead the attention should be directed to the real important parts of the structure. In the example the important thing are the water molecules inside the nanotube, not the rest of them or even the nanotube itself.
Data from P. Koumoutsakos - Institute of Computational Science, ETH Zürich.
- How to identify, extract and isolate important structures?
This is a first example of simplified representations for visual complexity reduction.
Only the backbone of the protein is visualized here with alpha helices represented by cylinders. The color represents the corresponding residue (left).
On the right this kind of representation shows alpha-helices and beta-sheets, colored by chain.
The ribbon rendering is similar, it shows the protein backbone, but it uses additional information (the O of the protein backbone or some of the phosphate oxygen for nucleic acids) to find a normal for drawing the oriented ribbon.
- When is this representation useful?
Here is the combination of a simplifying backbone only representation with data about deviation from the mean structure.
The data is from a NMR ensemble of human MIA shown as a spline with variable radius, representing the RMS of the deviation from the mean structure. Image from a Ruhr-Universität Bochum project.
There are also a page with example and notes on this sausage view of a NMR ensemble structure.
The important rings in this structure are highlighted using red pentagons.
Simulation by Marcella Iannuzzi – ETH Zürich.
- How to extract rings.
- Other global structures highlighting methods: least-squares mean planes, etc.
- Other summarizing methods for groups of atoms or sets of surface vertices: for example they may be represented by ellipsoids that enclose them while having minimal volume.
Polyhedral representations are common in structural chemistry as they serve to elucidate the coordination around (metal) atoms and to highlight important global structural features.
In this example some (left) or all (right) diamond carbon atoms are replaced by tetrahedrons.
Another example of polyhedral usage to clarify a zeolite structure. The first two images are very artistic, but hide the orientation of the bonds of Si and Al atoms. Instead the last image reveals their orientation by using blue tetrahedra for Si atoms and red octahedra for Al atoms.
Again a faujasite structure made visible using polyhedron as provided by DRAWxtl.
The Solvent Excluded Surface (SES) simplifies and summarizes a complex molecule.
Here the molecule is the 1KPL from the Protein Data Bank colored by nearest chain. Unfortunately for complex molecules there are still too many details in this representation. Also the resulting pebbly surface reduces the effectiveness of the three-dimensional cues provided by object illumination techniques.
Molecule Solvent Excluded Surface coloring. On the left the charge from the nearest atom is mapped on the surface. Instead on the right the values came from the electronic density sampled on top of the same surface.
Another important variable usually mapped on top of the molecular surface is hidrophobicity. It is worth noting that this is not a volumetric variable sampled on an embedded surface, it is a variable related to nearest atoms.
A simple colored wireframe can be used in place of the transparent surface. But it is difficult to perceive the spatial structure of one-dimensional elements.
It is better to use neutral color cages with a low triangle density just to communicate the overall volume occupied by the molecule.
A first representation of volumetric scalar data is obtained using an isosurface computed at some interesting value (generally a value of 0.002 a.u. is used for electronic density for example). One problem is that it is difficulty to find the correct iso value that produces information-rich isosurfaces.
Data from Prof. Artem Oganov – ETH Zürich.
A volumetric scalar value could be sampled by the points of another surface. The sampled values are then mapped and color-coded on top of this surface.
A generic surface is used on the left; instead on the right the sampling surface is the Solvent Excluded Surface of the molecule itself.
Visualization of the structure of the π–complex of an ethylene molecule with a Ti catalytic center in the chemical reaction leading to the formation of polyethylene. A finite difference gradient estimator acts as a volume opacity multiplier to render homogeneous regions almost transparently.
The parallel volume rendering of the electronic density around the molecule is performed with the VTK toolkit.
Model and simulation by Mauro Boero, Dept. of Physics, University of Tsukuba and Michele Parrinello, Swiss National Supercomputing Centre.
Nested isosurfaces could help understanding the volumetric distribution of values and are computationally less demanding compared to volume rendering. But the result is difficult to perceive (left). It is better to use no more than three nested surfaces and to make them semi-transparent (right). And, if possible, use iso values that have some special meaning.
An orbital is visualized as an isosurface of the wavefunction for some scalar value (usually 0.07 a.u.) or for two values symmetric around zero. See for example those slides for more information. The two colors mark positive-negative values of the vawefunction.
It is interesting to note that orbitals visualization without showing the corresponding structural model does not make much sense. In other words this is an example of “data fusion”. Data fusion means visualizing together different data to show correlations, causal relationships or to provide interpretative context.
From the cover of ChemPhysChem 8/2002 the visualization of ring currents in presence of an external magnetic field for a high density supercritical system.
The vector field is represented around the molecules using the illuminated streamline technique.
Electronic ring currents in benzene represented on a slice of the volume around the molecule by vector arrows and LIC textures.
Data simulation by Daniel Sebastiani, Max Planck Institut.
Representation of tensor value, which characterize some liquid crystal property, using streamlines and isosurfaces.
From the paper presented at IEEE Visualization 2004: Slavin, Laidlaw, Pelcovits, Zhang, Loriot, Callan-Jones: Visualization of Topological Defects in Liquid Crystals.
One parameter associated to a trajectory. The image shows how the structural modifications are related to the parameter change along the trajectory. The movie shows this evolution clearly.
Data from Davide Donadio – Computational Science, Department of Chemistry and Applied Biosciences, ETH Zürich.
Trajectory on a two parameters landscape. The chart shows how the two parameters change along the trajectory. The movie shows the evolution better.
Data from Matteo Ceccarelli – ETH Zürich.
Here the evolution involves three parameters.
Data from Francesco Gervasio – Computational Science, Department of Chemistry and Applied Biosciences, ETH Zürich. Reference: Gervasio, F L et. al Chem Eur J 10 4846 (2004)
- Color coding by group/function.
- Primitive/conventional cells.
- Techniques to show differences between structures.
- Techniques to show intermolecular interactions (like contacts).
- Techniques to show uncertainly and errors.
Around there are various galleries of chemistry visualization methods. Here are the ones I have found:
- Balloon Molecules site show how an entirely unrelated tool (party balloons) could be turned into an effective visualization tool. Clever, nice and funny!
- Examples from IBM Data Explorer
- The Scientific and Artistic Uses of Molecular Surfaces
- The Representation of Molecular Models site
- Molecular Representations in VMD
The following galleries provide nice examples of various chemistry visualizations:
- Gallery of Biomolecular Simulations
- Sample figures generated using Raster3D
- Making Matter: the atomic structure of materials collects representations used for crystallography
The Online Macromolecular Museum is not, strictly speaking, a gallery of representation methods, but it is a truly amazing example of chemistry visualization usage for communication and explanation.
Another example is the Virtual Cell Animation Collection. It combines animation and computer graphic to explain and clarify biological processes.