Data representation in chemistry

Chemists usually deal with a narrow range of data types. In this page I will try to understand them.
Chemistry visualization uses a limited, traditional set of representations for those data.
I am convinced that more complex data types and unusual representations could help bring new insights in the chemistry research.
Otherwise the risk is this: “A stagnant set of representations limits the way scientists think about their models and thereby limits potential insights”.

So I collected this material as one of the inputs for a research underway about representation in chemistry and the support or constrain provided by the current chemistry visualization tools.

My hope is that this work could help me in creating and proposing new visualization tools and techniques. And, maybe, catalyze new ideas for the chemists I collaborate with.

What you find here

Here you find a collection of images from various chemistry visualization tools together with open questions about their usefulness and their specific goals.

At the end a list of missing things that could be added and a collection of new and unusual representations found here and there. Plus a list of representation ideas.

Usual disclaimer

I am not a chemist, but work with them. I eagerly try to learn new things in this field, so mistakes and misunderstanding are only mine.

To make this page more useful to you, I need your inputs, ideas and pointers. Do not be shy, send them to me. Thanks!

Chemistry data types

Chemistry visualization deals with a narrow range of data types. The most important are:

Structural information: E.g. atoms and bonds plus associated attributes like charge. The structures could be time dependent to show reactions or evolution of phenomena.
Volumetric data: Usually scalar variables like electron density, ELF, orbitals. A less usual, but similar data type is volumetric vector data found, for example, in electronic currents or magnetic lines.
1D scalar: E.g. spectra, energy vs. simulation step.
2D scalar: E.g. COSY and NOESY spectra, two parameters energy landscapes.

Chemistry data representations

After a brief discussion about why a chemist should be interested or not interested in 3D representations, the various images are grouped as:

Pure structural – various forms of ball-and-stick
- A digression on bonds representation
Structural with associated data
Time dependent structural
Summarizing methods
- An important tool is the computation of molecular surfaces
Volumetric data
- Scalar data
- Vector data
Non volumetric data
- 1D scalar
- 2D scalar

2D or 3D?

In chemistry 2D structural representations are still king. Here are some relative merits and problems of 2D vs. 3D structural representations.

2D	3D
	Pros: Show complete structure Easy recognition of patterns Chemists know how good structures look like Cons: Removes too much information from the real structure Make impossible spatial matching of structures		Pros: All available structural information are present Understand shapes See what would be hidden in a 2D view Cons: Limited to viewing part of structure Unsuited for quick comparisons Needs interaction to avoid ambiguities

Pros:

Show complete structure
Easy recognition of patterns
Chemists know how good structures look like

Cons:

Removes too much information from the real structure
Make impossible spatial matching of structures

Pros:

All available structural information are present
Understand shapes
See what would be hidden in a 2D view

Cons:

Limited to viewing part of structure
Unsuited for quick comparisons
Needs interaction to avoid ambiguities

Pure structural

Ball-and-Stick is the most fundamental and common representation. The atoms are normally colored according to atom type and the bonds simply echo this information.

Atoms and bonds coloring could be differentiated to carry additional information to be fused together with the structural one. For example bonds could be colored by bond strain or bond dipole strength.

The bonds coloring could help or hinder structure perception. The two color form (middle) increases the number of sharp edges in the image, thus increasing distraction, compared to the smooth one (left). On the other hand this form helps distinguish atoms. The neutral form (right) can avoid distractions, leaving the focus on the atoms.

There are various proposals of color schemes for ball-and-stick representation.

The visualization of bonds can lead to high visual clutter in overview renderings, but enables the user to judge the exact alignment and chemical makeup of chemical structures in close-up views. Remember also that this representation is the most effective for teaching: it is well tuned to the mental model students usually already have.

The original function of physical ball and stick models were the support of measurements of structure angles and bonds lengths, leaving the real structure representation to space filling models (see Francoeur E.: The Forgotten Tool: The Design and Use of Molecular Models, Social Studies of Science, Vol 27, No, 1 (Feb 1997), 7-40).

Investigate:

Benefits from differentiating the atoms rendering method from the bonds one
Benefits from fusing different info using atoms and bonds
Black&White rendering?
Equally sized spheres? Using something different from spheres? Semi-transparent atoms or bonds?

This representation, called licorice or Dreiding model, contains less distractors compared to the normal ball-and-stick.

The shaded coloring (left) makes finding where the atoms are difficult. But could be an interesting method to show attributes, like charge, which vary smoothly on the whole structure (see below).

This representation is also well tuned to fuse a hypothesized molecular structure with the corresponding experimental electronic density maps. An example could be found on one of the various Crystallography & NMR System pages.

Here is an example of smooth mapping on a licorice model of an atom attribute which vary smoothly on the whole structure.

[line representation with split bonds color]

The line representation is less performance heavy compared to the previous one, but makes really difficult to see the three-dimensional structure.

Atoms rendered as spheres could be added, but the result is awful. All the three-dimensional structural information and depth cues are hardly perceivable.

In this simulation of ice melting the line representation provides the context showing the structure of solid ice. The parts that start melting are rendered as usual ball-and-stick colored by the coordination number.

Data from Davide Donadio – Computational Science, Department of Chemistry and Applied Biosciences, ETH Zürich.

The CPK representation, called also space filling, renders the atoms with spheres whose radius is the van der Waals atom radius.

Goal of this representation is to give an idea of the molecule external surface. But the multitude of borders present makes the perception of the surface difficult. The big spheres lead also to a high degree of occlusion, which can make understanding of details of the chemical structure or comprehension of phenomena, like superficial adsorption, difficult.

Originally the space filling representation was particularly suited for the study of “steric hindrance”, that is the way in which the volume of atoms imposes constraints both within and between molecules. But this original goal for the material realization of this model is lost on computer: the spheres could interfere and interpenetrate without any problem.

Another observation could be done on the model move from physical to graphical. The computer model preserves the appearance and problems of the physical one, without imitating the real virtue of the physical realization of this representation: the physical “feel” of the surface bumps, constrains and degrees of freedom of the structure (see Francoeur 1997 op cit.).

For comparison, here is the same molecule represented as CPK and with its Solvent Excluded Surface.

Strange coloring of a ball-and-stick model. In my opinion its author tried to replicate a model built with physical balls and physical sticks. Instead on the tool page (no more existing) there is a different explanation: “The rings are the circles on the atoms that highlight the bond connections. Two colors are used to define the rings, an inner and an outer color. The rings help to give a three dimensional appearance to the figure”. I’m still not entirely convinced.

The images (from the dead Chem-Ray site) show: Zinc Sulfide (left) and Molybdenum Acetate (right).

This is another creative rendering mode for bonds. It has been created with Garlic. I do not understand the usefulness of this rendering mode compared to plain cylinders. But with an appropriate selection of parameters it can simulate the molecule external surface as on the right image.

Here the 3D representation simply reproduces an old 2D rendering trick from the ORTEP program, the Oak Ridge Thermal Ellipsoid Program by Carroll Johnson (1976) (see image at right).

Images from Crystallography Centre, NIU at Galway.

Investigate:

What else could enhance standard ball-and-stick without being gratuitous decoration?

A nice example of the combination of two different representations (space filling and licorice) to represent different actors in a chemical reaction: the space filling representation is a Cu(100) surface, instead the licorice one is a hexylbenzene molecule absorbed on the copper surface.

Image from the Nano@PolyMTL gallery (used with permission).

Bonds

Double and triple bonds representation.

Normally bonds, and especially multiple ones, are computed by the visualization program, using inter-atomic distances, or set manually by the user because file formats often does not describe them. For example the PDB format describes only three types of bonds in its CONECT record:

Normal bonds
Hydrogen bonds
Salt bridges

Some tools interpret CONECT records that define a bond more than once as specifying the bond order of that bond, i.e. a bond specified twice is a double bond and a bond specified three (or more) times is a triple bond. Anyway this is not a standard PDB feature.

Investigate:

When is this added complexity useful?
Is there any format that describes double or triple bonds? Yes: Tripos MOL2

The ferrocene molecule has an entirely different kind of bond.

The standard representation (left) is from the Molecule of the month at the University of Oxford. The more correct representation (right) is from the Crystallography Centre, NIU at Galway.

Another example of non usual bonding that is difficult to render with standard visualization tools; they are ruthenium compounds from Dirk Deubel – ETH Zürich.

Other examples of strange bonded molecules could be find on the Molecules with Silly or Unusual Names page, for example a molecule that resembles a 3-legged piano stool.

Rendering of hydrogen bonds.

On the left a Xylose net bounded together by H-bonds. Here the H-bonds are represented with thin neutrally colored tubes. This image is from the dead Chem-Ray site. On the right the H-bonds are rendered as dashed lines. Image created with STM4.

Rendering of other special bond types.

The image on the left is from the Crystallography Centre, NIU at Galway. The one on the right is from the JOELib library site.

Investigate:

Is there any rendering examples of other more exotic bonds, like sulphur bridges?
Are them needed and useful?

Another example of aromatic bonds representation. This image has been created with BALLview on an ovalene structure.

Structural with associated data

[atoms with associated charge (other colormap)]

A scalar value, like electrical charge, could be associated to atoms. The values are mapped to colors using a colormap (center and right).

Besides the usual rainbow colormap (center), another useful one is the blue-white-red colormap (right) to highlight the sign of the scalar value and the zones where this value is near zero.

Classical molecular dynamics simulation of argon atoms freezing. Here a scalar value associated to each atom represents its status. Atoms approaching the freezing threshold for this parameter turn less and less transparent. Frozen atoms are rendered as yellow spheres.

The clouds provide context for the frozen atoms visualization and highlight the transition between fluid and frozen.

Federica Trudu – Computational Science, Department of Chemistry and Applied Biosciences, ETH Zürich.

Some computational code (specifically Gaussian) produces vibration vectors for each atom in the structure grouped by vibration mode and frequency. The result can be animated to communicate better the kind of movement involved.

Image produced using Molekel.

Another example of vector quantity associated to atoms is the magnetic spin, here represented by arrows.

Image created using FpStudio.

From E. A. Wiley, G. Deslongchamps: PostDock: A novel visualization tool for the analysis of molecular docking, Computing and Visualization in Science (online first). This is a nice representation of a set of docking candidates.

The representation combines color and transparency to show docking energy and pose RMSD. Here a less transparent ligand means lower docking energy and yellow means lower docking error. This way the less promising docking solution are darker and more transparent and so less visually obstructing.

The image legend says: "PostDock analysis of reverse-docking results with (a) naphthyl-liked receptor on left, and (b) fluorenyl-linked receptor on right. 9-EtA is shown in green ball-and-stick".

Time dependent structural

A trajectory can be represented by a sequence of static structures, one for each step, like this animated image from Crystallography Centre, NIU at Galway.

This representation, based on a set of disjointed structures, could evolve along three distinct paths:

Compute global quantities that summarize or simplify the whole trajectory (see below)
Use the atom positions as a set of timeseries and use standard timeseries analysis tools
Consider the trajectory as a move in a multidimensional space and use multidimensional analysis tools (e.g. 3 atoms that evolve for 10 steps can be represented as a point that moves in a 9D space or a fixed point in a 90D space)

Sometime a static structure is animated just to increase the 3D scene perception. This is called rocking. This technique could be used to create an illusion of 3D without any special device.

A summary representation of a whole trajectory could associate to each atom an ellipsoid that displays the volume where the corresponding atom moves during the entire trajectory.

Image on the right from Amira.

[time dependent structure by fading superposition]

Here all the structures from a trajectory are combined together to summarize the whole dynamic data set (left). Older positions are rendered as fading structures.

This is an enhancement compared to motion representation as an overlapped series of snapshots only (right). Here colors only distinguish the timestep value.

Image on the left from Amira. The related publication is: Johannes Schmidt-Ehrenberg, Daniel Baum, Hans-Christian Hege: Visualizing Dynamic Molecular Conformations. Proceedings of IEEE Visualization 2002, p. 235–242, 2002.

Image on the right created instead with STM4 on alanine data.

The movement of an atom is traced by a line that could be color-coded by age of the corresponding positions (left) or perceptually enhanced to reveal its three-dimensional structure (right). The following movie is an example.

Problem with this representation is that it is too detailed and contains a lot of unnecessary information, like the harmonic oscillation of the atom around a long term path.

Simulation of a topaz crystal by Sergey Churakov – PSI.

Another method to show spatial occupancy, like the ellipsoid method, uses an accumulator grid. Here the positions of atoms in a trajectory are sampled by a regular grid of cells. Each cell accumulates the count of times an atom passes through it. The result is then visualized with a volume rendering technique.

Here the resulting shapes are more realistic compared to the ellipsoidal crude approximation.

Simulation of a topaz crystal by Sergey Churakov – PSI.

A set of structures could derive not only from a trajectory, but also from a set of conformers of a given structure. They can be combined and superimposed to highlight the differences between them or visualized using any of the previous techniques.

Here are the first 5 lowest energy conformers of the Thyroliberin TSH/PRL Releasing Hormone from the Ramon Soriano page.

This image is taken from an ab initio molecular dynamics simulation of the self-interstitial defect in silicon. This research is done at the Integrated Systems Laboratory of ETH – Zürich by Beat Sahli.

This visualization shows two interesting representations used together: a static reference lattice (gray) and the atoms of the trajectory colored by distance from the reference position. The technique is better view in the accompaining movie.

Summarization and simplification methods

The usual problem with structural visualization is that there is “too much” visual information and all that information is perceived as having the same importance.

Instead the attention should be directed to the real important parts of the structure. In the example the important thing are the water molecules inside the nanotube, not the rest of them or even the nanotube itself.

Data from P. Koumoutsakos - Institute of Computational Science, ETH Zürich.

Investigate:

How to identify, extract and isolate important structures?

This is a first example of simplified representations for visual complexity reduction.

Only the backbone of the protein is visualized here with alpha helices represented by cylinders. The color represents the corresponding residue (left).

On the right this kind of representation shows alpha-helices and beta-sheets, colored by chain.

Backbone only (represented by an N-Cα-C tube) plus the usual licorice representation (left, done with STM4) or using the trace style colored by chain (right, done with VMD).

The ribbon rendering is similar, it shows the protein backbone, but it uses additional information (the O of the protein backbone or some of the phosphate oxygen for nucleic acids) to find a normal for drawing the oriented ribbon.

Investigate:

When is this representation useful?

Here is the combination of a simplifying backbone only representation with data about deviation from the mean structure.

The data is from a NMR ensemble of human MIA shown as a spline with variable radius, representing the RMS of the deviation from the mean structure. Image from a Ruhr-Universität Bochum project.

There are also a page with example and notes on this sausage view of a NMR ensemble structure.

The important rings in this structure are highlighted using red pentagons.

Simulation by Marcella Iannuzzi – ETH Zürich.

Investigate:

How to extract rings.
Other global structures highlighting methods: least-squares mean planes, etc.
Other summarizing methods for groups of atoms or sets of surface vertices: for example they may be represented by ellipsoids that enclose them while having minimal volume.

Polyhedral representations are common in structural chemistry as they serve to elucidate the coordination around (metal) atoms and to highlight important global structural features.

In this example some (left) or all (right) diamond carbon atoms are replaced by tetrahedrons.

Another example of polyhedral usage to clarify a zeolite structure. The first two images are very artistic, but hide the orientation of the bonds of Si and Al atoms. Instead the last image reveals their orientation by using blue tetrahedra for Si atoms and red octahedra for Al atoms.

Images done with STM4 on Pyrope structure from ICSD database.

Again a faujasite structure made visible using polyhedron as provided by DRAWxtl.

Molecular surfaces

The Solvent Excluded Surface (SES) simplifies and summarizes a complex molecule.

Here the molecule is the 1KPL from the Protein Data Bank colored by nearest chain. Unfortunately for complex molecules there are still too many details in this representation. Also the resulting pebbly surface reduces the effectiveness of the three-dimensional cues provided by object illumination techniques.

Here the Solvent Excluded Surface is used to simplify the DNA molecule representation to highlight the molecule bounded to its grove.

[Electronic density mapped on top of the SES]

Molecule Solvent Excluded Surface coloring. On the left the charge from the nearest atom is mapped on the surface. Instead on the right the values came from the electronic density sampled on top of the same surface.

Another important variable usually mapped on top of the molecular surface is hidrophobicity. It is worth noting that this is not a volumetric variable sampled on an embedded surface, it is a variable related to nearest atoms.

The molecular surface can be made semi-transparent to show the generating structure. But it is difficult to grasp the three dimensional shape of a semi-transparent surface. The addition of a texture on top of it helps surface structure perception.

The use of points to mark the surface has historical roots (see Connolly surfaces). But it is also an unobtrusive method to show the surface interior without obscuring it.

A simple colored wireframe can be used in place of the transparent surface. But it is difficult to perceive the spatial structure of one-dimensional elements.

It is better to use neutral color cages with a low triangle density just to communicate the overall volume occupied by the molecule.

Volumetric scalar data

A first representation of volumetric scalar data is obtained using an isosurface computed at some interesting value (generally a value of 0.002 a.u. is used for electronic density for example). One problem is that it is difficulty to find the correct iso value that produces information-rich isosurfaces.

Data from Prof. Artem Oganov – ETH Zürich.

[isosurface with second variable mapped]

On top of an isosurface a second value could be mapped to represent a different scalar value. With this method the correlation between the two variables could be seen.

The volume can be cut along interesting planes to show the scalar variable on top of them.

A volumetric scalar value could be sampled by the points of another surface. The sampled values are then mapped and color-coded on top of this surface.

A generic surface is used on the left; instead on the right the sampling surface is the Solvent Excluded Surface of the molecule itself.

Volume rendering of the scalar variable. In this technique the values are mapped not only to colors, but also to transparency.

Visualization of the structure of the π–complex of an ethylene molecule with a Ti catalytic center in the chemical reaction leading to the formation of polyethylene. A finite difference gradient estimator acts as a volume opacity multiplier to render homogeneous regions almost transparently.

The parallel volume rendering of the electronic density around the molecule is performed with the VTK toolkit.

Model and simulation by Mauro Boero, Dept. of Physics, University of Tsukuba and Michele Parrinello, Swiss National Supercomputing Centre.

Nested isosurfaces could help understanding the volumetric distribution of values and are computationally less demanding compared to volume rendering. But the result is difficult to perceive (left). It is better to use no more than three nested surfaces and to make them semi-transparent (right). And, if possible, use iso values that have some special meaning.

Image on the left from the Ruhr-Universität Bochum, Germany VMD tutorial. Image on the right done with STM4 on Prof. Artem Oganov data.

An orbital is visualized as an isosurface of the wavefunction for some scalar value (usually 0.07 a.u.) or for two values symmetric around zero. See for example those slides for more information. The two colors mark positive-negative values of the vawefunction.

It is interesting to note that orbitals visualization without showing the corresponding structural model does not make much sense. In other words this is an example of “data fusion”. Data fusion means visualizing together different data to show correlations, causal relationships or to provide interpretative context.

Volumetric vector data

From the cover of ChemPhysChem 8/2002 the visualization of ring currents in presence of an external magnetic field for a high density supercritical system.

The vector field is represented around the molecules using the illuminated streamline technique.

Electronic ring currents in benzene represented on a slice of the volume around the molecule by vector arrows and LIC textures.

Data simulation by Daniel Sebastiani, Max Planck Institut.

Representation of tensor value, which characterize some liquid crystal property, using streamlines and isosurfaces.

From the paper presented at IEEE Visualization 2004: Slavin, Laidlaw, Pelcovits, Zhang, Loriot, Callan-Jones: Visualization of Topological Defects in Liquid Crystals.

1D scalar

One of the various 1D data types used in chemistry: NMR spectra.

One parameter associated to a trajectory. The image shows how the structural modifications are related to the parameter change along the trajectory. The movie shows this evolution clearly.

Data from Davide Donadio – Computational Science, Department of Chemistry and Applied Biosciences, ETH Zürich.

2D scalar

Trajectory on a two parameters landscape. The chart shows how the two parameters change along the trajectory. The movie shows the evolution better.

Data from Matteo Ceccarelli – ETH Zürich.

Here the evolution involves three parameters.

Data from Francesco Gervasio – Computational Science, Department of Chemistry and Applied Biosciences, ETH Zürich. Reference: Gervasio, F L et. al Chem Eur J 10 4846 (2004)

Two examples of chemistry 2D data type: NOESY NMR spectra (left from HomoSpoil Experiments in AVANCE) and a 2D-IR difference spectra (right from Infrared Analogues of NMR).

Another example of 2D data is from genomics: two sequence comparison using the DotPlot that shows auto-correlation of residues in a protein. This image shows the DotPlot for a protein with internal repeats.

Things missing?

Color coding by group/function.
Primitive/conventional cells.
Techniques to show differences between structures.
Techniques to show intermolecular interactions (like contacts).
Techniques to show uncertainly and errors.

Other galleries

Around there are various galleries of chemistry visualization methods. Here are the ones I have found:

Balloon Molecules site show how an entirely unrelated tool (party balloons) could be turned into an effective visualization tool. Clever, nice and funny!
Examples from IBM Data Explorer
The Scientific and Artistic Uses of Molecular Surfaces
The Representation of Molecular Models site
Molecular Representations in VMD

The following galleries provide nice examples of various chemistry visualizations:

Gallery of Biomolecular Simulations
Sample figures generated using Raster3D
Making Matter: the atomic structure of materials collects representations used for crystallography

The Online Macromolecular Museum is not, strictly speaking, a gallery of representation methods, but it is a truly amazing example of chemistry visualization usage for communication and explanation.

Another example is the Virtual Cell Animation Collection. It combines animation and computer graphic to explain and clarify biological processes.

The last entry is Molecules in Motion that contains a lot of Jmol supported animations. And a comprehensive guide to the history of atomic structures.