Mario Valle Web

Mistakes, mysteries and errors in chemistry visualization

Experimenting with chemistry visualization tools and preparing lectures give me the unique opportunity in founding errors in data and in visualization tools. Maybe sometimes it is better to call them “unexpected outputs to non-experts”. My interest here is to underline the necessity of look critically to the visualization outputs, not to blame specific tools or researchers.

Note that I’m not a chemist, so maybe I mistakenly take some perfectly reasonable output as an error. Please let me know, so I can learn something and correct the mistake.

The first section here is devoted to errors created by visualization tools on (seems) valid data files. Instead in the second section I collected erroneous data files found around.

Visualization tools errors

A five bonded Carbon?

I’m not a chemist, so looking at this image produced by ArgusLab that shows the C2 carbon with two strange (to me) bonds, I immediately said I have found a 5 bonded Carbon. Well, turned out that this is the common idiom for representing “resonant bonds”, something with an order between 1 and 2. Anyway, in no place I found a section about meaning of graphical rendering of bonds by visualization tools. I repeat, I’m not even a beginner, but the same mistake could be done by a student using visualization programs at school.

The data is the MMFF94.mol2 dataset distributed with the commercial tool VIDA (from Open Eye Scientific Software). Same result with compound ZINC00000036 from the ZINC database.

…or maybe is it 4½ bonded?

OK, I have learned something new and now what are “resonant bonds” is clear to me. So I loaded another test file in ArgusLab (left) and Molekel (right). The results are similar, but now if I count bonds on the left image I arrive at 4½ on the leftmost Carbon. Is it time to learn more, or are the tools playing with me?

Symmetry above all!

Molekel seems to prefer symmetric rings ignoring the fact that they cannot exist. One example is the PDB file with ID: 3POR

Two different cartoons representations for the same molecule?

The same data (PDB: 1CRN.pdb) has been rendered in Chimera using two different forms of the cartoon representation (left). Small deviations are not significant, but the terminus on the right is simply displaced too much.

Instead VMD (right) at least has self-consistent representations (Cartoon and New Cartoon). But those representations, again at the right terminus, are dissimilar from the previous ones.

A multiple-bonds orgy.

The attached PDB (or MOL) file has been converted from a Gaussian01 log file (for which I forgot the provenance, sorry). ArgusLab (left) and Molegro Virtual Docker (right) have a different idea on what to bond. Also look at the usual 5-bonded carbon.

Multiple bonds, the sequel.

The PDB file from the previous episode give two interesting outputs in STM4.

On the left what is read from the CONECT records in the PDB file. On the right what has been recomputed by STM4 after upping by 20% the search range for bonds formation. Now Molybdenum atoms makes “metallocene-like” bonds with the two rings and the Phosphor atom creates its standard 4 bonds.

The molecular exterminator strikes again!

Seems the above structure continues uncovering problems in visualization tools.

Here is the result from Accelrys Discovery Studio Visualizer (left) compared to STM4 (right). Note the half broken ring and some other Carbon-Carbon bond missing on the right.

The fun starts when you push “Add Hydrogen”…

Crowded Hydrogen.

…and look Hydrogen atoms stepping one on the other.

Well, this function does its honest work, simply the data it receives is inconsistent due to missing bonds.

Triangular Hydrogen.

Look at the output of DeepView when visualizing our old friend. It tries to warn you about HETATM and so on, but the visible result is again a strange bonding of Hydrogen atoms (the triangles on the right).

Lone stranger.

Various visualization programs have problem with this URIDINE-2',3'-VANADATE residue extracted from PDB entry 6RSA. They fail to recognize D and V atoms; they cannot bond the V atom (see left image made with VMD). The 2D structure at right is here for reference.

Errors in public data files

Chameleonic Tryptophan.

In the 7GPB.pdb file seems the TRP residues have too much shape variability. See for example the residue 67 in chain D (left) and another one like residue 244 in chain D (right). Shouldn’t the two rings be coplanar?

This example is taken from the excellent Practical Model Validation – EMBO Bioinformatics Course

Escaping Hydrogen?

In PDB entry 1EW1 model 4, the Z coordinate of atom 60 has inexplicably changed sign leaving this poor Hydrogen atom alone in the dark.

Left or right camphor?

Roald Hoffmann told us another interesting story about blindly accepted structures in his paper: Representation in Chemistry (by Roald Hoffmann and Pierre Laszlo, Angew. Chem. Int. Ed. Engl. 30 (1991) 1-16):

“A further story needs to be told of camphor. We picked this molecule as one recognizable to the public, but carrying within it minimal complexities of representation. One of us (R. H.), having forgotten its structure, checked it in a textbook. then specified to some friends the geometry needed to produce the beautiful drawings of camphor in this paper. Every one of them was the mirror image of what you see, which is the naturally occurring, dextrorotatory material (1R,4R in configuration)! That we had the wrong absolute configuration was pointed out to us by a careful reader, Ryoji Noyori, the 1990 Baker Lecturer at Cornell. A literature search then revealed the wrong configuration disported by many, if not most textbooks, the Merck Index, and numerous literature papers, such as the important one by G. M. Whitesides and D. W Lewis, (J. Am. Chem. Soc. 92 (1970) 6979) on the use of an NMR shift reagent to determine enantiomeric purity. The structure is correctly given in the Sigma, Aldrich, and Fluka catalogues and the references to the assignment may be found in the following handbook: W. Klyne, J. Buckingham: Atlas of Stereorhemistry Vol. 1, Chapman and Hall. London 1978, p. 85”

Here the natural occurring structure is on the left, instead the wrong representation, from “The Merck Index – 6”, is on the right. Data files with the correct structure are fortunately present on 3Dchem.com and Chemie-Index (FU Berlin).

Chirality, where are you?

PDB entry 5RXN contains three threonine residues, named 5, 7 and 28. Inspect these three residues with your favorite graphics program. Does anything strike you as odd?

Yes. Thr 7 (left) and 28 have proper chirality for CB, but residue 5 (right) has incorrect CB chirality.

Here are residues THR 7 and THR 5.

This example is taken from the excellent Practical Model Validation – EMBO Bioinformatics Course

Creating atoms!

Look up Gly 126 in the PDB file (i.e., not the structure) of entry 1VNS. Do you notice anything peculiar?

Yes… this Glycine residue has somehow managed to sprout a CB atom!

Here on the left a generic Glycine, on the right GLY 126 from 1VNS.

This example is taken from the excellent Practical Model Validation – EMBO Bioinformatics Course

I want to be a leucine!

Using your favorite graphics program, compare aspartates C168 and C169 in PDB entry 1DLP. Does anything strike you as odd?

Yes. Aspartate C168 is obviously wrong: the CG is sp2-hybridised and should therefore be in one plane with its three neighbors (CB, OD1, OD2). Instead it has been given a Leucine-like conformation.

This example is taken from the excellent Practical Model Validation – EMBO Bioinformatics Course

It is not an error! It is a warning sign

The two PDB files 1R69 and 1R63 represent the same structure (phage 434 repressor domain) determined with two different methods: NMR (1R63, blue in the image) and X-Ray (1R69, green in the image).

Obviously there are differences! This image simply wants to be a warning sign: take visualized structures cum granus salis. That is: think before critically accept some visualization output.