STM3 development ideas

Here are, without any order, some ideas I have tried and collected during STM3 development. This page serves only as a repository, for now.

The user environment was Linux 7.2 with gcc 2.96 and AVS/Express 6.2 – RH7

Optimization

Use the profiler to understand where to focalize the efforts

Simply add -pg to the compilation line (and remove -fomit-frame-pointer)
The suggested -a makes AVS/Express crash at startup
Report with gprof -p (flat profile). This is the analysis starting point.
No way to have annotated source profiling! The -a option makes the application crash on startup

Tune gcc options

Now I use:

-fPIC -O3 -funroll-loops -fomit-frame-pointer -ffast-math -march=pentium -msse -mcpu=pentiumpro

I don't have benchmarked my code to see the changes originated from each switch.

The default compilation command generated by AVS/Express is:
```
-DEXPRESS -D_XOPEN_SOURCE -D_BSD_SOURCE -D_REENTRANT -fPIC -march=pentium -mcpu=pentiumpro -O
```
So I hope to do better then this default.

Builtin function usage

A critical piece of code uses sin()/cos(). How can I check if they are builtin or regular functions? (using gcc 2.96)
Found a builtin function called sincos() that compute together sin and cos of the same angle. The only method I have found to enable it is to define __USE_GNU before #include <math.h>. Is this the correct way to use it?
Another area of improvement is to change for loops to copy/fill memory with memcpy() memset() routines. From the string.h include file seems that a "turbo" version will be used if you define __USE_STRING_INLINES before including <string.h>

The real optimizations

But anyway the best optimization has been to change the DrawBonds algorithm! The brute force approach was ok for small molecules, but the time is O(n²). The real lesson is to test with varying load levels after the code is working.

Also I have provided various rendering options to balance visual quality with speed.

Another optimization is not to execute unneeded operations (like compute H bonds if no H are present in the input file).

Another AVS/Express performance killer

To allocate a dynamic list I was doing something like this inside a loop:

num_bonds = num_bonds + 1;
int *bond_type_lst_arr = (int *)bond_type_lst.ret_array_ptr(OM_GET_ARRAY_RW);
bond_type_lst_arr[num_bonds-1] = H_BOND;
ARRfree(bond_type_lst_arr);

This was a real performance killer. Then I changed it to:

bond_type_lst_vect.push_back(int(H_BOND));
followed by a copy to bond_type_lst_arr outside the loop

And, together with the search algorithm change, my timing went from 48 sec to 0.7 sec. Amazing!

Esoteric options

--file-ordering map_file

The --file-ordering option causes "gprof" to print a suggested
.o link line ordering for the program based on profiling data.
This option suggests an ordering which may improve paging, tlb
and cache behavior for the program on systems which do not
support arbitrary ordering of functions in an executable.

Use of the -a argument is highly recommended with this option.

The map_file argument is a pathname to a file which provides
function name to object file mappings. The format of the file
is similar to the output of the program "nm".

To create a map_file with GNU "nm", type a command like:

"nm --extern-only --defined-only -v --print-file-name program-name"

Not tried yet.

Code cleaning

Compiling with:

-Wall -pedantic

is really too much. So it is better to use (-O to catch uninitializated variables usage):

-Wall -0

Another exercise is to compile both on Linux and Windows.

Use lint, splint, indent to catch dubious code format.