Here are, without any order, some ideas I have tried and collected during STM3 development. This page serves only as a repository, for now.
The user environment was Linux 7.2 with gcc 2.96 and AVS/Express 6.2 – RH7
-fPIC -O3 -funroll-loops -fomit-frame-pointer -ffast-math -march=pentium -msse -mcpu=pentiumproI don't have benchmarked my code to see the changes originated from each switch.
-DEXPRESS -D_XOPEN_SOURCE -D_BSD_SOURCE -D_REENTRANT -fPIC -march=pentium -mcpu=pentiumpro -OSo I hope to do better then this default.
sin()/cos(). How can I
check if they are builtin or regular functions? (using gcc 2.96)sincos() that compute
together sin and cos of the same angle. The only method I have found to
enable it is to define __USE_GNU before #include
<math.h>. Is this the correct way to use it?for loops to
copy/fill memory with memcpy() memset() routines. From the
string.h include file seems that a "turbo" version will be used if you
define __USE_STRING_INLINES before including <string.h>But anyway the best optimization has been to change the DrawBonds algorithm! The brute force approach was ok for small molecules, but the time is O(n2). The real lesson is to test with varying load levels after the code is working.
Also I have provided various rendering options to balance visual quality with speed.
Another optimization is not to execute unneeded operations (like compute H bonds if no H are present in the input file).
To allocate a dynamic list I was doing something like this inside a loop:
num_bonds = num_bonds + 1; int *bond_type_lst_arr = (int *)bond_type_lst.ret_array_ptr(OM_GET_ARRAY_RW); bond_type_lst_arr[num_bonds-1] = H_BOND; ARRfree(bond_type_lst_arr);
This was a real performance killer. Then I changed it to:
bond_type_lst_vect.push_back(int(H_BOND)); followed by a copy to bond_type_lst_arr outside the loop
And, together with the search algorithm change, my timing went from 48 sec to 0.7 sec. Amazing!
--file-ordering map_file
The --file-ordering option causes "gprof" to print a suggested .o link line ordering for the program based on profiling data. This option suggests an ordering which may improve paging, tlb and cache behavior for the program on systems which do not support arbitrary ordering of functions in an executable.
Use of the -a argument is highly recommended with this option.
The map_file argument is a pathname to a file which provides function name to object file mappings. The format of the file is similar to the output of the program "nm".
To create a map_file with GNU "nm", type a command like:
"nm --extern-only --defined-only -v --print-file-name program-name"
Not tried yet.
Compiling with:
-Wall -pedantic
is really too much. So it is better to use (-O to catch uninitializated variables usage):
-Wall -0
Another exercise is to compile both on Linux and Windows.
Use lint, splint, indent to catch dubious code format.