Programming environment

Finding the right approach to resolve the potential conflict between design and programming is one of the hardest tasks to tackle. Several questions need to be addressed when developing algorithms for a sequence assembler:

execution speed
memory requirements
reliability
portability
implementation ease

In fact, even nowadays no existing combination of language and compiler provides an optimum coverage of the points mentioned above, they all have positive and negative aspects. The final decision to choose C++ as implementation language was in taken in 1997 because it offers a reasonable mix of the stipulated requirements.

The common base of C and C++ compilers and its backing as a high profile language led very early to good optimising capabilities³⁸ as well as reliable code with very few errors of existing compilers. The possibility to fall back to plain C if needed was also positive aspect although - in the end - this feature was not required.

Portability proved to be somewhat harder to attain. The C++ specifications were finalised and adopted November 14, 1997 by the ISO (International Organisation for Standardisation) as well as several national standards organisations such as ANSI (The American National Standards Institute), BSI (The British Standards Institute), DIN (The German National Standards Organisation) and others. That is well 6 months after the beginning of the project when the first algorithms had to be implemented for feasibility studies. Fortunately, the SGI MIPS compiler and the open source EGCS³⁹ were stable enough to support the language decision for C++. The EGCS compiler had the undeniable advantage to be available on a multiplicity of different platforms so that portability could be ensured, but tests showed that the SGI compiler undeniably produced code at least twice as fast as the EGCS for the then primary development platform, a SGI Origin 2000 with R10000 processors. This resulted in some makeshifts within the code as the SGI cc C++ compiler did not, e.g., support standard C++ string classes even in 1999, but then again, neither did the EGCS. Fortunately, keeping the code compatible to both compilers was not too hard, but portability to other platforms still was not easy: the first compilation on HP machines in 1999 revealed errors in the HP STL implementation that were hard to come by.

Speaking of bugs ...one of the nastiest to encounter is a compiler bug that appears 1) only on specific platform architectures (e.g. SUN) and 2) only when the compiler produces optimised code and 3) is only triggered on rare cases where unexpected results are caught by internal error checking mechanisms which then throw an error ... and in the course of it trigger the bug. When, after quite some investigation, it turns out to be an optimising bug of the compiler used (and not the code one wrote), both relief and anger battle each other. In the end, relief won.

In the beginning the build environment consisted of some simple, hand-coded makefiles that grew larger and larger over time while snatching tricks from makefiles from other authors. But in the end, all these were replaced by the GNU autoconf and GNU automake systems for somewhat easier cross-platform portability. Which does not mean that the learning curve for these tools was not steep ... quite the contrary is true when it comes to get the build environment running on different platforms, but it all paid off in the end as the package now builds out of the box on quite a number of different UNIX platforms.

Having virtually never worked with file versioning before and having gone through some obligatory code losses while typing silly things like ``rm * .o''⁴⁰, using RCS almost from the very beginning felt like a big step forward in the right direction. But when Thomas Pfisterer joined me at the DKFZ to write his automated editor, we started also to develop some libraries needed by both the assembler and the automated editor. It took some time before the pain of dealing with file locks under RCS became so great that a switch to CVS was envisioned ...and finally made.

Last but not least, even the most careful programming approaches (see the next section) could not prevent some hard to find programming bugs to sneak into the code. As one day, after a week of endless debugging sessions by Thomas and myself, one particularly nasty specimen was identified to be a ``simple'' pointer arithmetic problem, we caved in and started to use ``debugging tools for the sissies'': purify and, later on, valgrind. Those tools have a runtime checking mechanisms for almost every problem that can occur in typical C/C++ programs, ranging from stack errors to buffer overflows to uninitialised memory access. We always thought that we had been careful enough while programming and were a bit shocked by the number of potential problems that were uncovered the first time we ran our program with this tool. Needless to say that since then, regularly using these tools belongs to standard operational procedure. This has - together with using ready made algorithm libraries like the STL, see next section - improved stability of the algorithms developed by an order of magnitude.

Bastien Chevreux 2006-05-11