OOPS site :: research :: compiler evaluation

back

Due to our interest in hardware-software codesign and embedded systems development we are concentrated on evaluation of modern retargetable ANSI C compartible compilers. Initially we have choosen four compilers for evaluation:

GCC - GNU C Compiler.

lcc - retargetable ANSI C compiler developed at Princeton University.

VPO - Very Portable Optimizer (part of the Zephyr project).

SUIF/MachSUIF - Stanford University Intermediate Format.

Such a decision was caused by free availability of them.

We perform evaluation of the compilers to qualify their usability in real projects so the following criteria are subject of investigation:

completeness and stability of implementation.

code quality.

ease of retargetability.

To characterize completeness and stability we use notion soundness. Ideally compiler is sound if it implements standart of the language (in our case - ANSI C). Unfortunately there are no safe method to check soundness exostively so we use approximations. We measure soundness in percents of successfully passed tests of broad test suite.

To assess code quality we estimate running time of generated programs. Note that we have to distinguish peak performance from general one. The reason is that unsound compiler (actually almost every one) can (erroneously) do some optimistic assumptions on program properties that result in more efficient code when these assumptions are really true (in other case compiler generates incorrect code). So generally we can expect decreasing of soundness upon increasing optimization power. We can treat peak performance as running time of narrow set of benchmarks composed from those benchmarks that compiler able to generate correctly with maximum optimizations turned on. For completely sound compiler peak and general performances do not differ. Additionally where possible we use comparison with native optimizing non-retargetable compiler (reference compiler) to estimate drawbacks caused by retargetability.

The hardest property to measure is ease of retargetability. We didn't find any way differ from just looking at retargetability mechanism and trying to retarget it to "toy" machine architecture. So this point is most subjective.

As a basis for test suite SPECINT2000 by Standard Performance Evaluation Corpopation is choosen. We also use some additional benchmarks:

bzip2 - BWT-based data compression utility, by Julian Seward.

gzip - LZW based data compressor, by Jean-Loup Gailly.

ranking - Implementation of Symbol Ranking text compression algorithm, by Dmitri Lomov.

Disclaimer: The source code above is original source code of these programs, modified in order to run benchmarks. OOPS team advises you NOT TO USE programs compiled from source code provided above in any other way except benchmarking (escpecially so in production or in any other way critical environment). For latest, stable and tested versions of the above programs look elsewhere.

As for now we use our technique to evaluate compilers on two platforms:

Linux 7.x on Intel Pentium III 800Mhz, 256MB RAM with Intel C++ Compiler for Linux used as reference compiler.

Solaris 8 on Sun UltraSPARC-IIe 500Mhz, 256MB RAM with Sun Forte Compiler Collection used as reference compiler.

back