I am experiencing some unusual behavior in my HPC application on linux 2.6 (RedHat 2.6.9-42.0.10.ELsmp on X86_64 (2x AMD dual core 280)) and (Ubuntu 2.6.20-13-generic SMP on Core 2 Duo (dual core Woodcrest)).

My HPC application performs 1000-10000 iterations of an identical loop. The loop doesn't change with each iteration and there is no memory allocation after the program gets to the iteration loops.

The unusual behavior is that the iterations start at 3.5 GFLOPS per core efficiency for the first 20-30 seconds, then the performance slowly drops to about 0.7 GFLOPS over 10 minutes, then after about 30-50 minutes, the efficiency picks up to 4 GFLOPS per core. Each iteration of the loop is identical. There is no IO (and no calls to malloc or new). My jobs last only 20 minutes to 120 minutes, depending on parameters. The performance drop is a big problem.

I have tried a bunch of things:
1) Changed the nice to -15,
2) Setting SCHED_RR & SCHED_FIFO priority.
3) Run the job in linux runlevel 3.
4) Run the job with 1 thread or with 4, AMD or Intel CPU.
In all cases I get the same big drop in efficiency after 20-30 seconds.

The job is fairly memory intensive but I'm not allocating memory after the start. I am using 2 gigabytes of the 8 gigabytes available.

Any ideas on what can cause the performance to vary so much in identical loops and how to fix it? Thanks!