There will be a development board called CARMA, launching in 2012Q2. Dunno about the price and whether they will sell it to everybody or you have to be special friends with NVidia. http://www.nvidia.com/object/carma-devkit.html
However, the current use of Cachebench in PTS is of little use. It is only a test of how bad the compiler optimize when compiling with -O. The result differs a lot if better optimization is used.
The result from the Cachebench read-test I get this on an Athlon 64 X2 computer using Debian testing gcc-4.6:
PTS result, using the default -O: 1308 MB/s
PTS result, using -Ofast: 5487 MB/s
PTS result, using -Ofast -fprefetch-loop-arrays: 8440 MB/s
Cachebench is very sensitive to compiler optimizations, and just using -O is more of a test of the included optimizations when using -O and when tuning for the default CPU for the given compiler. Different versions of gcc include different optimizations, and different distributions of Linux set different default tuning options.
From a hardware test point of view, you should either use the same binary, or you should find the best results.
Now we see results from different compilers with non-optimal optimization and tuning flags, and even worse,
you don't state the compiler used, including default tuning options.
It is quite obvious that 1308 MB/s is a useless result when better optimization give you 8440 MB/s on the same hardware.
so the Tegra only wins a couple of these comparisons... we probably need to widen the search space. It is interesting how much the Panda performance varied. We couldn't run Crafty because of a dependency failure (libnuma-dev)...