Guys, all this shows is that the benchmark isn't valid
Post #31 at least tried multiple runs on the same rom/hardware, and various roms on the same hardware.
The only way to make the benchmark meaningful is to control for every variable (ie, every reg setting that changes the score)
Once we have a set of reg settings that we all agree on, and a test procedure we all agree on, then we can generate meaningful numbers (looks like +/- 1% across trials for normal variance) and compare them. That will tell you which rom OS is performing better or worse - likely the results will vary just above the noise (maybe 5% spread.)
Me personally, I don't care

But seriously if you want to do this that's the way, and VreebieZ has generated the most useful data so far. Note how all his "good" results were in the ballpark of 2500 (+/- 50, or 2%) - you'd be REALLY hard pressed to notice one device being 2% faster then another as a user.