Repost from the "Friends of jClarity" mailing list, so non-subscribers could read it....
I'm coming slightly late to this thread (out of town on family business).
I'm coming slightly late to this thread (out of town on family business).
Just a couple of observations:
1) The distinction between a benchmark and a microbenchmark needs to be kept clear. The scale is one of the critical factors which makes MBMs such tricky things. I don't think anyone would argue that *benchmarks* at a macro scale aren't useful. It's MBMs specifically that cause problems.
2) The point about Caliper is well taken, and is what I was alluding to in my "9 Fallacies" InfoQ article (http://www.infoq.com/ articles/9_Fallacies_Java_ Performance) last week. The Caliper tool was developed by a team who had a specific need for it inside Google.
To tease this out a little more, I always remind myself that application programmers have different needs than library developers, who have different needs to platform developers (who have different needs to kernel developers, but that's another story).
If you are writing library code, which will be used by a wide range of unknown client code, you must make compromises and test for both the most average case you can as well as any known extremal cases. This is very hard to do, even with a code corpus of the size that the internal folks at Google seem to have. In this case, an MBM is a useful substitute for some of this general-case reasoning - providing its limitations are understood.
The Caliper team seem to have done a decent job & to understand what they're doing. However, I have not yet had time to fully review the source code & make sure I understand its choice of compromises - and there is a general rule here:
"Never trust anything that can think for itself if you can't see where it keeps its brain" - Arthur Weasley
3) Do the maths. The statistical aspects of MBMs matter a lot. If you can't remember your descriptive statistics courses (which must use real data), do a course or buy a book.
To get me to take an MBM really seriously, I want to see a zip file containing source code, test harness, full description of environment & all results, preferably in a form which can be easily loaded into the tool of my choice. I can then independently rederive your conclusions to my own satisfaction.
The point behind this is simply to guard against the Feynman Rule:
"The first principle is that you must not fool yourself - and you are the easiest person to fool" - Richard Feynman
I might coin a rule of my own (although doubtless someone else already got this one...)
"A good microbenchmark is 3% clever programming and 97% data analysis"
(As a complete aside, to see other great examples of the (ab)use of statistics, see Ben Goldacre on clinical medical trials & data from those.)
Thanks,
Ben