Tuesday 30 April 2013

Reposted for Visibility

Repost from the "Friends of jClarity" mailing list, so non-subscribers could read it....


I'm coming slightly late to this thread (out of town on family business). 

Just a couple of observations:

1) The distinction between a benchmark and a microbenchmark needs to be kept clear. The scale is one of the critical factors which makes MBMs such tricky things. I don't think anyone would argue that *benchmarks* at a macro scale aren't useful. It's MBMs specifically that cause problems.

2) The point about Caliper is well taken, and is what I was alluding to in my "9 Fallacies" InfoQ article (http://www.infoq.com/articles/9_Fallacies_Java_Performance) last week. The Caliper tool was developed by a team who had a specific need for it inside Google. 

To tease this out a little more, I always remind myself that application programmers have different needs than library developers, who have different needs to platform developers (who have different needs to kernel developers, but that's another story). 

If you are writing library code, which will be used by a wide range of unknown client code, you must make compromises and test for both the most average case you can as well as any known extremal cases. This is very hard to do, even with a code corpus of the size that the internal folks at Google seem to have. In this case, an MBM is a useful substitute for some of this general-case reasoning - providing its limitations are understood. 

The Caliper team seem to have done a decent job & to understand what they're doing. However, I have not yet had time to fully review the source code & make sure I understand its choice of compromises - and there is a general rule here:

"Never trust anything that can think for itself if you can't see where it keeps its brain" - Arthur Weasley

3) Do the maths. The statistical aspects of MBMs matter a lot. If you can't remember your descriptive statistics courses (which must use real data), do a course or buy a book.

To get me to take an MBM really seriously, I want to see a zip file containing source code, test harness, full description of environment & all results, preferably in a form which can be easily loaded into the tool of my choice. I can then independently rederive your conclusions to my own satisfaction. 

The point behind this is simply to guard against the Feynman Rule:

"The first principle is that you must not fool yourself - and you are the easiest person to fool" - Richard Feynman

I might coin a rule of my own (although doubtless someone else already got this one...)

"A good microbenchmark is 3% clever programming and 97% data analysis" 

(As a complete aside, to see other great examples of the (ab)use of statistics, see Ben Goldacre on clinical medical trials & data from those.)

Thanks,

Ben

On Copyrightable APIs

Recently, Cloudbees wrote a blog post talking about the copyrightability of APIs, and particularly as it relates to the second round of litigation between Oracle & Google.

As the CEO of a startup, I am extremely concerned about ensuring that playing fields remain level, and that new, potentially lucrative markets remain with low barriers to entry for startups.

However, as I discussed last year, I remain conflicted about this issue. I'm concerned that if APIs are not copyrightable, then there is no legal recourse to prevent a company producing a partial implementation of a standard. This then could be used to weaken exactly the freedom to move between vendors that Sacha & Steve were discussing.

I'm also reminded of the "Embrace, Extend & Extinguish" strategy famously used by Microsoft.

I know that Cloudbees & I share a lot of goals & motivations - to promote an ecosystem which is open to new entrants, which is supportive of developers, and which provides protection for end-users (who may be making substantial investments in the technology).

I understand the risk of barriers to entry, but for end-users the fragmentation of standards could conceivably be worse.

I wonder what else we can do to move this discussion on - in particular, is there any quantitative evidence (or failing that, past examples) that can help us here? Sacha & Steve - any thoughts?

Standard Disclaimer: I don't have any real dog in the Oracle / Google fight. I represent the London Java Community on the Java Community Process Executive Committee, and our position is to promote open standards, which can be implemented by open-source projects on a royalty-free (RF) basis. I have many friends at both Oracle & Google, have spoken at events organised by both companies many times, and hold stock in neither firm.

Thursday 18 April 2013

Nashorn Thoughts

Recently, we ran a HackDay on Nashorn as part of Devoxx UK.

Nashorn (pronounced NAS-horn, not NASH-horn) is a new implementation of Javascript that will ship with Java 8. It's very cool - all implemented with invokedynamic & passes the ECMAScript spec 100% (no other impl can do this).

It's already landed on OpenJDK 8 mainline & is in most cases, pretty solid. We found a couple of surprising things & minor bugs with it during the HackDay, but nothing major.

On the whole, it's pretty awesome - the Java interop is great, and it's a way of writing Javascript which doesn't feel too clunky.

In fact, I'd really like to start using Nashorn as a teaching aid to do outreach to web programmers who don't know the Java platform at all.

One slight problem is that jjs (the Nashorm shell) isn't actually a very good REPL. In particular, it doesn't handle multi-line cut & paste & has no history.

So, I decided that the simplest way to try to fix that was to use rlwrap (thanks to Richard Warburton for discussion of this point). This is a program which takes a command name (plus args) as arguments, and runs them through the readline library, thereby adding support for history, etc even to programs which don't support them out of the box.

rlwrap is, for example, often used by Oracle DBAs to add history support to sqlplus.

Here, then, is where this tale of woe truly begins.

Mac OS X does not ship rlwrap. In fact, it doesn't even ship the GNU Readline library on which rlwrap strongly depends. Instead, it ships a library which has the same name, but is not fully compatible.

All of this means that rlwrap will not build cleanly from source on OS X.

To fix this, I downloaded GNU Readline & compiled it as a static library (actually 2 static libs - libhistory.a & libreadline.a).

Then I set the include directory of rlwrap to ignore the bogus readline.h in /usr/include & instead use the correct one (contained in the source directory of GNU Readline).

Rerunning make from the rlwrap build directory now correctly compiled all of rlwrap, but the final link still failed (because it was attempting to use the bogus libreadline.dylib which OS X ships).

However, at this point, I could now take the final link command, and modify it - removing the -lreadline switch & instead adding in libreadline.a and libhistory.a as additional object files that just need to be included in the final binary. This results in a semi-statically linked rlwrap which actually works.

With all of this yak-shaving done (& yes, I do know that MacPorts ships an rlwrap, but I have been bitten by MacPorts way too many tines to trust it. I'd rather actually understand what was happening on my system - there are package managers I fully understand. MacPorts is not one of them), we were finally read to go:

rlwrap jjs

Aaaaaand - it fails. Well, it works correctly with history & so forth - but one of the critical features for me (multiline c&p) just doesn't work, even under rlwrap.

So, that's kind of back to the drawing board.
  • I'm sure we need a better REPL for interacting with Nashorn. 
  • I'm prepared to spend some cycles on it.
  • I think I have a decent starter set of use cases for it
  • I need some help
  • Especially from people who are approaching this from the POV of not knowing the Java platform or the JVM
  • Errr, that's it...
So - who wants to help? I think this could be quite a cool little project...

Saturday 23 March 2013

Emacs Redux

Recently, I've discovered myself spending more & more time in Emacs.

This post is really about why that's the case, and how I have Emacs configured, in case it's of interest or use to someone else.

I used to use Emacs a long time ago - back when Perl & C were my primary languages, and dinosaurs still walked the Earth. I largely stopped around 2000, when I started writing a certain amount of Java, and using Eclipse as a primary environment.

These days, I find myself doing a fair amount of work with the OpenJDK codebase. This is a particular pain to write in an IDE. This is because to work properly, the IDE needs to hook a working version of the platform & fully support all of its features.

Now, if I'm working on the platform itself, then that self-consistency and support is not present. So I'm going to get, at best, a bunch of failures around the new features.

So a lot of the typical reasons why IDEs are useful are absent - so basic syntax highlighting is pretty much all we can hope for, and Emacs has pretty good support for that.

In terms of application code, whenever possible I like to write Clojure. These days, there is excellent support in Emacs for Clojure, including the NREPL for a very nice interactive environment.

To get clojure-mode installed, follow the instructions from here. However, the sections about inferior-lisp-mode and slime are obsolete - you should use NREPL instead.

I'm also tending to write articles & other documents in Markdown, which doesn't really need much in the way of WYSIWYG support (although I spent a tiny amount of money on Marked.app for the times when I do want to see what my .md files look like, or export to PDF).

My primary machine is a MacBook Pro, so I use the Emacs from here. I tend not to use package managers for OS X, so I don't use brew or macports.

Finally, I've been pleasantly surprised by many aspects of the modern Emacs experience. The Clojure integration has come a long way & it's now a joy to use. For OpenJDK development, it's almost as good as a full IDE.

However, there are still some rough edges. For example, Emacs doesn't ship with a spellchecker by default, and I have so far been unable to get one working - which is a medium-sized pain when writing Markdown. Instructions on how to get it working in comments would be very welcome.


Thursday 21 March 2013

A Starting Point

So, to ease myself back into writing, I'm going to cheat, and start by repeating some very good advice I heard recently.

This is by Rich Hickey, the inventor of Clojure, and I think it's a near-perfect statement of how I would like mailing list interactions to work (& how I aspire to behave myself).

Thanks Rich - over to you:

"This is just a reminder. While in general our communication here is very good, occasionally it goes astray.

These mailing lists are run by, and for, people who make things. Most messages should have one of these forms:

I made something - here is my contribution
I am trying to use the thing someone made and am having trouble, please help.
I can help you with that thing someone made.
I am trying to make something and am having trouble, please help.
I can help you make something.

They are not the place for opinion pieces and diatribes.

They are not the place for advocacy about what 'ought' to be made. If you think something ought to be made, then make it. Otherwise, respect others peoples' right to choose what they do with their time.

Occasionally, there may be disagreements about how something has been, or will be, made. These disagreements should take the form of technical arguments. To make a technical argument that gets (and gives!) respect:

Keep it short
Stick to the facts
Use logic
Leave people out of it
Avoid rhetorical devices:
        Superfluous or opinion-laden adjectives
        Claims to speak for the community, or that everyone agrees with you.
        Threats of what will happen unless things go your way
        Any flavor of 'the sky is falling'

If you are not the one making something, you should restrict your input to very short technical arguments supporting your position. If someone has already made your point, just +1 it.

Please keep your posts short.

Ignoring these guidelines fails to respect the time and effort of people who make things, which you should care about if you intend to be one.

Thanks,

Rich"

If you've read down this far, please leave me a comment with a link to other good advice you've heard recently.

Still Not Dead

Can't believe how long it's been since I last posted here.

A number of times I have wanted to start blogging again but for one reason or another, haven't managed it.

But let's give it a go - I am still crazily busy, but I do have things to say.