This is important, both for general optimisation and also for implementing languages like Scheme on the JVM. The Scheme case is particularly important, because the Scheme standard requires optimisation of tail recursion (as this is the standard idiom for implementing iteration in Scheme).
The basic form of a tail call which we would want to optimise, looks like this in C:
void foo() {
// stuff
// ....
bar();
return;
}
The call to bar() is followed immediately by a return, so if we don't optimise, we're growing the stack unnecessarily. The idea of TCO is to rewrite the current stack frame (and so avoid growing the stack) and then issue an unconditional jump to start executing the callee.
Now, in Java we don't have direct access to modify the current stack frame, so we instead need to have some bytecode which has the semantic meaning of the JVM of "this call must be implemented as a hard tail call". This change requires changes to compiler and interpreter, and is currently implemented like this:
The bytecode for a optimised tail call would look something like this in the patched HotSpot:
0xc4 wide
0xb6 invokevirtual
0x00 // high byte of index of method to call
0x03 // low byte of index of method to call
0xb0 areturn
The wide bytecode is used to indicate that the opcode immediately following it (the embedded instruction) is operating in a variant mode. Correctly implementing interpreters will verify the correctness of the instruction form of wide, parse the form and then evaluate the instruction.
That is, the embedded instruction is considered an operand to wide - not all forms which follow it are legal bytecode as they stand - so in particular, there should never be a jump directly to an embedded instruction. See this link for more details.
The form of wide above is a new, experimental form for supporting TCO (if you look at the current documentation for wide you'll see that all the embedded opcodes it currently supports are of type load / store / inc, so a form which supported invoke* would have to be a new form). To become standardised, the TCO change needs to pass the JCP, which first requires evidence that this will be of sufficient benefit to justify the changes by vendors to support it, so it might be a while - this is a fairly hot-off-the-presses change.
Next up, I'll discuss how to get the TCO patches switched on (and confirm that you have) and some test results for some preliminary cases.