Sunday 16 August 2009

Thanks For All The Fish

One of the major themes of the blog so far has been my attempts to write a version of (perhaps a subset of) Perl 5 that will run on the JVM, and/or find out what makes this a difficult exercise. And, to have fun while doing so.

For reasons which I outline in this post, I think it's time to give up, and I would strongly encourage anyone reading who is tempted to have a go at carrying on where I'm leaving off to think again. The code we wrote is out there, but I would suggest that anyone who's really interested contact me first.

So, here's the lowdown on the problems we faced.

There are really two separate sets of issues - first of all there are issues related to nullability.

Because Perl does not require brackets around function parameters, and has functions which do not need to specify their prototypes, then this line of code:

my $a = dunno + 4;

can be parsed in one of two ways:
  • $a gets 4 + the result of calling the function dunno(), which takes no arguments. That is, + is treated as a binary operator
  • $a gets the result of calling dunno(4), where dunno takes at least one argument. That is, the + is treated as a unary operator on 4.
This line of argument is expanded upon significantly by Jeffrey Kegler at http://www.perlmonks.org/?node_id=663393 - where he links it to Halting. I have not fully satisfied myself of the full implications yet, but the initial points amount to a hugely significant parsing problem, with no real good solution.

The second source of problems is that Perl 5 is old (1994) - and dates from a time when automated language tools were rather more lacking than they are today. When Larry Wall was working on the first versions of p5, lex and yacc were pretty much the state of the art in terms of what was practical for autogeneration, and a skilled practitioner could outperform them by modifying the output, or writing from scratch.

Perl wasn't written with formal parser theory in mind, and has now reached the stage where the implementation is really all we have. It does not fit well into a rigorous model of language, and during its development flexible language features were considered to be more important than linguistic concerns (such as static analysis).

Simply put, there's no grammar, and attempting to write something which matches the only existing implementation is a major undertaking - no existing automated language tools will help much, it's largely a matter of needing to completely reimplement the existing C code in Java (or bytecode). This is a huge amount of work, if it's possible at all, and is not going to be fun - and will be likely to be very frustrating for a large chunk of the time spent on it.

So, here we are. I've had a lot of fun working on this (and particular thanks to James Laver and Shevek, both of whom provided insight, help and encouragement - and to the many other people in the Perl, Java and Ruby worlds with whom I had interesting and sometimes amazing conversations) and I'd like to close with a short summary of what I've learned from this project:
  • Too much backwards-compatibility is a huge millstone
  • Always ensure that the people you're talking to have the same definitions you do
  • If you're going to use formal language, you must have proofs available. Declaring a problem to be in a particular class by fiat does not help anyone.
  • Perl's block / closure / subroutine definitions are too overlapping and unclear. This is a major problem
  • Indirect object syntax in Perl 5 was a misstep
So, that's it for now.

I'll be moving on to other problems on my stack now, so my next posts will be about broader topics than just language design / implementation but I'm sure I'll return to language design in due course - after all, I just can't seem to stay away from it.

2 comments:

Curt said...

Sorry for coming to the party late. Please expand a bit on what your goals were, so I can understand your words of caution. Since Perl does it, it must be possible. What about a multi-phase approach like Hot-spot? In other words, interpret everything, profile as you go, and compile based on feedback from the profiler. Sorry I'm not in the field, so I don't know the proper term for this.

Ben said...

Hi Curt,

The goals were (are? - see forthcoming post) to produce an alternative implementation for "as much of a subset of perl5 as practicable" - this is deliberately vague, as we know there are issues with the language structure that cause problems.

Ultimately we would like to have a parser and either an interpreter or compiler written in the target platform.

There is new news, however - please read the post I'm about to put up for more details.