Monday 20 April 2009

State of Play for Perlish PoC

First off, the slides from my talk are here.

A tarball of a Mac MLVM (OpenJDK7 with invokedynamic) dating from 2009-04-10 is here.

So, where are we with the code?
The current codebase uses SableCC, an automated parser generator to transform an EBNF grammar into a parser, lexer, etc.

I'm not saying that's the most powerful way to do it, or that it will ultimately suffice for a reasonable stab at getting Perl onto the JVM. It was simply what was to hand that I had a modicum of experience with and with which I could up to speed with some necessary chunks of parser theory in order to get off the ground.

My current grammar has a number of deficiencies, mostly because I cobbled it together by cribbing from some pre-existing grammars which parse Java and PHP.

Longer term we may need either or both of: a more subtle grammar and/or a parser written in Java (basically, a port of the existing C-based parser to Java, but one which tries to stay out of the later phases of compilation).

In terms of code generation, I have two branches - one which uses a homegrown ASM codegen, and one which translates the AST to JRuby's AST, and then uses JRuby's codegen (which is also based on ASM).

Going forward, the semantic differences between Perl and Ruby (notably Everything-Is-An-Object and string handling) probably make AST translation not viable as a long-term strategy.

However, one thing which did occur is if there are parts of the JRuby codegen libs which could be refactored out into a separate package that we could use for Perl / other langs, that would be helpful.

In addition, when designing the AST for use with a homegrown ASM-based codegen, a good hard look at the JRuby AST seems like a good plan - the scoping constructs that they use are directly relevant to Perl, for example (although I'm aware that for performance reasons, they may need to optimise how they handle scope constructs).

Places to get started

  • A better grammar. One quick way to improve the current situation is to improve the quality of the EBNF grammar. The current grammar is here. It may not be the long term plan, but making progress with what we've got in advance of a parser port should help to figure out issues and will help momentum

  • Design session for how to handle Perlish dispatch in full generality. This is probably the most fundamental design issue which needs to get nailed right now. I have some ideas about how to do it, but they need validating and the input of others. If there's interest, I suggest we get together in the back room of a pub or cafe in London one afternoon and thrash this out.

  • Test cases. Having a set of test cases (grouped by complexity - ie low-hanging fruit first) would be very useful. Ultimately, we want to run as much of the test suite as possible, but little acorns are the first step...

  • Starting up a wiki or similar to track know issues with syntax and semantic issues (eg the semantic issues around 'new', GC, etc)

  • Help from a guru who understands the OP representations. This would be really useful in starting to think about the ultimate form of the parser.



If any of these are appealing, especially the dispatch design task, please get in touch.

No comments: