Tales From Boxcat Junction

Sunday, 28 February 2010

Thoughts on Free / Open Source Software

This post is intended to provide a basic introduction to some of the concepts and motivations of Free / Open Source Software (F/OSS). The intended audience is students who are on the path to becoming professional developers - either final year undergrads or recent graduates beginning their first job.

A Definition of Open Source
Open Source software is software which meets these criteria:

Everyone can download and use the software as-is for any purpose they like, without any royalty or license payments.
Everyone can study the source code and make changes if they like
Everyone can give the software to anyone they like - with or without changes (but you can't take credit for things you didn't write, and you have to provide source code and the same rights as you received)

Notice that the above definition leads to a situation where for practical purposes, Open Source software is made available without charge - because even if the original developer asked for money (eg to cover the cost of bandwidth) then anyone who downloaded it, could redistribute it free of charge.

Free Software and Open Source Software
There are two major points of view regarding what is the most important aspect of this movement in the software world. One viewpoint is that the access to the source code, and the availability of the software without charge are the most important aspects. This view is usually the one in which businesses are most interested in when they consider Open Source.

However, the other viewpoint chooses to emphasise the user’s intellectual freedoms to use and modify the software as they see fit. Many people in this part of the movement prefer the term ‘Free Software’ to ‘Open Source’ for this reason.

"When we call software 'free', we mean that it respects the users' essential freedoms: the freedom to run it, to study and change it, and to redistribute copies with or without changes. This is a matter of freedom, not price, so think of 'free speech', not 'free beer'." - Richard Stallman

The term "Free Software" view holds that it's not simply the access to the source code and lack of a price tag on the code which matters - the freedoms of the user of the software are seen as a major point of principle and central to the entire development practice.

Free Software and Open Source Software are two schools of thought which share many common goals, but which have different philosophies and emphasis. Despite these difference in approaches, however, virtually all software which is Open Source is also Free, and for practical purposes, all Free Software is Open Source.

More importantly, most of the time this distinction does not matter to the majority of people who use and develop Open Source or Free Software. The different philosophical approaches that individual developers take do not usually matter in terms of their ability and willingness to work together - people with very different views on the underlying philosophy can and do work very effectively on the same project, to the same goals.

How does this fit into the modern software industry?
These days virtually all companies will use F/OSS for at least some of their software needs, and F/OSS is contained in a very large number of consumer devices, such as wireless routers and HD televisions from major manufacturers.

F/OSS has become a major presence in the software world and is now widely used in all sectors of the industry, particularly to provide infrastructure solutions or libraries to build upon.

What are some examples of Open Source / Free Software?
The Linux operating system. The Apache web server. Java (eg OpenJDK or Apache Harmony). The PHP web programming language.

The important thing to note here is that when F/OSS provides a platform on which to run other code (the business applications), then the source code for the business applications does not usually need to be released. For example, just because PHP is Free Software, does not mean that the source code to Facebook (and every other web application which uses PHP) needs to be available.

How does Open Source / Free Development work?
The source code is usually made available at all stages of development after the initial announcement - quite often through allowing public (read-only) access to the source repository.

In addition to the source code, projects will usually produce official releases, on whatever timescale they deem appropriate.

Individual developers can then join the project, by joining the project forums (eg mailing lists, bug trackers, etc) and getting up to speed and then starting to participate. This participation can take a number of forms - not just coding tasks. For example, developers who can write good tests or lucid project documents are in demand in virtually all large Open Source projects. Discussions about design and direction of the project will take place on the project mailing list, and people are welcome to contribute - although as with most projects a developer’s experience and standing in the community will be a factor in how seriously their views are taken.

Developers will usually tackle the tasks which interest them the most, although this can lead to duplication of effort, as several people may choose to stat attacking the same interesting-seeming problem. Sometimes developers who are new to the project will ask experienced devs what would be good starter tasks - and this can be a good way to get into a new project.

These open and decentralised approaches to development make it a very different environment from that found in many commercial workplaces. This is to be expected as the typical Open Source developer is not directly compensated (in material terms) for the work they do.

The primary reasons that developers have for being involved in Free or Open Software are very varied - but common motivations include:

Recognition by one’s peers
Satisfaction of scratching a personal “development itch”
Learning a new language / technology area
Contributing to one’s community

Wednesday, 30 December 2009

New Talk

Here is my recent talk on JVM Performance Tuning at the Skills Matter Open Source In Finance Exchange.

Thanks to Alan Hardy and the Skills Matter people for organising, and for all the great feedback from the attendees.

Monday, 30 November 2009

Round-Up

It's been a very few weeks round at Boxcat Junction.

I was lucky enough to get to Devoxx 2009 in Antwerp a couple of weeks ago - and was very impressed by the presentations and people I met there.

Particular props are due to Brian Goetz and Alex Buckley from Sun, James Strachan for his OSS ESB talk and Holly Cummins (superb performance talk), Zoe Slattery and Robin Fernandes from IBM.

This was followed last weekend by the London Java Community's first Unconference, graciously hosted by IBM at their Southbank location - organised by Barry Cranford and Zoe Slattery. The format went off really well, although I think next time I'd like to see more sessions using less traditional teaching methods (eg group discussions, fishbowl, etc). I hope to make my slides for my Performance talk available soon.

Next week sees me in St Petersburg, meeting with Russian colleagues, and then I'm attending (and speaking at) the SkillsMatter Open Source in Finance Exchange on December 15th.

Saturday, 12 September 2009

Knew I Couldn't Stay Away

In this post, I said that I was done with language hacking for a while.

Well, I guess I can't stay away from it. Mattia Barbon has been running the Language::P project for a while now, over on github, and I've agreed to give him a hand with the Java backend pieces of that project.

The current state is that he has a Perl5 parser, which covers a reasonable (and improving) subset of P5. The parser is written in Perl5, and currently outputs a packed bytecode format (consisting of P5ish opcodes).

He has a branch containing a .NET backend which uses DLR expression trees, and I've started work on a Java backend.

My initial target is an interpreter, but I have several pieces which I would hope to use in an eventual compiler.

The current aim is to support enough of P5 for the Language::P parser to parse itself, and from there to bootstrap parsers in the target backends. For an interpreted backend, such as my current Java prototype, this will mean shipping a packed bytecode parser for the interpreter to fire up. For compiled backends, other avenues are possible.

Both the .NET and Java parts are currently on private branches, but I hope to make the code available soon, and merge to HEAD.

If anyone wants to help, please leave a comment / email me.

Tuesday, 25 August 2009

Spring 2.5 and Eclipse 3.5 JUnit Problems

Spring 2.5 test classes (eg those that use @RunWith(SpringJUnit4ClassRunner.class) ) implicitly rely on a class which was removed in JUnit 4.5

See this Jira: http://jira.springframework.org/browse/SPR-5145

Unfortunately, the latest Eclipse (3.5) ships JUnit 4.5 by default, meaning that the newly-production Eclipse version is broken for running tests written for the current production Spring version.

No workaround is currently available (and "use Spring 3.0" as a resolution isn't very helpful for production stuff - it's still in beta) - I will investigate with the Eclipse people as to whether anything (eg a downgrade of the JUnit plugin to 4.4 would be feasible) but don't hold out much hope.

For now, looks like it's back to running Spring tests from the command-line with a local JUnit 4.4 jar on your classpath.

(Btw, if anyone does have a good workaround, please let me know.)

Sunday, 16 August 2009

Thanks For All The Fish

One of the major themes of the blog so far has been my attempts to write a version of (perhaps a subset of) Perl 5 that will run on the JVM, and/or find out what makes this a difficult exercise. And, to have fun while doing so.

For reasons which I outline in this post, I think it's time to give up, and I would strongly encourage anyone reading who is tempted to have a go at carrying on where I'm leaving off to think again. The code we wrote is out there, but I would suggest that anyone who's really interested contact me first.

So, here's the lowdown on the problems we faced.

There are really two separate sets of issues - first of all there are issues related to nullability.

Because Perl does not require brackets around function parameters, and has functions which do not need to specify their prototypes, then this line of code:

my $a = dunno + 4;

can be parsed in one of two ways:

$a gets 4 + the result of calling the function dunno(), which takes no arguments. That is, + is treated as a binary operator
$a gets the result of calling dunno(4), where dunno takes at least one argument. That is, the + is treated as a unary operator on 4.

This line of argument is expanded upon significantly by Jeffrey Kegler at http://www.perlmonks.org/?node_id=663393 - where he links it to Halting. I have not fully satisfied myself of the full implications yet, but the initial points amount to a hugely significant parsing problem, with no real good solution.

The second source of problems is that Perl 5 is old (1994) - and dates from a time when automated language tools were rather more lacking than they are today. When Larry Wall was working on the first versions of p5, lex and yacc were pretty much the state of the art in terms of what was practical for autogeneration, and a skilled practitioner could outperform them by modifying the output, or writing from scratch.

Perl wasn't written with formal parser theory in mind, and has now reached the stage where the implementation is really all we have. It does not fit well into a rigorous model of language, and during its development flexible language features were considered to be more important than linguistic concerns (such as static analysis).

Simply put, there's no grammar, and attempting to write something which matches the only existing implementation is a major undertaking - no existing automated language tools will help much, it's largely a matter of needing to completely reimplement the existing C code in Java (or bytecode). This is a huge amount of work, if it's possible at all, and is not going to be fun - and will be likely to be very frustrating for a large chunk of the time spent on it.

So, here we are. I've had a lot of fun working on this (and particular thanks to James Laver and Shevek, both of whom provided insight, help and encouragement - and to the many other people in the Perl, Java and Ruby worlds with whom I had interesting and sometimes amazing conversations) and I'd like to close with a short summary of what I've learned from this project:

Too much backwards-compatibility is a huge millstone
Always ensure that the people you're talking to have the same definitions you do
If you're going to use formal language, you must have proofs available. Declaring a problem to be in a particular class by fiat does not help anyone.
Perl's block / closure / subroutine definitions are too overlapping and unclear. This is a major problem
Indirect object syntax in Perl 5 was a misstep

So, that's it for now.

I'll be moving on to other problems on my stack now, so my next posts will be about broader topics than just language design / implementation but I'm sure I'll return to language design in due course - after all, I just can't seem to stay away from it.

Saturday, 1 August 2009

My Interview Checklist

Someone asked me recently about what sort of job interview prep I do, and having recently found myself a new job, I thought I'd post a sample here.

This is the bare bones of what I polished before my most recent job hunt. It's skewed towards Java for some of the actual technology bits, but the CS fundamentals should be language-independent.

My attitude is that the working practitioner should have a good command of a lot of this (especially the CS topics) at all times, and should only need to briefly revisit each subject to ensure the polish and the details are 100% there.

The books I used most heavily were "Introduction to Algorithms" (Rivest et al) and Doug Lea's "Concurrent Programming in Java".

Comments and suggestions for things other people have found useful would be most welcome.

Algorithms
Details of order notation (eg Omega etc)
Mergesort
Quicksort
String Matching
NP / NP-completeness

Trees
Basic Trees and Tree Construction
Red / Black Trees
Hashing / Hashtable
HashMap / TreeMap
B-Trees

Graphs
Representations of Graphs in code (object / pointers, matrix, adjacency list)
Graph Traversal (BFS, DFS)
Minimal Spanning Tree
Dijkstra

Discrete Maths / Probability / "Logic Puzzles"
Probability Exercises
Decision Trees Exercises
n-choose-k Problems
Permutation Groups, Cycle, Reprns, etc
"Perfectly Logical Beings" puzzles
Decision / Ply problems (eg Monty Hall)

DB / Hibernate
Normal Form
Having clause
Outer joins
XML Form of Hibernate - Basics
JPA / Hibernate
Indexes and Optimisation

Java Internals and Details
Bitwise operators
Collections / Generics nitty-gritty (ie to bytecode level)
OO nitty-gritty (and nasty edge cases)
Annotations nitty-gritty
Arrays nitty-gritty

OS/Concurrency details
Safety, Liveness, Performance, Reusability
Permanent Fail: Deadlock, Missed Signals, Nested Monitor Locks, LiveLock, Starvation, Resource Exhaustion, Distributed Fail
Immutability
Block-structured Locking, Synchronisation, JMM, Fully Synchronized Objects
Other Constructs: Mutex, Latch, Futures, Callable / Command Adapter
Real-world multithreaded application development

Future Web Tech
HTML 5
ECMAScript 4 hassles
Flex vs Silverlight (ref issues with LCDS and the Adobe approach)
Asynch Messaging for webapps