Compositionality

This essay is still in progress. Send me comments if it interests you.

Compositionality (or composability) is a strong word in computer science. It means you can put two things together, and get the effect of both. This is pretty much what everyone always wants. It is at the core of reusable programs -- programs that are not only useful in their own right but can actually be used by other programs to produce new, powerful functionality.

The standard, classic example of this is UNIX command-line programs. The shell pipe (|) lets you feed the output of one command into the input of the next. This is useful because there are lots of programs that do something useful by themselves, but they can do something more than twice as useful when you put them together. Look at sort and uniq. Sort will sort lines of a file lexicographically. Useful on its own, if a program spits output that you might like in alphabetical order. Uniq will remove duplicate lines next to each other. Useful if a program spits lots of identical log messages and you only want to see one of them at a time. But they become much more useful when combined, because now, you can take some output, sort it, and now all the identical lines in the file are next to each other, so uniq will remove them, and all of a sudden you've turned a sequence which may contain duplicate elements into a set which does not. I don't have to think too hard about how sort or uniq work; I just use them.

Subroutines in a program are also an example of compositionality. I can write a subroutine X for one purpose, then while writing subroutine Y, realize that part of the requirement is to do the thing X does, and just call X in order to achieve that part rather than writing it again -- you don't have to think about it. This is standard practice, right? Well, I have found, especially in large systems, that this is rarely the case in practice.

You have to think hard about X! In Java, X is probably a method on a class. The method might mutate the instance, which screws everything up because it wasn't expected to be called from that location, but if you don't think about it, you just use it and then you have a difficult bug. This has happened to me on tons of occasions. Systems code is even scarier, because X might disable interrupts, or sleep, or do something weird that you can't do from whatever context you're in, and everything gets upset. Programming languages like C are designed for composability, but they fail at it because low-level code can take actions that high-level code doesn't expect or can't deal with. Even going into a loop is a risky action on the part of low-level code, because code who calls you might be expecting a certain level of performance.

In thread programming the problem is even worse -- look at locking. If your code uses locking, you must know exactly which locks are being taken by code you call, because otherwise you risk deadlock. Software transactional memory helps on this issue, but the performance of composition of atomic blocks can be worse than the performance of the components (since transactions are larger, either commit probability or concurrency goes down).

My argument is that compositionality in most modern languages is terrible. It exists, and it is certainly possible to write compositional code, but making your user think hard about what your code does is nearly as bad as making them do it themselves.

Now, there are a lot of ways the situation is slowly being rectified. Haskell is a really interesting example of strongly compositional code: functions outside the IO monad have no side effects, period. Since they have no side effects, they accept your data and return a result. You are guaranteed that they don't do anything else.

It is worthwhile to note how monads fit into this. Functions are a little less predictable since the monad syntax hides the possibility of changing monad state, but (outside the I/O monad) the side effect possibilities are highly controlled and they are within the interface of the function.

In terms of performance, Haskell's composability is only a little better than that of most languages. Because of laziness, unneeded work of a called function is not done. But since the results of the called function will be needed before the calling function can produce a value, there's not usually very much performance difference in the long run compared to eager languages -- if you don't know whether an inner function is slow, then you don't know anything about the performance of the function you're writing. And that's bad.

I want to mention combinators. Combinators are components which are a combination of other components. The UNIX pipe I mentioned above is a combinator: it means "make a new component, which is the result of taking the output of that one, and sending it to this one." C and Java don't have the idea of combinators for functions, because there's little that could be done. If you want a function C which is the result of calling A and sending its output to B, you write C. Haskell has several combinators: the dot operator is like the UNIX pipe, in that A . B means A(B(x)).

Combinators are only useful when you have a lot of different components that are highly compositional. Flapjax is one of my favorite languages because of this. You have two types of objects, behaviors (time-varying values -- streams) and events, and you can combine them with all sorts of exciting combinators (merge these two streams, snapshot a behavior with an event, etc.), to produce new behaviors or events. Then you can create all sorts of interesting interactions with the rest of the system.

The language we're designing has a different characteristic. We aren't solving the problem completely, but we think our approach has a lot of merit in terms of compositionality.