Friday, April 07, 2006

Usable Languages

Do you remember the first language you programmed? For me, it was BASIC. I don't mean Visual Basic. I mean the old basic where each line had a line number, where you used goto statements. All the things that folks like Dijkstra told us were bad, bad ideas (admittedly, the reason he didn't like goto, if memory serves, is because it was hard to prove correctness, and Dijkstra was big into correctness, which faded from popularity when people realized he wanted to automate proofs, and that was hard, even for the simplest of programs), they tossed into BASIC.

The nice thing about BASIC is its ease of implementation. Given a language, like, say, Java, you could probably code up a BASIC interpreter in a few thousand lines, maybe not even that.

Sometime in the 80s, there was this idea that we ought to move past text-based languages. Somehow, text seemed primitive. Shouldn't we draw circles and arrows and nice pictures? Wouldn't this make for nicer looking code? This never caught on. I'm sure a few theses came of this, but for the most part, we still type text.

Yet, something, in my opinion isn't right about that. For example, think about comments. There are many types of comments. There are comments that are describing what's going on. Input parameters, return results, exceptions thrown--the kind of stuff Javadoc is supposed to be good at. But beyond that, you may want to have something that reminds to get back to this piece of code. Many of us come up with ways to notify ourselves. Perhaps a "TODO:" that we can grep for.

And what about bug fix comments? At times, I want to write comments specifically to indicate that the code I just changed was a bug fix (will IDEs eventually incorporate with a bug tracking system, so it's easy to point to the solution). This would let me have some idea where I've had to fix code, but it's not the kind of comment I want to have that represents the meaning of the code.

The next issue I have is testing code. We all know we're supposed to use it, but here's a source of problems. I would like to have a visibility keyword "test-visible" that I could apply to method names, e.g., test-visible getFooBar() which could only be called classes that call themselves test classes. Non-test classes could not call these methods.

The most drastic idea I have, along these lines, is related somewhat to research one of my housemates is doing. Jaime (pronouced "Jay-me" rather than "Hi-me") is collecting data from introductory programming students, to determine how they program and what kind of bugs they generate (at least, ostensibly, that's what it would be used for, but really, the testing infrastructure is as important, or at least, as time-consuming).

One suggestion, noted by Professor Mike Hicks, is to use this data to create programming languages that's more usable. I know, there's something that screams "Isn't that what Visual Basic is for?", but really, has a language ever been designed like this? How many people, in C, wrote = to test for equality, instead of == and got burned? Could we learn how to write usable languages?

I've done some interviews where I've asked people to code up data structures. I find some people understand what to do conceptually, but when it comes to actually coding it up, they are on far shakier grounds. The funny thing is that this can happen to teachers too. At a university, it's somehow OK for a professor to say "I know very little about programming language X" and yet still teach data structures. After all, data structures are an abstract concept which can be quite independent of the programming language.

Why is it challenging? For the teachers, it's typically not because they lack the intellectual ability to program, but they find they don't want to learn the language. It's yet another way to do something (like learning a new foreign language) that they find unappealing. But for students, who have some vested interest in knowing the language (they need to submit programs and take tests), this difficulty is surprising.

Some of the problems lie with lack of use. Some people forget how to program in a language after even a few months of not programming. Even with expertise, some make the same mistakes over and over (such as = vs. == or believing that comparison functions return -1, 0, 1 instead of negative, zero, and positive). Those kinds of mistakes, it seems, could be avoided with a more "usable" language.

And yet, it seems rare to design a language in this way. Java has done a little. For example, one of the most common errors in C/C++, especially those coming from Pascal and Fortran is the use of == for equality. In C/C++, every type has a "zero" value. For pointers, it's NULL. For characters, it's '\0'. C/C++ treats zeroes as false and all other values as true. People would write conditions like if (x = 3) thinking they were testing 3, but instead, they assigned 3 to x, which is non-zero, and therefore always true.

Java fixed this up by insisting that conditions have type boolean which doesn't even exist as a proper type in C, and that fixed up most errors, since the compiler would flag (x = 3) as having type int, not boolean. It's not fail-safe, since you could assign a boolean to a variable, but that's somewhat uncommon.

Little things like that can help make it easier to avoid pitfalls while programming. I'll agree that programming is sufficiently hard that people are always going to have a hard time programming. Still, that doesn't mean we shouldn't make some things easier.

One complaint about a usable language is that it would lead to really basic languages. That depends on your audience. If you're trying to appeal to the run-of-the-mill web developer, then higher order functions, continuations, and other advanced features may be too intellectually complex. On the other hand, even languages like O'Caml might benefit from figuring out what kinds of mistakes people make. All I'm saying is that usability studies might provide another angle by which we can design programming languages.

No comments: