Saturday, December 31, 2005

Joel vs. Java

Ah Joel Spolsky, you ranter you! The latest Joel article is all about his views on teaching Java in computer science departments. Now clearly, his view about the goals of a computer science department contrast greatly with what these departments are trying to do.

On the one hand, Joel wants these departments to produce the best programmers possible, and to him, Java ain't the way there. That's because Joel has boiled down great programming to knowing two concepts: recursion and pointers. This is strange to me for several reasons. First, Java does indeed have pointers: it simply calls them references.

To be fair, I don't exactly know what Joel thinks one should know about pointers. For example, you can use pointers to attach linkeds lists and trees, but that can be done in Java just as well as C. Languages like C create a strong correspondence between arrays and pointers, and offer something languages like Pascal don't have: pointer arithmetic. Also, C pretty much allows you to cast an int to any arbitrary pointer type.

This means you can point to memory that you don't own. Most modern OSes do not allow rampant access of memory, and use techniques like virtual memory to prevent a process from arbitrarily accessing memory that belongs to other processes. Doing so causes a segmentation fault. Java makes it pretty much impossible to get segmentation faults because you can't perform arbitrary casting. Furthermore, it does bounds-checking on arrays, which is its way of making sure you don't access memory you don't own.

OK, I admit, since Java is a safe language, the worst you can do is a null pointer exception. In general, it protects you from arbitrary memory access, and even more, it guarantees that you are either pointing to an object of a valid type, or that it's null. C and languages like it don't hide this fact from you.

But there's more. Those languages also don't have garbage collection. There's a certain skill to tracking down memory errors, but by golly, they're evil. Real-world programmers need memory leak tools to help trace these problems, and I tell you that it is an incredible waste of time. These bugs are insidious because they are so hard to find. Without tools like Purify, people would go nuts.

Joel also talks about the ability to pack things into bits. Even programmers in C don't actually have to do this. A curriculum can get along perfectly well without teaching bit operations, even if sticks to C or C++. And it's not as if Java doesn't have these operations, because it does. The only headache in Java is that it lacks unsigned types, which creates problems as you cast to larger types (sign-extension and the like).

It's not that you can't learn these concepts in Java, because you can, especially if the person teaching it is aware that they need to teach it. It's just that when you don't know it well in languages like C or C++, it bites you much harder, and the best students are more aware of why these bad things happen. It's like being a tightrope walker with and without a net. When you don't have a net, you know it really hurts when you fall, and you're that much more attuned to the mistake and how best to deal with it.

As far a recursion goes, I mean, come on, which CS dept. worth its grain in salt doesn't cover this? To be fair, I somehow managed to avoid understanding this concept until really late, until pretty much I had graduated. However, if you take a data structures course, you've got to do some recursion, especially with any tree manipulation. Algorithms courses tend to rely on some recursion too.

But understanding recursion and pointers isn't enough to do world class programming. It just isn't. A person who understands both may be far more likely to deal with the complexities of programming, because there's a strong correlation between the ability of a mind to grasp these concepts and to do good programming, but that's all it is: a correlation. In pro football, there's this thing called the combine, where college players eligible for the draft that aren't sure things (like Reggie Bush is a sure thing) go. They run 40 yard dashes, and do all sorts of stuff to impress the scouts.

What they don't do is play the game in the combine. The scouts hope the numbers correlate well to playing football well. This is the same with recursion and pointers.

Now that I've been in the "real world" for a while, I'm beginning to see a lot of other skills that are just as important. First and foremost is the ability to work with other people's code. Most CS depts prefer their kids to write code from scratch. There's a huge number of benefits from doing so, not the least of which is that the person writing the programming project doesn't have to supply students with code. Heaven knows that teachers of programming don't want to supply code that has to be tested and debugged. Given their druthers, most teachers want to describe a project in 2-3 paragraphs, so it only takes them an hour to write up. Real world code isn't small and isolated like that.

You also have to have some idea of what it means to architect code. I have no idea how to do this. Well, that's not true. I have some idea, but I don't have the definitive idea on how to do this. And intro books are no better at it than anyone else. Intro books are good at covering the syntax of a language, and perhaps dealing with a few basic things like data structures, algorithms, and some testing. I bet many of these authors have never had to code much beyond a thousand lines long.

Once you get past the fundamentals of the language, of algorithms, and of data structures, the next step is thinking about how to design code, and that ain't easy. People are still coming up with new ways of how to write code, and nothing jumps out as the "right" way.

But, wait, there's more. What software company doesn't use version control, or a bug tracking system, or debuggers? How many CS departments teach the ins and outs of this technology? In the days of C, the build tool of choice was make, and this was so old and kludgy, that I think some high school kid wrote it, warts and all. If you're coding in Java, you use Ant as a build language, or maybe Maven, which builds on top of Ant.

Well, ladies and gents, Ant wasn't around five years ago (and if it was, it wasn't around ten years ago). You can't do serious Java coding without some rudiumentary understanding of Ant (OK, you can, because IDEs do a good job of hiding it, but still). Ant is a piece of technology, just like CVS, just like JUnit, that is part of the Java programming lexicon, and you've got to learn it if you're doing to do business in this field.

Which leads me to the flip side of the argument. Joel is ranting that CS departments are going to stop producing the coding whizzes he so desperately wants, or at the very least, he's going to have to do that much more work to identify the stars. The first scenario is a dilemma, the second, an inconvenience.

But to address this point head-on, why are CS departments doing this? Why are they "dumbing down" the curriculum? Joel has an elitist view of the world, especially the programming world. If there were 100 CS majors, he'd weed out 98 as hopeless, to keep the two superstars. The rest should find something else to do. Of course, if everyone believed that, then no one would ever graduate from college. Not that he cares. He wants the best. The rest is too much work to get to the best, and wasted work at that. Coding whizzes are effectively born, not created.

But CS departments, as many other departments, want students to graduate. That's a nice goal, right?

So why "dumb" it down? This leads me to my next point, which will come as a shocker to most people who aren't in computer science. There are plenty of professors that can't program. Oh, I don't mean, they can't program. They can, but they choose not to.

To illustrate this idea, let me introduce a mathematician of note. David Hilbert. (I could have picked anyone, but Hilbert will do). Hilbert was a brilliant German mathematician at the turn of the twentieth century. If he were still alive now, and in full mental capacity, he could probably still do mathematics today with the best of them. Math notation hasn't changed much in a hundred years (although he was responsible for some of the way math looks today). Sure, there are many advanced techniques in math that just weren't around in his day, but even so, math notation and the idea of proof is basically unchanged.

On the other hand, programming languages are highly ephemeral. I once heard of a professor who lamented that he had to learn Java to teach data structures. Since when did he stop knowing stuff about trees and stacks and graphs and so forth? He knew how to code it in Pascal! Why did he have to know it in Java? What did inheritance and interfaces and exceptions have to do with data structures? He could draw pictures of nodes and arrows, and that was a perfectly fine way of talking about data structures.

Many professors simply don't like the change. They aren't fans of technology. They like the fact that the proof for the irrationality of the square root of two that they used ten years ago is still good today, and will be good ten years from now. They don't like the fact that they have to explain a binary tree in FORTRAN twenty five years ago (the horror!), then in Pascal fifteen years ago, and now in Java. Why do they have to know OO programming? What's wrong with good old procedural programming?

The paper that Joel cites in his rant is typical of this. The committee that was formed (mostly profs. with an interest in teaching programming), want to find some subset of the Java API they can use to teach year after year after year. They want to ignore the fast-paced changes of the language so that teachers don't have to keep up with what's going on in the language. In fact, as they see it, things are getting worse. Languages are revised every year. Did Pascal really change once it got popular? I don't think so.

The reason they switched to Java was because C++ is an extremely difficult language to learn in its full extent. Try to read a book by Alexandrescu on templates, or explain to me what an abstract base class is, and you begin to appreciate just what a mess the language can be. You can easily create problems that can stump even Bjarne Stroustrup. Java promised to hide a lot of that ugliness away, and to a great extent, a lot of people are happy about it. In particular, teachers are happy about it.

I've heard a counter-retort. If ease of language is so important, then why not teach Visual Basic? OK, so people teaching programming realize that learning Visual Basic would take so many kids out of the market for real jobs. The reality is that most kids who learn to program are going to do best in exactly the language they were taught. And the "tougher" that language is, the better their chances of working in the real world. Yet, if it's too tough, then even the teachers struggle. And let's face it, Visual Basic is Microsoft, and people in academia loathe Microsoft.

OK, then why not teach a semi-respectable language like Scheme? Its syntax is far simpler than Visual Basic and Java, yet it has powerful features like closures and continuations. Yet, the world doesn't do much programming in Scheme, so kids who learn it, would have to learn Java and C#, and the change is so great for the average programmer, that it would be a disaster. It's no surprise then, that powerhouse universities like MIT, which attract kids that already program well in Java or what have you, are the ones teaching Scheme.

The point is that academia wants to teach a language that is easy enough for them to understand, without the messiness that students generally don't want to deal with, yet, not be so easy as to not be able to hit important concepts, in particular, object oriented programming.

And there are plenty of profs in the world that think OO programming is just a fad.

This leads me to a phrase that Microsoft espouses. They seek folks with a passion for technology. This means a willingness to embrace the new, to keep up with changes, much like fans of Hollywood want to know the latest dirt on Angelina Jolie or Brad Pitt or Ben Affleck. Programming has become the flavor du jour. The hot topics (outside of MS land) is Ajax and Ruby on Rails. There are people pushing programming with mock objects. People who talk up Spring and Hibernate. People who think Groovy is groovy! Three years ago, these topics weren't on anyone's breath. Three years from now, they won't be on anyone's breath (or they may be!).

Academics want the ability to recycle their notes for the next ten years. I have a colleague who learned to teach programming in the eighties and early nineties. The big ideas then were "top-down programming". Now the buzz words are design patterns and unit testing. Newer buzzwords include mock object programming. Extreme programming. UML. XML. This teacher might well wonder what all this faddish stuff is all about, and why she has to learn it to teach programming. She doesn't realize that programming is like following the hot movie stars. Each year, something new comes out.

And somehow, someway, knowing about recursion and pointers is somehow equated to wanting to learn about all these technologies, and much of that, on your own time. There's a big difference between the ability to learn something, and the desire to learn something. Many academics find it a complete waste of their time to learn a new programming language. Those that have graduated more recently have begun to embrace these changes because they are in the midst of a kind of education revolution that is now taking place beyond the confines of academia.

Do you think kids who program now who know about CSS or RSS or Ajax or PHP or MySQL learned it in class? Come on. Kids these days who dabble in programming now pick this stuff on their own, while professors scoff at this as some kind of fad that's not really computer science.

And to some extent they are right.

If you're a C coder, do you have to know about Jar files and classpaths or Ant? Why did Java programmers have to learn about this, and why is it given such short shrift in many intro books? Because a lot of these Java books were written by C coders who didn't know they had to worry about classpaths.

And to be honest, why does knowledge of classpaths make you any smarter? How does it make you a better programmer? The flat answer is that it doesn't. But the followup answer is that you better know this stuff if you intend to code in Java. And so you have book writers and academics who throw their arms up in the air lamenting "Why am I forced to care about these things when I didn't have to care about them before?" and designers who say "Look at what wonderful things you can do now that you couldn't do before!", and academics who retort "but I don't care to learn that stuff, it's not a new proof, it's not profound, it's just technology!".

So I say that academics have it wrong and Joel has it wrong. Academics don't understand that the computer industry, by its very nature, is faddish, and that to be successful in this industry means having to follow fads all the time, every year. And teachers who are uncomfortable with the idea that what they know now and what they teach now will become, at least, partly obsolete (the basics of algortithms and data structures will remain the same, but how you design and structure code, plus the tools that allow you to build, test, and deploy code, plus tracking bugs, will change and change and change).

Ask an academic in computer science if they are going to teach version control and unit testing and regression testing and how to create and merge branches and use the debugger and so forth, and they'll say "What does that have to do with programming?". What they don't realize is that these tools are now part and parcel of the way code is developed today, and this may change again in five years time.

Knowing recursion and pointers isn't going to guarantee the people Joel finds will want to learn the newest technology and to embrace it. They may say, let's just code in Scheme, and be done with it, once and for all, and that simply isn't enough (though people like Paul Graham would disagree and say that we should code in Scheme or Lisp, and, yes, be done with all this other crap languages).

It's no wonder some people give up on programming, tired of chasing the latest technology, tired of realizing Unicode is not just a single coding standard but a family of coding standard, tired of dealing with RSS, tired of learning yet another tool that is supposed to make them more productive. They want to step off this road to nowhere, and learn a trade, get good at it, and stick to that same skill set the rest of their lives.

1 comment:

Anonymous said...

I just finished reading and enjoying your blog post / answer to Joel’s rant on why not using Java for college students is the best way to go. I come from a medical background and found many of your arguments similar to what I encountered when first entering the field. My education for laboratory clinical testing/result reporting was very theoretical in nature and really did help with on the job performance but only when there was a difficult problem to solve that actually required some thinking. That is, 90% of the education did not help with my actual job because of the manufacture automation that goes on in the work place and how a technician well versed on a particular analyzer knows the ins and outs that make him/her a very productive technician. The theoretical education comes into play when you get a critical sodium result on a patient and it doesn’t follow the normal elevated electrolytes that should also be elevated but yet know enough to call the ER or clinic that collected the sample and ask if they remembered to draw from below the IV line and to check. This really could mean life or death for the patient because normally the attending physician will prescribe the necessary drug to counter what he or she believes is going on with the patient. Sorry for being long winded but my point being that your argument about how most code being written and managed in today’s industry may be far and away from what the current college CS professor believes is “real” computer science. What is taught is not what is necessarily used in practice and the underlying theory and knowledge is in fact important – regardless if those concepts are covered in Scheme, LISP, Java, C++, COBOL etc…