Programming Language Thoughts

Summary

A lot of programming language advocacy (even when just restricted to procedural/object-oriented languages) seems to revolve around side issues such as syntax and neatness. I believe that language level support for the core data types (strings, arrays/lists, associate arrays), library development and resource management are more important issues that is often overlooked.

Last updated 2006/03/22

History of procedural languages

Summarised from posts in a thread at http://discuss.fogcreek.com/joelonsoftware/default.asp?cmd=show&ixPost=319&ixReplies=105

There always seems to be a lot of arguments about what programming language people think is best. While I believe in 'the right tool for the job', I feel that people often fail to notice some core areas where some languages excel at, or fall down at

As a programmer who went through a computer science course at university, and have worked on commerical code for about 10 years, I have programmed in and been exposed to a fairly common set of languages some in might position might have seen: C, C++, VB, Delphi, Tcl, Perl, Python, ML, Scheme, Prolog, etc. In general, procedural and object oriented languages seem to dominate the majority of software development these days, so most of the arguments below apply to these rather than functional (LISP, Scheme, etc) and logic (prolog) languages.

Historically, most procedural languages seem to be very good at the computational, conditional and looping control aspect of programming (eg. integer/floating point math, if/then statements, while/for loops). However, the early legacy of C and Pascal that abstracted away fairly minimally the underlying CPU/memory hardware has meant that more complex structures were not been built in to these early procedural languages. Programmers were expected to manage memory, strings, arrays, etc closely themselves, using whatever code/libraries they felt they needed to tradeoff the small memory size/cpu speed to optimise for their particular application.

Given that the original memory/CPU constraints are no longer relevant for a lot of programs these days (though embedded programmers, or programmers with huge data sets or strict performance requires would still disagree), it's become reaonsble over the last 20 years to expect more and more libraries to be used to manage these low level structures and details for the programmer.

Garbage collection and resource management

Joel feels that automatic memory management is an important feature required in a language these days. I'd say that memory management is only one part of the 'garbage collection' process, and a language should support some form of general guaranteed finalisation. C++ offers this through stack based objects and destructors, Perl and Python offer these through their reference counted semantics. Interestingly, more languages are planning on moving away from this to non-deterministic garbage collection!

People seem to want full garbage collection semantics, something that reference counting doesn't provide (because you can create loops). While full garbage collection ensures that even looped structures are deallocated, it's still not without problems. Even with one missed reference, you can still get a large 'leak' if that reference references many other large structures, which is why Java still has memory leak detection tools.

The other main problem with full garbage collection, is that you loose guaranteed destruction semantics on objects, something that can be very useful. The C++ idiom of "resource acquisition is initialization" and the corresponding destructor releases the resource is probably one of the most powerful idioms for dealing with all forms of resource acquisition and disposal, not just memory leaks which, given the push to full garbage collection everywhere, is what people seem the most obsessed about.

Examples where you want guaranteed finalisation include the common examples of closing of a file at the end of a function/method, or releasing a mutex/semaphore at the end of a function/method. The second one is especially commonly done in C++ thread libraries as a way to ensure that any exception thrown inside a method doesn't leave a locked mutex lying around.

Still, full garbage collection semantics doesn't mean that you can't have guaranteed finalisation code. It's possible to have garbage collected memory, and still have guaranteed finalisation through language semantics. Perl 6 is looking at having a POST block which is always executed at the end of a block. Unfortunately, almost no other language seems to be enabling something like this.

It's possible that it would be a very useful idiom to allow "enter scope" and "exit scope" actions to be placed together. Enter scope actions are commonly just code at that particular point, but being able to specify at the same point something that must occur when the same stack point is unwound could be a really useful feature.

  function do_something() {
    mutex.acquire();
    onexit { mutex.release(); }
    
  }

Strings, lists, hashes

Along with resource aquisition and disposal however, I'd add at least 3 main structures that a language should support intrinsically in an efficient and well planned way:

  1. Strings
  2. Lists
  3. Hashes/maps/associative arrays/dictionaries/or whatever your language calls them (I'll call them maps)

Historically, C was probably the worst for each of these, especially 3 which is non-existant. Most scripting languages do well at all of these, to lesser and greater degrees. C++ worked on the principle that rather than putting anything in the language per-se, the language should be powerful enough to create a library to do whatever you want. This seems to have worked reasonably well since the STL and boost libraries created some really powerful features that do all 3 of the above, with your choice of space/complexity tradeoff. Lisp derivatives go so far as to make both code and data just a list. Surprisingly even modern languages like Java seemed relatively poor on these, such as having to use non-dynamic arrays, or using boxing/unboxing vectors. Only the recent generics support has made this a lot better.

Whenever I use VB, it's the poor support for 2 & 3 above that really annoys me. Also I'm not sure if it's still true, but I remember reading that string concatenation was on O(N^2) operation under VB meaning item 1 was of suspect usefulness as well. VB array support is horrible, and maps were an additional component, rather than an inbuilt language feature.

Map and list interoperability is a particularly interesting issue. All maps allow you to iterate over the keys or key-value pairs of the map. Whether this uses an iterator, or returns some list of items can seems to depend on whether the designer is more comfortable with a procedural or functional programming system.

Library support

While libraries have been around since almost programming started, their importance seems to continue to increase more and more. Many early C programmers seemed to have a "roll your own" attitude to most things, which harks back to the "it's not quite optimised for what I need" syndrome. These days a language can live or die by it's library support.

One of the core features of a language is the ability to create libraries and use them. This has been one of C++'s greatest strengths, and weaknesses. Because for quite some time there was no standard string class/array class/map class, dozens of libraries and users created their own. This created the unfortunate problem that then trying to build additional higher-level libraries that used these core structures was hard to do, because which did you use? The result was lots of libraries that still just used char * pointers for strings, and ..., int length, t_type * items, for arrays. While mostly fine for input data, this still creates problems for methods that want to return data and results in annoying "call once to get the size, call again to get the data" type interfaces that abound in C libraries.

Even with the STL that created a powerful standard library, a lack of a common implementation, bugs in different implementations, a change in the standard (#include vs #include and the way template compilation occurs, it was basically impossible to create a binary linkable C++ library. In this respect, the idea of a .dll/.so with an interface definition that you can compile against and then link to has never really been possible with C++ and the STL.

Libraries for other modern languages seem to fall into two main categories; loose community supplied libraries and highly structured centralised hierarchical class libraries.

When Perl 5 was introduced, it included a fairly simple packaging and library system, and the CPAN community site where libraries could be uploaded, indexed, searched and installed from. This has been immensely popular, and other scripting languages have tried to emulate it (eg PEAR for PHP). As perl is interpreted, all the libraries/modules are provided as source and loaded as runtime as used. Perl also provided a simple documentation format (perldoc) that encourages a certain comment style in code, and a certain layout. The most obviously visible aspect of this is the SYNOPSIS section at the top of most perl modules that gives a 1 page introduction on how to immediately use the module in code. In over 50% of cases I've found this lets you pretty much use the module straight away without having to read the full documentation list.

.Net and Java have included large libraries from their respective creators which are regarded as part of the language. These are usually pretty comprehensive wih regard to file, io, thread, container, etc.

Misc comments on VB

On VB. Although I complained about the lack of lists/maps above, I have to say that there are some things I really like about VB.

  1. Building a straight forward Windows UI in VB is nice and easy (yes, Delphi is good in this respect to). The idea of using 'packers' and 'layout engines' (Tk/Java) is sometimes neat because it allows easy resizing of windows and the like, but generally it's a bit of a pain when you just know what you want to stick down and where.
  2. The COM integration is great. Being able to access all sorts of external libraries really easily and in a 'Do What I Mean' way is really powerful. This is especially true with VBA. I've created numerous (mostly smallish) applications using Access and Excel as basically front ends with powerful objects to build my code logic around. One place where this falls down though is actually trying to use both of them at once. Using DAO/ADO (basically the Access data engine) from Excel works fine, but trying to use an Excel object from Access seems to randomly freeze up and cause all sorts of problems. At least last time I tried (which was Office 2k)
  3. Variants _and_ strict types. Sometimes you just don't care about the data type being used and want 'Do What I Mean' semantics'. This is often the case when dealing abstractly with DBs or data you don't know about. On the other hand, sometimes you do care about the type, because your about to create a giant array of Ints and want it stored efficiently. For most compiled languages, it's strict typing only. For most scripting languages it's always weak typing. With VB I get to choose my time/space/easiness tradeoff for myself. And it's all at the language level so I don't have to worry about calling helper methods, wrapper classes, casting functions, etc. Yay!

Language syntax

I think most arguments are generally pretty much hot air. {} vs keywords vs indenting are all relatively minor points in comparison to core data structure support in my mind. Of course whole different paradigms like functional and logic languages are another kettle of fish that you can argue on completely different grounds. I wish I had some time to try out OCaml and Haskell which look really interesting.

However, I must admit to having a bias here. Every time I used Perl, I used to be really flustered by the 'noise' level of the language. Over time though, I've grown to really like it, as well as Larry's design ideas of 'huffman coding' a language. Basically the idea is that the most common things you want to do should be easy, and hard things not impossible. This extends down to how much typing should be needed for basic language features. This thinking manifests itself all through perl, including really deep support for strings, lists and maps.

Many people comment on Perl as a 'write once' language. I think any language can be, but sometimes people confuse 'noise' with 'conciseness'. Do you prefer reading legalise writing like "the aim, not withstanding other obstacles to the successful completion of the exercise, is to increase the general vibrational energy of the H2O molecules present in the ceramic container, or in other words, to create a strong level of Brownian motion", or do you prefer "heat the cup of tea". I see Perl as the second, and some other languages as the first. Of course it's a bit different to that, because from someone else's perspective the second statement "heat the cup of tea" might be written in Russian, so it's actually harder to understand then the long winded version...