Wednesday, June 30, 2004

Programming Language Pragmatics

Programming Language Pragmatics
Michael Scott; Morgan Kaufmann; 1-55860-442-1

I recently finished reading this and thought I should provide a review as it is a truely remarkable book. Fundamentally the book is an informal survey of the various fields of computer science and engineering that impact the design and implementation of programming languages, their interpreters, and compilers --- and it succeeds admirably. Specifically the chapters include:

  1. Introduction
  2. Programming Language Syntax. (FA/PDA's; regular expressions; context-free grammars; LL/LR parsers, including one of the clearest descriptions of LR parsing I have come across.)
  3. Names, Scopes, and Bindings. (Value lifetime; garbage collection; different scoping rules; different implementation approaches to scoping.)
  4. Semantic Analysis. (Attribute grammars; attribute flow. While well written, I personally found this one of the weaker chapters.)
  5. Assembly-Level Computer Architecture. (Effectively a one chapter synopsis of Hennersey and Patterson.)
  6. Control Flow. (Selection; Iteration; Recursion; and Non-determinisim)
  7. Data Types. (Static and Dynamic checking; discussion of implementation; A short discussion of HM-style type inference; as well as discussions on the particular issues associated with Records, Arrays, Pointers, Lists, and IO.)
  8. Subroutines and Control Abstraction. (Calling sequences and conventions; call-stack/tree management; exceptions - including continuations; coroutines.)
  9. Building a running program. (Compiler backends; intermediate code; register allocation; linking - including dynamic linking and pic.)
  10. Data Abstraction and Object Orientation. (OOP, including semantics and implementation concerns of encapsulation; inheritance; finalisation; polymorphism. All discusses with reference to a multitude of different languages and approaches to OOP).
  11. Nonimperative Programming Models: Functional and Logic Languages. (Split into two sections, one on functional and one on logic programming. Of particular interest are introductions to both the lambda and first-order predicate calculi; and discussions on the advantages these mathmatical groundings provide programmers).
  12. Concurrency. (Background; Theory; Shared-memory; and Message-Passing. A very impressive introduction to concurrency.)
  13. Code Improvement. (By necessity of space, a very limited introduction to code optimisation. Still it manages to cover a reasonable amount of ground, although I would have liked more discussion on cache optimisation techniques. Peephole Optimisation; Global Redundancy; Data Flow Analysis; Loop Invariants and Unrolling; Instruction Scheduling; Register Allocation).
Given the breadth of the coverage, the reader will require additional reading to gain any appreciable capacity in any specific area. However this book does provide a very reasonable introduction to areas that will provide much needed context when approaching any more focused work. It's not going to replace the Dragon Book, but it will make the Dragon Book far more approachable, and I wish I had read PLP before I had to wade my way through the much less approachable prose of Aho, Sethi and Ullman. Similarly, the reader is still going to want to read Patterson and Hennersey. However the discussions of language semantics and their associated demands on computer architecture will answer many questions of motivation given short treatment in the more advanced work.

In conclusion, I would highly recommend this to any programmer who has experience in only one or two languages; any undergraduate wanting to broaden their understanding of programming; any professional wanting to learn more about compiler/interpreter/parser theory and implementation and wants context for the more focuses works in this field.

5-stars.

Tuesday, June 29, 2004

Domain Specific Languages

I came across this article on Lambda by Simon Johnston on DSL's. He takes two examples from the world of art, and asks that we consider possible parallels in software.
1. My son and I recently read a book on Leonardo da Vinci (great link), in particular looking at the way Leonardo’s work contained a great body of work from very rough sketches to beautiful and complete works of art. Explaining how artists start with rough pencil sketches, refining the lines, the perspective and then move onto oil to complete was a particularly interesting discussion. 2. I know there are many analogies that we in computer science draw between our world and that of construction – here’s another. Look at how buildings are really constructed, the architect does not build blue prints, they draw or make a model of the envisioned building (some of these drawings have become as well known as the actual buildings themselves). For example, Frank Lloyd Wright’s Fallingwater started with a truly beautiful drawing that sold the client. Then followed floor plans and blueprints. Only then did wiring diagrams, plumbing details and specific engineering drawings for features such as the cantilevered balconies complete the story.
Simon then goes on to state his position:
My position is that the creation of domain specific languages that do not seamlessly support the ability to transform information along the refinement scale are not helpful to us. So, for example, a component designer that is a stand alone tool unconnected to the class designer that provides the next logical level of refinement (classes being used to construct components) is a pot hole in the road from concept to actual implementation. Now, this is not as I have said to indicate that domain specific languages are bad, just that many of us in this industry love to create new languages be they graphical, textual or conceptual. We have to beware of the tendency to build these disjoint languages that force the user to keep stopping and jumping across another gap.
I couldn't read this and not immediately be struck by Paul Graham's reflections on creating what became Yahoo Store, specifically his comments regarding lisp. Writing this it also occurred to me that RMS's comment in Why you should not use Tcl is also appropriate:
The principal lesson of Emacs is that a language for extensions should not be a mere "extension language". It should be a real programming language, designed for writing and maintaining substantial programs. Because people will want to do that! Extensions are often large, complex programs in their own right, and the people who write them deserve the same facilities that other programmers rely on.
...which can only apply doubly for applications written in a DSL.

It also brings to mind the miriad of embedded dsl's spawned by the Haskell community though their wholesale adoption and exploitation of monads (and more recently arrows).

It only reinforces my increasing belief that the biggest gap in many popular modern languages is the lack of syntactic closure; the ability to introduce new syntatic support for a feature that becomes a first-class citizen of your programming environment.

Monday, June 28, 2004

...but at least it's done.

In the original, memory based, version of tks queryies returned answers; and answers had a method getRowCount() which did exactly that. The problem with that is that memory dosn't scale as well as disk does. So we moved to a disk backed architecture. Now when you are memory based, you want to release your memory as soon as possible, so you want to resolve (and subsequently release) any intermediate results immediately. Once you are backed by disk, you have much better uses for memory than storing an answer the user isn't using yet, so you want to evaluate your query as lazilly as possible. Hence Kowari/TKS has this concept of a Tuples, which is a thunked, localised, intermediate result. This gets wrapped in an Answer object, which is returned to the user, and resolves the Tuples by need.

The problem is that calculating an accurate row count becomes a potentially expensive operation; you can generally only afford an upper-bound instead. So unfortunately while the semantics changed, the name stayed the same, to the general confusion of all.

Subsequentely I have spent the past 2.5 days reverting the semantics, and introducing new methods to access the upper-bound (and cardinality). Tedious work, but at least it's done.

Now on to getting resolvers to play nicely with transactions.

"So, what do you do for a living?"

Like most people I am forever being asked to explain what I do for a living. Answering that question got harder when I started my current job.

My last job was with a company called Braintree, who make world class communication gateways --- mainly for the eftpos and financial industry. So I could always answer "I help design boxes to connect old EFTPOS machines to the bank using new technologies". It glosses over a lot, is largely incomplete, but remains something most people could comprehend. The worst job I've had to try and explain was my first. "I am writing the user-interface to an Electron Paramagnetic Resonance simulation"? I've seen geeks eyes glaze over on that one :).

So far for my current job I've had to revert to the classic "I'm a programmer in an IT company", which is sufficiently devoid of information to be mutually unsatisfying. So what am I currently working on? I am employed by Tucana Technologies to work on an open-source RDF database Kowari. Kowari is used by Tucana as the base on which we build our enterprise rdf-datastore TKS. OTOH, if having difficuty explaining my job is the price to pay for having this much fun I suppose it's worth it. Tucana is a great company to work for, and Kowari is really fun engineering. Still, if anyone can think of a one or two sentence description that isn't quite as insipid I would appreciate it.

Friday, June 25, 2004

Once more with feeling...

Well not an auspicious start for blogger. I posted a short note last night, mainly to avoid leaving an empty blog. Somehow blogger has managed to lose it.

Let us start with my morning reading:

  • Lambda The Ultimate - Probably the premiere blog on the topic of computer languages; their design, semantics, and implementation.
  • Squawks of the Parrot - The blog of Dan Suglaski, the lead developer on parrot --- the new perl-6 vm.
  • About Kim - Another blog. Kim's posts invariably betray a striking intelligence; are always thought provoking; and periodically on programming language topics.
  • Groklaw - Because even geeks need soap-opera sometimes.
  • Irregular Web Comic - There are other web comics I enjoy, but this has rapidly become my favourate.