Friday, December 23, 2005

Humourous post on ocaml-list

Ok so the fact that I found this hilarious probably suggests I need my head examined... I still thought I'd share it though.

From: hars@bik-gmbh.de
Subject: Re: [Caml-list] Camlp4 in a Unix pipe
Message-Id: <4383605A.7030309@bik-gmbh.de>

Alessandro Baretta wrote:
Cool command line! How did you discover the '-' option? 

Out of habit? In the the unix shell monad, "foo >>= bar" happens to be written
as "foo | bar -" ("return" is written "cat", in case you wondered. So "Hey,
that is a useless use of cat!" really means 'Hey, you ignored the monad laws!")

Yours, Florian (time to go home, I know).
So yeah at this point I probably have to conceed I'm strange :).

Friday, December 16, 2005

Alternative Logics

Found via Google Blogs: A Neighbourhood of Infinity: Alternative Logic.

An interesting read.

Google Blogging and an interesting assertion.

Read Paul's post on the latest addition to Google Labs. Installed Google Blogger, and WOW this is cool. Naturally first stop was to check out etymon (disappointing), and second to visit sites I visit regularly. Stumbled upon this quote recounted by Martin Fowler; would love to see some measurements.

While I'm on the topic of concurrency I should mention my far too brief chat with Doug Lea. He commented that multi-threaded Java these days far outperforms C, due to the memory management and a garbage collector. If I recall correctly he said "only 12 times faster than C means you haven't started optimizing".

Tuesday, December 13, 2005

Reading List Reprise

When I started this blog I posted my reading list, having recently finished Types and Programming Languages I thought it was probably time to revisit the reading list.

Essentials of Programming Languages, Daniel Freidman
I finished this early in the year. Unfortunately I never did finish working the examples beyond the first couple of chapters. There is some very interesting material in this book, particularly the last couple of chapters.
Types and Programming Languages, Benjamin Pierce
I recently finished this (back in October), and loved it. This book is illuminating, and it definately achieved its goal. This book is a superb example of how to introduce a reader to a field, and provide the necessary preparation for accessing further works.
Introduction to Graph Theory, Richard J. Trudeau
Read, and yes it was a fun book, but unfortunately did not ever really reach the depth I needed for it to be immediately applicable. What I received from this book was mostly vocab.
Making reliable distributed systems in the presesnce of software errors, Joe Armstrong
I know Greg was annoyed by some trivial errors in some of the example code; however I am very glad I've read this thesis. This is the closest I have seen to practical self-reparing systems. An gentle and easy read, it lays out a practicioners framework for designing reliable systems. Highly recommended.
The Polyadic PI Calculus : A Tutorial, Robin Milner
Never did finish reading this as I ended up borrowing Milner's book and reading it instead. I'm curious to know if there has been any work on the relationship between PI-calculus processes and continuations.
Genuinely Functional User Interfaces, Antony Courtney, Conal Elliott
As I said this is my primary area of interest, and it remains the long-term focus of my efforts. This paper remains a core influence on my interest in Type-Theory, Category Theory, and Language Semantics
Arrows, Robots, and Functional Reactive Programming, Paul Hudak, Henrik Nilsson, Antony Courtney, John Peterson
Opps, that's right I was going to read this one... I'll have to carry this one over to my new reading list...
Other works read worthy of note include:
Purely Functional Data Structures, Chris Okasaki
This book stands alongside my copies of Cormen, Date, Brooks, and K&R without shame. Only a year old and already starting to wear from use. Personally I consider it essential reading for anyone serious about programming. I am seriously considering buying a second copy so I can lend one without losing it from my library. This book is important enough that I will be posting seperately about it.
Other things read and recommended:
  • I reread Fred Brooks, The Mythical Man-Month
  • I reread Kernighan and Pike, The Practice of Programming
  • I'm reading selected chapters of Benjamin Pierce, Advanced Topics in Types and Programming Languages, which is proving to be even better than TAPL due to it's focus on more practical issues (although it does rather assume you have at least a basic grasp of the theory in TAPL :)
Other things of interest include:
  • I'm half way through Joseph Landin, An Introduction to Algebraic Structures
  • Several papers on Core Erlang (I'll track down citations if anyone is interested)
  • Various standards including the RDF stack, and the XMPP RFC's
  • Started Roy Crole, Categories for Types

Monday, December 12, 2005

Found on Lambda
... I'm guessing you won't like my new Redneck Scheme dialect, in which call-with-current-continuation is renamed y'all-come-back-soon-now-y'hear?... Anton van Straaten

Wednesday, December 07, 2005

Kowari insert/select and blank-nodes.

I was asked recently to justify Kowari's decision to maintain blank-node identity across an insert/select command. My response was along the lines of treating kowari as a 4-graph partitionable as rdf-compliant 3-graphs (actually technically a superset as we don't restrict the domains of subjects and predicates as rdf does). At this point the entire 4-graph is within the scope of the existential, and consequently it is valid to retain blank-node identity between the rdf-partitions (kowari models).

However it did send me back to the RDF-SEMANTICS spec again where I found the following in 1.5. Blank Nodes as Existential Variables

This effectively treats all blank nodes as having the same meaning as existentially quantified variables in the RDF graph in which they occur, and which have the scope of the entire graph. In terms of the N-Triples syntax, this amounts to the convention that would place the quantifiers just outside, or at the outer edge of, the N-Triples document corresponding to the graph. This in turn means that there is a subtle but important distinction in meaning between the operation of forming the union of two graphs and that of forming the merge. The simple union of two graphs corresponds to the conjunction ( 'and' ) of all the triples in the graphs, maintaining the identity of any blank nodes which occur in both graphs. This is appropriate when the information in the graphs comes from a single source, or where one is derived from the other by means of some valid inference process, as for example when applying an inference rule to add a triple to a graph. Merging two graphs treats the blank nodes in each graph as being existentially quantified in that graph, so that no blank node from one graph is allowed to stray into the scope of the other graph's surrounding quantifier. This is appropriate when the graphs come from different sources and there is no justification for assuming that a blank node in one refers to the same entity as any blank node in the other.
(emphasis added)

Later in 2. Simple Entailment between RDF graphs we come across the Merging Lemma
Merging lemma. The merge of a set S of RDF graphs is entailed by S, and entails every member of S. [Proof] This means that a set of graphs can be treated as equivalent to its merge, i.e. a single graph, as far as the model theory is concerned. This can be used to simplify the terminology somewhat: for example, the definition of S entails E, above, can be paraphrased by saying that S entails E when every interpretation which satisfies S also satisfies E. The example given in section 1.5 shows that it is not the case, in general, that the simple union of a set of graphs is entailed by the set. The main result for simple RDF inference is:
This is relevant because an alternative view of kowari is as a set of named graphs.

Last time I read this document was when I first started with Tucana, and hadn't heard of RDF before. It is amazing how much more understandable the rdf-semantics document once you've spent some time working with RDF.

Wednesday, August 17, 2005

Persistent Datastructures

Many moons ago I wrote a short tutorial on pthreads. It was adopted (with permission) by a number of universities to provide a brief introduction for their students doing pthread based coursework. It strikes me as bizzare, but the tutorial remains the top google hit for 'pthreads tutorial'. Consequently I periodically get requests for help with threading issues.

When I wrote the tutorial I was still enamoured with threads, an infatuation that didn't last very long :). Since then, my experience working with pthreads, auto-parallelising compilers, java, erlang, kowari, and 10 years of professional programming, have fundamentally changed my perspective on how to approach concurrency.

  1. Concurrency bad
  2. See 1
  3. Shared-memory concurrency very very bad
  4. see 3
However we can't always avoid concurrency so:
  • Process isolation good
  • Message-Passing/Rendevous good
  • Co-routine/Continuation good
Again, sometimes we can't avoid shared-memory concurrency if we want to make good use of some modern cpu architectures so:
  • Locks are bad - you've accepted all the pain and suffering associated with shared-memory concurrency, and now you're going to SERIALISE operations!!

Assuming you are stuck using shared-memory threads, what do you do now? Well first hope like hell you have a garbage collector. If you don't it is going to hurt enough I would suggest you give up now and go back and reconsider the reasons why you discounted the other options above -- anything is better than this. If you do have a garbage collector, start reading up on persistent datastructures.

Persistent datastructures are very important in safe efficient thread-based design because they don't require locking. They are called persistant because they are immutable. Once you have a reference to a structure that structure is guaranteed const. What this means is that any method that would mutate an empherial (ie, what you are used to) structure returns a new structure.

As an example consider a list

x = []              | x = null 
y = x.prepend(1)    | y = { 1, x }
z = y.prepend(2)    { z = { 2, y }
Now if you need the rest of your program to see the updated list you use some sort of reference cell. The simplest is just a one element queue. This is a classic reader-writer problem; except you don't have to worry about evicting readers before the writer does his thing.
RefCell:
  init(T data);   // Initialises the cell with data.  All subsequent access to data must originate with this cell.
  T read();       // Returns the current value of data (internally access is protected by the 'access lock').
  T writeLock();  // Obtains the 'write lock' (or blocks), and returns the current value of data.
  unLock(T data); // Updates the current value of data under the 'access lock', and releases the 'write lock'.
Note there is almost no contention for readers, no book-keeping, and once a reader has a reference to data, all further access to the structure is lock-free.

However take another look at it. Step back and think about what this is and you will realise that what we have written here is a transaction based datastore. What this means is that you can now go examine all your transaction management theory and introduce multiple-writers, lock-stealing, etc, et al. Now as long as T is persistent, you are atomic, consistent, and isolated. If you ensure unLock flushes the structure to disk you now have an ACID datastore (surprised? :). Add consistency checks to the mutation operations and you can call it a database :).

Granted, without a garbage collector, memory management is a bitch. The standard reference on persistent structures is Chris Okasaki. A quick google will get you a fair way; however citeseer is the motherlode. If you have access to a university library, I would lookup his book (Purely Functional Data Structures), which will save you alot of searching on citeseer :).

Another thing worth noting is that it is worth noting is that once you leave the ephemeral space, you need to start making decisions beyond the simple "Vector/List/Hash/Tree". There are alot more tradeoffs to consider. Do you really need the efficient random-access of a vector, or are you actually using it as a queue, or dqueue? A persistent queue can be implemented with O(1) amortized operations, but the best I have been able to find (quickly) for a true 'vector' is log(i) access (i = index) [citeseer]. Just as importantly, a persistant queue is very simple, almost identical in fact to an efficient ephemeral queue; whereas a persistant random-access-list much more complicated than a vector.

Saturday, June 04, 2005

Using Jabber as a communications framework

Adrian Sutton asks about using Jabber as a communications framework for developing a bug-tracking/QA application. Two items immediately spring to mind.

I haven't had any experience with jabber except as a casual user, however these are projects that have caught my eye as things I would like to investigate when/if I ever have time. So Adrian, if you do check them out, let me know how you go :).

Wednesday, May 25, 2005

Silly old lady

There was an old lady who swallowed a bird;
How absurd, to swallow a bird!
She swallowed the bird to catch the spider
That wriggled and jiggled and wiggled inside her.
She swallowed the spider to catch the fly.
But I dunno why she swallowed that fly -
Perhaps she'll die

Thursday, January 13, 2005

Long Time no See

I realise it has been almost two months since I last blogged, I can only plead two excuses for this lapse. The first reason is that I have met an incredible woman and have dedicated a substantial amount of time recently in getting to know her better. The second is that on the 26th of December, Tucana Technologies (my employer) closed its doors. This is an incredible blow as without reservation, this was the best job I have ever had. I can only hope that I will be able to find something half as involving, challanging, and inspiring as my work on Kowari and RDF. Anyway, unless I should suddenly find myself with employment I should be able to address some of the unfinished posts I have lined up for this blog over the next week or two.

On that note, I have spent some time trying to install GHC under OSX 10.2, unfortunately the only install I have been able to find and install thus far is for 10.3. If anyone out there knows of a package or install method for 10.2 it would be much appreciated.

Finally if anyone is interested in employing a Senior Computer Systems Engineer with a strong background in metadata, RDF, and in the use and design of more languages than you can poke a stick at, feel free to contact me on andrae@humbug.org.au; CV available on request.