Many moons ago I wrote a short tutorial on pthreads. It was adopted (with permission) by a number of universities to provide a brief introduction for their students doing pthread based coursework. It strikes me as bizzare, but the tutorial remains the top google hit for 'pthreads tutorial'. Consequently I periodically get requests for help with threading issues.
When I wrote the tutorial I was still enamoured with threads, an infatuation that didn't last very long :). Since then, my experience working with pthreads, auto-parallelising compilers, java, erlang, kowari, and 10 years of professional programming, have fundamentally changed my perspective on how to approach concurrency.
- Concurrency bad
- See 1
- Shared-memory concurrency very very bad
- see 3
- Process isolation good
- Message-Passing/Rendevous good
- Co-routine/Continuation good
- Locks are bad - you've accepted all the pain and suffering associated with shared-memory concurrency, and now you're going to SERIALISE operations!!
Assuming you are stuck using shared-memory threads, what do you do now? Well first hope like hell you have a garbage collector. If you don't it is going to hurt enough I would suggest you give up now and go back and reconsider the reasons why you discounted the other options above -- anything is better than this. If you do have a garbage collector, start reading up on persistent datastructures.
Persistent datastructures are very important in safe efficient thread-based design because they don't require locking. They are called persistant because they are immutable. Once you have a reference to a structure that structure is guaranteed const. What this means is that any method that would mutate an empherial (ie, what you are used to) structure returns a new structure.
As an example consider a list
x = [] | x = null y = x.prepend(1) | y = { 1, x } z = y.prepend(2) { z = { 2, y }Now if you need the rest of your program to see the updated list you use some sort of reference cell. The simplest is just a one element queue. This is a classic reader-writer problem; except you don't have to worry about evicting readers before the writer does his thing.
RefCellNote there is almost no contention for readers, no book-keeping, and once a reader has a reference to data, all further access to the structure is lock-free.: init(T data); // Initialises the cell with data. All subsequent access to data must originate with this cell. T read(); // Returns the current value of data (internally access is protected by the 'access lock'). T writeLock(); // Obtains the 'write lock' (or blocks), and returns the current value of data. unLock(T data); // Updates the current value of data under the 'access lock', and releases the 'write lock'.
However take another look at it. Step back and think about what this is and you will realise that what we have written here is a transaction based datastore. What this means is that you can now go examine all your transaction management theory and introduce multiple-writers, lock-stealing, etc, et al. Now as long as T is persistent, you are atomic, consistent, and isolated. If you ensure unLock flushes the structure to disk you now have an ACID datastore (surprised? :). Add consistency checks to the mutation operations and you can call it a database :).
Granted, without a garbage collector, memory management is a bitch. The standard reference on persistent structures is Chris Okasaki. A quick google will get you a fair way; however citeseer is the motherlode. If you have access to a university library, I would lookup his book (Purely Functional Data Structures), which will save you alot of searching on citeseer :).
Another thing worth noting is that it is worth noting is that once you leave the ephemeral space, you need to start making decisions beyond the simple "Vector/List/Hash/Tree". There are alot more tradeoffs to consider. Do you really need the efficient random-access of a vector, or are you actually using it as a queue, or dqueue? A persistent queue can be implemented with O(1) amortized operations, but the best I have been able to find (quickly) for a true 'vector' is log(i) access (i = index) [citeseer]. Just as importantly, a persistant queue is very simple, almost identical in fact to an efficient ephemeral queue; whereas a persistant random-access-list much more complicated than a vector.
1 comment:
My sources tell me that you probably want to read the paper referred to near the bottom of this report from PLDI 2005.
I found a PDF copy of the paper; it is
Threads Cannot be Implemented as a Library by Boehm.
Post a Comment