What the Arduino Software Needs
The Arduino software has a very "one job" perspective. For instance, if you want to read an A/D value, this could take about 100us. Sure, that isn't much, unless you also want to take an ultrasonic reading. Most timing in the system simply spins and waits. You should be able to do something else whilst you are just waiting.
Take, for instance, reading the A/D converter. The code looks like this:
        // ADSC is cleared when the conversion finishes
        while (bit_is_set(ADCSRA, ADSC));
The Arduino code just waits for the conversion to happen. What it we added a "yield()?" That snip-it of code would look like this:
        // ADSC is cleared when the conversion finishes
        while (bit_is_set(ADCSRA, ADSC))
                yield();
Adding this line, allows you to get called back during A/D operations. When you have some flexibility in the "exactness" of timing, you can do stuff while you are waiting.
Then you create a "yield()" function that you call during long operations. For all practical purposes, you are just waiting.
Sound's good? Sure it does, but there is a catch. Unless you want to implement a whole "green thread" system, you have to think about tasks differently. Rather than view them as a "loop" you have to view then as state machines. Each call back, does something trivial.
(Green threads are cooperative threads where each green thread has its own stack and state block for registers. They can be implemented with setjmp/longjmp and some inline assembler, like the old posix threads, but IMHO there are too heavy for an Arduino.)
to be continued....

State Machines
I am testing now and will be writing soon.
The Arduino is a slow processor, true, but if you are mainly using it for I/O and control, there is plenty of head-room to do more with it. The problem is the single task paradigm. It is difficult to structure multiple logical tasks.
I created a class called "yieldtask" that will run multiple state machine tasks during yields. A state machine breaks down a task in to a number of successive states, where each state is logically stand-alone functionality. Right now the deadman timer and the ultrasonic range finding code work off it. I have a patch for the Arduino 1.0.1 code to insert yield() in the appropriate places.
Stay tuned!
tasks, etc
The callback approach is interesting. Eager to read the next article.
Arduino seems limited once you start pushing the limits in timing. I feel like that's well beyond its intended use.
By Green Thread I take it you mean a cooperative multitasking approach?
Michael