[Stackless] Stackless based replacement

Arnar Birgisson arnarbi at gmail.com
Fri Oct 3 19:04:13 CEST 2008


Hi Larry,

On Fri, Oct 3, 2008 at 17:45, Larry Dickson <ldickson at cuttedge.com> wrote:
> Am I following io_operation right: the callback is an externally spawned (?)
> process which blocks waiting on the real IO operation but does not take up a
> slot in the Stackless round robin?

No, see below.

> And event.notify_when_ready_for_op returns instantly even if not ready?

Yes, these always return immediately, hence the asynchrony.

> If so, then libevent seems to introduce
> a layer of multitasking that is not under the Stackless umbrella.

Yes, in a way. libevent has nothing to do with Stackless actually, it
is simply a library that wraps asynchronous mechanisms of various
platforms - presenting them as a unified interface and using the best
underlying mechanism available. In essence, all that libevent provides
is this:

Set up an read (or write) event on a file descriptor, such that
whenever a read (or write) on that file descriptor will succeed
immediately (i.e not block) - a callback is invoked.

This is the essence of asynchronous mechanisms, and yes - this can,
and is, certainly used to construct multitasking layers.

Now, for your first question. There is no process spawned for the
callback. I simply say to libevent: "When this file-descriptor is
ready for reading (or writing), call this callback". Libevent sets up
the event and returns immediately, after which I do a receive on a
channel. This naturally blocks the tasklet in question.

After the FD becomes ready, in the first round the dispatcher tasklet
runs, libevent will invoke the callback. The callback then simply
performs the I/O operation, knowing that it will not block, and sends
the result on the channel. This makes the tasklet that requested the
read runnable with the result of the IO operation passed over the
channel.

All of this happens inside just one thread.

> The half-busy loop (with the time.sleep(0.0001)) is not necessary if the
> blocking select is used when no tasklets can run.

Hmm. What if there are other I/O operations in the queue, waiting for
a notification? In other words, what happens in the following
scenario:

There are two tasklets running, A and B.

1. Tasket A requests to read from file descriptor 1, performing a
non-blocking select (since tasklet B is also runnable).

2. Tasklet B then requests to read from file descriptor 2, now
performing a blocking select since there are not other runnable
tasklets.

Now, the process is blocked, waiting for FD 2 to become readable. But
it so happens that FD 2 is actually a network device and it won't
become readable until after several hundred milliseconds. FD 1 however
is a memory-mapped file and becomes readable within a few milliseconds
or less. The process will not be resumed until FD 2 becomes readable,
because that's the one we did the blocking select on.

In my mind, mixing blocking and non-blocking is generally not a good
idea - except you may allow yourself one blocking operation, namely
sleep(0.0001).

> The 1 ms and 2.5 ms were determined by experiment (you just loop on a usleep
> or nanosleep of some tiny positive amount - this always waits one tick, at
> least in C/Linux). This is obviously motherboard-dependent, and the newer
> motherboard had the slower response. I suspect interrupt response in general
> is getting sluggish, and they are afraid of a pileup of event code chained
> to the timer tick.

I did the following experiment in a Python interpreter on an otherwise
busy machine, a Macbook running OS-X 10.5.5:

>>> import time
>>> def timeit(f):
...     t0 = time.time()
...     f()
...     delta = time.time() - t0
...     print "Function executed in %.4f seconds" % delta
...
>>> def sleep10000():
...     i = 10000
...     while i > 0:
...         time.sleep(0.0001)
...         i -= 1
...
>>> timeit(sleep10000)
Function executed in 1.6180 seconds
>>>
>>> def sleep100000():
...     i = 100000
...     while i > 0:
...         time.sleep(0.00001)
...         i -= 1
...
>>> timeit(sleep100000)
Function executed in 1.9185 seconds

This shows that sleeping 10,000 times for 100 nanoseconds takes ~1.6
seconds, and sleeping 100,000 times for 10 nanoseconds takes ~1.9
seconds. I think the extra 0.6 and 0.9 seconds are not unreasonable
times for the overhead of decrementing i and doing the while-loop
test, for Python.

In other words, sleeping for 10 nanoseconds works just fine. For fun,
lets try 1 nanosecond:

>>> def sleep1000000():
...     i = 1000000
...     while i > 0:
...         time.sleep(0.000001)
...         i -= 1
...
>>> timeit(sleep1000000)
Function executed in 11.5436 seconds

Ah, that does not look right - indeed it might be sleeping for longer
than 1 nanosecond.

Did you possibly mean nanosecond and not milliseconds when you cited
the 2.5 number? I am quite willing to believe you if that's the case
:)

cheers,
Arnar




More information about the Stackless mailing list