[Stackless] Stackless based replacement

Fri Oct 3 19:52:35 CEST 2008

Hi Arnar,

A few questions below...

On 10/3/08, Arnar Birgisson <arnarbi at gmail.com> wrote:
>
> Hi Larry,
>
> On Fri, Oct 3, 2008 at 17:45, Larry Dickson <ldickson at cuttedge.com> wrote:
> > Am I following io_operation right: the callback is an externally spawned
> (?)
> > process which blocks waiting on the real IO operation but does not take
> up a
> > slot in the Stackless round robin?
>
> No, see below.
>
> > And event.notify_when_ready_for_op returns instantly even if not ready?
>
> Yes, these always return immediately, hence the asynchrony.
>
> > If so, then libevent seems to introduce
> > a layer of multitasking that is not under the Stackless umbrella.
>
> Yes, in a way. libevent has nothing to do with Stackless actually, it
> is simply a library that wraps asynchronous mechanisms of various
> platforms - presenting them as a unified interface and using the best
> underlying mechanism available. In essence, all that libevent provides
> is this:
>
> Set up an read (or write) event on a file descriptor, such that
> whenever a read (or write) on that file descriptor will succeed
> immediately (i.e not block) - a callback is invoked.
>
> This is the essence of asynchronous mechanisms, and yes - this can,
> and is, certainly used to construct multitasking layers.
>
> Now, for your first question. There is no process spawned for the
> callback. I simply say to libevent: "When this file-descriptor is
> ready for reading (or writing), call this callback". Libevent sets up
> the event and returns immediately, after which I do a receive on a
> channel. This naturally blocks the tasklet in question.
>
> After the FD becomes ready, in the first round the dispatcher tasklet
> runs, libevent will invoke the callback.

Two questions:

(1) Why does it wait for the first round the dispatcher tasklet runs? Why
doesn't the callback just get chained off the interrupt that responds to the
readiness of the IO? The main tasklet got PAST the call to
event.notify_when_ready_for_op and is blocked on the receive, so is there
another tasklet that just keeps trying for readiness every time the round
robin comes back?

(2) If libevent has nothing to do with Stackless, how can its callback
perform a Stackless function (the channel send)?

 The callback then simply
> performs the I/O operation, knowing that it will not block, and sends
> the result on the channel. This makes the tasklet that requested the
> read runnable with the result of the IO operation passed over the
> channel.
>
> All of this happens inside just one thread.
>
> > The half-busy loop (with the time.sleep(0.0001)) is not necessary if the
> > blocking select is used when no tasklets can run.
>
> Hmm. What if there are other I/O operations in the queue, waiting for
> a notification? In other words, what happens in the following
> scenario:
>
> There are two tasklets running, A and B.
>
> 1. Tasket A requests to read from file descriptor 1, performing a
> non-blocking select (since tasklet B is also runnable).
>
> 2. Tasklet B then requests to read from file descriptor 2, now
> performing a blocking select since there are not other runnable
> tasklets.
>
> Now, the process is blocked, waiting for FD 2 to become readable. But
> it so happens that FD 2 is actually a network device and it won't
> become readable until after several hundred milliseconds. FD 1 however
> is a memory-mapped file and becomes readable within a few milliseconds
> or less. The process will not be resumed until FD 2 becomes readable,
> because that's the one we did the blocking select on.
>
> In my mind, mixing blocking and non-blocking is generally not a good
> idea - except you may allow yourself one blocking operation, namely
> sleep(0.0001).

A non-blocking select is just a blocking select that always happens to be
ready (because of the 0 timeout). The blocking select does not have a
timeout (unless that is explicitly coded), but it selects on ALL the
currently outstanding IOs. Note that a select does not actually do the IO,
but it returns when at least one of them is ready. In your case, the
blocking select on both FD1 and FD2 would be won by FD1, and Tasklet A would
wake up able to read the memory-mapped file immediately.

> The 1 ms and 2.5 ms were determined by experiment (you just loop on a
> usleep
> > or nanosleep of some tiny positive amount - this always waits one tick,
> at
> > least in C/Linux). This is obviously motherboard-dependent, and the newer
> > motherboard had the slower response. I suspect interrupt response in
> general
> > is getting sluggish, and they are afraid of a pileup of event code
> chained
> > to the timer tick.
>
> I did the following experiment in a Python interpreter on an otherwise
> busy machine, a Macbook running OS-X 10.5.5:
>
> >>> import time
> >>> def timeit(f):
> ...     t0 = time.time()
> ...     f()
> ...     delta = time.time() - t0
> ...     print "Function executed in %.4f seconds" % delta
> ...
> >>> def sleep10000():
> ...     i = 10000
> ...     while i > 0:
> ...         time.sleep(0.0001)
> ...         i -= 1
> ...
> >>> timeit(sleep10000)
> Function executed in 1.6180 seconds
> >>>
> >>> def sleep100000():
> ...     i = 100000
> ...     while i > 0:
> ...         time.sleep(0.00001)
> ...         i -= 1
> ...
> >>> timeit(sleep100000)
> Function executed in 1.9185 seconds
>
> This shows that sleeping 10,000 times for 100 nanoseconds takes ~1.6
> seconds, and sleeping 100,000 times for 10 nanoseconds takes ~1.9
> seconds. I think the extra 0.6 and 0.9 seconds are not unreasonable
> times for the overhead of decrementing i and doing the while-loop
> test, for Python.
>
> In other words, sleeping for 10 nanoseconds works just fine. For fun,
> lets try 1 nanosecond:
>
> >>> def sleep1000000():
> ...     i = 1000000
> ...     while i > 0:
> ...         time.sleep(0.000001)
> ...         i -= 1
> ...
> >>> timeit(sleep1000000)
> Function executed in 11.5436 seconds
>
> Ah, that does not look right - indeed it might be sleeping for longer
> than 1 nanosecond.
>
> Did you possibly mean nanosecond and not milliseconds when you cited
> the 2.5 number? I am quite willing to believe you if that's the case
> :)

No, I mean milliseconds. It would seem that either your time.sleep is using
some other mechanism than a timer interrupt, or you have faster timer ticks
(10 usec?) than my motherboard. You might want to try C/Linux nanosleep or
usleep and see if you get the same result. And/or do a longer Stackless test
and watch CPU usage, just in case Stackless is using a busy loop.

Larry

cheers,
> Arnar
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.stackless.com/pipermail/stackless/attachments/20081003/e9da8b50/attachment.htm>