[Stackless] Interthread channel bug

Christian Tismer tismer at stackless.com
Tue Jan 15 04:47:49 CET 2008


Hi Richard,

> I have been looking into an old interthread channel usage bug which
> Paul Sijben kindly provided a reproduction case for back in July, with
> the aid of Kristjan.  Kristjan, please step in and correct me or add
> whatever you think is necessary if I have missed something.
> 
> Here is the sequence of events which is happening:

Ok. I will try to insert comments about the desired behavior.
I'm sure I overlooked something, but I remember what I thought
were valid assumptions. Maybe these need to be made valid.

> 1. There are two threads, a client thread and a server thread, each
> sharing the same channel, using it for interthread communication
> between tasklets.

Yes. The idea was to let stackless handle all necessary
locking, so the user can simply send across threads.

> 2. The server thread sends something through the channel and what gets
> awoken to receive that something is the tasklet on the client thread.
> 
> 3. channelobject.c:generic_channel_action() is reached.  And the
> following code is invoked.  This places the awokened tasklet in the
> scheduler of the thread it belongs to.
> 
> 		if (interthread) {
> 			/* interthread, always keep target! */
> 			slp_current_insert(target);
> 		}

What I wanted is that the target tasklet gets activated like in
a normal channel.send (with mode==1).
I think I assumed that the target's thread was locked as well
and it would be safe to modify its tasklet chain.
This is wrong, of course.

> 4. channelobject.c:generic_channel_action() then calls:
> 
> 	retval = slp_schedule_task(source, target, stackless);
> 
> 5. scheduling.c:schedule_task_unblock() is reached.  This proceeds to
> do some interthread locking in order to get the other thread to
> schedule the awakened tasklet in different circumstances.  Part of
> this is releasing the GIL while it does so.

waah:

> 6. In the case the other thread is not locked, because we have already
> scheduled the awakened tasklet, it can run in the meantime on the
> other thread when it acquires the GIL.  And if the tasklet exits after
> it blocked on the channel, we can end up with a dead and garbage
> collected tasklet which our reference to is no longer valid.

Ok, and then we mess up.

> Any thoughts on what would be the correct fix for this?  My guess is
> we should avoid scheduling the tasklet into the other thread, then
> unlocking the GIL and then proceeding to operate on it when it is
> unknown what state the other thread would have put it into.

I think of locking the other thread while doing the transaction.
The target tasklet should be inserted while the target thread is
locked, it should not run before the channel action is done.

> So I would lean towards changing
> channelobject.c:generic_channel_action() to not inserting the awakened
> tasklet into its thread's scheduler and freeing the reference we have
> to it after the slp_schedule_task() call.  Of course, if it is
> possible that a tasklet known to be blocked on another channel will
> always have another reference perhaps as a local variable, we might be
> able to free it sooner.
> 
> Of course this might be the completely wrong fix :-)  If you can give
> me your insight on what the correct course to proceed with this is, I
> would appreciate it.

Avoiding the insert fixes the crash, while it does not give the
expected behavior (prioritizing the target tasklet).
I'm not sure if this is really needed, but it is somewhat annoying
that this cannot be controlled.

Maybe I should look again at the locking mechanism. At the time of the
channel action, both threads should be in a defined state where
only one of them can run. The other one should be blocked in a way
that we safely can modify things.

Scheduling the awakened tasklet should not happen before we are
ready, of course. My fault.

I think the target should be inserted, obeying the chanel's
flags. Not inserting at all is probably wrong. Who guarantees that
the target gets run at all?

Any idea how to do it right?

cheers - chris
-- 
Christian Tismer             :^)   <mailto:tismer at stackless.com>
tismerysoft GmbH             :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9A     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 802 86 56  mobile +49 173 24 18 776  fax +49 30 80 90 57 05
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?   http://www.stackless.com/




More information about the Stackless mailing list