[Stackless] Interthread channel bug

Richard Tew richard.m.tew at gmail.com
Thu Dec 6 09:30:15 CET 2007


Hi Christian,

I have been looking into an old interthread channel usage bug which
Paul Sijben kindly provided a reproduction case for back in July, with
the aid of Kristjan.  Kristjan, please step in and correct me or add
whatever you think is necessary if I have missed something.

Here is the sequence of events which is happening:

1. There are two threads, a client thread and a server thread, each
sharing the same channel, using it for interthread communication
between tasklets.

2. The server thread sends something through the channel and what gets
awoken to receive that something is the tasklet on the client thread.

3. channelobject.c:generic_channel_action() is reached.  And the
following code is invoked.  This places the awokened tasklet in the
scheduler of the thread it belongs to.

		if (interthread) {
			/* interthread, always keep target! */
			slp_current_insert(target);
		}

4. channelobject.c:generic_channel_action() then calls:

	retval = slp_schedule_task(source, target, stackless);

5. scheduling.c:schedule_task_unblock() is reached.  This proceeds to
do some interthread locking in order to get the other thread to
schedule the awakened tasklet in different circumstances.  Part of
this is releasing the GIL while it does so.

6. In the case the other thread is not locked, because we have already
scheduled the awakened tasklet, it can run in the meantime on the
other thread when it acquires the GIL.  And if the tasklet exits after
it blocked on the channel, we can end up with a dead and garbage
collected tasklet which our reference to is no longer valid.

7. We reach the following piece of code in scheduling.c:schedule_task_unblock()

	else {
		PR("unlocker: other is NOT LOCKED or dead");
		if (next->flags.blocked) {
			/* unblock from channel */

8. next is a pointer to a now garbage collected tasklet and an access
violation occurs when "flags" is accessed.

Any thoughts on what would be the correct fix for this?  My guess is
we should avoid scheduling the tasklet into the other thread, then
unlocking the GIL and then proceeding to operate on it when it is
unknown what state the other thread would have put it into.

So I would lean towards changing
channelobject.c:generic_channel_action() to not inserting the awakened
tasklet into its thread's scheduler and freeing the reference we have
to it after the slp_schedule_task() call.  Of course, if it is
possible that a tasklet known to be blocked on another channel will
always have another reference perhaps as a local variable, we might be
able to free it sooner.

Of course this might be the completely wrong fix :-)  If you can give
me your insight on what the correct course to proceed with this is, I
would appreciate it.

Thanks,
Richard.

Here for reference is the call stack from VS.NET 2003, it is the same
as one of the other call stacks which Paul Sijben has provided to
Kristjan and myself:

 	python25_d.dll!slp_channel_remove_slow(_tasklet * task=0x00b7d038)
Line 109 + 0x3	C
 	python25_d.dll!schedule_task_unblock(_tasklet * prev=0x00b7d248,
_tasklet * next=0x00b7d038, int stackless=128)  Line 757 + 0x9	C
>	python25_d.dll!slp_schedule_task(_tasklet * prev=0x00b7d248,
_tasklet * next=0x00b7d038, int stackless=128)  Line 793 + 0x11	C
 	python25_d.dll!generic_channel_action(_channel * self=0x00b7a980,
_object * arg=0x00b79bb8, int dir=1, int stackless=128)  Line 468 +
0x11	C
 	python25_d.dll!impl_channel_send(_channel * self=0x00b7a980, _object
* arg=0x00b79bb8)  Line 480 + 0x13	C
 	python25_d.dll!channel_send(_object * myself=0x00b7a980, _object *
arg=0x00b79bb8)  Line 491 + 0xd	C
 	python25_d.dll!call_function(_object * * * pp_stack=0x0134adb8, int
oparg=12036480)  Line 3879 + 0xcf	C
 	python25_d.dll!PyEval_EvalFrame_value(_frame * f=0x00b97820, int
throwflag=0, _object * retval=0x1e2d0cec)  Line 2492	C
 	python25_d.dll!PyEval_EvalFrameEx_slp(_frame * f=0x00b97820, int
throwflag=0, _object * retval=0x1e2d0cec)  Line 782 + 0x15	C
 	python25_d.dll!PyEval_EvalFrame_value(_frame * f=0x00b99120, int
throwflag=0, _object * retval=0x1e2d0cec)  Line 2860	C
 	python25_d.dll!PyEval_EvalFrameEx_slp(_frame * f=0x00b99120, int
throwflag=0, _object * retval=0x1e2d0cec)  Line 782 + 0x15	C
 	python25_d.dll!PyEval_EvalFrame_value(_frame * f=0x00b96288, int
throwflag=0, _object * retval=0x1e2d0cec)  Line 2860	C
 	python25_d.dll!PyEval_EvalFrameEx_slp(_frame * f=0x00b96288, int
throwflag=0, _object * retval=0x1e2d0cec)  Line 782 + 0x15	C
 	python25_d.dll!slp_eval_frame_newstack(_frame * f=0x00b96288, int
exc=0, _object * retval=0x1e2d0cec)  Line 415 + 0x11	C
 	python25_d.dll!PyEval_EvalFrameEx_slp(_frame * f=0x00b96288, int
throwflag=0, _object * retval=0x1e2d0cec)  Line 782	C
 	python25_d.dll!slp_frame_dispatch_top(_object * retval=0x1e2d0cec)
Line 692 + 0x10	C
 	python25_d.dll!slp_run_tasklet()  Line 1157 + 0x9	C
 	python25_d.dll!slp_eval_frame(_frame * f=0x00b96288)  Line 299 + 0x5	C
 	python25_d.dll!climb_stack_and_eval_frame(_frame * f=0x00b96288)
Line 266 + 0x9	C
 	python25_d.dll!slp_eval_frame(_frame * f=0x00b96288)  Line 294 + 0x9	C
 	python25_d.dll!PyEval_EvalCodeEx(PyCodeObject * co=0x00b62e08,
_object * globals=0x00b96288, _object * locals=0x00000000, _object * *
args=0x00b6f2ec, int argcount=1, _object * * kws=0x00000000, int
kwcount=0, _object * * defs=0x00000000, int defcount=0, _object *
closure=0x00000000)  Line 3137 + 0x6	C
 	python25_d.dll!function_call(_object * func=0x00b7b610, _object *
arg=0x00b6f2d8, _object * kw=0x00000000)  Line 525 + 0x40	C
 	python25_d.dll!PyObject_Call(_object * func=0x00b7b610, _object *
arg=0x00b6f2d8, _object * kw=0x00000000)  Line 1863 + 0x3c	C
 	python25_d.dll!instancemethod_call(_object * func=0x00b7b610,
_object * arg=0x00b6f2d8, _object * kw=0x00000000)  Line 2516 + 0x11	C
 	python25_d.dll!PyObject_Call(_object * func=0x00b2e0f8, _object *
arg=0x00a01038, _object * kw=0x00000000)  Line 1863 + 0x3c	C
 	python25_d.dll!PyEval_CallObjectWithKeywords(_object *
func=0x00b2e0f8, _object * arg=0x00a01038, _object * kw=0x00000000)
Line 3757 + 0x11	C
 	python25_d.dll!t_bootstrap(void * boot_raw=0x00a07788)  Line 425 + 0x1a	C
 	python25_d.dll!bootstrap(void * call=0x0021a128)  Line 179 + 0x7	C
 	msvcr71d.dll!_threadstart(void * ptd=0x00a47f50)  Line 196 + 0xd	C




More information about the Stackless mailing list