[Stackless] a stackless crash

Kristján Valur Jónsson kristjan at ccpgames.com
Fri Nov 2 14:59:58 CET 2007


I just want to clarify a few points.
in 2), the channel is being destroyed.  It has a tasklet with a cstack sleeping on it. We have the following callstack:
        slp_channel_remove()
        channel_clear()
        slp_resurrect_and_kill()
        channel_dealloc()

slp_channel_remove() unlinks the tasklet so that afterwards it has null prev and next pointers.  in channel_clear() this tasklet is then just decrefed and the code "hopes that it goes away" (channelobject.c:29)

But now we have two problems:
1) PyStackless_kill_tasks_with_stackes() assumes that all cstacks have tasklets that are linked in a chain, and
2) we have lost a reference somewhere, as I describe.


Now, I can easily repro this in release eve code, but not in vanilla .py code as I explained, because I can never get a C stack tasklet to sleep on a channel without increasing the channel's reference count.

But I think I can do it with a simple .pyd module.  I will try and see if I manage that.

Cheers,

Kristján



-----Original Message-----
From: Christian Tismer [mailto:tismer at stackless.com]
Sent: Monday, October 29, 2007 01:48
To: Kristján Valur Jónsson
Cc: Stackless mailing list
Subject: Re: [Stackless] a stackless crash

Kristján Valur Jónsson wrote:
> Ok, I managed to analyze this a bit better during my flight from Boston
> to SF.

> 1)      A tasklet with a C stack is sleeping in blue.pyos.Yield.  It
> does a receive on a channel.  It does not increment any references to
> that channel.

Yes. I did not add references for the channels. No idea why,
but I tried to treat this like a weak-ref.

> 2)      We start shutting down and as part of that, we release the
> channel.  It gets destroyed, in the process it unlinks the sleeping
> tasklet which is now unlinked (with next and prev pointers being NULL(

Sounds bad. next and prev should point at the channel, prior to
destruction. If they don't, it is an error.

> 3)      We enter PyStackless_kill_tasks_with_stacks.  It finds the
> tasklet and tries to unlink it in order to relink it.  But it is nowhere
> linked so the unlink crashes.

Sounds bad, bad, bad.

> Now, I tried fixing it by checking for NULL in step 3 and skipping the
> unlink.  But I then got a crash, because a subsequent PyTasklet_Kill()
> call effectively steals the reference to the tasklet (it does a decref
> after killing it) and this causes a crash during GC.
>
> Increfing the tasklet before calling KILL fixes that.
>
> Now, here are some questions, mostly for our chief Guru:
>
> 1)      Is this scenario supposed to be possible?  I tried reproing it
> with pure python but I found no way to let a tasklet sleep on a channel
> without increfing the channel.  See the attached .py file.  On the other
> hand, the code that deletes a channel does unlink any sleeping tasklets,
> so it would appear that the design is intended to cope with that.  At
> any rate, we probably don't want a sleeping tasklet to hold a reference
> to a channel since then we have a reference cycle.

I need to have a look at it.
Maybe it is of course insane to sleep on a channel without incref.
On the other hand, what about reference cycles?
Do you still refuse using gc???

> 2)      Assuming it is possible, and the only way to trigger it is
> within the context of eve, are my bugfixes correct?  I.e. checking for
> NULL and not unlinking, and increfing before the PyTasklet_Kill?

I have to have a closer look. Please give me some time, I'm just busy
with the Leopard update :-)

> Also, any help in reproing the case in pure python would be appreciated.
>
> N.b. I get the expected behavior (no incref in sleep) if I run in
> softswitching mode.  Only then the bug isn't encountered because only
> tasklets with cstacks matter here.

Well, softswitching is the real thing, but what should I say.
Hard switching will not die, even not in PyPy!

cheers - chris
--
Christian Tismer             :^)   <mailto:tismer at stackless.com>
tismerysoft GmbH             :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9A     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 802 86 56  mobile +49 173 24 18 776  fax +49 30 80 90 57 05
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?   http://www.stackless.com/




More information about the Stackless mailing list