[Stackless] LLVM coroutine support [was: [LLVMdev] Proposal: stack/context switching within a thread]

OvermindDL1 overminddl1 at gmail.com
Sat Apr 10 01:38:02 CEST 2010


2010/4/9 Kristján Valur Jónsson <kristjan at ccpgames.com>:
> Hello Jeffrey.
> This is very interesting.  I am not familiar with the C apis that this appears to emulate.  The concept of a "linked" context is novel to me.
>
> One thing that is not immediately is apparent to me is if this system does not support "stack slicing".
> I see the "makecontext"  This has some problems:
> 1) You have to know beforehand how much stack you will use
> 2) You Therefore have to allocate conservatively.
> 3) This generous amount of preallocated memory that remains fixed for the lifetime of the context causes memory fragmentation, similar to what you see with regular threads.  This limits the number of contexts that can be alive by virtual memory in the same way as the number of threads are limited.
>
> In stackless python, we use "stack slicing".  If you are not familiar with the concept, it involves always using the same C stack, which therefore can grow, and storing the "active" part of the stack away into heap memory when contexts are switched.  An inactive context therefore has not only cpu registers associated with it, but also slice of the stack (as little as required to represent that particular context) tucked away into a heap memory block.
>
> It is unclear to me if a context created by "getcontext" could be used as a base point for stack slicing.
> Could one create such a base point (ctxt A), and then decend deeper into the stack, then "swapcontext" from a nex context B back to the previous point on the stack?  Will the stack data in the between points A and B on the stack be "tucked away", to be restored when returning to context B?
>
> When doing stack slicing, one has to define a "base", and in the exaple above, context A would be the base, and all other contexts would have to be from deeper on the stack.  I don't see a provision for identifying such a base.  And indeed, if some of the contexts come from a separate "makecontext" area, they would have a different base.
>
> If stack slicing is not supported, as I suspect, it would be relatively simple to add it by being able to specify a "base context" to "getcontext" and "swapcontext", which would serve the base point on the stack between which and the current stack position, memory would need to be saved.  The contexts thus generated would have an associated stack slice with them.
>
> Adding such a "base context" argument go getcontext() and swapcontext() would enable us to build current stackless behaviour on top of such an API.
>
>
> Just to add weight to my argument:  My company runs internet servers using stackless python, each of which runs a single stackless python process handling 30.000 TCP connections.  Each connection's stack, when not active, is tucked away in a tight malloc´d block when it is not in use.  Having 30.000 preallocated, reatively large, stacks present at fixed positions in virtual memory for the duration of the process would be impossible in 32 bits.

I was under the impression that Stackless Python had two forms of
switching, hard and soft.

Hard - Is as you described, it malloc's memory and copies a chunk of
the stack and registers, just enough to represent the execution
context.  It only uses hard switching when a C frame is on the stack.

Soft - Every Python function is already fully enclosed, do not use the
stack at all, so it just switches Python frames, it can only do this
if *everything* on the stack is *only* Python frames, no C frames at
all, this method however is a *great* deal faster then hard switching.

Consequently, I see speed issues in the proposed implementation.  I
already use LLVM for a stackless python'ish switching language, but I
break things up into various function calls so everything is tail
called, that seems to mitigate the main problems.  Any 'stack'
variable that need to exist across calls are stored on a 'stack' that
is passed from function to function for a thread of execution, fixes
the speed issue in the makecontext style tasks, but requires a pass to
go over your code to properly translate it into pure tail-calls (at
least any calls that can 'halt' need to be made into proper
tail-calls).



More information about the Stackless mailing list