[Stackless] LLVM coroutine support [was: [LLVMdev] Proposal: stack/context switching within a thread]

Wed Apr 14 03:03:34 CEST 2010

Kristján Valur Jónsson <kristjan at ...> writes:

> 
> Hello Jeffrey.
> This is very interesting.  I am not familiar with the C apis
> that this appears to emulate.  The concept of a
> "linked" context is novel to me.
> 
> One thing that is not immediately is apparent to me is if this
> system does not support "stack slicing".
> I see the "makecontext"  This has some problems:
> 1) You have to know beforehand how much stack you will use
> 2) You Therefore have to allocate conservatively.
> 3) This generous amount of preallocated memory that remains fixed
> for the lifetime of the context causes memory fragmentation, similar
> to what you see with regular threads.  This limits the number of
> contexts that can be alive by virtual memory in the same way as the
> number of threads are limited.
> 
> In stackless python, we use "stack slicing".  If you are not
> familiar with the concept, it involves always using the same C
> stack, which therefore can grow, and storing the "active" part of
> the stack away into heap memory when contexts are switched.  An
> inactive context therefore has not only cpu registers associated
> with it, but also slice of the stack (as little as required to
> represent that particular context) tucked away into a heap memory
> block.
> 
> It is unclear to me if a context created by "getcontext" could be
> used as a base point for stack slicing.  Could one create such a
> base point (ctxt A), and then decend deeper into the stack, then
> "swapcontext" from a nex context B back to the previous point on the
> stack?  Will the stack data in the between points A and B on the
> stack be "tucked away", to be restored when returning to context B?
> When doing stack slicing, one has to define a "base", and in the
> exaple above, context A would be the base, and all other contexts
> would have to be from deeper on the stack.  I don't see a provision
> for identifying such a base.  And indeed, if some of the contexts
> come from a separate "makecontext" area, they would have a different
> base.
> 
> If stack slicing is not supported, as I suspect, it would be
> relatively simple to add it by being able to specify a "base
> context" to "getcontext" and "swapcontext", which would serve the
> base point on the stack between which and the current stack
> position, memory would need to be saved.  The contexts thus
> generated would have an associated stack slice with them.  Adding
> such a "base context" argument go getcontext() and swapcontext()
> would enable us to build current stackless behaviour on top of such
> an API.

Thanks for the detailed analysis. I forwarded it to the author of
the proposal, and we discussed it some in the thread at
http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-April/030887.html. He's
uploaded a new version to
http://code.google.com/p/llvm-stack-switch/wiki/Proposal,
although it doesn't yet include all of the comments I asked for.

To try to answer some of your questions above:

The C functions are documented, though not very well, at
http://www.opengroup.org/onlinepubs/009695399/functions/makecontext.html.

The basic mechanism Kenneth is proposing only swaps out the
context's registers, not its actual stack space. It would be up
to Stackless's scheduler to copy the stack data back and forth
from the heap. He thinks it'll be enough for your use to add a
way to query the top of the stack. Then, before calling anything
you'd have to hard-switch, you would call getcontext() once and
save the stack top. When switching, you'd call getcontext() again
and save out the data between the two tops. I guess that matches
your suggestion for a "base" context? The idea would be to give
schedulers enough portable primitives to do what you need,
without making LLVM know how to allocate memory.

In the llvmdev thread, I suggested that Stackless might want to
use makecontext(new_space) any time it entered a C function, on
the assumption that C frames are short-lived, and you could
re-use the space as soon as the C frame returned, but I think
that was wrong.

Jeffrey