[Stackless] LLVM coroutine support [was: [LLVMdev] Proposal: stack/context switching within a thread]

Jeffrey Yasskin jyasskin at gmail.com
Thu Apr 8 02:30:06 CEST 2010


Hi Stackless folks,

The following message was recently sent to the LLVM developers list
proposing to add primitives along the lines of the posix makecontext
and swapcontext functions. Since you guys have implemented coroutines
(in assembly if I'm remembering right), you have much more expertise
about this than I do. I'd like to make sure that LLVM supports your
needs in case you want to support jitted code, so could some of you
look over this to see if it does what you need? If there's anything
broken about it, you can either reply to llvmdev at cs.uiuc.edu or send
me comments and I'll forward them.

Thanks,
Jeffrey


---------- Forwarded message ----------
From: Kenneth Uildriks <kennethuil at gmail.com>
Date: Wed, Apr 7, 2010 at 12:14 PM
Subject: [LLVMdev] Proposal: stack/context switching within a thread
To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>


Right now the functionality is available, sometimes, from the C
standard library.  But embedded environments (often running a limited
standard library) and server environments would benefit heavily from a
standard way to specify context switches within a single thread in the
style of makecontext/swapcontext/setcontext, and built-in support for
these operations would also open the way for optimizers to begin
handling these execution paths.

The use cases for these operations, and things like coroutines built
on top of them, will only increase in the future as developers look
for ways to get more concurrency while limiting the number of
high-overhead and difficult to manage native threads, locks, and
mutexes.

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
//===----------------------------------------------------------------------===//
//                         Stack and context switching
//===----------------------------------------------------------------------===//

4/7/2010 - Initial Revision

At the time of this writing, LLVM supports standard function calls on a thread
of execution, but does not support switching function contexts or stacks
within a thread of execution.  Such support would enable creating coroutines,
which in turn supports high performance, safer concurrency, and lower overhead
than native threads do, and enables concurrency on systems that lack native 
thread support.

Some C library implementations include support for stack and context switching
with functions such as makecontext, getcontext, and setcontext.  However,
these calls may not exist on some embedded systems, which may also lack native
thread support and therefore have a greater need for context switching.  Also,
built-in support for context switching allows such operations to be lowered
to inline assembly rather than a call into the C library.  Implementation of
kernels in IR would benefit from these intrinsics.  The C library functions 
depend on machine-specific structures to represent the context.  Finally, with 
intrinsic functions for handling context switches, optimizers can be made to 
recognize and optimize code whose flow of execution is impacted by these context
switches.

//===----------------------------------------------------------------------===//
// Implementation Approach

We will model the intrinsics after the C library functions.  In environments 
where the library functions are present, these intrinsics should behave 
identically to the library functions, so that code using them can interact 
gracefully with platform native code using the library functions.  In
environments where the library functions are absent, these intrinsics can be 
lowered to inline assembly language instructions with the proper effect.

We will constrain the definition of the function passed to llvm.makecontext,
and unconditionally pass the function's own context to it, in order to make
it easy for the optimizer to determine when the function uses
llvm.swapcontext to temporarily return execution to its "link" context or
when it passes its own context as the "link" context to a newly
created context.  This should make it possible to optimize some common
coroutine patterns.

A context must be executing within at most one thread at any given time.
A context may execute in one thread, and later execute in a different thread.

; Returns the number of bytes needed to hold a stack context.  Since the
; context typically includes a copy of most or all machine registers plus
; additional data, mem2reg would offer little advantage; therefore, having the
; size not recognized in the IR as a constant, while it would block mem2reg for
; that particular alloca, would have little practical disadvantage.
declare i64 llvm.context.size() readonly

; pCtx shall be a pointer to a memory area of at least the number of bytes
; returned by llvm.context.size().  That memory area will be filled with
; stack context data.  A call to llvm.setcontext or llvm.swapcontext with
; that context data will cause execution to proceed as if from a return from
; this llvm.getcontext call.
declare void llvm.getcontext({}* %pCtx)

; pCtx shall be a pointer to context data generated by a call to llvm.getcontext
; or llvm.makecontext.  llvm.setcontext will cause execution to transfer to
; the point specified in the context data.
declare void llvm.setcontext({}* %pCtx)

; The first argument is a pointer to a memory area of at least the numnber of
; bytes returned by llvm.context.size().  That memory area will be filled with
; new context data.  The second argument is a pointer to the function where 
; execution of this context shall begin.  The third and fourth arguments define 
; the memory reserved for the context's local stack; this stack will not grow, 
; and overflow of this stack will lead to undefined behavior.  The fifth
; argument is a pointer to the "linked" or "previous" context; when %func
; returns, this pointer is dereferenced and execution continues at the linked
; context.  When the context begins executing, %func is called with a pointer to
; its corresponding context as its first parameter, and the sixth parameter to
; llvm.makecontext as its second parameter.  If execution unwinds past %func, 
; undefined behavior results.  If the memory pointed to by %newCtx is
; overwritten by anything other than a call to llvm.swapcontext, a later return
; from %func will result in undefined behavior.
declare void llvm.makecontext({}* %newCtx, void({}*, {}*)* %func, 
i8* %stackbegin, i64 %stacksize, {}* %linkCtx, {}* %data)

; Retrieves the link context pointer from the given context.  The link context
; pointer is the same one that was passed to llvm.makecontext to create the
; given context.  If pCtx was populated by llvm.getcontext rather than
; llvm.makecontext, this function returns a null pointer.  If pCtx was
; populated only by llvm.swapcontext, the return value is undefined.
declare {}* llvm.getlinkcontext({}* %pCtx)

; The first argument shall point to a memory area of at least the number of
; bytes returned by llvm.context.size().  That memory area will be filled with
; stack context data corresponding to the current execution context.  The
; second argument shall point to context data generated by a call to
; llvm.getcontext or llvm.makecontext.  Execution will transfer to the context
; specified by pNewCtx.  Using llvm.swapcontext or llvm.setcontext to set
; the context to that stored in pThisCtx will cause execution to transfer back
; to this context as if from a return from llvm.swapcontext.  The linked
; context pointer corresponding to %pThisCtx is not modified.
declare void llvm.swapcontext({}* %pThisCtx, {}* %pNewCtx)

A simple example using these intrinsics:

define void @co1({}* %thisCtx, {}* %data) nounwind {
entry:
  %prevCtx = call {}* @llvm.getlinkcontext(%thisCtx)

  ; Now call print messages.  After each print message, temporarily yield
  ; control back to the previous context.
  call void @printCo1FirstMessage()
  call void @llvm.swapcontext({}* %thisCtx, {}* %prevCtx)
  call void @printCo1SecondMessage()
  call void @llvm.swapcontext({}* %thisCtx, {}* %prevCtx)
  call void @printCo1ThirdMessage()
    
}

define void @co2({}* %thisCtx, {}* %data) nounwind {
entry:
  %prevCtx = call {}* @llvm.getlinkcontext(%thisCtx)

  ; Now call print messages.  After each print message, temporarily yield
  ; control back to the previous context.
  call void @printCo2FirstMessage()
  call void @llvm.swapcontext({}* %thisCtx, {}* %prevCtx)
  call void @printCo2SecondMessage()
  call void @llvm.swapcontext({}* %thisCtx, {}* %prevCtx)
  call void @printCo2ThirdMessage()
}

define i32 @main() nounwind {
entry:
  ; alloca space for the contexts.
  %ctxSize = call i64 @llvm.context.size()
  %p0 = alloca i8, i64 %ctxSize
  %mainCtx = bitcast i8* %p0 to {}*
  %p1 = alloca i8, i64 %ctxSize
  %ctx1 = bitcast i8* %p1 to {}*
  %p2 = alloca i8, i64 %ctxSize
  %ctx2 = bitcast i8* %p2 to {}*

  ; Stacks for the contexts
  %stack1 = alloca i8, i64 4096
  %stack2 = alloca i8, i64 4096

  ; Create contexts for co1 and co2.
  call void @llvm.makecontext({}* %ctx1, void({}*, {}*)* @co1, i8* %stack1, 
i64 4096, {}* %mainCtx, {}* null)
  call void @llvm.makecontext({}* %ctx2, void({}*, {}*)* @co2, i8* %stack2, 
i64 4096, {}* %mainCtx, {}* null)

  ; Run co1 and co2 in an alternating fashion.
  call void @llvm.swapcontext({}* %mainCtx, {}* %ctx1)
  call void @llvm.swapcontext({}* %mainCtx, {}* %ctx2)
  call void @llvm.swapcontext({}* %mainCtx, {}* %ctx1)
  call void @llvm.swapcontext({}* %mainCtx, {}* %ctx2)
  call void @llvm.swapcontext({}* %mainCtx, {}* %ctx1)
  call void @llvm.swapcontext({}* %mainCtx, {}* %ctx2)
}

This code should make the following sequence of messaging calls:

printCo1FirstMessage()
printCo2FirstMessage()
printCo1SecondMessage()
printCo2SecondMessage()
printCo1ThirdMessage()
printCo2ThirdMessage()

//===----------------------------------------------------------------------===//
// Moving forward
//

An easy first step would be to lower the intrinsics to corresponding calls to 
makecontext and friends.  Once that is working on test platforms with these
calls in their C libraries, we can start adding platform specific lowering
code to platforms that need it, and then as an option for all platforms to
allow using this functionality in kernels and other environments that lack a
standard C library.  In the meantime, optimization passes can be introduced that
know how to optimize across some context switches.


More information about the Stackless mailing list