[Stackless] Is Stackless single core by nature?

Sat Jul 4 08:57:51 CEST 2009

On Sat, Jul 4, 2009 at 5:26 PM, Henning
Diedrich<hd at authentic-internet.de> wrote:
>> Stackless is a microthreading solution.  It is not a scalability
>> solution in and of itself.
>
> But regardless how much the GIL (1) may be an additional obstacle, or not,
> the immediate reach of Stackless' tasklets seems to be one process and there
> is no transparency across system processes, cores and boxes (as Erlang
> features). Is this correct?

This is correct.

>> You can take the basic
>> functionality and build up your own framework around this.  Tired of
>> callbacks?  Make a function that wraps an asynchronous operation in a
>> channel and whatever calls it will just read as a synchronous call.
>> Of course, a programmer needs to be aware of the effect of blocking
>> and when blocking might happen on the code they write, but in practice
>> this is rarely much of a concern.
>
> Could I do this if I left single core behind ... ? To my eye that is part of
> the advantages you achieved with the very clear architecture decisions you
> opted for with EVE. The more flexible and complex ways you had referred to,
> might have turned out way more complex in this regard.

I don't understand what you are asking here.  The ability to provide a
function that blocks in a synchronous way wrapping asynchronous IO is
a benefit that comes with any real coroutine-like solution.  And it
can be applied as a building block in any framework you build, whether
one core/process or multiple cores/process per core.

>> Stackless has a scheduler which runs on a real thread, and
>> all microthreads created on that thread are run within that scheduler.
>> You can have multiple threads each running their own scheduler, with
>> their own tasklets running within them.
>
> Can channels reach out of their interpreter/scheduler? Or can a Stackless
> interpreter run across multiple cores, or even blades? Are there modules or
> extensions that provide for this, or for transparency in this regard?

This is mostly on the user.  Stackless is a basic set of functionality
(scheduling of microthreads, microthread serialisation).  There are no
modules or extensions to take it further in other directions.

However, if you can ensure your newly launched thread goes on the core
you want, then the interpreter can be considered to run across
multiple cores.  This is a Python problem, not a Stackless one.

Running a Stackless interpreter across multiple blades makes no sense
- as such Erlang wouldn't be able to do it either.  A program runs on
one machine, not several, is what I am saying.  And the Python
interpreter is a program.

I should note that while at CCP, I wrote part of a framework that ran
an agent on each machine involved.  There was a master program and it
would communicate with each running agent telling it to start
sub-applications to farm off work to.  All programs, whether agents,
master and sub-applications were specialisations of the CCP Stackless
Python based application.  There was no pickling involved, however.
Unless I am mistaken, this sort of arbitrary ability to start up
instances of the interpreter on involved machines is as close as you
would be able to get to "or even blades", no matter the language (and
framework) used.

> That pickling works even across diverse OSses is an exciting feature (2).
> And I am still working to get my head around what happens to state when
> sending tasklets over to another box (3). It doesn't look quite trivial.

I don't have a clear picture either.  But it should be something that
someone who intends to use this functionality should be able to easily
get a handle on with a little experimentation.  In my book, it is
better to have a choice in how this works (as you would with
Stackless), than to have an inflexible predetermined solution forced
upon you (as I think you get with Erlang, but may be wrong).

Regarding Pyro.  Its webpage says "you just call a method on a remote
object as if it were a local object".  For this to be a true
statement, it must block the current thread while the call takes
place.  This would be incompatible with Stackless, as the tasklet
making the call would block the scheduler preventing any other tasklet
from running.

Writing an RPC mechanism using Stackless is straightforward, if you
are familiar with networking and Stackless.  Here is one I have
written:

http://code.google.com/p/stacklessexamples/source/browse/#svn/trunk/examples/networking/stacklessrpc

It is possible to write a simpler version of course, my one being a
little abstract.

> But is pickling fast enough to do more interactive stuff than load balancing
> (e.g. loading complete solar systems off to a different blade that has
> better hardware or because the current blade had more than one solar system
> mounted). Is it fast enough to completely distribute entities?

I have no experience with this, so cannot say one way or the other.

> As this is what I can't yet fathom about Erlang, how it's paradigma of not
> sharing state may work well for telecom but not for games. Since that virtue
> is achieved by taking the liberty from the programmer, it could be
> replicated by discipline in other languages. But the language inherent
> features of Erlang would have to be coded in Python, most everytime that
> they would come into play, making the source more complicated, losing
> readability.

I don't believe I agree with this.  Maybe you are thinking of some
Erlang features I am not familiar with.

Cheers,
Richard.

> Yet I'm not a Pythonista in this regard and don't hold readability to be
> decisive. In regard to multi-core processing I am looking mostly at
>
> - performance, predictability of performance and what might turn out to be
> 'out of reach' for optimizing, hardwired and part of the
> compiler/interpreter.
>
> - where state is physically located or how it's mobility is managed and
> whether it may work in any scenario to share state between microthreads
> across multiple blades for near realtime calculations.
>
>> So, given you are willing to take the time to write a framework to
>> take care of it, this allows you to move running logic to other
>> 'nodes', to be resumed there.
>
> To arrive at maximum control it may be best to code a fitting custom
> implementation. Given the complexity of the topic though, I'll sure be best
> advised to learn more about it by using what's already there, first. I am
> even expecting to learn that it simply can't work what I am looking for,
> specifically with regard to cross blade state access. And I am far from
> fluent enough in Python to write a concurrency framework on my own. Does
> Pyro (3) work with Stackless? Does the processing module (4) work across
> multiple blades?