[Stackless] Is Stackless single core by nature?
hd at authentic-internet.de
Thu Jul 9 17:20:23 CEST 2009
coming back to the thread, I meanwhile found the treasure trove to dive
into for my questions: http://wiki.python.org/moin/ParallelProcessing
That's a list of efforts to nudge Python towards parallel/multicore/
etc. Did you work with any of them? Is there any specific or general
rule, as to which will work with Stackless?
I am not yet done looking at all of them.
>>> You can take the basic
>>> functionality and build up your own framework around this. Tired of
>>> callbacks? Make a function that wraps an asynchronous operation in a
>>> channel and whatever calls it will just read as a synchronous call.
>>> Of course, a programmer needs to be aware of the effect of blocking
>>> and when blocking might happen on the code they write, but in practice
>>> this is rarely much of a concern.
>> Could I do this if I left single core behind ... ? To my eye that is part of the advantages you achieved with the very clear architecture decisions you opted for with EVE. The more flexible and complex ways you had referred to, might have turned out way more complex in this regard.
> I don't understand what you are asking here. The ability to provide a
> function that blocks in a synchronous way wrapping asynchronous IO is
> a benefit that comes with any real coroutine-like solution. And it
> can be applied as a building block in any framework you build, whether
> one core/process or multiple cores/process per core.
But in a SMP environment you run into concurrent resource access, as one
effect of blocking, issues that you are completely isolated from when
staying one-core, protected by the guaranteed sequentiality that this
It that sense I had referred to "a programmer needs to be aware ... but
in practice this is rarely much of concern": I wondered if this may get
way more complicated as soon as you'd have multi-core concurrency and
the need to protect resources from contention.
The hiding of asynchronous operations - in its simplest incarnation in a
physically sequential environment - is not called in question. But the
implications of such a layout in a distributed or multi-core
environment, across multiple blades even, at best 'transparently
That's very much what I am looking for, the *implications* of hiding of
complexity (in the extreme even 'burried' inside the language itself)
and of different syntactical approaches in different languages. And, by
default I would *not* like to see vital functions 'out of my reach',
whether hidden inside a function or a language, see below.
But eventually it should be 'easy' to use, as much as possible, for the
very reasons you were citing that speak for Python as a language. MPI4Py
then, at least unwrapped, seems not really what I am looking for, by
itself, but probably a stepping stone. The blocking tutorial samples
are nice but *asynchronous* stuff comes with strong warnings. It would
have to be wrapped again, to make it robust ---- and maybe that is the
way to go. I can't tell if that can still end up to be as 'elegant' as
The promise with Erlang, is that the restriction of "no shared state" is
going to provide for the transparancy of messaging, obviously then by
design, dead lock-free. Depending how much you heard about it, that may
even sound hard to believe, but it's a deceptively simple principle.
Without that, i.e. with shared state, you'll have the usual
opportunities for races and dead locks. And yes, that can be done right,
and still, in practice, from a certain size of the system on, it can
also turn into a nightmare.
And, it's a factor here that not everyone in a team of programmers may
turn out to be a genius (I won't protest if you tell me at, CCP you are
;-), so restriction by design, if well chosen, may save a hell of a lot
But all this, regarding "architecture decision", is what you avoid when
you stay on one core; and basically stay protected by underlying ensured
sequentiallity of (micro-)threads, while creating parallelly formulated
(1) - Joe Armstrong, Erlang's Rossum, on concurrency strategies (among
other things): http://www.pragprog.com/articles/erlang
(2) - Guido v. Rossum pro GIL (and why removing the GIL is *not* what
people *really* want):
>>> Stackless has a scheduler which runs on a real thread, and
>>> all microthreads created on that thread are run within that scheduler.
>>> You can have multiple threads each running their own scheduler, with
>>> their own tasklets running within them.
>> Can channels reach out of their interpreter/scheduler? Or can a Stackless
>> interpreter run across multiple cores, or even blades? Are there modules or
>> extensions that provide for this, or for transparency in this regard?
> This is mostly on the user. Stackless is a basic set of functionality
> (scheduling of microthreads, microthread serialisation). There are no
> modules or extensions to take it further in other directions.
> However, if you can ensure your newly launched thread goes on the core
> you want, then the interpreter can be considered to run across
> multiple cores. This is a Python problem, not a Stackless one.
As I said, I am still going through the presented approaches at
Is there a rule of thumb, or a list, of what modules and libraries run
I am still hoping to find the stackless-compatible concurrency support I
am looking for. But otherwise, would Stackless then stay close, and 'on
top' of the main Python branch, which in turn will likely not implement
multi-threading as that would be obstructed by the GIL-philosophy? (see
Even if Stackless is not originally about multi-core or distributed
processing: just as it is not a *language* issue that CPython has the
GIL, but an implementation issue of CPython (as discussed at (2))-
would not the Stackless syntax be just what one wanted to use
multi-cores and distribute calculations to multiple computers?
Potentially extended (or reduced!) to deal with shared resources? It
just seems to lend itself to that exceptionally well and would not have
to pass in the last hurdle, as Java does (see (6) at the bottom). The
last hurdle being microthreads, wich make Elrang and Stackless seem very
Would not even EVE have to expect, in the future, that Blades will
become faster on a much slower pace, measured per core, but offering
more cores instead as today's proposition of speed improvement? Growth
by hardware should get harder to realize, staying with one core. But
maybe you fork out different stuff to keep cores busy in a different way.
Erlang, got multi-threaded only quite recently, in 2007
As would be expected with no language changes, only the VM was adapted,
which people at Ericsson where rightfully proud about. I imagine the
Erlang hype of 2007/8 was fired up by this fact. I had initially thought
Stackless was just as destined for that feat.
This may neatly clarify similarities and differences between Stackless
and Erlang (Joe, Armstrong, quote from (3)):
"The Erlang VM is written in C and run as one process on the host
operating system (OS). Within the Erlang VM an internal scheduler is
responsible for running the Erlang processes (which can be many
thousands). In the SMP version of the Erlang VM, there can be many such
schedulers running in separate OS threads. As default there will be as
many schedulers as there are processors or processor cores on the system.
"The SMP support is totally transparent for the Erlang programs. That
is, there is no need to change or recompile existing programs. Programs
with built-in assumptions about sequential execution must be rewritten
in order to take advantage of the SMP support, however."
That this worked was because of the way that Erlang had focused on
making distributed computations possible: again, the paradigm of no
shared state. As this is inherent in Erlang, Erlang could transparently
be made to use multi-cores.
Even if Stackless cannot follow that leap (sic, p), my impression was
that it may be the natural starting point for Python to get there, if
probably with syntactic modification needed. It's coming from a
different approach of (not) dealing with state in concurrency but seems
as microprocess centered by design as Erlang.
I maybe just haven't found the project that is doing this just yet. Or
there is a fundamental problem that yet eludes me (I know that Erlang
followers would immediately second that. But 'shared state' is not
universally thought of as a bad thing).
> Running a Stackless interpreter across multiple blades makes no sense
> - as such Erlang wouldn't be able to do it either. A program runs on
> one machine, not several, is what I am saying. And the Python
> interpreter is a program.
Just wanted to make sure that I am not missing a point that was too far
out for me to imagine.
Erlang is somewhat transparent though across multiple machines. Like as
if channels would work exactly the same way across machines or locally.
But sure, there'd be one Erlang VM running per machine.
Maybe that's the gordian knot, thinking single mutli-computer VM.
> I should note that while at CCP, I wrote part of a framework that ran
> an agent on each machine involved. There was a master program and it
> would communicate with each running agent telling it to start
> sub-applications to farm off work to. All programs, whether agents,
> master and sub-applications were specialisations of the CCP Stackless
> Python based application. There was no pickling involved, however.
> Unless I am mistaken, this sort of arbitrary ability to start up
> instances of the interpreter on involved machines is as close as you
> would be able to get to "or even blades", no matter the language (and
> framework) used.
Plus what you did with no pickling is probably close to Erlang
philosphy: (if not literally because you can send all sort of things
with an Erlang message, but:) if you didn't pickle you probably also did
not expect state sent back as immediate answers, except for basic 'ok's.
Which is close to the Erlang's 'return-less' (Actor model) messages.
>> That pickling works even across diverse OSses is an exciting feature (2).
>> And I am still working to get my head around what happens to state when
>> sending tasklets over to another box (3). It doesn't look quite trivial.
> I don't have a clear picture either. But it should be something that
> someone who intends to use this functionality should be able to easily
> get a handle on with a little experimentation. In my book, it is
> better to have a choice in how this works (as you would with
> Stackless), than to have an inflexible predetermined solution forced
> upon you (as I think you get with Erlang, but may be wrong).
Same here. That it is *not* an integral part, thus potentially less
removed, can be considered a *plus*, as it would be accessible,
changeable, fixable and tuneable.
On the other hand, Erlang's track record is impressive enough to
interpolate that so much work went into solving exactly these problems
that it should not be passed up upon, and that that wheel need not be
re-invented. Where it is *not* a language issue, but of implementation,
I trust that they found over they years what caches make sense where and
what un-intuitive modifications bring extra speed and/or stability.
However, you'd find out the contrary only with much pain. At worst after
the system is ready for prime time and suddenly starts to sputter. I
spoke with Thorsten Schütt of Scalaris and he confirmed in a way that
'soft real time', as the Erlang claim goes, does not mean 'real time'.
But the real difference is in the concurrency philosophy of Erlang that
simply prevents the worst sort of problems from happening in the first
place. That productivity argument should be easily acceptable in the
Specifically speaking of pickled tasklets though, Erlang does not send
> Regarding Pyro. Its webpage says "you just call a method on a remote
> object as if it were a local object". For this to be a true
> statement, it must block the current thread while the call takes
> place. This would be incompatible with Stackless, as the tasklet
> making the call would block the scheduler preventing any other tasklet
> from running.
Thanks for looking into that. I found it hard to find any mention of Pyro and Stackless together, which only supports your conclusion.
Reminds of the raison d'etre for StacklessIO. Could it yield equal rewards? You had mentioned a drop in you wrote with the same functionality as StacklessIO. How difficult could it be to shift the blocking to the level of the tasklet, away from the system thread, with Pyro?
Shouldn't that be rather painless, given that Pyro is native Python (http://pyro.sourceforge.net/manual/1-intro.html)?
Or am I missing something there?
> Writing an RPC mechanism using Stackless is straightforward, if you
> are familiar with networking and Stackless. Here is one I have
> It is possible to write a simpler version of course, my one being a
> little abstract.
>> But is pickling fast enough to do more interactive stuff than load balancing
>> (e.g. loading complete solar systems off to a different blade that has
>> better hardware or because the current blade had more than one solar system
>> mounted). Is it fast enough to completely distribute entities?
> I have no experience with this, so cannot say one way or the other.
I have meanwhile read some about MPI4Py
(http://pypi.python.org/pypi/mpi4py) and what they do is direct access
down to the memory blocks where the Python objects lie, as binary, to
prevent the overhead of pickling and marshaling. Unless endian
conversion is needed, and unless the underlying MPI lib should add some
overhead I am not aware of, that should be as fast as it can get.
And again, doing this with asynchronous calls gets a big warning in the
manual, because the programmer must see to that state of the memory
block being sent is never changed while the send is pending, as on a
deeper level obviously there may be partial sending, er re-sending going
on. (See manual in the installation package, docs/mpi4py.pdf, pg. 4. Did
not find it online.)
MPI4Py does that to allow for better performance and for sending even of
"objects bigger than half the availabel memory" (:-).
Erlang is "different" here again. As I understand, exactly for this very
occasion. Variables in Erlang cannot change state. Once a variable is
bound to a value it cannot be assigned a new value, ever. That should in
fact allow for peace of mind when using the internal, binary buffer of a
variable itself to send a 'copy' of state over the network or to another
process. And again, it's not only about what is possible but what
improces productivity. These restrictions are for robustness of
concurrency what Java's elimination of pointers was for robustness of
memory management (I know, there were others before Java and before
Erlang, respectively, and there was everything already there in LISP a
This is the 'discipline' question again. Certainly, "no shared state"
and "no re-binds" can be *emulated* in any language that can share state
and can change assigned state. But then, the problem is not the theory
but that actively preventing this to happen, as Erlang does, may save a
crucial lot of nerves and time and make things possible that are more
complex because it prevents one from 0.001% inadvertant stupid things
ruining Five Nines of good.
In the concrete case with MPI4Py and asynch sends I am not sure why the
immutable concept in Python does not pre-empt this problem. Is that
because MPI4P works on memory level even with complex data constructs,
which *are* mutable?
However, immutability in Python seems like coming from the same corner
as that in Erlang: improved clarity, less chances for errors. Can't tell
>> As this is what I can't yet fathom about Erlang, how it's paradigma of not
>> sharing state may work well for telecom but not for games. Since that virtue
>> is achieved by taking the liberty from the programmer, it could be
>> replicated by discipline in other languages. But the language inherent
>> features of Erlang would have to be coded in Python, most everytime that
>> they would come into play, making the source more complicated, losing
> I don't believe I agree with this. Maybe you are thinking of some
> Erlang features I am not familiar with.
It's along the lines that Stackless introduced a whole new concept,
making possible a whole different way to formulate solutions. But very
much by changes 'under the hood'. Will it be possible to bring Erlang's
model to Python without loss of style?
I shall bother you with details, if you care, in a nutshell.
In Erlang, a message is sent like this: Receiver ! Message.
And the receiver can be anywhere.
To some extent this sending syntax is the equivalent of a method call in
OO. It is the way Erlang "processes" communicate. And processes are as
ubiquitous in Erlang as objects in Python. In fact, they can be regarded
as objects to some sense (see (8) below, really 'actors') and as such
<Receiver ! Message> is as common as, and some say, really *is*, a
method call in Erlang. Again, some will agree, some won't.
Python is about making things elegant and simple to achieve readability
and in the end productivity. It goes further, as Torvalds lectured the
Google croud recently, "if you can do something really fast and really
well, people start using it differently." -- which holds for performance
and elegance and simplicity, too, I'd say. This is why the abstraction
and transparency of, in essence, any "method call" (message sending) in
Erlang may make a difference: *it's in the same form*, no matter how far
it travels, to a local or non-local receiver. Like, say, UDP there is no
(explicit) handshake as I understand, with the same gains and drawbacks.
As Thorsten of Scalaris described, he started out on his Laptop and then
found it "made no difference" if he simulated 100 nodes locally or
distributed them on several machines.
<Receiver> can contain a the equivalent of a function name that is
registered to a name service, and some of the remaining address part is
a proper domain name. <Message> can be pretty much anything allowed as
Erlang expression. But it is never a reference, not even if the message
goes to an Erlang process in the same system process, it is always a
copy. This is essential to the way Erlang works and this is what I have
doubts about: that may work better for telecom than for many other
things and may confine Erlang to its niche.
However, this construct will call any 'method' anywhere in the visible
network of nodes, in the same process, the same core, the sibbling core
on the same CPU, a co-CPU, or a remote machine.
Such a call is always one-way, non-blocking, it does *not* return a
return message from the Receiver. If the receiver wants to send
something back, it would be the same way, sending a message. Back, one-way.
The pendant to the sending construct is a receive block, which is always
blocking, but can easily be set to an individual time out as native part
of the language.
With "loosing readability" I referred to my wondering if the actor model
can be emulated in Stackless, or programmed on top of it, with the
result of something as elegant and addressee-distance-unaware as in
Erlang. Or if there is a Python package out there for this that will
work with Stackless. I still have some candidates to check out.
The mentioned "discipline" would be to make only calls that are
non-blocking and do not even expect anything back, ever. Have receiver
functions to be called from remote processes that are blocking, as you
suggested, and never return anything to the remote process. The
discipline would also demand to not share state, which would provide for
no dead locks. That might do and I can't think it through to the end
yet. But I can also hardly imagine a bunch of programmers sticking to
such 'voluntary' restrictions without diviating "for good reasons", of
course, here and there. So, yes, a framework would have to built and
declared mandatory, knowing full well that it is a subset of available
possibilities, a restriction for good reason. I have no idea how
complicated and/or unelegant the use of such a framework would turn out
to be. MPI itself sure sounds like a different planet than Erlang.
If I could kindle your interest, I found the following post by Slava
Akhmechet a rewarding read, both for humor and enlightment.
It plays through the thought of how Java could be extended in the
direction of Erlang and why and what for. It also stops exactly at an
unsurmountable hurdle for Java, which happens to be Stackless'
(6) - http://www.defmacro.org/ramblings/concurrency.html
This is a commendable article by Bruce Tate of IBM, which is looking at
Erlang from the Java angle, too:
(7) - http://www.ibm.com/developerworks/java/library/j-cb04186.html
Ralph Johnson explains why Erlang processes are objects, even if this
should send Joe Armstrong, Erlang's creator kicking and screaming: (8) -
Where I currently got to is Candygram (
http://candygram.sourceforge.net/overview.html ), explicitly an Erlang
epigon, looking quite quiet since 2004. Probably suffering from the
fact, too, that it can't have many threads, not near the numbers of
Erlang and Stackless. I can't tell why it couldn't run with Stackless,
you surely can?
If you would have an answer to that it would be very much appreciated.
There is a post from 2006 on
, Bob Ippolito answering Ivan Krstic':
"Candygram is heavyweight by trade-off, not because it has to be.
Candygram could absolutely be implemented efficiently in current
Python if a Twisted-like style was used. An API that exploits Python
2.5's with blocks and enhanced iterators would make it less verbose
than a traditional twisted app and potentially easier to learn.
Stackless or greenlets could be used for an even lighter weight API,
though not as portably."
. . .
"> * Introduce microthreads, declare that Python endorses Erlang's
no-sharing approach to concurrency, and incorporate something like
"> candygram into the stdlib.
"We have cooperatively scheduled microthreads with ugly syntax (yield),
or more platform-specific and much less debuggable microthreads with
stackless or greenlets.
"The missing part is the async message passing API and the libraries to
go with it."
End of quote.
*This* is exactly what MPI4Py now *does* provide I take, but rather
'dangerously' as outlined, which could be wrapped and brought into the
form of Candygram?
I will look around still more, since I am still suspecting that what I
am looking for is probably out there already.
What puzzles me is how you seem rather unphased about these multi-core
issues. Isn't Stackless *the* place from where this should come to
CPython? Is the potential in this irrelevant for some reason I am
missing out on? Or for some reason uninteresting for CCP?
Best regards and thank you for your, or any other readers thoughts on this,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Stackless