<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

Hi Richard,<br>

<br>

coming back to the thread, I meanwhile found the treasure trove to dive

into for my questions: <tt><a class="moz-txt-link-freetext"

 href="http://wiki.python.org/moin/ParallelProcessing">http://wiki.python.org/moin/ParallelProcessing</a></tt><br>

<br>

That's a list of efforts to nudge Python towards parallel/multicore/

etc. Did you work with any of them?&nbsp; Is there any specific or general

rule, as to which will work with Stackless? <br>

<br>

Does MPI4Py?<br>

<br>

I am not yet done looking at all of them.<br>

<blockquote

 cite="mid:952d92df0907032357w71bad16j537be4c5bb104fd1@mail.gmail.com"

 type="cite">

  <blockquote type="cite">

    <blockquote type="cite">

      <pre wrap="">You can take the basic

functionality and build up your own framework around this.  Tired of

callbacks?  Make a function that wraps an asynchronous operation in a

channel and whatever calls it will just read as a synchronous call.

Of course, a programmer needs to be aware of the effect of blocking

and when blocking might happen on the code they write, but in practice

this is rarely much of a concern.

      </pre>

    </blockquote>

    <pre wrap="">Could I do this if I left single core behind ... ? To my eye that is part of the advantages you achieved with the very clear architecture decisions you opted for with EVE. The more flexible and complex ways you had referred to, might have turned out way more complex in this regard.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

I don't understand what you are asking here.  The ability to provide a

function that blocks in a synchronous way wrapping asynchronous IO is

a benefit that comes with any real coroutine-like solution.  And it

can be applied as a building block in any framework you build, whether

one core/process or multiple cores/process per core.

  </pre>

</blockquote>

But in a SMP environment you run into concurrent resource access, as

one effect of blocking, issues

that you are completely isolated from when staying one-core, protected

by the guaranteed sequentiality that this yields.<br>

<br>

It that sense I had referred to "a programmer needs to be aware ... but

in practice this is rarely much of concern": I wondered if this may get

way more complicated as soon as you'd have multi-core concurrency and

the need to protect resources from contention.<br>

<br>

The hiding of asynchronous operations - in its simplest incarnation in

a

physically sequential environment - is not called in question. But the

implications of such a layout in a distributed or multi-core

environment, across

multiple blades even, at best 'transparently distributed'. <br>

<br>

That's very much what

I am looking for, the *implications* of hiding of complexity (in the

extreme even 'burried' inside the language itself) and of different

syntactical approaches in different languages. And, by default I

would *not* like to see vital functions 'out of my reach', whether

hidden inside a

function or a language, see below. <br>

<br>

But eventually it should be 'easy'

to use, as much as possible, for the very reasons you were citing that

speak for Python as a language. MPI4Py then, at least unwrapped, seems

not really what I am

looking for, by itself, but probably a stepping stone.&nbsp; The blocking

tutorial samples are nice but *asynchronous*

stuff comes with strong warnings. It would have to be wrapped again, to

make it robust &#8211;&#8211; and maybe that is the way to go. I can't tell if that

can still end up to be as 'elegant' as I'd hope.<br>

<br>

The promise with Erlang, is that the

restriction of "no shared state" is going to provide for the

transparancy

of messaging, obviously then by design, dead lock-free. Depending how

much you heard about it, that may even sound hard to believe, but it's

a deceptively simple principle. Without that,

i.e. with shared state, you'll have the usual opportunities for races

and dead locks. And yes, that can be done right, and still, in

practice, from a certain size of the system on, it can also turn into a

nightmare.<br>

<br>

And, it's a factor here that not everyone in a team of programmers may

turn out to be a genius (I won't protest if you tell me at, CCP you are

;-), so restriction by design, if well chosen, may save a hell of a lot

of time. <br>

<br>

But all this, regarding "architecture decision", is what you avoid when

you stay on one

core; and basically stay protected by underlying ensured sequentiallity

of (micro-)threads, while creating parallelly formulated code. <br>

<br>

(1) -&nbsp; Joe Armstrong, Erlang's Rossum, on concurrency strategies (among

other things): <a class="moz-txt-link-freetext"

 href="http://www.pragprog.com/articles/erlang">http://www.pragprog.com/articles/erlang</a><br>

<br>

(2)&nbsp;

- Guido v. Rossum pro GIL (and why removing the GIL is *not* what

people

*really* want):

<a class="moz-txt-link-freetext"

 href="http://www.artima.com/forums/flat.jsp?forum=106&amp;thread=214235&amp;start=0&amp;msRange=15">http://www.artima.com/forums/flat.jsp?forum=106&amp;thread=214235&amp;start=0&amp;msRange=15</a>

<br>

<blockquote

 cite="mid:952d92df0907032357w71bad16j537be4c5bb104fd1@mail.gmail.com"

 type="cite">

  <blockquote type="cite">

    <blockquote type="cite">

      <pre wrap="">Stackless has a scheduler which runs on a real thread, and

all microthreads created on that thread are run within that scheduler.

You can have multiple threads each running their own scheduler, with

their own tasklets running within them.

      </pre>

    </blockquote>

    <pre wrap="">Can channels reach out of their interpreter/scheduler? Or can a Stackless

interpreter run across multiple cores, or even blades? Are there modules or

extensions that provide for this, or for transparency in this regard?

    </pre>

  </blockquote>

  <pre wrap=""><!---->

This is mostly on the user.  Stackless is a basic set of functionality

(scheduling of microthreads, microthread serialisation).  There are no

modules or extensions to take it further in other directions.

However, if you can ensure your newly launched thread goes on the core

you want, then the interpreter can be considered to run across

multiple cores.  This is a Python problem, not a Stackless one.

  </pre>

</blockquote>

As I said, I am still going through the presented approaches at <tt><a

 class="moz-txt-link-freetext"

 href="http://wiki.python.org/moin/ParallelProcessing">http://wiki.python.org/moin/ParallelProcessing</a><br>

<br>

</tt>Is there a rule of thumb, or a list, of what modules and libraries

run with Stackless? <br>

<br>

I am still hoping to find the stackless-compatible concurrency

support I am looking for. But otherwise, would Stackless then stay

close, and 'on top' of the main Python branch, which in turn will

likely

not implement multi-threading as that would be obstructed by the

GIL-philosophy? (see (2) above)<br>

<br>

Even if Stackless is not originally about multi-core or distributed

processing: just as it is not a *language* issue that CPython has

the GIL, but an implementation issue of CPython (as discussed at (2))-&nbsp;

would not the Stackless syntax be just what one wanted

to use multi-cores and distribute calculations to multiple computers?

Potentially extended (or reduced!) to deal with shared resources? It

just seems to lend itself to that exceptionally well and would not have

to pass in the last hurdle, as Java does (see (6) at the bottom). The

last hurdle being microthreads, wich make Elrang and Stackless seem

very close. <br>

<br>

Would not even EVE have to expect, in the future, that Blades will

become faster on a much

slower pace, measured per core, but offering more cores instead

as today's proposition of speed improvement? Growth by hardware should

get harder to realize, staying with one core. But maybe you fork out

different stuff to keep cores busy in a different way.<br>

<br>

Erlang, got multi-threaded only quite recently, in 2007<br>

(3) - <a class="moz-txt-link-freetext"

 href="http://www.ericsson.com/technology/opensource/erlang/news/archive/erlang_goes_multi_core.shtml">http://www.ericsson.com/technology/opensource/erlang/news/archive/erlang_goes_multi_core.shtml</a>

<br>

As would be expected with no language changes, only the VM was adapted,

which people at Ericsson where rightfully proud about. I imagine the

Erlang hype of 2007/8 was fired up by this fact. I had initially

thought Stackless was just as destined for that feat.<br>

<br>

This may neatly clarify similarities and differences between Stackless

and Erlang (Joe, Armstrong, quote from (3)): <br>

<br>

"The

Erlang VM is written in C and run as one process on the host operating

system (OS). Within the Erlang VM an internal scheduler is responsible

for running the Erlang processes (which can be many thousands). In the

SMP version of the Erlang VM, there can be many such schedulers running

in separate OS threads. As default there will be as many schedulers as

there are processors or processor cores on the system.<br>

<br>

"The SMP support is totally transparent for the Erlang programs. That

is, there is no need to change or recompile existing programs. Programs

with built-in assumptions about sequential execution must be rewritten

in order to take advantage of the SMP support, however."<br>

<br>

That this worked was because of the way that Erlang had focused on

making distributed computations possible: again, the paradigm of no

shared state. As this is inherent in Erlang, Erlang could transparently

be made to use multi-cores. <br>

<br>

Even if Stackless cannot follow that leap (sic, p), my impression was

that it may be the natural starting point for Python to get there, if

probably with

syntactic modification needed. It's coming from a different approach of

(not) dealing with state in concurrency but seems as microprocess

centered by design as Erlang. <br>

<br>

I maybe just

haven't found the project that is doing this just yet. Or there is a

fundamental problem that yet eludes me (I know that Erlang followers

would

immediately second that. But 'shared state' is not universally thought

of as a bad thing). <br>

<blockquote

 cite="mid:952d92df0907032357w71bad16j537be4c5bb104fd1@mail.gmail.com"

 type="cite">

  <pre wrap="">Running a Stackless interpreter across multiple blades makes no sense

- as such Erlang wouldn't be able to do it either.  A program runs on

one machine, not several, is what I am saying.  And the Python

interpreter is a program.

  </pre>

</blockquote>

Just wanted to make sure that I am not missing a point that was too far

out for me to imagine.<br>

<br>

Erlang is somewhat transparent though across multiple machines. Like as

if channels would work exactly the same way across machines or locally.

<br>

<br>

But sure, there'd be one Erlang VM running per machine.<br>

<br>

Maybe that's the gordian knot, thinking

single mutli-computer VM.<br>

<blockquote

 cite="mid:952d92df0907032357w71bad16j537be4c5bb104fd1@mail.gmail.com"

 type="cite">

  <pre wrap="">I should note that while at CCP, I wrote part of a framework that ran

an agent on each machine involved.  There was a master program and it

would communicate with each running agent telling it to start

sub-applications to farm off work to.  All programs, whether agents,

master and sub-applications were specialisations of the CCP Stackless

Python based application.  There was no pickling involved, however.

Unless I am mistaken, this sort of arbitrary ability to start up

instances of the interpreter on involved machines is as close as you

would be able to get to "or even blades", no matter the language (and

framework) used.

  </pre>

</blockquote>

Yes. <br>

<br>

Plus what you did with no pickling is probably close to Erlang

philosphy: (if not literally because you can send all sort of things

with

an Erlang message, but:) if you didn't pickle you probably also did not

expect state sent back as immediate answers, except for basic 'ok's.

Which is close to the Erlang's 'return-less' (Actor model) messages.&nbsp; <br>

<blockquote

 cite="mid:952d92df0907032357w71bad16j537be4c5bb104fd1@mail.gmail.com"

 type="cite">

  <blockquote type="cite">

    <pre wrap="">That pickling works even across diverse OSses is an exciting feature (2).

And I am still working to get my head around what happens to state when

sending tasklets over to another box (3). It doesn't look quite trivial.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

I don't have a clear picture either.  But it should be something that

someone who intends to use this functionality should be able to easily

get a handle on with a little experimentation.  In my book, it is

better to have a choice in how this works (as you would with

Stackless), than to have an inflexible predetermined solution forced

upon you (as I think you get with Erlang, but may be wrong).

  </pre>

</blockquote>

Same here. That it is *not* an integral part, thus potentially less

removed, can be considered a

*plus*, as it would be accessible, changeable, fixable and tuneable. <br>

<br>

On the other hand, Erlang's track record is impressive enough to

interpolate that so much work went into solving exactly these problems

that it should not be passed up upon, and that that wheel need not be

re-invented. Where it is *not* a language issue, but of implementation,

I trust that they found over they years what caches make sense where

and what un-intuitive modifications bring extra speed and/or stability.

<br>

<br>

However, you'd find out the contrary only with much pain. At worst

after

the system is ready for prime time and suddenly starts to sputter. I

spoke with Thorsten Sch&uuml;tt of Scalaris and he confirmed in a way that

'soft real time', as the Erlang claim goes, does not mean 'real time'.<br>

<br>

But the real difference is in the concurrency philosophy of Erlang that

simply prevents the worst sort of problems from happening in the first

place. That productivity argument should be easily acceptable in the

Python world.<br>

<br>

Specifically speaking of pickled tasklets though, Erlang does not send

processes around. <br>

<blockquote

 cite="mid:952d92df0907032357w71bad16j537be4c5bb104fd1@mail.gmail.com"

 type="cite">

  <pre wrap="">Regarding Pyro.  Its webpage says "you just call a method on a remote

object as if it were a local object".  For this to be a true

statement, it must block the current thread while the call takes

place.  This would be incompatible with Stackless, as the tasklet

making the call would block the scheduler preventing any other tasklet

from running.</pre>

</blockquote>

<pre wrap="">Thanks for looking into that. I found it hard to find any mention of Pyro and Stackless together, which only supports your conclusion.

Reminds of the raison d'etre for StacklessIO. Could it yield equal rewards? You had mentioned a drop in you wrote with the same functionality as StacklessIO. How difficult could it be to shift the blocking to the level of the tasklet, away from the system thread, with Pyro?

Shouldn't that be rather painless, given that Pyro is native Python (<a class="moz-txt-link-freetext" href="http://pyro.sourceforge.net/manual/1-intro.html">http://pyro.sourceforge.net/manual/1-intro.html</a>)?

Or am I missing something there?

</pre>

<blockquote

 cite="mid:952d92df0907032357w71bad16j537be4c5bb104fd1@mail.gmail.com"

 type="cite">

  <pre wrap="">Writing an RPC mechanism using Stackless is straightforward, if you

are familiar with networking and Stackless.  Here is one I have

written:

<tt><a class="moz-txt-link-freetext"

 href="http://code.google.com/p/stacklessexamples/source/browse/#svn/trunk/examples/networking/stacklessrpc">http://code.google.com/p/stacklessexamples/source/browse/#svn/trunk/examples/networking/stacklessrpc</a></tt>

It is possible to write a simpler version of course, my one being a

little abstract.

  </pre>

  <blockquote type="cite">

    <pre wrap="">But is pickling fast enough to do more interactive stuff than load balancing

(e.g. loading complete solar systems off to a different blade that has

better hardware or because the current blade had more than one solar system

mounted). Is it fast enough to completely distribute entities?

    </pre>

  </blockquote>

  <pre wrap=""><!---->

I have no experience with this, so cannot say one way or the other.

  </pre>

</blockquote>

I have meanwhile read some about MPI4Py

(<a class="moz-txt-link-freetext"

 href="http://pypi.python.org/pypi/mpi4py">http://pypi.python.org/pypi/mpi4py</a>)

and what they do is direct access

down to the memory blocks where the Python objects lie, as binary, to

prevent the overhead of pickling and marshaling. Unless endian

conversion is needed, and unless the underlying MPI lib should add some

overhead I am not aware of, that should be as fast as it can get.<br>

<br>

And again, doing this with asynchronous calls gets a big warning in the

manual, because the programmer must see to that state of the memory

block being sent is never changed while the send is pending, as on a

deeper level obviously there may be partial sending, er re-sending

going on. (See manual in the installation

package, docs/mpi4py.pdf, pg. 4. Did not find it online.)<br>

<br>

MPI4Py does that to allow for better performance and for sending even

of "objects bigger than half the availabel memory" (:-). <br>

<br>

Erlang is "different" here again. As I understand,

exactly for this very occasion. Variables in Erlang cannot change

state. Once a variable is bound to a value it cannot be assigned a new

value, ever. That should in fact allow for peace of mind when using the

internal, binary buffer of a variable itself to send a 'copy' of state

over the

network or to another process. And again, it's not only about what is

possible but what improces productivity. These restrictions are for

robustness of concurrency what Java's

elimination of pointers was for robustness of memory management (I

know, there were others before Java and before Erlang, respectively,

and there was everything already there in LISP a century ago).<br>

<br>

This is the 'discipline' question again. Certainly, "no shared state"

and "no re-binds" can be *emulated*

in any language that can share state and can change assigned state. But

then, the problem

is not the theory but that actively preventing this to happen, as

Erlang does, may save a crucial lot of nerves and time and make things

possible

that are more complex because it prevents one from 0.001% inadvertant

stupid

things ruining Five Nines of good.<br>

<br>

In the concrete case with MPI4Py and asynch sends I am not sure why the

immutable concept in

Python does not pre-empt this problem. Is that because MPI4P works on

memory level even with complex data constructs, which *are* mutable?<br>

<br>

However, immutability in Python seems like coming from the same corner

as

that in Erlang: improved clarity, less chances for errors. Can't tell

though.<br>

<blockquote

 cite="mid:952d92df0907032357w71bad16j537be4c5bb104fd1@mail.gmail.com"

 type="cite">

  <blockquote type="cite">

    <pre wrap="">As this is what I can't yet fathom about Erlang, how it's paradigma of not

sharing state may work well for telecom but not for games. Since that virtue

is achieved by taking the liberty from the programmer, it could be

replicated by discipline in other languages. But the language inherent

features of Erlang would have to be coded in Python, most everytime that

they would come into play, making the source more complicated, losing

readability.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

I don't believe I agree with this.  Maybe you are thinking of some

Erlang features I am not familiar with.

  </pre>

</blockquote>

It's along the lines that Stackless introduced a whole new concept,

making possible a whole different way to formulate solutions. But very

much by changes 'under the hood'. Will it be possible to bring Erlang's

model to Python without loss of style?<br>

<br>

I shall bother you with details, if you care, in a nutshell.<br>

<br>

In Erlang, a message is sent like this: Receiver ! Message.<br>

<br>

And the receiver can be anywhere.<br>

<br>

To some extent this sending syntax is the equivalent of a method call

in OO. It is the way Erlang "processes" communicate. And processes are

as

ubiquitous in Erlang as objects in Python. In fact, they can be

regarded as objects to some sense (see (8) below, really 'actors') and

as such

&lt;Receiver ! Message&gt; is as common as, and some say, really *is*,

a method call in Erlang. Again, some will agree, some won't. <br>

<br>

Python is about making things elegant and simple to achieve readability

and in the end productivity. It goes further, as Torvalds lectured the

Google croud recently, "if you can do something really fast and really

well, people start using it differently." &#8211; which holds for performance

and elegance

and simplicity, too, I'd say. This is why the abstraction and

transparency of, in

essence, any "method call" (message sending) in Erlang may make a

difference: *it's in the same form*, no matter how far it travels, to a

local or non-local receiver. Like, say, UDP there is no (explicit)

handshake as I understand, with the same gains and drawbacks. As

Thorsten of Scalaris described, he started out on his Laptop and

then found it "made no difference" if he simulated 100 nodes locally or

distributed them on several machines.<br>

<br>

&lt;Receiver&gt; can contain a the equivalent of a function name that

is registered to a name service, and some of the remaining address part

is a proper domain name. &lt;Message&gt; can be pretty much anything

allowed as Erlang expression. But it is never a reference, not even if

the message goes to an Erlang process in the same system process, it is

always a copy. This is essential to the way Erlang works and this is

what I have doubts about: that may work better for telecom than for

many other things and may confine Erlang to its niche.<br>

<br>

However, this construct will call any 'method' anywhere in the visible

network of nodes, in the same process, the same core, the sibbling core

on the same CPU, a co-CPU, or a remote machine. <br>

<br>

Such a call is always one-way, non-blocking, it does *not* return a

return message from the Receiver. If the receiver wants to send

something back, it would be the same way, sending a message. Back,

one-way.<br>

<br>

The pendant to the sending construct is a receive block, which is

always blocking, but can easily be set to an individual time out as

native part of the language.<br>

<br>

With "loosing readability" I referred to my wondering if the actor

model can be emulated in Stackless, or programmed on top of it, with

the result of something as elegant and addressee-distance-unaware as in

Erlang. Or if there is a Python package out there for this that will

work with Stackless. I still have some candidates to check out.<br>

<br>

The mentioned "discipline" would be to make only calls that are

non-blocking and do not even expect anything back, ever. Have

receiver functions to be called from remote processes that are

blocking, as you suggested, and never return anything to the remote

process. The discipline would also demand to not share state, which

would provide for no dead locks. That might do and I can't think it

through to the end yet. But I can also hardly imagine a bunch of

programmers sticking to such 'voluntary' restrictions without diviating

"for good reasons", of course, here and there. So, yes, a framework

would

have to built and declared mandatory, knowing full well that it is a

subset of available possibilities, a restriction for good reason. I

have no idea how complicated and/or unelegant the use of such a

framework would turn out to be. MPI itself sure sounds like a different

planet than Erlang.<br>

<br>

If I could kindle your interest, I found the following post by Slava

Akhmechet a rewarding read, both for humor and enlightment.<br>

<br>

It plays

through the thought of how Java could be extended in the direction of

Erlang and why and what for. It also stops exactly at an unsurmountable

hurdle for Java, which happens to be Stackless' specialty:

microprocesses.&nbsp; <br>

(6) - <a class="moz-txt-link-freetext"

 href="http://www.defmacro.org/ramblings/concurrency.html">http://www.defmacro.org/ramblings/concurrency.html</a><br>

<br>

This is a commendable article by Bruce Tate of IBM, which is looking at

Erlang from the Java angle, too: <br>

(7) - <a class="moz-txt-link-freetext"

 href="http://www.ibm.com/developerworks/java/library/j-cb04186.html">http://www.ibm.com/developerworks/java/library/j-cb04186.html</a><br>

<br>

Ralph Johnson explains why Erlang processes are objects, even if this

should send Joe Armstrong, Erlang's creator kicking and screaming: (8)

- <a class="moz-txt-link-freetext"

 href="http://www.cincomsmalltalk.com/userblogs/ralph/blogView?entry=3364027251">http://www.cincomsmalltalk.com/userblogs/ralph/blogView?entry=3364027251</a><br>

<br>

---<br>

<br>

Where I currently got to is Candygram (

<a class="moz-txt-link-freetext"

 href="http://candygram.sourceforge.net/overview.html">http://candygram.sourceforge.net/overview.html</a>

), explicitly an Erlang

epigon, looking quite quiet since 2004. Probably suffering from the

fact, too, that it can't have many threads, not near the numbers of

Erlang and Stackless. I can't tell why it couldn't run with Stackless,

you surely can?<br>

<br>

If you would have an answer to that it would be very much appreciated.<br>

<br>

There is a post from 2006 on

<a class="moz-txt-link-freetext"

 href="http://mail.python.org/pipermail/python-3000/2006-September/003718.html">http://mail.python.org/pipermail/python-3000/2006-September/003718.html</a>

, Bob Ippolito answering Ivan Krsti&#263;:<br>

<br>

"Candygram is heavyweight by trade-off, not because it has to be.

Candygram could absolutely be implemented efficiently in current<br>

Python if a Twisted-like style was used. An API that exploits Python

2.5's with blocks and enhanced iterators would make it less verbose<br>

than a traditional twisted app and potentially easier to learn.

Stackless or greenlets could be used for an even lighter weight API,<br>

though not as portably."<br>

. . . <br>

<br>

"&gt; * Introduce microthreads, declare that Python endorses Erlang's

no-sharing approach to concurrency, and incorporate something like<br>

"&gt; candygram into the stdlib.<br>

<br>

"We have cooperatively scheduled microthreads with ugly syntax (yield),

or more platform-specific and much less debuggable microthreads with

stackless or greenlets.<br>

<br>

"The missing part is the async message passing API and the libraries to

go with it."<br>

<br>

End of quote.<br>

<br>

*This* is exactly what MPI4Py now *does* provide I take, but rather

'dangerously' as

outlined, which could be wrapped and brought into the form of

Candygram?<br>

<br>

I will look around still more, since I am still suspecting that what I

am looking for is probably out there already.<br>

<br>

What puzzles me is how you seem rather unphased about these

multi-core issues. Isn't Stackless *the* place from where this should

come to CPython? Is the potential in this irrelevant for some reason I

am missing out on? Or for some reason uninteresting for CCP?<br>

<br>

Best regards and thank you for your, or any other readers thoughts on

this,<br>

Henning<br>

</body>

</html>