[Stackless] That darn GIL rant again... (was: Re: Server Components)
Jeff Senn
senn at maya.com
Fri Mar 28 15:20:53 CET 2008
On Mar 28, 2008, at 9:18 AM, Simon Pickles wrote:
> Midway thru my server dev, I am seeing the light... well, a light
> anyway.
>
> Its a stackless python server and not too bad, running on Ubuntu
> Linux.
> Stackless Python is great, but single threaded (I DO do my DB lookups
> and network comms in other threads).
>
> I figure its GOT to be more concurrent, if I'm looking to use future
> architecture.
> ...
> SO - Am I reinventing the wheel? Does Twisted do this, or other
> Frameworks? In my limited experience, I need to create a server hub
> then
> have all the component modules connect to that as clients.
>
> Is there a better way?
Well... here's a post I'm on the verge of making every time
someone posts about how they are going to make a great distributed
scheduling system by just managing to "distribute" the computation
of Stackless around a little bit... [Note: I hope this doesn't wind
up having a pessimistic/downer tone -- there is actually a
(rather difficult) suggestion below...]
Simon is really asking the right question here: has it been done before?
The answer is yes. Quite a few times. Perhaps most "spectacularly"
with architectures like CORBA (and DCOM and ...). (Note that
"spectacular" probably doesn't mean what you think it does. Look
it up!) :-)
What winds up happening (i.e. one of the reasons these
architectures are abandoned by people who are
actually interested in performance -- and replaced
with things like Twisted or Apache-with-dynamically-
loaded-modules etc, etc...) is that you wind up spending
all of your computing resources on "hidden costs"; that is:
serializing/de-serializing messages, the memory for all of those
copies, network latency, and really inefficient
(or unpredictable) partitions of task scheduling (if not deadlocks).
Now, I'm not saying that one can't do better - if your goal is to
use CPUs that *don't* have shared memory, you have little choice: go
for it! In fact the nature of Python may give you some advantages
(e.g. dynamic task serialization) that previous architectures lacked!
But just be aware of the dragons livin' in them-thar caves... :-)
My interest in Stackless is adjacent: The main pragmatic
performance advantage of Stackless over using
things like distributed object models to slice up computation is that
the "messages" in Stackless are objects resident in memory and do not
need to be
serialized or copied (i.e. they are references).
Stackless is fast... very fast... but it is that way because it
relies on a BIG COMPROMISE that is central to how CPython works:
the assumption that no one else is mucking with your memory
out of sight of your interpreter (including other threads!!)
(This is known as the "GIL Issue"). Getting rid of the GIL (other
than breaking LOTS of code) makes Python and Stackless slower...
http://www.python.org/doc/faq/library/#can-t-we-get-rid-of-the-global-interpreter-lock
http://www.artima.com/weblogs/viewpost.jsp?thread=214235
So we are sort of stuck between the rock and the hard place...
Choose one: fast python or truly multi-threaded python.
About the only space between rock and hard-place I see is:
1) notice that most (well-designed) systems that
do asynchronous messaging (in practice) send far less data
between "processes" than they handle locally.
2) realize that declaring beforehand which data is local
and which is "possibly remote" is not very "Pythonic". And
probably difficult given the "reference heavy" nature
of Python objects.
3) Re-vamp a python interpreter that dynamically acquires
locking "per-object" as it notices the object reference
cross a "thread boundary". (Notice that descent into the tree
of references is necessary for containers!)
This way "local objects" (ones with a reference only within
a single "thread") can be accessed with very little overhead, while
"shared objects" slow you down (necessarily for safety).
Objects can get upgraded to "shared", but not generally
downgraded again (until they are copied).
This would probably be an intractable amount of work with the current
Python
implementation.... and would certainly cost memory (but possibly could
meet Guido's single-thread-must-be-as-fast stipulation).
So... some of us await what PyPy has to say about all of this...
(Yes - essentially I want a Stackless Python "operating system"
with a shared, flat copy-on-write-between-tasks memory subsystem!)
-Jas
More information about the Stackless
mailing list