[Stackless] Multi core/cpu?

Santiago Gala sgala at apache.org
Fri Oct 19 16:31:06 CEST 2007

El jue, 18-10-2007 a las 23:17 -0700, Allen Fowler escribió:
> Richard Tew <richard.m.tew at gmail.com> wrote:
>         On 10/18/07, Allen Fowler wrote:
>         > Is there any (near-term) hope to fix the common case of
>         trying to easily get
>         > more than one CPU core in a single physical machine to
>         process tasklets?
>         No. Not by anyone I can name on this list. But if you want it,
>         I
>         encourage you to give it a shot. Most of the things people
>         post about
>         wanting to this mailing list, they could probably write
>         themselves if
>         they set themselves to it rather than to posting.
> I suppose you are correct.  However, as a python newbie, if I knew how
> to do this, I probably would not have posted here the first place. 
> Frankly, I am not sure I know what a "thread" really is.
> I get the idea that a "process" has it's own slab of memory space it
> to which program code (and variables) are loaded.

Basically true, though "text" memory (read only, typically executable
and librery code but sometimes also some data) is typically shared to
save memory. Hence the name "shared" libraries.

> As I understand it, the CPU then runs through the slab of memory
> executing instructions.  (jmp-ing around inside as the code
> dictates. )

Again, basically yes

> Now, threads supposedly "share a memory space".  In my simplistic view
> of things, I don't see how that could work.
> I mean variables would be changing under-foot at all times as other
> threads do their thing. (?!)
> How could you even make something simple as a c-style for loop?  The
> index variable would be clobbered by the other thread running the same
> code.

Well, there is a piece you have not considered: the stack. Each thread
shares memory, except the processor registers and the  stack. Local
variables are allocated from the stack in most if not all languages,
which means the "automatic" (non extern/static) C vars are allocated
separatelly for each thread. Any introductory book con languages or
compilers should explain how a stack-allocated program works.

Thread switching is way cheaper than process switching, as the OS needs
only to save/restore registers and the stack pointer (and the stack
needs to get into cache, etc.). In a process switch, typically all the
cache is stale and needs to be refreshed from the new process memory.

> But, obviously, it is do-able.  I just don't see how.
>         Put a Stackless Python interpreter on each core. Make aware of
>         each
>         other. Then pickle the tasklets you want to move to other
>         cores, and
>         send them there. Do the same across sockets to other cores on
>         other
>         computers on the network. There are some technical issues
>         involved in
>         which tasklets can be pickled in this way, but finding out
>         what they
>         are is half the fun.
> IPC is complex. IPC is slow. IPC is hard.
> IPC (and concurrency) has already been figured out by many smart
> people. (MPI,Spread,etc.)
> IPC seems like an overkill for my laptop's dual-core CPU.
> I really don't want to fork/spawn/whatever a 2nd/3rd/4th 100MB+ Python
> process.

Take a look into the wide finder code from Fredrich Lundh
( http://effbot.org/zone/wide-finder.htm ) starting with my code
( http://memojo.com/~sgala/blog/2007/09/29/Python-Erlang-Map-Reduce ) ,
and you'll see how it can pay to spawn a process for a lot of programs.
The timing goes from 1.9 secs  (his optimized serial code, 1GB input) to
0.9 secs in a two core machine.

Again, a lot of effort nowadays is not directed towards optimizing
"small" problems, but rather to make them scale. Things like "use 1000
machines to count hits per URL for 20000 log files totaling 3TBytes...

>         > It would be a huge boon to me, and i'm sure many others.
>         Then I hope you will find value in giving it a shot.
> I'd like to, however I'd first need  to understand how plain-old
> Threads work.
> Then I'd need to understand how CPython works.
> Then, I'd need to comprehend what exactly stackless is doing inside
> CPython.

In a very broad summary, it stops using the OS stack for the interpreter
internals, so that "microthreads" that can be switched very fast are
possible. (Correct my overgeneralized sentence, please)

> All this being hampered by a near-zero knowledge of C.
> And at the end, I have no guarantee that any of this is actually
> possible to do in a performance-sane way.

I agree it is difficult, but at least you don't have all the
preconceptions people acquires after learning "The Way" 20 years ago.


> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> _______________________________________________
> Stackless mailing list
> Stackless at stackless.com
> http://www.stackless.com/mailman/listinfo/stackless

More information about the Stackless mailing list