[Stackless] stackless python in a multicore environment

Chris Lee c.j.lee at tnw.utwente.nl
Mon Sep 24 11:28:10 CEST 2007


Well yes and no. pp doesn't really use threads, rather it initiates an 
entirely new python instance and runs the job in that environment (thus 
avoiding the GIL). PP also does all the load balancing, so all workers 
are doing an appropriate amount of work.
Data is synchronized using the pickle module.

job = job_server.submit(.....)

the_result = job() <- this blocks until the job returns

In my simulations, I submit between 6 (a debug test) and 550 jobs 
simultaneously and then wait for them all to return before moving on to 
the next step of my calculation (which is also broken up into jobs for 
the job server, but is still buggy).

Using stackless *within* a job will work fine but stackless cannot help 
between jobs... unless you pickle the results, and in situations like 
that pp will slow down, because there will be a lot of blocking as the 
channels communicate via the hard drive. You would really want that to 
happen a minimal amount of occasions.

Also, using stackless within a thread would indicate that you haven't 
broken the task up for parallelisation optimally for pp (at least that 
is my thought)

Finally, the job that contains the server and initiates the jobs is not 
counted as a worker. If you run top while your script is running, you 
will notice num_cpus + 1 python instances running, until all jobs are 
submitted. At that point, the job server blocks until a job returns so 
you will generally only see num_cpus python instances in top.

Cheers
Chris


Johan Carlsson wrote:
> I've had a quick look at pp and it looks to me like it threads of
> tasks when going remove, that's why there is a wait() method
> to synchronize on a groups.
>
> pp is very high level, the API is minimal and examining the stats
> print outs I would say it does a lot of optimization of scheduling
> internally, for instance it seems to try to run as much task
> as close as possible (I don't have a SMP machine so I run a cluster
> consisting of in process on my desktop and one on my office server,
> making the total number of server 3? (the local process seems to count
> like a server too)) .
>
> Anyway, my guess it that stackless will not play nice with pp
> due to pp's internal threading.
> It still possible to run a stackless application in parallel but
> I'm not sure pp is the best package to do that with.
> (Like I mentioned before, spread might be more suited to
> integrate with stackless, Spread is way more low level, and
> doesn't provide any scheduling of tasks just the multicast protocol).
>
>
>   




More information about the Stackless mailing list