[Stackless] stackless python in a multicore environment
c.j.lee at tnw.utwente.nl
Mon Sep 24 11:28:10 CEST 2007
Well yes and no. pp doesn't really use threads, rather it initiates an
entirely new python instance and runs the job in that environment (thus
avoiding the GIL). PP also does all the load balancing, so all workers
are doing an appropriate amount of work.
Data is synchronized using the pickle module.
job = job_server.submit(.....)
the_result = job() <- this blocks until the job returns
In my simulations, I submit between 6 (a debug test) and 550 jobs
simultaneously and then wait for them all to return before moving on to
the next step of my calculation (which is also broken up into jobs for
the job server, but is still buggy).
Using stackless *within* a job will work fine but stackless cannot help
between jobs... unless you pickle the results, and in situations like
that pp will slow down, because there will be a lot of blocking as the
channels communicate via the hard drive. You would really want that to
happen a minimal amount of occasions.
Also, using stackless within a thread would indicate that you haven't
broken the task up for parallelisation optimally for pp (at least that
is my thought)
Finally, the job that contains the server and initiates the jobs is not
counted as a worker. If you run top while your script is running, you
will notice num_cpus + 1 python instances running, until all jobs are
submitted. At that point, the job server blocks until a job returns so
you will generally only see num_cpus python instances in top.
Johan Carlsson wrote:
> I've had a quick look at pp and it looks to me like it threads of
> tasks when going remove, that's why there is a wait() method
> to synchronize on a groups.
> pp is very high level, the API is minimal and examining the stats
> print outs I would say it does a lot of optimization of scheduling
> internally, for instance it seems to try to run as much task
> as close as possible (I don't have a SMP machine so I run a cluster
> consisting of in process on my desktop and one on my office server,
> making the total number of server 3? (the local process seems to count
> like a server too)) .
> Anyway, my guess it that stackless will not play nice with pp
> due to pp's internal threading.
> It still possible to run a stackless application in parallel but
> I'm not sure pp is the best package to do that with.
> (Like I mentioned before, spread might be more suited to
> integrate with stackless, Spread is way more low level, and
> doesn't provide any scheduling of tasks just the multicast protocol).
More information about the Stackless