[Stackless] question on preemtive scheduling semantics
senn at maya.com
Wed Mar 25 13:34:02 CET 2009
One word: "overhead"
30% does seem a little bit high... but when you have 2 threads
you are going from basically zero overhead
to whatever it takes to try to get both threads to run...
(as you say GIL passes in/out... OS cost to schedule your
thread...etc) consider that this might be just enough
to bump a nice tight processor cache into a more
On Mar 25, 2009, at 5:55 AM, Mads Darø Kristensen wrote:
> Thank you for the explanation. That does make sense, because when I
> measure the time spent performing the tasklets it takes more than
> twice as long when performing two (identical) tasklets, so the added
> 30% is definitely not being spent on my number crunching tasklets.
> I'll be reimplementing my execution environment using processes
> anytime soon :-)
> Best regards
> Kristján Valur Jónsson wrote:
>> There are probably two reasons for this.
>> a) The GIL is released for the duration of any time-consuming
>> system call. This allows time for another thread to step in.
>> b) Aquiring the lock, at least on windows, will cause the thread to
>> do a few hundred trylock spins. In fact, this should be removed on
>> windows since it is not appropriate for a resource normally
>> The effect of b is probably small. But a) is real and it would
>> suggest that a large portion of the time is spent outside of
>> python, performing system calls, such as send() and recv(), hardly
>> -----Original Message-----
>> From: stackless-bounces at stackless.com [mailto:stackless-bounces at stackless.com
>> ] On Behalf Of Mads Darø Kristensen
>> Sent: 25. mars 2009 08:29
>> To: stackless list
>> Subject: Re: [Stackless] question on preemtive scheduling semantics
>> Replying to myself here...
>> I have now tested it more thoroughly, and I get some surprising
>> (surprising to me at least). When running a single-threaded stackless
>> scheduler I get the expected 100% CPU load when i try to stress it,
>> running two threads on my dual core machine yielded a CPU load of
>> approximately 130%? What gives?
>> Seeing as the global interpreter lock should get in the way of
>> more than one core shouldn't I be seeing that using two threads
>> (and two
>> schedulers) would yield the same 100% CPU load as using a single
>> thread did?
>> I'm not here to start another "global interpreter lock" discussion,
>> if there are obvious answers to be found in the mailing list archives
>> just tell me to RTFM :)
>> Best regards
>> Mads Darø Kristensen wrote:
>>> Hi Jeff.
>>> Jeff Senn wrote:
>>>> Hm. Do you mean "thread" or "process"? Because of the GIL you
>>>> cannot use
>>>> threads to overlap python
>>>> execution within one interpreter (this has been discussed at great
>>>> length here many times...) --
>>>> depending on how you are measuring, perhaps you would aspire to get
>>>> 200%, 400% ...etc for multicore....
>>> I mean thread, not process. And what I meant with 100% utilization
>>> 200% for the 2-core Mac I tested on... At least that was what I
>>> I saw - I'll have to test that again some time :-)
>>> Best regards
>>> Stackless mailing list
>>> Stackless at stackless.com
> Med venlig hilsen / Best regards
> Mads D. Kristensen
> Blog: http://kedeligdata.blogspot.com/
> Work homepage: http://www.daimi.au.dk/~madsk
> Stackless mailing list
> Stackless at stackless.com
More information about the Stackless