[Stackless] question on preemtive scheduling semantics

Jeff Senn senn at maya.com
Wed Mar 25 13:34:02 CET 2009


One word: "overhead"

30% does seem a little bit high... but when you have 2 threads
you are going from basically zero overhead
to whatever it takes to try to get both threads to run...
(as you say GIL passes in/out... OS cost to schedule your
thread...etc) consider that this might be just enough
to bump a nice tight processor cache into a more
"thrashy" one...


On Mar 25, 2009, at 5:55 AM, Mads Darø Kristensen wrote:

> Thank you for the explanation. That does make sense, because when I  
> measure the time spent performing the tasklets it takes more than  
> twice as long when performing two (identical) tasklets, so the added  
> 30% is definitely not being spent on my number crunching tasklets.
>
> I'll be reimplementing my execution environment using processes  
> anytime soon :-)
>
> Best regards
> Mads
>
> Kristján Valur Jónsson wrote:
>> There are probably two reasons for this.
>> a) The GIL is released for the duration of any time-consuming  
>> system call.  This allows time for another thread to step in.
>> b) Aquiring the lock, at least on windows, will cause the thread to  
>> do a few hundred trylock spins.  In fact, this should be removed on  
>> windows since it is not appropriate for a resource normally  
>> occupied...
>>
>> The effect of b is probably small.  But a) is real and it would  
>> suggest that a large portion of the time is spent outside of  
>> python, performing system calls, such as send() and recv(), hardly  
>> surprising.
>>
>> K
>>
>> -----Original Message-----
>> From: stackless-bounces at stackless.com [mailto:stackless-bounces at stackless.com 
>> ] On Behalf Of Mads Darø Kristensen
>> Sent: 25. mars 2009 08:29
>> To: stackless list
>> Subject: Re: [Stackless] question on preemtive scheduling semantics
>>
>> Replying to myself here...
>>
>> I have now tested it more thoroughly, and I get some surprising  
>> results
>> (surprising to me at least). When running a single-threaded stackless
>> scheduler I get the expected 100% CPU load when i try to stress it,  
>> but
>> running two threads on my dual core machine yielded a CPU load of
>> approximately 130%? What gives?
>>
>> Seeing as the global interpreter lock should get in the way of  
>> utilizing
>> more than one core shouldn't I be seeing that using two threads  
>> (and two
>> schedulers) would yield the same 100% CPU load as using a single  
>> thread did?
>>
>> I'm not here to start another "global interpreter lock" discussion,  
>> so
>> if there are obvious answers to be found in the mailing list archives
>> just tell me to RTFM :)
>>
>> Best regards
>> Mads
>>
>> Mads Darø Kristensen wrote:
>>> Hi Jeff.
>>>
>>> Jeff Senn wrote:
>>>> Hm. Do you mean "thread" or "process"? Because of the GIL you  
>>>> cannot use
>>>> threads to overlap python
>>>> execution within one interpreter (this has been discussed at great
>>>> length here many times...) --
>>>> depending on how you are measuring, perhaps you would aspire to get
>>>> 200%, 400% ...etc for multicore....
>>> I mean thread, not process. And what I meant with 100% utilization  
>>> was
>>> 200% for the 2-core Mac I tested on... At least that was what I  
>>> thought
>>> I saw - I'll have to test that again some time :-)
>>>
>>> Best regards
>>> Mads
>>>
>>> _______________________________________________
>>> Stackless mailing list
>>> Stackless at stackless.com
>>> http://www.stackless.com/mailman/listinfo/stackless
>>
>
> -- 
> Med venlig hilsen / Best regards
> Mads D. Kristensen
>
> Blog: http://kedeligdata.blogspot.com/
> Work homepage: http://www.daimi.au.dk/~madsk
>
> _______________________________________________
> Stackless mailing list
> Stackless at stackless.com
> http://www.stackless.com/mailman/listinfo/stackless
>





More information about the Stackless mailing list