[Stackless] verify my performance numbers?

Santiago Gala santiago.gala at gmail.com
Tue Jan 30 23:47:38 CET 2007


El mar, 30-01-2007 a las 02:37 +0000, Andrew Dalke escribió:
> On Jan 29, 2007, at 12:13 AM, Santiago Gala wrote:
> >  My version substitutes the bloated and hyperlocked Queue class by a
> > deque + simple locking, using with for cleaner code.
> 
> Interesting.  I didn't think about that because the deque class
> doesn't have built-in guarantees that it would be usable across
> multiple threads.  It works in CPython because of the implementation
> guarantees, but I don't know know if that can be considered portable.
> 
> Because the CPython implementation uses a C type and none of the
> append or pop calls are re-entrant it works.
> 

No need for it being re-entrant, it works because both producer and
consumer calls that use it are wrapped with a shared lock, i.e., marked
as critical sections in a fairly standard way. Those are the

with lock:
    <code>

calls in the producer and the consumer. Basically only one thread
touches it at a time. 

The Queue class is supposed to do the same, but it looks, on a quick
glance, overoptimized, which makes it slower. I could be wrong, I'm not
an expert pythoneer, but it looks like too much python code for what it
is supposed to do.

> > A substantial part of the overhead, thus, is due to the Queue class, 
> > though
> > stackless is still 50%+ faster than a simple deque+lock solution, plus 
> > cleaner code
> 
> In checking the analysis I did realize that the thread doing
> the pushes onto the Queue/deque (the producer side) can be faster
> than the consumer.  I tested it and found that the Queue can have
> several thousand pending messages while the stackless version of
> course only has one.
> 
> Using Queue.Queue(1) nearly doubled the run-time but I need
> to verify that.  I got the exception
> 
>     Unhandled exception in thread started by
>     Error in sys.excepthook:
> 
>     Original exception was:
> 
> and need to figure out what that means.
> 

I guess it is the same that I got before the test for len(queue): the
extracting thread arrives before the pushing one has put anything. I let
the consumer sleep for 0.01 seconds while the queue was empty.

But, as a java programmer, and leaving aside potential problems in the
Queue class, I like a lot how simple is the stackless approach for this
kind of producer-consumer, and 50% faster than a much more tricky and
bug prone implementation like the shared-lock one. 

> Also, the iTunes XML has a reference to a DTD and the expat
> parser interface tries to resolve it, so part of the overhead
> is doing a network lookup of
>    "http://www.apple.com/DTDs/PropertyList-1.0.dtd"
> 
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> 
> _______________________________________________
> Stackless mailing list
> Stackless at stackless.com
> http://www.stackless.com/mailman/listinfo/stackless


_______________________________________________
Stackless mailing list
Stackless at stackless.com
http://www.stackless.com/mailman/listinfo/stackless


More information about the Stackless mailing list