[Stackless] verify my performance numbers?

Thu Feb 22 10:29:23 CET 2007

On Jan 30, 2007, at 3:47 PM, Santiago Gala wrote:
> No need for it being re-entrant, it works because both producer and
> consumer calls that use it are wrapped with a shared lock, i.e.,  
> marked
> as critical sections in a fairly standard way. Those are the
>
> with lock:
>     <code>
>
> calls in the producer and the consumer. Basically only one thread
> touches it at a time.
>
> The Queue class is supposed to do the same, but it looks, on a quick
> glance, overoptimized, which makes it slower. I could be wrong, I'm  
> not
> an expert pythoneer, but it looks like too much python code for  
> what it
> is supposed to do.

I took a look into your approach and included its timing numbers
on my PyCon talk.  I found that using Stackless was slightly
faster than using a thread+lock

    Normal Python, counter in handler:  1.46 s
    Stackless iterparser:  2.25 s
    threaded, lock around a deque:  2.34 s
    threaded, using Queue.Queue(1000): 13.1  s

Here's my final test code

from __future__ import with_statement
import xml.sax
import thread
import threading
import collections

class LockingIterHandler(xml.sax.ContentHandler):
     def __init__(self, lock, queue):
         self.lock = lock
         self.queue = queue
     def startElement(self, tag, attrs):
         with self.lock:
             self.queue.append( ("startElement", (tag, attrs)) )
     def endDocument(self):
         with self.lock:
             self.queue.append(None)

def iterparse(filename):
     queue = collections.deque()
     lock = thread.allocate_lock()
     handler = LockingIterHandler(lock, queue)
     parser = xml.sax.make_parser()
     parser.setContentHandler(handler)
     threading.Thread(target=parser.parse, args=(filename,)).start()
     while 1:
         if queue:
             with lock:
                 while queue:
                     data = queue.popleft()
                     if data is None:
                         return
                     yield data
         time.sleep(0.001)

import time
t1=time.time()
count = 0
for event, args in iterparse("iTunes Music Library.xml"):
     if event == "startElement":
         count += 1
t2=time.time()
print count, "elements"

print t2-t1

With some experimentation I found the reason why it's
successful is that the producer is much faster than
the consumer.  The queue gets a lot of items on it
which are then processed in one lump.

That amortizes the function call overhead of all
of the with statement calls.

For example, if I replace

             with lock:
                 while queue:
                     data = queue.popleft()
                     if data is None:
                         return
                     yield data

with something which consumes 1 item at a time in the lock
rather than a block of items

             with lock:
                 data =  queue.popleft()
                 if data is None:
                     return
                 yield data

then the time lengthens to (as I recall) about 20 seconds.

So for this example (converting XML SAX processing
into an iterator) it works, but the approach is not
a general purpose solution.

Also, it assumes the deque may be unbounded, while
some may want at most some fixed N read-aheads.  That
would slow down processing a bit, though I don't
think it would be much.

                                 Andrew
                                 dalke at dalkescientific.com

_______________________________________________
Stackless mailing list
Stackless at stackless.com
http://www.stackless.com/mailman/listinfo/stackless