[Stackless] Connecting To A Database

Jeff Senn senn at maya.com
Mon Jun 11 17:42:22 CEST 2007


On Jun 11, 2007, at 9:25 AM, Christopher Armstrong wrote:

>
>> So... generally you are not going to get this "pure performance"
>> benefit unless you use another thread (than the one that is running
>> Python) or your OS supports
>> some kind of asynch I/O directly and you have Python support for it
>> (unlikely).
>
> Hold on, what? I must be misunderstanding you. As far as I can tell,
> it's extremely likely that your OS supports asynchronous I/O and that
> your Python does support it. It's called select() (or poll(), or
> epoll, or kqueue, or WaitForMultipleEvents, or Windows' IOCP API).
> Python supports many of these, and Twisted adds support for more. This
> covers most platforms. Is this not the kind of thing you were talking
> about?

Well... yes you are correct... sort of (And you spotted a weakness
in my attempt to sum up the whole topic of in a page -- I, too much,
had the database stuff in my head -- perhaps i should have said
"... *Stackless* Python support for it ...", I was trying to generalize
my argument too much).

But we should also remember that (in general) these are just
implementations of things that divide the time between
the OS kernel and your process.  They are (probably) both
implementing this stuff on the same CPU (yes, it muddies the
waters on multiple processor systems).

At some point (possibly) the work is offloaded to whatever is at the  
other end of
a "device driver" -- and if this thing (device) has another processor  
you indeed
*might* see some benefits from keeping it's buffers full/empty.

[Aside: my worry here is that folks might be sort-of-thinking that  
once you get
your request on the "other side" of select/poll/internal-db-call etc,
that it's suddenly "free" and has no performance impact.]

A problem from the (stackless) Python programmer's point of view is one
of scheduling.  Since you don't have access to all of these "devices"  
*directly*
from Python there is no way to integrate them (generally) into a  
scheduler
for stackless... you are at the mercy of the method of "polling for  
events" your OS
provides.  And you can pretty much use only one method at a time  
(hence the
whole ManageSleepingTasklets confusion....) unless you use OS threads to
capture the polling state and "join" back into the scheduler.

Twisted helps in that it makes many higher level protocols act  
similarly (in terms
of asynchrony) - so "use Twisted and stackless" is good advice if you  
are trying
to overlap I/O & CPU in a situation where there is potential overlap  
(and all of
your asynchrony is available in the Twisted impl).  You have
to be willing to write your code a certain way... but one might argue  
that Stackless
augments the Twisted core in some way.  If I understand correctly  
this still
requires one extra thread - (but I suppose that is not theoretically  
necessary?)

In the database situation, I suppose the thing to think about is  
whether there
is any overlap to be exploited -- and, more importantly, where it  
is.  (e.g. say
in some join operation you are trying to intersect two sets...it  
doesn't matter
much whether you interleave the fetching or not.  You still have to  
wait for the
whole thing to complete before you can move on).

Maybe I just should have said: "Just because you make it asynchronous  
does not
mean it will be faster". :-)

-Jas








_______________________________________________
Stackless mailing list
Stackless at stackless.com
http://www.stackless.com/mailman/listinfo/stackless



More information about the Stackless mailing list