[Stackless] Connecting To A Database
Jeff Senn
senn at maya.com
Mon Jun 11 17:42:22 CEST 2007
On Jun 11, 2007, at 9:25 AM, Christopher Armstrong wrote:
>
>> So... generally you are not going to get this "pure performance"
>> benefit unless you use another thread (than the one that is running
>> Python) or your OS supports
>> some kind of asynch I/O directly and you have Python support for it
>> (unlikely).
>
> Hold on, what? I must be misunderstanding you. As far as I can tell,
> it's extremely likely that your OS supports asynchronous I/O and that
> your Python does support it. It's called select() (or poll(), or
> epoll, or kqueue, or WaitForMultipleEvents, or Windows' IOCP API).
> Python supports many of these, and Twisted adds support for more. This
> covers most platforms. Is this not the kind of thing you were talking
> about?
Well... yes you are correct... sort of (And you spotted a weakness
in my attempt to sum up the whole topic of in a page -- I, too much,
had the database stuff in my head -- perhaps i should have said
"... *Stackless* Python support for it ...", I was trying to generalize
my argument too much).
But we should also remember that (in general) these are just
implementations of things that divide the time between
the OS kernel and your process. They are (probably) both
implementing this stuff on the same CPU (yes, it muddies the
waters on multiple processor systems).
At some point (possibly) the work is offloaded to whatever is at the
other end of
a "device driver" -- and if this thing (device) has another processor
you indeed
*might* see some benefits from keeping it's buffers full/empty.
[Aside: my worry here is that folks might be sort-of-thinking that
once you get
your request on the "other side" of select/poll/internal-db-call etc,
that it's suddenly "free" and has no performance impact.]
A problem from the (stackless) Python programmer's point of view is one
of scheduling. Since you don't have access to all of these "devices"
*directly*
from Python there is no way to integrate them (generally) into a
scheduler
for stackless... you are at the mercy of the method of "polling for
events" your OS
provides. And you can pretty much use only one method at a time
(hence the
whole ManageSleepingTasklets confusion....) unless you use OS threads to
capture the polling state and "join" back into the scheduler.
Twisted helps in that it makes many higher level protocols act
similarly (in terms
of asynchrony) - so "use Twisted and stackless" is good advice if you
are trying
to overlap I/O & CPU in a situation where there is potential overlap
(and all of
your asynchrony is available in the Twisted impl). You have
to be willing to write your code a certain way... but one might argue
that Stackless
augments the Twisted core in some way. If I understand correctly
this still
requires one extra thread - (but I suppose that is not theoretically
necessary?)
In the database situation, I suppose the thing to think about is
whether there
is any overlap to be exploited -- and, more importantly, where it
is. (e.g. say
in some join operation you are trying to intersect two sets...it
doesn't matter
much whether you interleave the fetching or not. You still have to
wait for the
whole thing to complete before you can move on).
Maybe I just should have said: "Just because you make it asynchronous
does not
mean it will be faster". :-)
-Jas
_______________________________________________
Stackless mailing list
Stackless at stackless.com
http://www.stackless.com/mailman/listinfo/stackless
More information about the Stackless
mailing list