[Stackless] Stackless and Psyco

Christian Tismer tismer at stackless.com
Wed Jul 21 18:38:54 CEST 2004

Shane Holloway (IEEE) wrote:

> Chris,
> This is great news!  Stackless is becoming more and more a part of the 
> way I think, and as such, wriggling itself into many of the applications 
> I write.  Simply, Thank You.

That's nice to hear!

Well, let me tell about the features where I'm working on.
I'm trying to make the Stackless/Psyco combination as
efficient as possible, without changing too much of
Psyco, just things which are easy with Stackless support.

Here we go:

- Psyco cannot accelerate generators.
- Stackless' generators are not very fast.

Psyco's problem is that it is not written in a way that
allowes to jump off from a generator call via yield.

Stackless' problem is that it makes generators pickleable,
which means generators must always start from the toplevel,
and this is expensive.

I was planning to do something special with real C stacks
since quite a while:
For special-purpose tasklets, which are supposed to be
*very* fast in respect to context switching, I want to
add a third way of switching, not copying stacks, not
leaving the interpreter, but having a stack pool, which
allows to switch the hardware stack with a few instructions.

For Stackless only, the effect of this would be moderate,
since we are still interpreting slow code.
Now, the combination with Psyco changes things dramatically:
If we have Psyco-accelerated tasklets which do many context
switches, we also want these switches to be ultra fast, in
order to keep frequent switching feasible.

Which then led me to this approach:
Whenever a user uses Psyco on a tasklet, that tells us that
he wants ultimative speed and doesn't care about space.
Exactly in that case, we also use the stack pooling.
By doing that, I can implement generators as simple function
calls, including a stack switch. That means, I can change
Psyco for Stackless to not forbid generators any longer,
but to just call a special function, which happens to
switch stacks, and continue with the interupted generator.


A completely different speed optimization is the following:
Psyco (at least the Stackless version) will allow to enforce
inlining of function calls.

Reason: For writing fast applications with a couple small
functions involved, having real function calls is a
show stopper, since in Psyco, return values are *always*
turned into PyObjects.

Using self.some_attr to pass values around between different
methods is expensive, too. Using nested functions with
shared variables also doesn't work, since Psyco can't handle
Instead, looking for a simple, halfway structured way to
keep code readable, but to produce light-speed code, Armin
and I came up with the idea to enforce inlining, and to use
a tuple to move state around.

Here is a code example, untested, just sketching:

class PDFParser:
     def parse(self, ...)

         def handle_string(state):
             # not that this must read continuation lines
             line, left, source = state
             # process line, and maybe follow-up lines
             state = line, left, source
             return state, tok
         def handle_xxx...

         source = self.stream

         for line in source:
             left = len(line)
             while left:
                 c = line[-left]
                 left -= 1
                 state = (line, left, source)
                 if c in whitespace:
                     state, tok = handle_white(state)
                 elif c == "/":
                     state, tok = handle_label(state)
                 elif c == "(":
                     state, tok = handle_string(state)
                 elif ....
                 line, left, source = state

                 yield tok  # see part 1 of the marriage

The important part in this implementation is, that if all the
sub-functions are inlined, the assignment to the state tuple
doesn't take place at all. It is just optimized away, and
all variables stay where they are, as optimized as they are.

No parameters are passed by nested scopes, since this would
drop all optimization. No parameters are passed through
self or other object references, but everything walks through
the virtual tuple. After all, the whole parser function will
run at almost C speed.

Well, this is the plan. Some initial tests are done.

cheers -- chris

Christian Tismer             :^)   <mailto:tismer at stackless.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/

Stackless mailing list
Stackless at stackless.com

More information about the Stackless mailing list