Index: [thread] [date] [subject] [author]
  From: becka@rz.uni-duesseldorf.de
  To  : ggi-develop@eskimo.com
  Date: Tue, 2 Feb 1999 00:45:56 +0100 (MET)

Re: The battles.

Hi Morten, nice to have you on that project as well ;-) !

> I write a GGI-application, and I want it to run fast and
> use accels where available.  If accels are not available
> for some operation I want to do, like "put-some-image"
> (which is easily detected as you explained), I may *still*
> have to do "put-some-image"... Question is *how* do I do
> it?

> Case 1: If "put-some-image" is part of the GGI-primitives,
>         it would be insane not relying on GGI to do a good
>         job doing it for me, so I use the GGI-primitive.

Yes.

> Case 2: If "put-some-image" is not part of the GGI-
>         primitives, like z-buffering that you mentioned,
>         I may have to do it myself, using GGI-primitives
>         when my algorithm finishes.

Yes.

> In order to make my program run everywhere, I should as a
> last resort, rely only upon GGI-primitives available
> everywhere.  

The _LIB_GGI primitives are avilable everywhere. That is:
If some card doesn't do something accelerated, LibGGI jumps in and
does it in software. LibGGI can query the available accel features
and can fall back to other methods, where they are not available.

All LibGGI calls jump through a redirection table, that gets filled
with several layers of modules. We start with a "generic-stub"
library, that does everything by falling back to simpler primitives.

On top of that one usually loads a framebuffer method, like say "linear-8"
which implements Get/PutPixel and overrides some functions from generic-
stubs that it can do better (like HLine by memset()).

On top of that, one can for example have the generic KGI ioctl layer,
that overrides some functions after runtime-querying their availability
in hardware.

On top of that, you can even go further, and if you have a special
"direct" acceleration lib for the card (via a mmaped accelerator
engine), this again can override the "slower" ioctl methods.

> My program may even be very simple, and *only*
> uses primitives and has no special tests for accels.

Yes. Tests are evil and shouldn't be done by the application.
Leads to very messy programs.

> Now, the problem is:  What if 50% of those primitives
> are available as accels, and 50% are not?
> 
> Example: I draw lines, so I tell GGI to draw lines.  My
> gfx only supports accels for horisontal and vertical
> lines (rectangles), but that is something GGI should
> figure out, right?

Yes. You just tell LibGGI to draw a line. It will automatically hit
the best implementation due to that "override" scheme I detailed
above.

> The result would be mixed user-space SW and accels,
> introducing context-switching that reduces the speed of
> the draw-line primitive.

Yes. However LibGGI does some checks to keep things sane.
If the accelerated action (like DrawRectangle) is so small,
that it doesn't make sense to call into the kernel at all,
it won't do so. That is: It knows the cost of the accel call
and will use software methods, if that is faster.

That is: Even if your program _is_ running suboptimal, as
you do box();line();box();line(); and thus alternate between
SW and HW drawing, it still runs as fast as possible with that
command ordering and without mixing up HW and software rendering
in one protection ring.

_If_ the boxes can be done faster in SW, they are.

> If the primitive had a kernel-side SW implementation,
> a context-switch would possibly have been avoided.

Yes, but that would blow up kernel size too much. Linus would kill us.

> Intermingeled accels/SW operations would be a selling
> point for user-space accels only (Ie. the X-server),

Well. We always said, that it is _a_little_ faster, if you bang the
hardware directly. On the other hand, it is completely unsafe.

You do not gain so much from avoiding that context-switch. 
Not enough to do what Win NT does since 4.0: It has the whole 
graphics subsystem at kernel level. Well ... It is faster. 
But the price you pay is stability. 
The less code in the kernel, the better. In that point, I agree
with Linus. 

And after all: The question is pretty academic. Most modern accels
have the most common primitives, and all of them. And most programs
will either use the higher primitives a lot, or rather use 
DirectBuffer.

Or to put it in another way: Nothing is perfect. And we should try 
to find a reasonably good middle path.

1. Putting all in either the kernel or userspace is fastest. No question.
1a. Putting all in kernelspace bloats the kernel.
1b. Putting all in userspace is insecure.

2. Splitting things up introduces a speed loss due to the context switches.

For GGI we have chosen method 2, because:

1a. would never be accepted by the kernel folks, and due to the rising
probability of having a bug somewhere, if we put more and more into the
kernel, I have a bad feeling about that as well. Keep critical code
simple. That's why passwd isn't a graphical application.

1b. is insecure, which is reason enough.

Also note, that on really _GOOD_ cards (which are rare), you don't need
to go into the kernel for simple acceleration stuff at all.
You just mmap the accel and bang at it from userspace.

But you should still have to ask the kernel to set up a mode of
4000x3000 at 134Hz, as this might crash-and-burn your monitor ...

> since context-switching would be eliminated, and the
> input feed to X would be heavily buffered, shared
> memory even. Hmm.

Note, that feeding X requires a context-switch. Even a quite expensive
one. It requires a task-switch. Of course you can then again save some
switches _within_ the X server. It's always a tradeoff ...

> I know this is a can of worms, and I can't really
> imagine how Mr. Torvalds would react upon discovering
> an attempt to put software line-drawing into the
> kernel.

I suppose I can tell you. It will have to do something with a wet haddock
or in that case probably some larger fish or even marine mammal.

CU, Andy

-- 
= Andreas Beck                    |  Email :  <andreas.beck@ggi-project.org> =

Index: [thread] [date] [subject] [author]