Index: [thread] [date] [subject] [author]
  From: Jos Hulzink <josh@stack.nl>
  To  : ggi-develop@eskimo.com
  Date: Tue, 3 Aug 1999 16:26:24 +0200 (CEST)

Re: Matrox GGI accellerator

On Tue, 3 Aug 1999, Rodolphe Ortalo wrote:

> However, I suspect that, to provide such mechanisms, in fact, we
> need to make assumptions on the way the hardware is designed.
> IF the 'safe' accelerations registers are well isolated, say
> between 0x800 and 0x8ff (offsets in the MMIO aperture) then
> we can allow direct access from userspace in that area. (At least
> from the security point of view: safe means that a malicious
> program will not burn your monitor or lock the PCI bus.)
> BUT this is apparently not the case with 'common' hw.

With many videoprocessors, you can forget that. Internally the huge
address range is converted to a small registermapping. The registers for
DMA access are the same as for linedrawing (On a ViRGE you even need to
reinitialize the engine if you use 2D and 3D accel at the same time
because many 2D configuration registers are also Z-buffer configuration
registers). So disabling DMA address space can not prevent using the 2D
line address space to do DMA !! (I know... shoot S3 and the others)

<SNIP>

> First:
> 
> It seems people always think to virtualization at the registers
> level. But in fact, this is not mandatory. E.g. to access the
> useful acceleration features of a card like a ViRGE of
> Cirrus Logic, you need to virtualize things like
> 'DRAW_BOX', 'COPY_BOX' (2D) or 'LINE', 'TRIANGLE'
> (3D), or TEXTURE_SETUP, etc.

I thought that was called GGI ? :)

> 
> So, first, I'd say that the accesses that could be convenient
> may not always be at the registers level. So, a single
> command unit may be longer than a byte, it may be a
> whole struct.
> (I don't say that exporting acceleration registers to
> userspace if bad: I say that there may exist different
> access granularities that may be satisfying enough not
> to bother doing something differently.)
> 
> So, in fact, you need a good way to communicate such commands
> from the userspace application to the graphic card.
> 
> If the command is simply:
>    {register_offset, value}
> then.... well, you will have a lot of them, and in that situation,
> it may be slow, and difficult to control. So, it's typically NOT the
> ideal case. (Except for criticism of the whole kernel
> mediation interest. ;-)
> 
> But if commands look like:
>   { DRAW_BOX, X1, X2, Y1, Y2, COLOR }
> or {TRIANGLE, ... etc... [1] }
> you will have less commands.than in the above case. Especially
> when they are 2D commands. With 3D you may have a lot of
> commands (1000 triangles for one picture is not unusual) - but
> typically they come in a long stream.
The point is clear, but for huge commands, you will need a kernel-space
interpreter, increasing the size of the accelleration driver. And that
must be small.

> 
> Note that, even in this case, you _may_ need to do some
> access control (cf: the example I gave you for the 546x).
> In case someone tries to use 'strange' accelerators
> commands. [2]

Is it possible to create 'Trusted software ?' I mean, knowing for sure
that the accel driver you give your file handle to for mmapping accel
space, is the GGI driver you wrote yourself ?

And with knowing for sure I mean knowing 22898902490247890 % sure, not
just 100 %....

Hmm, just thinking... Open source isn't that cool at all here: Some user
can hack the GGI libs and thus still crash the system. Damn.

> 
> We end up with something that resembles to ioctl, but
> the problem is that we face long streams of commands,
> and _very_ _very_ impatient users.
> 
> So, if, each time one command is issued by the application
> software there is a switch to kernel context, an access
> control check, and the emission of an accelerated
> command to the graphic card before going back to
> the application software for the next command....
> Well, we end up with only 4x or 5x acceleration in
> 2D. (No figures available yet for 3D. And that's a
> pity.)
> [This is the 'ioctl' way.]

I got 10 / 11 times here for 2D, never thought a ViRGE was that much
faster copying data as a Pentium 2 233, so I'm happy with it :) I must
say, got pgcc -O7 -march=pentiumpro kernel and libs, ioctl will be fast
here I guess...

> However, we can take advantage of the fact that
> these commands come in 'stream' to process
> them in whole batches.
> This is the idea of the pingpong buffer. It's pretty
> similar to buffering in fact.
> 2 mmap pages are setup. But only one of them
> is available in userspace. Both are swapped as soon
> as the userspace page is filled:
> The empty page is put up in userspace and the application
> program can continue to put its commands in it.
> The other page is handled by the card driver which
> reads the commands, controls them, and issue them
> to the card.

Heh, I finally got the idea. But define mmap pages please. Are these 1)
part of the framebuffer (Video cards RAM) or 2) Kernel memory ? 3)
Something very logical my stupid mind forgets ?

And how to tell a program that a mmapped space it has a pointer to is no
longer valid ? How does the kernel know that the buffer is full ? Still
ioctl ? I'm thinking of an IOCTL that demands the kernel for switching and
returns not before the other buffer is empty and ready to be filled by the
program again.

> 
> You can put whatever command you want in that
> page (reg+value or complex commands). But the
> idea is that the latency of issuing ONE command
> will be greatly reduced, and this is very desirable,
> especially for 3D.
> Now, of course, you will always find someone who
> needs something even faster and who wants full
> control over this or that register. Well... I'd give
> it then (or cancel the account).

Direct register access ? Well... There is an OS for that, called Windows.
It has some more nice features including crashing every 5 minutes. Maybe
we can include some crashing features for compatability (see notes below)

I'm just wondering by myself how big a pingpongbuffer must me for optimal
speed. You don't want a sync to take ages, just because there was a huge
buffer still waiting to be filled completely that still needs to be
executed.

> Rodolphe (Ortalo)
> 
> 
> [1] I don't remember the parameters in fact... You need
> the top, left, right and 2 slopes no - and the color or
> texture of course ? :-(
> [2] Note that both the commands AND the accelerator
> are strange. ;-)

[1] You forget the fogging, alpha, and time before crashing parameters
(The last for Win98 compatability)

[2] Hey, aren't accellerator and strange the same ? Besides, you forget
the accellerator programmers.

Jos

P.s. sorry for this huge amount of questions, I'm an Electrical engineer
learning Kernel programming, not a real engineer in Computer Science

Index: [thread] [date] [subject] [author]