Index: [thread] [date] [subject] [author]
  From: Jon M. Taylor <taylorj@gaia.ecs.csus.edu>
  To  : ggi-develop@eskimo.com
  Date: Tue, 3 Aug 1999 16:30:54 -0700 (PDT)

Re: Matrox GGI accellerator

On Tue, 3 Aug 1999, Jos Hulzink wrote:

> On Tue, 3 Aug 1999, Rodolphe Ortalo wrote:
> 
> > 
> > So, first, I'd say that the accesses that could be convenient
> > may not always be at the registers level. So, a single
> > command unit may be longer than a byte, it may be a
> > whole struct.
> > (I don't say that exporting acceleration registers to
> > userspace if bad: I say that there may exist different
> > access granularities that may be satisfying enough not
> > to bother doing something differently.)
> > 
> > So, in fact, you need a good way to communicate such commands
> > from the userspace application to the graphic card.
> > 
> > If the command is simply:
> >    {register_offset, value}
> > then.... well, you will have a lot of them, and in that situation,
> > it may be slow, and difficult to control. So, it's typically NOT the
> > ideal case. (Except for criticism of the whole kernel
> > mediation interest. ;-)
> > 
> > But if commands look like:
> >   { DRAW_BOX, X1, X2, Y1, Y2, COLOR }
> > or {TRIANGLE, ... etc... [1] }
> > you will have less commands.than in the above case. Especially
> > when they are 2D commands. With 3D you may have a lot of
> > commands (1000 triangles for one picture is not unusual) - but
> > typically they come in a long stream.
>
> The point is clear, but for huge commands, you will need a kernel-space
> interpreter, increasing the size of the accelleration driver. And that
> must be small.

	Which is why you design the kgicommand interface for your driver to
be as close to the actual hardware command interface as possible, so you have
as little translation work as possible that must be done for each command. 
The translation and sanity-checking process is what kills performance.  If
people want to run their drivers as fast as possible but with a chance to
lock up the accelerator, the solution is not to just give them direct access
to the registers but to turn off sanity checking in the KGI driver's command
interpreter.  Luckily, most new hardware does not need too much sanity
checking anymore, and people will not expect blazing performance out of their
old hardware anyway....
 
> > 
> > Note that, even in this case, you _may_ need to do some
> > access control (cf: the example I gave you for the 546x).
> > In case someone tries to use 'strange' accelerators
> > commands. [2]
> 
> Is it possible to create 'Trusted software ?' I mean, knowing for sure
> that the accel driver you give your file handle to for mmapping accel
> space, is the GGI driver you wrote yourself ?

	That is what XFree86 does.

> And with knowing for sure I mean knowing 22898902490247890 % sure, not
> just 100 %....

	Not unless you implement some sort of formal authentication protocol,
which is a bit out of place in a device driver IMHO.

> Hmm, just thinking... Open source isn't that cool at all here: Some user
> can hack the GGI libs and thus still crash the system. Damn.

	If the KGI drivers are open-source, people can hack them and crash
the system as well.  In fact I do that almost every day |->.  But the average
end-user expects their drivers to work correctly before all else. 
Potentially unsafe performance tweaks such as boosting the RAMDAC clock rate
are allowed by Windows drivers, but they are certainly not the default.  In
fact, many such tweaks require third-party utilities to turn on.
 
> > However, we can take advantage of the fact that
> > these commands come in 'stream' to process
> > them in whole batches.
> > This is the idea of the pingpong buffer. It's pretty
> > similar to buffering in fact.
> > 2 mmap pages are setup. But only one of them
> > is available in userspace. Both are swapped as soon
> > as the userspace page is filled:
> > The empty page is put up in userspace and the application
> > program can continue to put its commands in it.
> > The other page is handled by the card driver which
> > reads the commands, controls them, and issue them
> > to the card.
> 
> Heh, I finally got the idea. But define mmap pages please. Are these 1)
> part of the framebuffer (Video cards RAM) or 2) Kernel memory ? 3)
> Something very logical my stupid mind forgets ?

	Kernel memory.  You ask the KGI driver for a mapping and the one page
shows up in userspace. 
 
> And how to tell a program that a mmapped space it has a pointer to is no
> longer valid ? 

	You don't need to.  When the page is full, the KGI driver's
page-fault handler maps the second page into userspace at the end of the
first page's address space, so the userspace code can continue writing into a
virtual linear address space of arbitrary length.  All the kernel does is
keep swapping the two pages in and out so that userspace can continuously
full one while the kernel continuously drains the other.

> How does the kernel know that the buffer is full ? 

	Page fault when writing off the end of the buffer in usersapce.

> Still
> ioctl ? I'm thinking of an IOCTL that demands the kernel for switching and
> returns not before the other buffer is empty and ready to be filled by the
> program again.

	A "flush me please" ioctl would be necessary when userspace needs to
implement a glFinish() call or something similar.  Otherwise the last
partially-filled buffer page would never get flushed until the next 
frame.

> > You can put whatever command you want in that
> > page (reg+value or complex commands). But the
> > idea is that the latency of issuing ONE command
> > will be greatly reduced, and this is very desirable,
> > especially for 3D.
> > Now, of course, you will always find someone who
> > needs something even faster and who wants full
> > control over this or that register. Well... I'd give
> > it then (or cancel the account).
> 
> Direct register access ? Well... There is an OS for that, called Windows.
> It has some more nice features including crashing every 5 minutes. Maybe
> we can include some crashing features for compatability (see notes below)
> 
> I'm just wondering by myself how big a pingpongbuffer must me for optimal
> speed. You don't want a sync to take ages, just because there was a huge
> buffer still waiting to be filled completely that still needs to be
> executed.

	If you use the flush-on-pagefault approach, you are guaranteed that
the userspace code cannot get very far out of sync with the kernel code.  A
4KB page cannot hold a huge number of commands as far as 3D rendering is
usually concerned, and for slower stuff like 2D GUI accels you will probably
see userspace doing its own manual flushes.

Jon

---
'Cloning and the reprogramming of DNA is the first serious step in 
becoming one with God.'
	- Scientist G. Richard Seed

Index: [thread] [date] [subject] [author]