Index: [thread] [date] [subject] [author]
  From: Steffen Seeger <s.seeger@physik.tu-chemnitz.de>
  To  : ggi-develop@eskimo.com
  Date: Wed, 4 Aug 1999 17:06:25 +0200 (CEST)

Re: Accelleration: A summary

Hello,

> Ping-Pong buffers are implemented by taking two memory pages (isn't that a
> little huge for 2D accelleration ?). One of the pages is mapped into user
> space, the other only in kernel space. For this needs segfault trapping in
> the kernel, KGICON is not suitable, but KGI is the way to go.
> 
> User level drivers get a remapped region, with size 2x the memory page
> size. This mmapped region is defined by a base pointer. Writes to this
> region are done in linear (increasing) order with wrap around after 2x
> memory page size, like:
> 
> Offset = (Offset + CommandSize) & (2 * Pagesize - 1)
> 
> Each time a write is done at the start of a new page (that is at offset
> "0" and "Pagesize"), a PageFault is generated (caused by the fact that the
> kerneldriver has unmapped that page). This pagefault is caught by the
> kernel driver which maps the page, unmaps the other page, and tells some
> accellerator procedure it has work to do. That procedure takes the data of
> the unmapped page and does something with that data.
> 
> The data can consist of two things:
> 
> 1) Commandstructures which have to be interpreted by the kernel
> driver. Negative item here is that this makes the kernel accellerator
> bigger compared with the 2nd option. Positive is that the kernel only has
> to check coordinates for security (If this can crash the system,
> otherwise even that might be omitted). Note here from me: 2D accelleration
> in kernel doesn't take that much space, see for example the ViRGE
> kgicon driver.
> 
> 2) A list of registers and data. Positive here is that the kernel driver
> can be very small, for no interpretation is needed. Negative is that the
> register and data must be checked. (No DMA commands allowed, maybe
> coordinate checking if this can crash the video chipset) 
> 
> Am I right so far ?

Almost. This is the ggi-0.0.9 scheme. In short the KGI-0.9 extensions:

1)	Pagesize can vary, but must be a power of two greater CPU_PAGE_SIZE,
	e.g. 4k, 8k, ..., 128k

2)	more than one buffer (Page) is allowed (the total size of the
	linear region seen by the application is accordingly bigger).

3)	partial buffers (Offset % Pagesize bytes) can be flushed by accessing 
	(Offset+Pagesize) & (Regionsize-1)

4)	given the proper hardware/trusted code, buffers can be directly
	fed to the accelerator using DMA transfers (== the data protocol
	is driver/hardware specific) Wrong data here may crash the system,
	but if it does, the resource may only be exported to trusted code.
	(e.g. indicated by a suid-graphics bit).

5)	each mapping is assigned a context and a priority, with context
	switches done transparently by the driver. This allows you to
	have a virtual accelerator for the 2D driver in X, a separate 
	(virtual) accelerator for the Mesa3D driver inside X, and other 
	processes may have their own mappings as well (direct rendering).

Planned, but only partially implemented:

6)	depending contexts may be attached to a context, e.g. to have
	graphics-process-local AGP texture memory, etc.

Well-designed hardware can easily use this mechanisms to provide an
API-independent very low-level, high-performance interface with minimal
kernel-driver code. Broken hardware may need work-arounds, but
that's a problem with this broken hardware, not the concept.

This is basically the core KGI acceleration concept.
I did not get any reasonable arguments from the 'kernel-gurus' why this is
bad except that it is KGI.

			Steffen

PS: The Permedia is well-designed, which is why I am using it for a reference
driver.

----------------- e-mail: seeger@physik.tu-chemnitz.de -----------------

Index: [thread] [date] [subject] [author]