Index: [thread] [date] [subject] [author]
  From: Jon M. Taylor <taylorj@gaia.ecs.csus.edu>
  To  : ggi-develop@eskimo.com
  Date: Tue, 3 Aug 1999 16:02:32 -0700 (PDT)

Re: Matrox GGI accellerator

On Tue, 3 Aug 1999, Rodolphe Ortalo wrote:

> James Simmons wrote:
> 
> > They know that accel handling, not accel registers belongs in the kerenel.
> 
> Yes that's the usual distinction between the 'authorization to use'
> and 'real use' of a resources. Of course, initial authorization as well
> as regs. access setup should be done by priviledged software (and,
> as these resources are hardware resources, it really seems to me
> that the target software is the kernel).
> The 'normal usage' accesses may be done differently - but well,
> it often depends on the hardware design it seems.

	The kernel controls access to various regions of hardware address
spaces.  What it gives to userspace is an abstract, virtualized acceleration
context.  As long as the concept of 'accelerator context' is kept nice and
abstract and generic, pretty much anything can be put behind it.

> > That is the distiction that has to get across. Their has been some
> > discussion on this. In fact I had a talk with one of the kernel developers
> > today about this. The BIGGEST thing is the need for proper virtualization
> > of the accel engine. Security naturally comes out of this. I have been
> 
> I understand that such virtualization is very difficult to obtain. 

	Not if the accel interface is abstracted enough from the hardware. 

> In fact,
> from all the issues that have been discussed in the past on the
> GGI project wrt. that, it seems that a single 'perfect' approach
> may not exist. 

	I would say that the concept of an accel command FIFO with associated
state mapping buffers should be close enough to a perfect approach that any
hardware that I know about could be handled well this way...?

> > working on the design for /dev/gfx to deal with such issues. I'm modeling
> > /dev/gfx after the way it works on SGI workstations. RIght now if I where
> 
> SGI hardware is probably the strongest hw design. 

	Up until a couple of years ago, this was true.  However, after having
seen the guts of Savage4 and TNT2, I can say that this is not true anymore,
and after the next generation of PC video chipsets hits the streets, SGI will
be left in the dust.  That is why the have chosen to partner so closely with
nVidia recently.

> So you may have the possibility
> to cleanly separate access control from 'functional' accesses - that is to
> say that you can feasibly define a /dev/gfx interface. But with other
> types of hardware it may be difficult.

	I am not so sure that we should be slavishly following SGI's 
/dev/gfx interface, however good the general idea of a virtualized accel 
engine device may be.
 
> > stations. Plus with the window idea you can mmap the zbuffer the size
> > of the window. Even if the card only allows a z buffer the size of the
> > mode /dev/gfx will only mmap the window area of the zbuffer. This way
> 
> To do that you really need very fine grain control. Is it really possible to
> mmap any portion of a SGI card framebuffer (at 1 pixel resolution) ? Don't
> you have things like 'page granularity' etc. that would hinder the idea?
> Note that this idea adresses confidentiality of the fb in fact, no?
> Do you really think that window granularity is necessary ? Personnally,
> I'd say that a secure application could simply open a _new_ graphic
> console. This would prevent a secure and an unsecure application
> to display within two windows on the _same_ graphic console. But,
> would it be a (security) feature or a bug ? (Your mileage may vary
> of course, but the answer is not so evident - to me at least. :-)

	Correct.  If you cannot do fine-grained locking in hardware, you have
to make your context virtualization more abstract and more disconnected from
the hardware.  And it is only recently with AGP that PC video hardware has
even had the ability to manage page-mapping over its address aperture, never
mind doing its own fault handling, etc.  This type of functionality cannot be
assumed to exist when designing an interface to the accelerator. 

> > struct gfx_registers {
> >         line_x 0x34;
> >         line_x_mask 0xff12;
> >         line_y 0x35
> >         line_y_mask 0xff13;
> >         ...
> >         ...
> > }
> 
> Again this requires very fine grain control over hardware access. Is
> it really feasible to provide such fine access control (at the bit level
> sometimes) without requiring implicitly a kernel mediation on each
> access to a register ? If it's possible, this approach is IMHO
> pretty good.

	It is impossible on any MMU I have ever heard of. 
 
> However, I suspect that, to provide such mechanisms, in fact, we
> need to make assumptions on the way the hardware is designed.
> IF the 'safe' accelerations registers are well isolated, say
> between 0x800 and 0x8ff (offsets in the MMIO aperture) then
> we can allow direct access from userspace in that area. (At least
> from the security point of view: safe means that a malicious
> program will not burn your monitor or lock the PCI bus.)
> BUT this is apparently not the case with 'common' hw.

	Correct.  This is why this approach works with SGI's hardware.  
Older PC video hardware is designed to run with one global resource lock 
for all hardware resources (so no protection/privilege levels are 
necessary).  Even in newer hardware which use command FIFOs, all register 
accesses are assumed to be encapsulated by the device drivers.

> > Hum. Can you expalin in detail the ping pong method. I sort of have a
> > idea on how it works but not quite. Thanks.
> 
> First:
> 
> It seems people always think to virtualization at the registers
> level. But in fact, this is not mandatory. 

	I would actually say that under no circumstances should it be
_necessary_ to give userspace direct access to hardware registers in order to
get acceptable performance or implement a direct rendering model.  A command
FIFO with ping-pong buffers will reduce the number of kernel-user transitions
to a level where they are no longer the weak link in the performance chain. 
Latency will go up somewhat, but given that most video hardware cannot
display more than ~100 frames per second anyway, this is unlikely to be a
noticeable problem.

> E.g. to access the
> useful acceleration features of a card like a ViRGE of
> Cirrus Logic, you need to virtualize things like
> 'DRAW_BOX', 'COPY_BOX' (2D) or 'LINE', 'TRIANGLE'
> (3D), or TEXTURE_SETUP, etc.
> 
> So, first, I'd say that the accesses that could be convenient
> may not always be at the registers level. So, a single
> command unit may be longer than a byte, it may be a
> whole struct.
> (I don't say that exporting acceleration registers to
> userspace if bad: I say that there may exist different
> access granularities that may be satisfying enough not
> to bother doing something differently.)

	Exactly!  Why go through the hassles of trying to safely allow
usersapce to directly access registers (which will be different for every
single chipset), when it just plain is not necessary?

[snip]
 
> When you think to it, the whole system is pretty
> adapted to graphic cards programming. In fact,
> these 'commands' can be made very similar
> to display lists - so the kernel driver may not
> really need to interpret or rewrite them, it
> simply needs to control them. The latency was
> the big problem, and with that scheme, the
> limiting factor may be the acceleration engines
> hardware fifos (which typically contain a few
> commands) or the speed of the accesses from
> the card to the display lists. (At least: maybe something
> the GGI project will not be blamed for. Well, they
> will find something else... :-( )

	Newer hardware either has deep (hundreds of commands) FIFOs or DMA
command buffers.  Buffering, latency and synchronization of accel commands
will be a significant issue, whether or not this is done in userspace or the
kernel.  The advantage of an in-kernel approach like the ping-pong buffers is
that the flush synchronization can be hooked to interrupts (flush-on-vsync,
flush-on-hardware-fifo-low-watermark, etc) or page faults when writing 
off the end of the buffer is userspace, as KGI does with the ping-pong 
buffers.

	The primary concern is to enable the userspace code to write commands
into its buffer as fast as it can, while simultaneously translating the
kgicommands into hardware-specific commands and feeding them into the
hardware FIFOs/DMA buffers in the KGI drivers.  If the kernel part is not
efficient enough, the userspace code will get blocked a lot when the
ping-pong buffers fill up faster than they can be drained to the hardware. 
If the userspace code is not efficient enough, OTOH, the kernel drivers (and
the hardware they control) will be spending too much time waiting around for
new data to process.

	The bottleneck here is the translation from kgicommands to hardware
commands/register writes.  This is why it is _critical_ that the kgicommand
interface for a particular KGI driver be able to map closely to the actual
hardware command structure, and that the userspace GGI target code can not
assume that there will be some kind of generic kgicommands which exist across
all drivers.

> However, this scheme does _not_ prevent someone
> to mmap a full portion of the register to userspace,
> and use them directly from there to obtain maximal
> throughput - POTENTIALLY bypassing the kernel
> driver for any control of course then. (I say potentially
> because this may depend on the hardware features, on
> the SGI you may setup such interaction without
> compromising safety - but frankly, even with SGI
> hardware I would be surprised that security will
> really be guaranteed [think malicious! ;-)]. On the Cirrus
> Logic anyway, forget about both...)

	Again, I question the necessity for this.  In the past, we have
presented the ability to directly map the registers as an issue of allowing
for less safety for those who want maximum speed.  I am increasingly of the
opinion that the extra locking required of the userspace code in this type of
situation removes any potential for real-world performance gains.

Jon 

---
'Cloning and the reprogramming of DNA is the first serious step in 
becoming one with God.'
	- Scientist G. Richard Seed


Index: [thread] [date] [subject] [author]