Index: [thread] [date] [subject] [author]
  From: Jon M. Taylor <taylorj@ecs.csus.edu>
  To  : GGI mailing list <ggi-develop@eskimo.com>
  Date: Sun, 21 Feb 1999 01:54:08 -0800 (PST)

RFC: Abstract register handling in KGI

	I'll get to the title subject in a bit but first some background:

	In the course of my KGI driver development at creative, I've quickly
come to realize something: KGI does not have enough development and debugging
tools.  This makes driver development a LOT harder and more time-consuming
than it needs to be.  I mentioned once before that a KGI-based equivalent of
WHATVGA220.EXE would be very useful.  WHATVGA220.EXE comes with VGADOC4B.ZIP
and is a very useful and versatile video card testing and debugging utility. 
It is a combination of the functionality of tunemode, setmon, testpattern and
grabmode, with a nifty interactive register dumping/twiddling feature thrown
in.  Quite nice. 

	Anyway, I was thinking about how best to design such a utility for
KGI drivers, and I came around to thinking about how to implement in the KGI
drivers themselves the ability to dump and twiddle their own registers.  Upon
consideration, I realized that the current state of register handling in KGI
drivers is a real mess.  Registers and register sets are defined ad hoc,
there's too much emphasis on the [S]VGA-style register sets, and the register
I/O methods are way too ad hoc as well.  A chipset_io.inc file is what most
extant KGI drivers use right now.  It will be a real PITA to wrap a good
register-level debugging toolset around this mess without having to write
tons of bridge code for all the special-case handling for each
chipset/ramdac/clockchip's peculiarities.

	An example:  I want my KGI driver debugger/tester to be able to dump
a mode timing set from the clock driver AND also directly from the registers. 
This would be quite useful to determine if your clock driver is doing what it
should be.  But to do this requires knowing what bits of what registers the
PLL M, N, etc values are in, and on top of that you may have to know how to
read/write the registers to do so.  But you really don't want to burden the
mode-timing dump code itself with all that crap.  It should only have to know
about sysncstart, syncend, blankstart, etc etc.  It should be able to tell
KGI or the KGI driver, 'Ah, you are an S3 Trio64V+.  My little data structure
here tells me that the mode timing values I want are represented by these
bits in these registers of yours.  Please do whatever you have to do to 
get me the info I need'.

	So, naturally my thoughts turned to how best to rectify this problem. 
If ad hoc is the problem, I thought, a consistent yet abstract framework for
register and register set definition, representation, handling and metadata
association might be the answer.  So, as with LibGGI3D, I tried to strip
things down to the core concepts.  What are registers?  What are registers
sets?  What types of categories do each fall into?  What metadata needs to be
attachable to registers and register sets in order to know what needs to be
done with them, when it needs to be done or not done, and under what
curcumstances?  If those questions could be answered, it would then be
possible to abstract out all register I/O from the drivers into a common
framework which would be light-years easier to work with, as well as more
powerful, and would also be flexible enough to handle all the wierd things
that are done with registers on video cards.  And there's a lot of really
wierd stuff there to handle, so the design had better be good. 

	I belive I have come up with a decent design for such a framework.  
It consists of the following basic underlying abstractions:

* Register I/O spaces.  This is already handled by the underlying KGI
bus/system abstraction, which will be even better in Steffen's new KGI 0.9. 
It needs virtualization, however.  A register I/O space is a linear chunk of
registers and sub-I/O spaces with associated metadata.  Think of I/O spaces
as register sets with common properties or behaviors.

* Registers, which are bitstrings of arbitrary length.  A register can be
physical or virtual.  Physical registers are those which are directly
adressable as a single unit by the hardware.  Virtual registers are composed
of a set of bitstrings, each of which can come from any set of bits in any
register(s), physical or virtual.  Virtual registers can be super-registers
(like a PLL M value composed of 22 bits in three 8-bit registers) or
sub-registers (like when the top four bits of an 8-bit register hold the DAC
mode and the lower three hold clock control bits or something).  Registers
also have associated metadata.

* I/O operations.  These are basic primitive operations that can be 
applied to physical and virtual I/O spaces and physical and virtual 
registers: Read, write, read-modify-write, remap, etc.  Each op has a 
function hook associated with it, much as a kgi_display can have a 
set_origin() hook.

* Metadata.  Registers and register I/O spaces can represent an almost
infinite number of different data types, and furthermore each data type can
be handled in a number of different ways.  In order for the driver to be able
to divide the hardware resources into whatever resource abstractions it needs
to and still be able to do everything it needs to do with those 
resources, a matadata hook must be provided.  Same principle as the 
private data hook in most kgi_* data structures.


	By using a system like this, the driver can handle the hardware
registers in any layout, represent them in whatever alternative layout it
wants, and handle all possible dependencies and interrelationships between
them in an infinitely fine-grained manner.  And yet, complete abstraction of
all data types allows higher levels of abstraction in the driver code and/or
in userspace utilities to have access to all of the internal data structures
descibed above without having to know **ANYTHING** about what the data
structures represent or how to access them!  Such code will of course not
necessarily know what the data structures really represent (although such
intelligence could be duplicated outside the driver if necessary), but since
all the I/O methods and the data structure relationships are abstract, it
doesn't have to be! 

	Such a system would also make it MUCH easier to implement
cross-platform abstract I/O in general, which is necessary for portability. 
As I have said before, a Trio64 chipset driver should not have to worry about
what type of bus the card uses, what type of CPU is driving the bus, whether
or not the registers are VGA offset-mapped or MMIO banked or linear MMIO
mapped, whether two 16-bit reads instead of one 32-bit read are needed for a
particular set of registers in a particular memory mapping configuration, etc
etc.  It should program registers and let other code handle the gory details. 

	Let me give some abstract examples of how this virtual register
thingy would work.  This is off the top of my head, so I'm sure this will be
suboptimal.  Also, I basically made up some of the register definitions, bit
field definitons, etc.  Roll with it.  The basic ideas I'm trying to
illustrate should be clear, however:

===============

Chipset: S3 Trio64V+

  Register space 1: VGA 8-bit I/O.
  read_hook: { read all hardware regs, using offsets when necessary }
  write_hook: { same for writing all regs }
  read-modify-write hook: NULL (not applicable to this object)
  metdata:
   dirty bit
   vertical retrace bit  

    Register space 1a: Physical MISC, FCTRL, CRT/SEQ/ARC/GRC index regs, 
    etc.  No indexed 'virtual' regs, physical only.

      Register 1a1: MISC

        Virtual register 1a1a: MISC_VGA_COLOR_OR_MONO = b[2:1] of MISC
        Virtual register 1a1b: MISC_DISPLAY_ACTIVE = b[3] of MISC

    Register space 1b: Virtual indexed CRT reg linear mapping

      Register 1b1: CR36

        Virtual register 1b1a: CR36_BUS_TYPE = b[1:0]

    Register space 1c: Virtual indexed SEQ reg linear mapping

    Register space 1d: Virtual indexed ARC reg linear mapping

    Register space 1e: Virtual indexed GRC reg linear mapping

  Register space 2: 16-bit 2D engine regs

  Register space 3: 16-bit 3D engine regs

  Register space 4: 32-bit PCI config regs

  Register 1: Virtual MCLK PLL M parameter register
    read_hook: { 
      read CR11, CR12 and CR25
      mask the right bits out of CR11, CR12 and CR53
      shift them together to form virtual register
    }
    write_hook: {
     read-modify-write CR11, CR12, and CR53 with masking 
    }
    read-modify-write hook: {
      switch (OPERATION) {
        case INC_MVAL: {
          MVAL++
          mask off CR11 bits and call CR11's write() hook
          mask off CR12 bits and call CR12's write() hook
          mask off CR53 bits and call CR53's read-modify-write() hook
          (CR53 MCLK overflow bits share CR53 with other virtual regs)
          ...etc...

==========================

	As you can see, essentially any conceivable representational or
abstraction scheme for the hardware registers can be represented using this
type of system.  As long as you hook the necessary I/O methods in at the 
proper places and define the data structures and their relationships 
correctly, it will all hang together as a consistent whole.  Maintenace 
of state becomes much easier, because each level of abstraction only has to 
know how to maintain its own state and how to manage the operation of the 
levels of abstraction underneath it.

	No more spending hours laboriously tracing through your driver code
and sticking DEBUG()s everywhere to track down where a register wasn't being
written, or was being written but then rewritten incorrectly by a buggy
softcopy handler.  No more wondering if your IS_CRITICAL block is too large
(it usually is).  No more confusion during a modeset, where you have to
unlock regs, then disable the display, then disable interupts, then program
the PLL logic, then program the dac, then write the register softcopies in
bizarre and hardware-specific ways, then strobe a modeset toggle bit, then
reenable the display, then toggle another strobe bit for the DAC, then... you
get the idea.  Just tell the card's top-level register abstraction to flush
its state to hardware and the whole nasty dependency tree is traversed
correctly!

	This would also make optimization for speed much easier.  At each 
level of abstraction, cacheing/pipelining/queuing can be implemented 
within a nice, clean, well-defined domain of operation.  Want to have 
your display updates synchronized to the vertical blank?  Just have a 
high-level object be able to intelligently queue 3D accel commands and 
flush the queue on vblank interrupt.  On cards with multiple hardware 
buffers, a continuous stream of high-level drawing ops could be taken in 
at high speed, buffered, and drawn to a back buffer which is flipped to 
the front on a vblank interrupt.  

	Hierarchical state maintanance and coordinated flushing of this
complexity is a nightmare to make work correctly when things like interrupts
come into play, because the number of potential state domain clashes grows
very large very rapidly.  The only way to be able to coordinate all that 
crap without losing your mind and/or having your driver loaded with edge 
condition bugs is to abstract the hardware and formalize the maintenance 
and representation of hardware state domains.

END OF SPEW

	Whew.  Feedback on this stuff is most welcome.  I have high hopes for
this idea.  I intend to begin implementing such a scheme, bit by bit, in the
KGI driver I am working on at Creative right now.  I have to do something to
simplify the current chaos in order to avoid losing my mind (don't ask...),
but I think that the inherent utility of this way of doing things will
immediately show itself.

Jon

---
'Cloning and the reprogramming of DNA is the first serious step in 
becoming one with God.'
	- Scientist G. Richard Seed

Index: [thread] [date] [subject] [author]