Index: [thread] [date] [subject] [author]
  From: Martin Eli Erhardsen <mee@daimi.aau.dk>
  To  : ggi-develop@eskimo.com
  Date: Thu, 09 Jul 1998 05:00:14 +0200

Re: Fast lines

Jon M. Taylor wrote:
> 
>         The fastest linedrawing/clipping routine possible uses the x and y
> coordinates of the two endpoints of the line and two corners of the
> clipping region as offsets into a sparse precalculated array of x,y pairs
> stored as offsets into a linear framebuffer + one extra entry at the
> beginning for the number of pixels in the line, making up all possible
> lines.  For a 320x200x8bpp framebuffer, this would require ~3.5 heptabytes
> of RAM, but the computational cost is:
> 
> Setup:
> * One memory read + one register write to set the pixel color
> * three multiplies to get the array offset
> * One menory read + one register write to set the pixel count
> 
> Pixel-setting loop:
> * One memory read + one register write to get the address of the next pixel
> * One register read + one memory write to set the pixel at that address
>   with the value in the pixel color register
> * One decrement-and-branch-if-not zero instruction that tests the pixel
> * count register
> 
>         No compares, multiplies, or adds in the loop at all!  If your
> processor can do indirect writes with the offset coming from memory
> without additional cycle cost, that loop reduces to one indirected write
> and the conditional branch.  A good pipeline will reduce this loop to ONE
> CYCLE per pixel.  Beat *that*! |->
> 

I think that the fastest is a special bresenham instruction.
You could just supply the endpoints and the clip rectangle and color,
and it should automatically write the pixels and do write combining
where appropriate. This isn't too far fetched, many graphics chips do
this already.

When someone integrates the graphics chips with the CPU, it
shouldn't be too difficult to add the whole OpenGL library
to the instruction set, because on x86 we already have much of 
the math library there. OpenGL would take lots of transistors,
but it would enable us to avoid the slow PCI and issue graphics
commands at the full CPU clock speed.

Memory bandwidth would off course be a problem, but you could just
add lots of Direct Rambus channels, maybe 8 or 16. This should
solve the Bandwidth problem.

Wait a minute! What I am describing is NOT a CPU with builtin graphics, 
but a MONSTER graphics chip with a builtin CPU. 

It could only happen if graphics demands grow much faster than
CPU demands, i.e. everything gets design to run Quake 6 as fast
as possible, while the normal programs are neglected.

Nevertheless when Intel takes a step backward and releases a 
Pentium II without L2 cache especially for games, I cant rule this out.

Index: [thread] [date] [subject] [author]