Index: [thread] [date] [subject] [author]
  From: Andreas Beck <becka@rz.uni-duesseldorf.de>
  To  : Vesna i Lale <vzivkovic@cent.co.yu>
  Date: Tue, 23 Nov 1999 21:26:56 +0100

Re: Something's wrong with my elaboration???

Hi !

[I CC to the list, as other might have wondered about similar effects.]

> I have "tested" "./showaccel" (under /degas/lib/libggi/programs/check) 
> and "./demo" (under <same-until-check>/demos)
> for GGI_DISPLAY="x", GGI_DISPLAY="xlib", GGI_DISPLAY="fbdev"
> (BTW, what is the difference between "x" and "xlib", I mean it is 
> obvious in executing but "what is what?", if "x" is "X-server", 
> than what is "x-lib"?)

xlib is an alternate way to talk to the X server. The one way (display-x) is 
allocating a piece of memory on the client side, drawing into it and flushing 
it to the server in regular intervals.

This is of course unaccelerated, except maybe for the final blit. It is very
efficient, if lots of little changes are made each frame and few highlevel
(acceleratable) commands are executedper frame. This is the default, as most
programs fall into this category.

xlib on the other hand translates every LibGGI primitive call in calls to
xlib, thus handing through the requests to the X server who can potentially
accelerate them. However this initiates a network or pseudo-network (unix
domain socket) transfer for each primitive. Thus it is only fast, if you
only call few primitives per frame and best if those are acceleratable as
well.

> Whn I run ./showaccel I got this results:
> (this is table with 4 cols , so 'maximaze' letter)
> TARGET        ACC        NOT        RAT
> only-xsrv    158904    109385    =1.45...
> x+1procs     79499     54646    =1.45...

The speedups are only due to better optimized code and less calling overhead in
libggi as compared to the test program internal code.

> ---------------------------------------------------------
> only-xlib    296782        4275    =69.4222...
> xlib+1pr.    178438        2811    =63.47....

Oh - yes. the showaccel falls into the few commands, high accel gain -
category. I get here (1min runs on PII/300 and Permedia2):
x:    Ratio :   241890 :   214815 = 1.126039
xlib: Ratio :    81260 :     1685 = 48.225519

This nicely demonstrates the calling overhead xlib has. I wonder a bit about
your values - how long did you measure 60 seconds each ? That's a bit weird,
then. The absolute values show a very fast system for xlib, but rather slow
for x ... This is pretty weird.

What's your system like ? (Hardware, X server used ?)
 
> ---------------------------------------------------------
> o.kgicon    175953    127056    =1.4229861
> kgicon+1    88767      62786    =1.4138....
> (ALL in 60 sec)

Here:
fbcon: Ratio :   274042 :    48519 = 5.648138

> (to explain) This "+1" or + "1proc" means that on different console (or 
> in different window) there is one "heavy" process running
> (some process that constantly loads disk-file in regular time 
> intervals). 

That is not a very good test, as it does not put much load on CPU (except on
PIO-Mode ISA systems), but only on the bus and that not very heavily. 
Calculating Mandelbrot sets (CPU) or writing random values to a large memory
space (bus) might be might be better, but it depends what exactly you want to 
test.

> So I am trying to simulate what happens in multitasking pardon multiprocess 
> environment, meaning different processing doing at the same time and how 
> does this affects kgicon and how this affects same "ggilib" but under 
> existing targts (non-kernel-drivers)...

Without the advanced queueing mechanisms few drivers implement now,
differences shouldn't be too big.

> So, FIRST I was little confused when I detected that in 60 sec, 
> "xlib" has 1.6 times faster ACCELs than "kgicon", 

Are you using a stock XFree Server for your card ? It may well be better
optimized for it. KGIcon in standard setup with ioctl-accel is not too fast
compared to a good X server. However I wonder why the server is so slow at 
the X target ... Was the display remote ?

> meaning it has  ~3*10**5 and "kgi" has ~1.8*10**5? Why? Why does "xlib" 
> do it that much time (context-commutation perhaps?)?

Hmm - looking at KGIcon vs. X: Is your kgicon driver accelerated at all ?
1.4 seems to be from calling overhead and stuff. This would explain all.

> And (what definitely defeated me) when "this heavy load process takes in 
> part" "xlib" drops down 1.66 times, and kgicon drops down 1.98 times!!!! 
> (that is 2 times) 
> To be honest I espected that KGIcon will have better performance when 
> multiprocess takes apart than Xlib, 

That is a bit tricky, but "correct":

When the X server is running, three processes are producing load:
The client, the server and the "load producer".

Without the loader, you have 2x 50% to the drawing stuff, with it, you
only have 2x33% for drawing and 33% for the load produceer.
Thus you get about 2/3rd of the original performance.

Without X, your situation is worse,as there are only 2 processes competing
for CPU: The graphics drawer and the load producer.

Thus CPU usage goes from 100% for graphics to 50% graphics, 50% load
producer - about half the original performance ...

Both cases are proven by your values 1.66 is about 3/2=1.5 and 1.98 is about
2/1=2 ...

Thus effectively the "fair" scheduling strategy of Linux produces the effect
you encounter. When you do the drawing in many processes you will get
more CPU for it, which is why it appears faster.

It is effectively cheating ... :-).

> AND THAT HAPPENED the OTHER SIDE ROUND! XLIB has better performanse than 
> KGIcon in multitasking!!!????
> WHY? WHAT I DO WRONG? OR IT IS SUPPOSE TO BE?

See above. Moreover from your values, I really suspect an acceled X server 
but no kgicon accel.

> Functions:  |  t(sec)  |  TAR     |   times(only) |      times(with 1proc)
> -----------------------------------------------------------------------------------------------
> Puts        |    3    |    X         |    7580        |            3951
> CrossBlit   |    3    |              |      21        |              15
> DrawBox     |    4    |              |   14184        |            7373
> DrawLine    |    3    |              |   16935        |            8976
> CopyBox     |    4    |              |    5686        |            2876
> -----------------------------------------------------------------------------------------------
> Puts        |     3   |     Xlib     |      97         |             72
> CrossBlit   |     3   |              |       9         |              7
> DrawBox     |     4   |              |   14660         |          10610
> DrawLine    |     3   |              |   11216         |           7465
> CopyBox     |     4   |              |    6428         |           4384
> --------------------------------------------------------------------------------------------
> Puts        |      3  |     KGIcon   |    8395         |           4631
> CrossBlit   |      3  |              |      32         |             15
> DrawBox     |      4  |              |   15606         |           8276
> DrawLine    |      3  |              |   18435         |          10147
> CopyBox     |      4  |              |     247         |            147
> -----------------------------------------------------------------------------------------------

> So when we are talking of  Puts, or CrossBlit, and the other things, 
> that is obviously more powerfull in "KGIcon" than "xlib",
> or better than "x". BUT, WHY DOES COPY-BOX are 20 times or more lower 
> than in "xlib" and "x"?

You might have very slow read access to video memory and no copybox support
in the KGIcon driver. This will crawl. The X target works in main memory
which is usually very fast at read access, and X might be accelerated
properly or it can use backbuffer data.

The low "puts" value for Xlib is interesting as well. It shows that drawing
text is very ineffective for the X target, as it induces tons of primitive
calls.

> And MULTIPROCESSing DEGRADATION (when more then one process are present - 
> sam example with disk-writes-in-regular-time-period-process) is near 
> 2 times for KGIcon (???) and in case of "xlib" that is 1.2 or 1.5 but 

Yes. See above. This is actually speaking for KGIcon. The X model will 
"steal" performance _from the system_ by "pretending" to be two entities 
that want CPU. They will both get an equal share which is of course higher
than that a single process would get.

> it is much lower than 2!!!! (YOU wil agree with this???)

Your loader process should be slower with X by a factor of about 1.5
which compensates the difference.

> Why does KGIcon have this degradation in multitasking environment?
> (I am asking this, because I AM TRYING TO BUILD SOME CONCEPTION THAT 
> WILL SAVE ITS PERFORMANSE OR AT LEAST TRY TO SAVE IT WHEN OTHER PROCESSES 
> ARE PRESENT! This obviously have some connection with X-server nature - 
> which I am not familiar with, but WOULDn't it BE LOGIVAL that in 
> multitasking system real-driver get goal above the user-level one?)

This is not quite right. KGIcon _is_ slower than a userlevel driver due to
the extra context switch an accel call induces. This is an unavoidable
tribute to allowing halfway "direct" access by usermode programs as opposed 
to by a trusted binary like the X server. However this additional layer does
not cause a very big performance loss.

The main thing you encounter is the 2 processes vs. 3 processes trick. 

And it seems you have a pretty much unaccelerated kgicon driver or a very
good X server. Please tell me, what hardware you have and what X server 
and kgicon driver you used for testing, and maybe I can shed some more
light.

As you can see, in my case with similarly accelerated X server and KGIcon
driver, results are as expected.

CU, Andy

-- 
= Andreas Beck                    |  Email :  <andreas.beck@ggi-project.org> =


Index: [thread] [date] [subject] [author]