Up to this point we have an idea of how X has a client-server architecture, where the clients are our application programs. Under this client-server graphic system, we have several possible window managers, which manage our screen real estate; we also have our client applications, which are where we actually get our work done, and clients can be programmed using several possible different toolkits.
Here's where the mess begins. Each window manager has a different approach to managing the clients; the behavior and decorations are different from one to the next. Also, as defined by which toolkit each client uses, they can also look and behave differently from each other. Since there's nothing that says authors have to use the same toolkit for all their applications, it's perfectly possible for a user to be running, say, six different applications, each written using a different toolkit, and they all look and behave differently. This creates a mess because behavior between the apps is not consistent. If you've ever used a program written with the Athena widgets, you'll notice it's not too similar to something written using Gtk. And you'll also remember it's a mess using all these apps which look and feel so different. This basically negates the advantage of using a GUI environment in the first place.
On a more technical standpoint, using lots of different toolkits increases resource usage. Modern operating systems support the concept of dynamic shared libraries. This means that if I have two or three applications using Gtk, and I have a dynamic shared version of Gtk, then those two or three applications share the same copy of Gtk, both on the disk and in memory. This saves resources. On the other hand, if I have a Gtk application, a Qt application, something Athena-based, a Motif-based program such as Netscape, a program that uses FLTK and another using XForms, I'm now loading six different libraries in memory, one for each of the different toolkits. Keep in mind that all the toolkits provide basically the same functionality.
There are other problems here. The way of launching programs varies from one window manager to the next. Some have a nice menu for launching apps; others don't, and they expect us to open a command-launching box, or use a certain key combination, or even open an xterm and launch all your apps by invoking the commands. Again, there's no standarization here so it becomes a mess.
Finally, there are niceties we expect from a GUI environment which our scheme hasn't covered. Things like a configuration utility, or "control panel"; or a graphical file manager. Of course, these can be written as client apps. And, in typical free software fashion, there are hundreds of file managers, and hundreds of system configuration programs, which conceivably, further the mess of having to deal with a lot of disparate software components.