As the performance of commodity computer and network hardware increase, and their prices decrease, it becomes more and more practical to build parallel computational systems from off-the-shelf components, rather than buying CPU time on very expensive Supercomputers. In fact, the price per performance ratio of a Beowulf type machine is between three to ten times better than that for traditional supercomputers. Beowulf architecture scales well, it is easy to construct and you only pay for the hardware as most of the software is free.
This HOWTO is designed for a person with at least some exposure to the Linux operating system. Knowledge of Beowulf technology or understanding of more complex operating system and networking concepts is not essential, but some exposure to parallel computing would be advantageous (after all you must have some reason to read this document). This HOWTO will not answer all possible questions you might have about Beowulf, but hopefully will give you ideas and guide you in the right direction. The purpose of this HOWTO is to provide background information, links and references to more advanced documents.
Famed was this Beowulf: far flew the boast of him, son of Scyld, in the Scandian lands. So becomes it a youth to quit him well with his father's friends, by fee and gift, that to aid him, aged, in after days, come warriors willing, should war draw nigh, liegemen loyal: by lauded deeds shall an earl have honor in every clan. Beowulf is the earliest surviving epic poem written in English. It is a story about a hero of great strength and courage who defeted a monster called Grendel. See History to find out more about the Beowulf hero.
There are probably as many Beowulf definitions as there are people who build or use Beowulf Supercomputer facilities. Some claim that one can call their system Beowulf only if it is built in the same way as the NASA's original machine. Others go to the other extreme and call Beowulf any system of workstations running parallel code. My definition of Beowulf fits somewhere between the two views described above, and is based on many postings to the Beowulf mailing list:
Beowulf is a multi computer architecture which can be used for parallel computations. It is a system which usually consists of one server node, and one or more client nodes connected together via Ethernet or some other network. It is a system built using commodity hardware components, like any PC capable of running Linux, standard Ethernet adapters, and switches. It does not contain any custom hardware components and is trivially reproducible. Beowulf also uses commodity software like the Linux operating system, Parallel Virtual Machine (PVM) and Message Passing Interface (MPI). The server node controls the whole cluster and serves files to the client nodes. It is also the cluster's console and gateway to the outside world. Large Beowulf machines might have more than one server node, and possibly other nodes dedicated to particular tasks, for example consoles or monitoring stations. In most cases client nodes in a Beowulf system are dumb, the dumber the better. Nodes are configured and controlled by the server node, and do only what they are told to do. In a disk-less client configuration, client nodes don't even know their IP address or name until the server tells them what it is. One of the main differences between Beowulf and a Cluster of Workstations (COW) is the fact that Beowulf behaves more like a single machine rather than many workstations. In most cases client nodes do not have keyboards or monitors, and are accessed only via remote login or possibly serial terminal. Beowulf nodes can be thought of as a CPU + memory package which can be plugged in to the cluster, just like a CPU or memory module can be plugged into a motherboard.
Beowulf is not a special software package, new network topology or
the latest kernel hack. Beowulf is a technology of clustering Linux
computers to form a parallel, virtual supercomputer. Although there
are many software packages such as kernel modifications, PVM and MPI
libraries, and configuration tools which make the Beowulf architecture
faster, easier to configure, and much more usable, one can build a
Beowulf class machine using standard Linux distribution without any
additional software. If you have two networked Linux computers which
share at least the /home
file system via NFS, and trust each
other to execute remote shells (rsh), then it could be argued that you
have a simple, two node Beowulf machine.
Beowulf systems have been constructed from a variety of parts. For the sake of performance some non-commodity components (i.e. produced by a single manufacturer) have been employed. In order to account for the different types of systems and to make discussions about machines a bit easier, we propose the following simple classification scheme:
CLASS I BEOWULF:
This class of machines built entirely from commodity "off-the-shelf" parts. We shall use the "Computer Shopper" certification test to define commodity "off-the-shelf" parts. (Computer Shopper is a 1 inch thick monthly magazine/catalog of PC systems and components.) The test is as follows:
A CLASS I Beowulf is a machine that can be assembled from parts found in at least 3 nationally/globally circulated advertising catalogs.
The advantages of a CLASS I system are:
The disadvantages of a CLASS I system are:
CLASS II BEOWULF
A CLASS II Beowulf is simply any machine that does not pass the Computer Shopper certification test. This is not a bad thing. Indeed, it is merely a classification of the machine.
The advantages of a CLASS II system are:
The disadvantages of a CLASS II system are:
One CLASS is not necessarily better than the other. It all depends on your needs and budget. This classification system is only intended to make discussions about Beowulf systems a bit more succinct. The "System Design" section may help determine what kind of system is best suited for your needs.