Next Previous Contents

8. Redundancy & Performance

For performance reasons, it is possible to spread data in a 'stripe' over multiple disks. This means that block 1 is on Physical Volume A, and block 2 is on PV B, while block 3 may be on PV A again. You can also stripe over more than 2 disks.

This arrangement means that your have more disk bandwidth available. It also means that more 'spindles' are involved. More on this later.

Besides increasing performance, it is also possible to have your data in copies on multiple disks. This is called mirroring. Currently, LVM does not support this natively but there are ways to achieve this.

8.1 Why stripe?

Disk performance is influenced by three things, at least. The most obvious is the speed at which data on a disk can be read or written sequentially. This is the limiting factor when reading or writing a large file on a SCSI/IDE bus with only a single disk on it.

Then there is the bandwidth available TO the disk. If you have 7 disks on a SCSI bus, this may well be less than the writing speed of your disk itself. If you spend enough money, you can prevent this bottleneck from being a problem.

Then there is the latency. As the saying goes, latency is always bad news. And even worse, you can't spend more money to get lower latency! Most disks these days appear to have a latency somewhere around 7ms. Then there is the SCSI latency, which used to be something like 25ms.

FIXME: need recent numbers!

What does this mean? The combined latency would be around 30ms in a typical case. You can therefore perform only around 33 disk operations per second. If you want to be able to do many thousands of queries per second, and you don't have a massive cache, you are very much out of luck.

If you have multiple disks, or 'spindles', working in parallel, you can have multiple commands being performed concurrently, which nicely circumvents your latency problem. Some applications, like a huge news server, don't even work anymore without striping or other IO smartness.

This is what striping does. And, if your bus is up to it, even sequential reading and writing may go faster.

8.2 Why not

Striping without further measures raises your fault chance, on a 'per bit' basis. If any of your disks dies, your entire Logical Volume is gone. If you just concatenate data, only part of your filesystem is gone.

The ultimate option is the mirrored stripe.

FIXME: make a mirrored stripe with LVM and md

8.3 LVM native striping

Specifying stripe configuration is done when creating the Logical Volume with lvcreate. There are two relevant parameters. With -i we tell LVM how many Physical Volumes it should use to scatter on. Striping is not really done on a bit-by-bit basis, but on blocks. With -I we can specify the granulation in kilobytes. Note that this must be a power of 2, and that the coarsest granulation is 128Kbyte.

Example:

# lvcreate -n stripedlv -i 2 -I 64 mygroup -L 20M
lvcreate -- rounding 20480 KB to stripe boundary size 24576 KB / 6 PE
lvcreate -- doing automatic backup of "mygroup"
lvcreate -- logical volume "/dev/mygroup/stripedlv" successfully created

Performance notices

The performance 'gain' may well be very negative if you stripe over 2 partitions of the same disk - take care to prevent that. Striping with two disks on a single IDE bus also appears useless - unless IDE has improved beyond what I remember.

FIXME: is this still true?

Older motherboards may have two IDE buses, but the second one used to be castrated, dedicated to serving a slow cdrom drive. You can perform benchmarks with several tools, the most noteworthy being 'Bonnie'. The ReiserFS people have released Bonnie++ which may be used to measure performance data.

8.4 Hardware RAID

Many high end Intel x86 servers have Hardware RAID controlers. Most of them have atleast 2 independant SCSI channels. Fortunatly, his has very little bearing on LVM. Before Linux can see anything on such a controler the administrator must define a Logical drive within the raid controler itself. As an example [s]he could choose to stripe together two disks on SCSI channel A and then mirror them onto two disks on channel B. This is a typical RAID 0/1 configuration that maximises performance and data security. When Linux boots on a machine with this configuration it can only 'see' one disk on the RAID controler and that is the Logical drive that contains the four disks in the RAID 0/1 stripeset. This means, as far as LVM is concerned, that there is just one disk in the machine and it is to be used as such. If one of the disks fails, LVM wont even know. When the administrator replaces the disk (even on the fly with HotSwap hardware) LVM wont know about that either and the controler will resync the mirrored data and all will be well. This is where most people take a step back and ask "Then what good does LVM do for me with this RAID controler?" The easy answer is, in most cases, after you define a logical drive in the RAID controler you cant add more disks to that drive later. So if you miscalculate the space requirements or you simply need more space you cant add a new disk or set of disks into a pre-exsisting stripeset. This means you must create a new RAID stripset in the controler and then with LVM you can simply extend the LVM Logical volume so that it seamlessly spans both stripesets in the RAID controler.

FIXME: Is there more needed on this subject ?

8.5 Linux software RAID

Linux 2.4 comes with very good RAID in place. Linux 2.2 by default, as released by Alan Cox, features an earlier RAID version that's not well regarded. The reason that 2.2 still features this earlier release is the the kernel people don't want to make changes within a stable version that require userland updates.

Most people, which included Red Hat, Mandrake and SuSE, chose to replace it with the 0.90 version which appears to be excellent.

We will only treat the 0.90 version here.

FIXME: write more of this


Next Previous Contents