In 1984 Richard Stallman's Free Software Foundation (FSF) began the GNU project, a project to create a free version of the Unix operating system. By free, Stallman meant software that could be freely used, read, modified, and redistributed. The FSF successfully built many useful components but was having trouble developing the operating system kernel [FSF 1998]. In 1991 Linus Torvalds began developing an operating system kernel, which he named ``Linux'' [Torvalds 1999]. This kernel could be combined with the FSF material and other components producing a freely-modifiable and very useful operating system. This paper will term the kernel itself the ``Linux kernel'' and an entire combination as ``Linux'' (many use the term GNU/Linux instead for this combination).
Different organizations have combined the available components differently. Each combination is called a ``distribution,'' and the organizations that develop distributions are called ``distributors.'' Common distributions include Red Hat, Mandrake, SuSE, Caldera, Corel, and Debian. This paper is not specific to any distribution; it does presume Linux kernel version 2.2 or greater and the C library glibc 2.1 or greater, which are valid assumptions for essentially all current major Linux distributions.
Increased interest in such ``free software'' has made it increasingly necessary to define and explain it. A widely used term is ``open source software,'' which further defined in [OSI 1999]. Eric Raymond [1997, 1998] wrote several seminal articles examining its development process.
Linux is not derived from Unix source code, but its interfaces are intentionally Unix-like. Therefore, Unix lessons learned apply to Linux, including information on security. Much of the information in this paper actually applies to any Unix-like system, but Linux-specific information has been intentionally added to enable those using Linux to take advantage of its capabilities. This paper intentionally focuses on Linux systems to narrow its scope; including all Unix-like systems would require an analysis of porting issues and other systems' capabilities, which would have greatly increased the size of this document.
Since Linux is intentionally Unix-like, it has Unix security mechanisms. These include user and group ids (uids and gids) for each process, a filesystem with read, write, and execute permissions (for user, group, and other), System V inter-process communication (IPC), socket-based IPC (including network communication), and so on. See Thompson [1974] and Bach [1986] for general information on Unix systems, including their basic security mechanisms. Section 3 summarizes key Linux security mechanisms.
There are many general security principles which you should be familiar with; consult a general text on computer security such as [Pfleeger 1997].
Saltzer [1974] and Saltzer and Schroeder [1975] list the following principles of the design of secure protection systems, which are still valid:
Many different types of programs may need to be secure programs (as the term is defined in this paper). Some common types are:
This paper merges the issues of these different types of program into a single set. The disadvantage of this approach is that some of the issues identified here don't apply to all types of program. In particular, setuid/setgid programs have many surprising inputs and several of the guidelines here only apply to them. However, things are not so clear-cut, because a particular program may cut across these boundaries (e.g., a CGI script may be setuid or setgid, or be configured in a way that has the same effect). The advantage of considering all of these program types together is that we can consider all issues without trying to apply an inappropriate category to a program. As will be seen, many of the principles apply to all programs that need to be secured.
There is a slight bias in much of this paper towards programs written in C, with some notes on other languages such as C++, Perl, Python, Ada95, and Java. This is because C is the most common language for implementing secure programs on Linux (other than CGI scripts, which tend to use Perl), and most other languages' implementations call the C library. This is not to imply that C is somehow the ``best'' language for this purpose, and most of the principles described here apply regardless of the programming language used.
The primary difficulty in writing secure programs is that writing them requires a different mindset, in short, a paranoid mindset. The reason is that the impact of errors (also called defects or bugs) can be profoundly different.
Normal non-secure programs have many errors. While these errors are undesirable, these errors usually involve rare or unlikely situations, and if a user should stumble upon one they will try to avoid using the tool that way in the future.
In secure programs, the situation is reversed. Certain users will intentionally search out and cause rare or unlikely situations, in the hope that such attacks will give them unwarranted privileges. As a result, when writing secure programs, paranoia is a virtue.
Several documents help describe how to write secure programs (or, alternatively, how to find security problems in existing programs), and were the basis for the guidelines highlighted in the rest of this paper.
AUSCERT has released a programming checklist [AUSCERT 1996], based in part on chapter 22 of Garfinkel and Spafford's book discussing how to write secure SUID and network programs [Garfinkel 1996]. Matt Bishop [1996, 1997] has developed several extremely valuable papers and presentations on the topic. Galvin [1998a] described a simple process and checklist for developing secure programs; he later updated the checklist in Galvin [1998b]. Sitaker [1999] presents a list of issues for the ``Linux security audit'' team to search for. Shostack [1999] defines another checklist for reviewing security-sensitive code. The Secure Unix Programming FAQ also has some useful suggestions [Al-Herbish 1999]. Some useful information is also available from Ranum [1998]. Some recommendations must be taken with caution, for example, Anonymous [unknown] recommends the use of access(3) without noting the dangerous race conditions that usually accompany it. Wood [1985] has some useful but dated advice in its ``Security for Programmers'' chapter. Bellovin [1994] and FreeBSD [1999] also include useful guidelines.
There are many documents giving security guidelines for programs using the Common Gateway Interface (CGI) to interface with the web. These include Gundavaram [unknown], Kim [1996], Phillips [1995], Stein [1999], and Webber [1999].
There are also many documents describing the issue from the other direction (i.e., ``how to crack a system''). One example is McClure [1999], and there's countless amounts of material from that vantage point on the Internet.
This paper is a summary of what I believe are the most useful guidelines; it is not a complete list of all possible guidelines. The organization presented here is my own (every list has its own, different structure), and the Linux-unique guidelines (e.g., on capabilities and the fsuid value) are also my own. Reading all of the referenced documents listed above as well is highly recommended.
One question that could be asked is ``why did you write your own document instead of just referring to other documents?'' There are several answers:
System manual pages are referenced in the format name(number), where number is the section number of the manual. C and C++ treat the character '\0' (ASCII 0) specially, and this value is referred to as NIL in this paper. The pointer value that means ``does not point anywhere'' is called NULL; C compilers will convert the integer 0 to the value NULL in most circumstances, but note that nothing in the C standard requires that NULL actually be implemented by a series of all-zero bits.