Developer Forums | About Us | Site Map
Search  
HOME > TUTORIALS > SERVER SIDE CODING > ADMINISTRATION TUTORIALS > SECURE PROGRAMMER: KEEP AN EYE ON INPUTS


Sponsors





Useful Lists

Web Host
site hosted by netplex

Online Manuals

Secure programmer: Keep an eye on inputs
By David A. Wheeler - 2004-01-26 Page:  1 2 3 4

Common input sources

The following sections discuss some of the common inputs and what to do about them. You should consider each of these inputs when you're writing your program, and if they are untrusted, carefully filter them.

Environment variables

Environment variables can be incredibly dangerous, especially for setuid/setgid programs and the programs they call. Three factors make them so dangerous:

  1. Many libraries and programs are controlled by environment variables in ways that are incredibly obscure -- in fact, many are completely undocumented. The command shell /bin/sh uses environment variables such as PATH and IFS, the program loader ld.so (/lib/ld-linux.so.2) uses environment variables such as LD_LIBRARY_PATH and LD_PRELOAD, lots of programs use the environment variables TERM, HOME, and SHELL -- and all of these environment variables have been used to exploit programs. There are a huge number of these environment variables; many of them are recondite variables intended for debugging, and it's pointless to try to list them all. In fact, you can't know them all, since some aren't even documented.
  2. Environment variables are inherited. If program A calls B, which calls C, which calls D, then D will receive the environment variables that A received unless some program changes things along the way. This means that if A is a secure program, and the developer of D adds an undocumented environment variable that helps in debugging, that addition to D might create a vulnerability in A! This inheritance isn't accidental -- it's what makes environment variables useful -- but it also makes them a serious security problem.
  3. Environment variables can be completely controlled by locally running attackers, and attackers can exploit this in surprising ways. As described by the environ(5) man page (see Resources), environment variables are internally stored as an array of character pointers (the array is terminated by a NULL pointer), and each character pointer points to a NIL-terminated string value of the form NAME=value (where NAME is the name of the environment variable). Why is this detail important? It's because attackers can do weird things such as create multiple values for the same environment variable name (like two different LD_LIBRARY_PATH values). This can easily lead libraries using the environment variables to do unexpected things, which may be exploitable. The GNU glibc library has routines that work to counter this, but other libraries and any routines that walk the environment variable list can get into trouble in a hurry.

In some cases, programs have been modified to make it harder to exploit them using environment variables. Historically, many attacks exploited the way the command shell handled the IFS environment variable, but most of today's shells (including GNU bash) have been modified to make IFS harder to exploit.

What's the IFS problem?
Although it's not as serious a problem today, the IFS environment variable once caused many security problems in older Unix shells. IFS was used to determine what separated words in commands sent to the original Unix Bourne shell, and was passed down like any other environment variable. Normally the IFS variable would have the value of a space, a tab, and a newline -- any of those characters would be treated like a space character. But attackers could then set IFS to sneaky values, for example, they might add a "/" to IFS. Then, when the shell tried to run /bin/ls, the old shell would interpret "/" just like a space character -- meaning that the shell would run the "bin" program (wherever it could find one) with the "ls" option! The attacker would then provide a "bin" program that the program could find.

Thankfully, most of today's shells counter this by at least automatically resetting the IFS variable when they start -- and that includes GNU bash, the usual shell for GNU/Linux systems. GNU bash also limits the use of IFS so it's only used on the results of expansions. This means that IFS is used less often, and, thus, it's much less dangerous (the original sh split all words using IFS, even commands). Unfortunately, not all shells protect themselves (Practical Unix & Internet Security -- see Resources for a link -- has sample code to test this). And although this particular problem has been (for the most part) countered, it exemplifies the subtle problems that can occur from unchecked environment variables.

Unfortunately, while this hardening is a good idea, it's not enough -- you still need to deal with environment variables carefully. An extremely important (though complicated) example involves how all programs are run on Unix-like systems. Unix-like systems (including GNU/Linux) run programs by first running a system loader (it's /lib/ld-linux.so.2 on most GNU/Linux systems), which then locates and loads the necessary shared libraries. The loader is normally controlled by -- you guessed it -- environment variables.

On most Unix-like systems, the loader's search for libraries normally begins with any directories listed in the environment variable LD_LIBRARY_PATH. I should note that LD_LIBRARY_PATH works on many Unix-like systems, but not all; HP-UX uses the environment variable SHLIB_PATH, and AIX uses LIBPATH instead. Also, in GNU-based systems (including GNU/Linux), the list of libraries specified in the environment variable LD_PRELOAD is loaded first and overrides everything else.

The problem is that if an attacker can control the underlying libraries used by a program, the attacker can completely control the program. For example, imagine that the attacker could run /usr/bin/passwd (a privileged program that lets you change your password), but uses the environment variables to change the libraries used by the program. An attacker could write their own version of crypt(3), the password encryption function, and when the privileged program tries to call the library, the attacker can make the program do anything -- including allowing permanent, unlimited control over the system. Today's loaders counter this problem by detecting if the program is setuid/setgid, and if it is, they ignore environment variables such as LD_PRELOAD and LD_LIBRARY_PATH.

So, are we safe? No. If that malicious LD_PRELOAD or LD_LIBRARY_PATH value isn't erased by the setuid/setgid program, it will be passed down to other programs and cause the very problem the loader is trying to counter. Thus, the loader makes it possible to write secure programs, but you still have to protect against malicious environment variables. And that still doesn't deal with the problem of undocumented environment variables.

For secure setuid/setgid programs, the only safe thing to do is to always "extract and erase" environment variables at the beginning of the program:

  • Extract the environment variables that you actually need (if any).
  • Erase the entire environment. In C/C++, erasing the environment can be done by including <unistd.h> and then setting the environ variable to NULL (do this very early, in particular before creating any threads).
  • Set just the environment variables you need to safe values. One environment value you'll almost certainly re-add is PATH, the list of directories to search for programs. Typically PATH should just be set to /bin:/usr/bin or some similar value. Don't include the current directory in PATH, which can be written as "." or even as a blank entry (so a colon at the beginning or end would probably be exploitable). Typically you'll also set IFS (to its default of " \t\n" -- space, tab, and newline) and TZ (timezone). Others you might set are HOME and SHELL. Your application might need a few more, but limit them -- don't accept data from a potential attacker unless it's critically needed.


View Secure programmer: Keep an eye on inputs Discussion

Page:  1 2 3 4 Next Page: Common input sources cont.

First published by IBM developerWorks


Copyright 2004-2024 GrindingGears.com. All rights reserved.
Article copyright and all rights retained by the author.