Common input sources
The following sections discuss some of the common inputs and what to do about them. You should consider each of these inputs when you're writing your program, and if they are untrusted, carefully filter them.
Environment variables
Environment variables can be incredibly dangerous, especially for setuid/setgid programs and the programs they call. Three factors make them so dangerous:
- Many
libraries and programs are controlled by environment variables in ways
that are incredibly obscure -- in fact, many are completely
undocumented. The command shell /bin/sh uses environment variables such
as
PATH
andIFS
, the program loader ld.so (/lib/ld-linux.so.2) uses environment variables such asLD_LIBRARY_PATH
andLD_PRELOAD
, lots of programs use the environment variablesTERM
,HOME
, andSHELL
-- and all of these environment variables have been used to exploit programs. There are a huge number of these environment variables; many of them are recondite variables intended for debugging, and it's pointless to try to list them all. In fact, you can't know them all, since some aren't even documented. - Environment variables are inherited. If program A calls B, which calls C, which calls D, then D will receive the environment variables that A received unless some program changes things along the way. This means that if A is a secure program, and the developer of D adds an undocumented environment variable that helps in debugging, that addition to D might create a vulnerability in A! This inheritance isn't accidental -- it's what makes environment variables useful -- but it also makes them a serious security problem.
- Environment variables can be completely
controlled by locally running attackers, and attackers can exploit this
in surprising ways. As described by the environ(5) man page (see Resources),
environment variables are internally stored as an array of character
pointers (the array is terminated by a NULL pointer), and each
character pointer points to a NIL-terminated string value of the form
NAME=value
(whereNAME
is the name of the environment variable). Why is this detail important? It's because attackers can do weird things such as create multiple values for the same environment variable name (like two differentLD_LIBRARY_PATH
values). This can easily lead libraries using the environment variables to do unexpected things, which may be exploitable. The GNU glibc library has routines that work to counter this, but other libraries and any routines that walk the environment variable list can get into trouble in a hurry.
In some cases, programs have
been modified to make it harder to exploit them using environment
variables. Historically, many attacks exploited the way the command
shell handled the IFS
environment variable, but most of today's shells (including GNU bash) have been modified to make IFS
harder to exploit.
What's the IFS problem? Thankfully, most of today's shells counter this by at least automatically resetting the |
Unfortunately, while this hardening is a good idea, it's not enough -- you still need to deal with environment variables carefully. An extremely important (though complicated) example involves how all programs are run on Unix-like systems. Unix-like systems (including GNU/Linux) run programs by first running a system loader (it's /lib/ld-linux.so.2 on most GNU/Linux systems), which then locates and loads the necessary shared libraries. The loader is normally controlled by -- you guessed it -- environment variables.
On most Unix-like systems, the loader's
search for libraries normally begins with any directories listed in the
environment variable LD_LIBRARY_PATH
. I should note that LD_LIBRARY_PATH
works on many Unix-like systems, but not all; HP-UX uses the environment variable SHLIB_PATH
, and AIX uses LIBPATH
instead. Also, in GNU-based systems (including GNU/Linux), the list of libraries specified in the environment variable LD_PRELOAD
is loaded first and overrides everything else.
The
problem is that if an attacker can control the underlying libraries
used by a program, the attacker can completely control the program. For
example, imagine that the attacker could run /usr/bin/passwd (a
privileged program that lets you change your password), but uses the
environment variables to change the libraries used by the program. An
attacker could write their own version of crypt(3), the password
encryption function, and when the privileged program tries to call the
library, the attacker can make the program do anything -- including
allowing permanent, unlimited control over the system. Today's loaders
counter this problem by detecting if the program is setuid/setgid, and
if it is, they ignore environment variables such as LD_PRELOAD
and LD_LIBRARY_PATH
.
So, are we safe? No. If that malicious LD_PRELOAD
or LD_LIBRARY_PATH
value isn't erased by the setuid/setgid program, it will be passed down
to other programs and cause the very problem the loader is trying to
counter. Thus, the loader makes it possible
to write secure programs, but you still have to protect against
malicious environment variables. And that still doesn't deal with the
problem of undocumented environment variables.
For secure setuid/setgid programs, the only safe thing to do is to always "extract and erase" environment variables at the beginning of the program:
- Extract the environment variables that you actually need (if any).
- Erase the entire environment. In C/C++, erasing the environment can be done by including
<unistd.h>
and then setting theenviron
variable to NULL (do this very early, in particular before creating any threads). - Set just the environment variables you need to safe values. One environment value you'll almost certainly re-add is
PATH
, the list of directories to search for programs. TypicallyPATH
should just be set to /bin:/usr/bin or some similar value. Don't include the current directory inPATH
, which can be written as "." or even as a blank entry (so a colon at the beginning or end would probably be exploitable). Typically you'll also setIFS
(to its default of " \t\n" -- space, tab, and newline) andTZ
(timezone). Others you might set areHOME
andSHELL
. Your application might need a few more, but limit them -- don't accept data from a potential attacker unless it's critically needed.
View Secure programmer: Keep an eye on inputs Discussion
Page: 1 2 3 4 Next Page: Common input sources cont.