Saturday, April 25, 2009

IO Redirection

UNIX had the concept of IO redirection long before DOS copied and bastardised the concept. The UNIX IO redirection concept is fundamental to many of the things that you can do with UNIX, and it is quite a well-developed idea, so we will explore this concept here.

Why do I mention UNIX at all? Well, Linux is a UNIX operating system!

Under UNIX, all programs that run are given three open files when they are started by a shell:
0.

Standard in, or STDIN.

This is where input comes from, and it normally points at your terminal device.

To find out what device is your terminal, use the tty(1) command. Note, the (1) after command names in UNIX refers to the section of the man pages that the documentation for the command exists in.

You can arrange to run any command and pass it input from a file in the following way:

$ some-command < /path/to/some/file

Note, the '$' is your prompt. Note also, you can always specify a complete path name for a file.

For example:

$ grep -i Fred < /etc/passwd

Would search for the string 'fred' in /etc/passwd, regardless of the case of the characters.

But wait a minute, you object, I always use:

$ grep -i Fred /etc/passwd

This is true, but you can also pass the file in on STDIN, and you will get different results if you do. Can you see what the difference is?
1.

Standard out, or STDOUT.

This is where the normal output from a program goes. It normally points at your terminal as well, but you can redirect it.

You can redirect output in the following way:

$ some-program > /path/to/some/file

For example:

$ grep -i Fred /etc/passwd > /tmp/results

2.

Standard error, or STDERR.

This is where error output from your program goes. This normally points at your terminal as well, but you can redirect it.

Why have different output places for standard out and standard error?

Well, as you will see when you come to writing shell scripts, you often do not want error messages cluttering up the normal output from a program.

You will forgive me for starting the above list at 0, I am sure, when you learn that each of these IO 'channels' are represented by small numbers, called file descripters (FDs), that have exactly those numbers. That is, STDIN is FD 0, while STDOUT is FD 1, and STDERR is FD 2.

When the shell runs a program for you, it opens STDIN as FD 0, STDOUT as FD 1, and STDERR as FD 2, and then runs the program (technically, it almost always does a fork(2) and then an exec(3) or one of the exec?? calls). If you have redirected one of STDIN, STDOUT or STDERR, your shell opens that file as the appropriate FD before running the program.

Now, what does this all have to do with you, I hear you ask?

Well, there are lots of neat things you can do, but some things to watch out for as well.

A lot of inexperienced UNIX users assume that they can redirect a file into a program and use the same name for redirecting the output:

$ some-program < mega-important-data-file > mega-important-data-file

They become very upset after doing the above, especially if that mega-important data file has never been backed up anywhere. Why is this?

The shell opens the mega-important-data-file for reading and associates it with FD 0 (or STDIN), and then opens it for writing, but truncates it to zero length, and associates it with FD 1 (or STDOUT) as well.

So, if you want to do something like the above, use a different file name for the output file. Oh, you should also back up files as well :-).

Now, there are lots of redirection symbols that you can use, and here are some of them:
< file means open a file for reading and associate with STDIN.
<< token Means use the current input stream as STDIN for the program until token is seen. We will ignore this one until we get to scripting.
> file means open a file for writing and truncate it and associate it with STDOUT.
>> file means open a file for writing and seek to the end and associate it with STDOUT. This is how you append to a file using a redirect.
n>&m means redirect FD n to the same places as FD m. Eg, 2>&1 means send STDERR to the same place that STDOUT is going to.

OK, here are some tricks that you might want to use in various places.

If you are gathering evidence for a bug report, you might want to redirect the output from a series of programs to a text file (never mind that you can use the script command to do the same :-). So you might do the following:

$ some-buggy-program > important-evidence.txt
$ echo '---------MARKER-------' >> important-evidence.txt
$ some-buggy-program some-params >> important-evidence.txt

The second and subsequent lines append the output from the commands issues to the evidence file rather than overwriting them. Try the following:

$ echo This is a line of text > /tmp/file.txt
$ echo This is another line > /tmp/file.txt

What do you get?

Now try:

$ echo This is a line of text > /tmp/file.txt
$ echo This is another line >> /tmp/file.txt

What do you get this time?

OK, for the last few tricks here. Sometimes you want to append STDOUT and STDERR to a file. How do you do it?

$ some-command >> /tmp/log.log 2>&1

The 2>&1 says make STDERR point to the same places as STDOUT. Since STDOUT is open already, and the shell has done a seek to the end, STDERR will also be appended to STDOUT.

If you want to append a line to a file, you can echo the line you want with a redirect, rather than firing up an editor:

$ echo Some text >> /path/to/some/file

It turns out that you can cause the shell to redirect to other file descriptors as well, and if you look in the configure scripts that come with many UNIX software packages, you will see examples of this.

Why is redirecting so important? Well, it is used in many shell scripts, it is a simple and conventient mechanism to sending output to any file without the programmer having to add code for handling command line instructions, and it is the UNIX way of doing things :-).

It is also the same as piping, where you redirect output to, or input from, a pipe device. The pipe device has a process living on the other side, but we will look at this later.

No comments: