Wednesday, April 29, 2009

CLI Magic: For geek cred, try these one-liners

In this context, a one-liner is a set of commands normally joined through a pipe (|). When joined by a pipe, the command on the left passes its output to the command on the right. Simple or complex, you can get useful results from a single line at the bash command prompt.

For example, suppose you want to know how many files are in the current directory. You can run:

rakesh@debian ~]$ ls | wc -l

That's a very simple example -- you can get more elaborate. Suppose you want to know about the five processes that are consuming the most CPU time on your system:

rakesh@debian ~]$ ps -eo user,pcpu,pid,cmd | sort -r -k2 | head -6

The ps command's o lets you specify the columns that you want to be shown. sort -r does a reverse order sort with the second column (pcpu) as reference (k2). head gets only the first six lines from the ordered list, which includes the header line. You can place pcpu as the first column and then omit the k2 option because sort by default takes the first column to do the sort. That illustrates how you may have to try several approaches on some one-liners; different versions and ways to manipulate the options may produce different results.

A common situation for Linux administrators on servers with several users is to get quick ordered user lists. One simple way to get that is with the command:

rakesh@debian ~]$ cat /etc/passwd | sort

If you just need the username, the above command returns too much information. You can fix it with something like this:

rakesh@debian ~]$ cat /etc/passwd | sort | cut -d":" -f1

The sorted list is passed to cut, where the d option indicates the field's delimiter character. cut breaks into pieces each line, and the first field f1 is the one that you need to display. That's better; it shows only usernames now. But you may not want to see all the system usernames, like apache, bin, and lp. If you just want human users, try this:

rakesh@debian ~]$ cat /etc/passwd | sort | gawk '$3 >= 500 {print $1 }' FS=":"

gawk evaluates each line from the output piped to it. If the third field -- the UID -- is equal or greater than 500 (most modern distros start numbering normal users from this number) then the action is done. The action, indicated between braces, is to print the first field, which is the username. The separator for field in the gawk command is a colon, as specified by the FS option.

Now suppose you have a directory with lots of files with different extensions, and you want to back up only the .php files, calling them filename.bkp. The next one-liner should do the job:

rakesh@debian ~]$ for f in *.php; do cp $f $f.bkp; done

This command loops through all the files in the current directory looking for those with .php extensions. Each file's name is held in the $f variable. A simple copy command then does the backup. Notice that in this example we used a semicolon to execute the commands one after another, rather than piping output between them.

What about bulk copy? Consider this:

rakesh@debian ~]$ tar cf - . | (cd /usr/backups/; tar xfp -)

It creates a tar package recursevely on the current directory, then pipes this package to the next command. The parenthesis creates a temporary subshell, changes to a different directory, then extracts the content of the package, which is the whole original directory. The p option on the last tar command preserves file properties like time and permissions. After completion, the shell context will be at the original directory.

A variant on the previous one-liner lets you do the same kind of backup on a remote server:

rakesh@debian ~]$ tar cf - . | ssh smith@remote.server tar xfp - -C /usr/backup/smith

Here, the command establishes an SSH remote session and untars the package with the C option, which changes the directory, in this case to /usr/backup/smith, where the extraction will be made.

grep and gawk and uniq, oh my!

Text processing is a common use for one-liners. You can accomplish marvelous things with the right set of commands. In the next example, suppose you want a report on incoming email messages that look like this:


rakesh@debian ~]$ cat incoming_emails
2008-07-01 08:23:17 user1@example.com
2008-07-01 08:25:20 user2@someplace.com
2008-07-01 08:32:41 somebody@server.net
2008-07-01 08:35:03 spam not recived, filtered
2008-07-01 08:39:57 user1@example.com
...

You are asked for a report with an ordered list of who received incoming messages. Many recipients would be repeated in the output of the cat command. This one-liner resolves the problem:

rakesh@debian ~]$ grep '@' incoming_email | gawk '{print $3}' | sort | uniq

grep filters the lines that contains a @ character, which indicates an email address. Next, gawk extracts the third field, which contains the email address, and passes it to the sort command. Sorting is needed to group the same recipients together because the last command, uniq, omits repeated lines from the sorted list. The output is shown below. Most text processing one-liners use a combination of grep, sed, awk, order, tr, cut, uniq, and other related commands.


somebody@server.net
user1@example.com
user2@someplace.com

If you like any of these one-liners but think they're too long to type often, you can create an alias for the command and put it in your .bashrc file. When you log in your session, anything inside this file will be run, so your personal aliases would be ready at anytime.

rakesh@debian ~]$ alias p5="ps -eo pcpu,user,pid,cmd | sort -r | head -6"

You can certainly create better and simpler variations of all of the commands in this article, but they're a good place to start. If you are a Linux system administrator, it's good practice to collect, create, and modify your own one-liners and keep them handy; you never know when are you going to need them. If you have a good one-liner, feel free to share it with other readers in a comment below.

No comments: