Last modified: September 01, 2023

Twelve Useful Unix Commands

Introduction

Unix-based systems, such as Solaris, as well as Unix-like operating systems and environments, such as Linux and Cygwin, have a rich set of commands and utilities, probably more than any other operating system I've used. However, there are a few commands that I use almost daily, and intimate knowledge of what they are, and what they do, can make working with these kinds of systems easier and more effective. Most of them, except for "man" and "ls", are stream commands; that is, they can take input from an I/O stream, and create output on another I/O stream. This makes them extremely powerful. (Actually, output from "man" and "ls" can go to a stream, they just don't read from them.)

I'm not going to go into programming and scripting languages here, as that's outside the scope of the story I want to tell. Discussions of Bourne Shell, Perl, and so on are for another day. There are shades of gray here: "awk" can be used as a programming language, but inclusion here is in the form of simple command line tasks. I may also include one or two Perl "one-liners" for reference. Oh, and I'm leaving interactive text editors out of the mix, also.

Again, it's important to understand that the most powerful use of these Unix commands is not standalone, but combined together into streams. The concept of pipes and redirection, where commands can feed input to one another, accept input from arbitrary data files, and provide output to new data files, is one of the things that makes Unix stand out (and, yes, I know that other operating systems, such as MS-DOS, have copied these ideas, but the original Unix implementations are still superior. I was going to put a quick guide to redirection here, but decided to instead put examples inline as I go along. For now, please remember that pipe ("|") takes the output of one command and makes it the input of another command, less-than ("<") redirects into a command from a file, and greater-than (">") redirects from a command into a file. Also keep in mind that there are three main input/output streams: "stdin", or standard input, which is stream 0; "stdout", or standard output, stream 1; and "stderr", or standard error output, stream 2. Additional I/O streams can be created as needed. Pipes and redirects do not handle the standard error stream by default, so error messages generated by commands will not get passed to subsequent commands, or put into files, and will be displayed on your console instead.

Finally, I've provided parallel examples in cases where there might be syntax differences between Bourne Shell, which is primarily used for writing scripts, and C-Shell, which is used interactively. For example, to redirect a command's standard output and standard error output to the same file, we'd use this in Bourne Shell:

ls > myfile 2>&1

and this in C-Shell:

ls >& myfile

And, now, on with my list of twelve useful Unix commands:

1. man

I originally left this off the list, then when I realized I had used it to double-check the options to all the other ones shown below, decided it was too important to omit. The usage is pretty simple:

man awk

Gives the manual page for the awk command. It has a cousin named "apropos", which shows man pages that might be relevant to a particular topic; for example, "apropos filesystem" lists man pages that might have relevant information about filesystems.

For all of the commands that follow, and for the man command itself, take time to read the man pages. There are many more interesting options than what I have presented here.

2. ls

This may seem kind of obvious, but the obvious uses of "ls" can cloud some of the really cool things it can do. In its simplest form, it lists files in the current directory, in alphabetical order. However, it has a lot of useful options; here are just a few:

-l — List file attributes (known as the "long list" option);
-t — List files in order of modification time;
-a — List all files, including hidden ones;
-r — List files in reverse order;
-R — List files, recursively listing subdirectories as well;
-F — Show files with a symbol showing the type of file;
-u — Show last accessed time instead of mod time in long lists; and
-c — Show inode change time instead of mod time in long lists.

Some of these are so useful that we've set up some "alias" commands in the global login files, so that all users can have them as part of their interactive sessions:

l — Long list, also showing sizes in blocks (ls -ls);
la — Same as the above, but also shows hidden files (think "all" files) (ls -las);
lt — Lists files sorted by modification time, newest to oldest (ls -lst);
nf — Lists the 14 newest files (ls -lst \!* | sed 15q); and
lf — Lists files in columns, showing file types (ls -CF).

The options shown above can be combined in many interesting ways:

ls -ltur: Sorts from least-recently accessed to most-recently accessed; this can be useful in cases where you're looking at, say, a "bin" directory, and wondering which ones are not being used and therefore are candidates for removal.
ls -ltc: This sorts, not by file modification time, but by the time that the file inode entries were changed. Inodes are where file attributes are stored, and this can be useful to detect when file ownerships or permissions were altered.
ls -lRa | grep -- ---: Does a quick scan of a directory and its subdirectories to find any files that might have restrictive permissions (see "chown" for more information on the three dashes in the "grep" command; as for the two dashes, these are necessary in order to pass a dash or dashes as arguments without them being interpreted as command line options, or "flags").

3. cat

At first glance, "cat" seems so simple, you'll wonder why I included it. It's sole purpose is to stream files to standard output, and/or accept streams from standard input. However, there are cases where it works better than just the shell "<" and ">" operators; one such is that it can provide a sequential stream from multiple files, whereas "<" only works with one file per instance:

cat file1 file2 file3 | grep "awordwearelookingfor"

Also, it has some options to massage the data, such as "-n" to precede each line with a unique number.

4. tail (and head)

The "tail" command shows the last few lines of a file; the default is 10, but this can be changed. The most interesting command line option is "-f", which "follows" a growing file. In other words, it doesn't exit after printing the last few lines, but hangs on to the file waiting for more lines to appear, so they can be printed. This is extremely useful for monitoring log files.

The "-r" flag, which prints lines in reverse order, is interesting, but I'm not sure how useful it is.

A cousin to "tail", the "head" command, shows the first few lines of a file; the default is 10 lines, but this can be changed. When given multiple files, it precedes them with a header, which can actually be handy when preparing a summary list of a bunch of files. It's interesting to note that early versions of System V Unix left out this command, because they figured that the command "sed 10q" pretty much did the same thing, at least in the single file case.

5. grep (and egrep and fgrep)

Wow, is "grep" useful. I wish there was a handheld "grep" device that could scan through a stack of papers and look for words, like this command does for files. It's job is to scan data looking for patterns, and it does a great job of it. In fact, it's often used as a verb, as in, "I'm going to grep through my old emails for that address."

"grep" has two cousins: "egrep", which has an expanded set of search pattern options, and "fgrep", which does not honor any search pattern metacharacters (the "f" stands for "fixed", not "fast", contrary to popular belief).

Here are a few examples:

grep "foobar" textfile: Looks for the string "foobar" in the file "textfile".
grep -i "foobar" textfile: Same as the above, but case-insensitive.
grep -i "qu..k" textfile: Looks for any five-character sequence that starts with "qu" and ends with "k"; the period, or "dot", is a metacharacter (wildcard) that stands for any single character.
grep -i " *" textfile: Looks for any sequence of two or more space characters (there are two spaces and an asterisk inside the quotes); the asterisk means "multiple occurrences of the previous character.
grep -l "#!/bin/sh" /eng/admin/etc/*: Lists the names of files in "/eng/admin/etc" that contain the string "#!/bin/sh" (eg. are Bourne Shell scripts), but not the lines that match the pattern.
egrep 'Sid|Nancy' *.html: Looks for lines that contain either "Sid" or "Nancy".

6. sort

Some commands, like "ls", have built-in sorting options. For others cases, the "sort" command is necessary. I have to be honest, it isn't the easiest command to use. Trying to sort on specific columns can be a real chore, and I often have to use trial-and-error to get it right. Also, it may not be obvious, but the default ordering is by ASCII code, and sometimes sorting fields of numbers can be confusing until you remember the "-n" flag. The ordering can be reversed with the "-r" flag, and multiple identical lines can be pruned to just one instance with the "-u" (for "unique") flag.

As but one example, if I am examining a group of directories on a disk, and trying to figure out how much space they're using, I use this command:

du -sk * | sort -r -n

It gives me a list sorted from largest to smallest, in numeric order. The default sort field is column 1, which is where "du" puts the file size, so I don't have to specify it.

7. wc

The "wc", or "word count", command, seems primitive but is actually quite useful. I almost always find myself using it with the The "-l" option, which makes it count lines instead of words. For example, in our nightly "cleanup" scripts, we only reboot systems if no users are logged in to them, using a Bourne Shell construct like this (simplified from the original):

if [ `who | wc -l` = "0" ]; then
    shutdown -y -g30 -i6 "Nightly Reboot In Progress"
fi

8. diff (and cmp)

I can't emphasize enough the usefulness of "diff". If you need to find any and all differences between two files, this is the right tool for the job. It will show lines that were altered, lines that were added, and lines that were deleted, between two versions of a file.

The notation takes a little getting used to, but it's enough to remember that lines starting with "<" point to the first file listed on the command line, and lines starting with ">" point to the second.

The combination of flags "-wb" is useful for ignoring any differences caused by "white space" (eg. combinations of space and tab characters), and the flag "-c" produces a "context diff", where the three lines before and after each difference are shown (this can make slogging through a complex set of changes much easier).

For example, want to see if a file is sorted or not? Try this:

sort file.txt | diff file.txt -

If it outputs anything, that means it isn't sorted. If it is sorted, then you won't see anything. The dash ("-") argument means "read from standard input instead of from a file."

A cousin to "diff", the "cmp" command, does a byte-by-byte file comparison. It's quicker than diff in cases where you only want to find what files are different, and not where they're different, and also the best way to determine if binary files are different ("diff" hates binary data).

9. find

If there's a diamond in the rough, it's probably the "find" command. It will walk down a directory tree, looking for anything that matches the specified criteria. I use this command almost daily. As the saying goes, though, "with great power comes great responsibility"; This command can be extremely useful, because it can repeat a command on many files in a directory tree, but can also be extremely destructive, for the same reason. If you're doing something complex, I strongly advise running it on a small test directory before inflicting it on a large number of files and directories.

This is the only case where I will explicitly advocate the use of the GNU version in some cases, because they've added some really useful (and obvious) extensions.

Also, this command is best described through example:

find . -type f -user "haljordan" -print: Prints any "regular" files (not directories, symbolic links, or device files) owned by user "haljordan".
find / -xdev -mtime -3 -print: Prints any files on the root disk that have been modified within the last three days. The -xdev flag is really important: It tells find to only look on the current disk device, and not traverse into other mounted partitions, or (worse), NFS-mounted partitions.
gfind /var/mail/. -type f -name '*.lock' -cmin +60 -exec /bin/rm '{}' \;: This is an actual command that is run hourly on Fate, which deletes email lock files that are older than an hour; we use GNU find because it allows use to specify the modification, access, and inode change times in minutes instead of just days, as is the case with the standard version of find. It's also important to understand that when using the -exec option, the curly braces ("{}") are replaced with the name of every file found, and there has to be a \; at the end to terminate the command line to be executed.
find /archive -type f \! -perm 0664 -exec chmod 0644 {} \;: This shows any file that does not have mode 0664 (user and group read/write, other read) set. I use this, along with the aforementioned "ls -lRa | grep -- ---, after doing updates in directories like "/util/gnu", to make sure files have the correct permissions. Oh, and the backslash before the exclamation point is only necessary when using C-Shell.
gfind . -iname "*.jpg" -print: Finds all files ending with ".jpg", in a case-insensitive manner. The standard version of find only has the -name option, which matches exact case, so we'd have to use something like -name '*.[Jj][Pp][Gg]' to get the same results.

10. sed

sed is known as the "stream editor". It's a non-interactive text editor that do a surprising number of things that you would normally expect to need to do with an interactive editor. It's important to remember that it only works on I/O streams, and does not modify files by itself. Again, it's probably best to show it through example:

sed -e 's/foo/bar/g' < infile > outfile: replaces all occurrences of "foo" with "bar" in the file "infile", putting the output into "outfile". Without the "g" at the end of the substitute command, it would only have replaced the first occurrence on each line; "g" makes the command "global". By the way, the delimiter character for the search and replace portions does not have to be slash; so, for example, if you're replacing a directory path, you can use "#" or anything else that isn't part of the pattern.
sed -ne '/The/p': This prints only the lines that contain the pattern "The" (just like "grep").
(echo "<p>"; sed -e '/^$/c\ </p>\ \ <p>' < infile; echo "</p>") > outfile: This is a weird one. The purpose is to take a text file containing paragraphs separated by blank lines, and wrap them in HTML paragraph tags. It does this by using echo to write an initial tag, then sed to change all blank lines into three lines: An end paragraph tag, a blank line, and a start paragraph tag. It then uses echo again to write a final end paragraph tag. sed commands which use "newline" characters are finicky, and the use of backslashes to denote newline characters is necessary. We also run this sequence of commands in a subshell, denoted by the parentheses, so that we can redirect the output of all three commands into the output file. You'd almost certainly want to make a script out of this, and not type it in on the command line.

11. awk

awk is a pattern scanning language. It, along with sed, was part of the inspiration for the Perl language.

I tend to use it mostly for its ability to split lines into fields (For more complicated things, I usually switch to Perl). By default, it will split on arbitrary whitespace, but that can be changed by specifying a new delimiter with the -F flag. For example, to split out the subnet part of a UB IP address:

echo "128.205.25.5" | awk -F. '{ print $3 }'

If fed multiple lines, like a file, it will loop through all of them, applying the same command. It also uses sed-style patterns, so we can do things like:

awk -F. '/^128\.205\./ { print $3 }' < iplist

Which would only print the subnets for UB addresses, when given an arbitrary list of IP addresses (the backslashes before the periods are necessary, because they are normally a wildcard for any single character, and this tells them to be interpreted literally).

12. cut

cut is a neat little command, used to split lines into pieces. I often use it instead of awk because it's small, and because it's a bit easier and has its own set of options. It can split on either single-character delimiters, or on fixed column positions, making it ideal for processing data.

Here are some simple examples:

cut -d: -f6 < /etc/passwd: This splits the password file on colon characters, and prints the sixth field, which is the home directory location.
cut -c20-30 < somedata: This prints columns 20-30 of the file "somedata".

This is another of those commands that looks simple on its own, but is powerful when combined with others in a stream.

Conclusion

It was hard to limit this list to twelve commands; I originally had ten, but then kept thinking of others (actually, I did take some liberties by listing similar commands in some cases). Commands such as ln, file, chmod, chown, chgrp, and ps almost made the cut, but I decided to leave them for a future list of either useful advanced user commands or useful system administrator commands.

The final item I want to present is one of the core Unix philosophies, "There's more than one right way to do it". There is no reason why we couldn't use:

cat iplist | grep '^128\.205\.' | cut -d. -f3

Instead of:

awk -F. '/^128\.205\./ { print $3 }' < iplist

It all boils down to personal preference, efficiency, portability, and readability. I tend to favor portability, which is why I often use these commands instead of a language like Perl, because you are almost guaranteed to find these commands on any Unix system you encounter. Your mileage may vary.