The grep command

The grep command
	Chapter 3. Regular Expressions

The name of this command, comes from a command in the Unix text editor -ed- that takes the form g/re/p meaning search globally for a regular expression and print lines where instances are found.

This acronym readily describes the default behaviour of the grep command. grep takes a regular expression on the command line, reads standard input or a list of files, and outputs the lines that match the regular expression. (Quoted from the Wikipedia (http://en.wikipedia.org/wiki/Grep).^[10]

grep can be used to do a whole host of tricks and magic. We can use it as either a filter or to look inside files. It also uses regular expressions.

Let's start off with using grep to look inside files. If I wanted to determine which users use the bash shell, I could do it the following way:

	
grep "/bin/bash" /etc/passwd

I'm enclosing /bin/bash inside double quotes because I don't want anything to interpret the forward slashes. ^[11]

We could pipe this grep output to the cut or the sort commands, etcetera.

We can search any file, or group of files looking for various patterns.

Remember that grep is looking for a pattern, so as per our example, it's not looking for a string, it's looking for a pattern beginning with a forward slash ( / ), followed by the letters 'b' i 'n', followed by another forward slash ( / ), etc.

Understand that it's searching for a pattern. Why am I emphasising this?

Primarily, because we could use our pattern matching characteristics. We could say:

grep "[hH][aA][Mm]" /etc/passwd

which would match all of the following patterns:

hAM
HaM
HAm

I could also:

grep "Linuz" bazaar.txt

We could equally have done a

grep "Linu." bazaar.txt

or better still

grep '[lL]inu.' bazaar.txt

which would've grep'd using a pattern 'l' or 'L', i, 'n', 'u' and then any other character. This would grep both Linux and Linus (or linux or linus).

You can see a similarity starting to appear between using regular expressions in grep and regular expressions in sed. They are all RE's, so there should be no difference!

For example I can combine these patterns now:

grep "[Ll]inu." bazaar.txt

What happens if I wanted any 5 characters to follow Linu or linu, then I would use the following:

grep "[Ll]inu.\{5\}" bazaar.txt

grep (naturally) has other switches that are useful:

switch	action
-B 5	display the context - i.e. 5 lines of context before a match
-A 3	display the context - 3 lines of context after a match
-v	reverses the pattern
-n	label every line

The following command would grep every line except the lines that have the pattern Linus/linus/Linux/linux etc. and it would label every line because of the -n switch.

grep -vn "[Ll]inu." bazaar.txt

If you wanted to grep on a whole stack of files then you could:

grep -n "Linux." *

which would show you the filename and the line number of the line that contains the pattern.

So far we have used grep fairly effectively to look inside a file or a series of files. Now we want to use grep as a filter.

The best way to see how this works is to use your messages file (/var/log/messages). This file logs all messages from applications on system. If you don't have access to your messages file, then you need to be logged in as root. In order to affect this particular example, you need to have access to the /var/log/messages file.^[12]

Look at the time on your system with the following command:

date

Use grep as a filter to extract all messages that occurred during 11 'o clock in the morning. The following pattern should achieve this:

tail -20 /var/log/messages |grep '11:[0-5][0-9]'

In this case, we're using grep as a filter, filtering the input that's coming from the tail command and actually reducing the amount of output we receive.

Now clearly, because of the ability to pipe commands, you can use the output of one grep command as input to the next.

So we start off with a huge amount of data, but by piping data through grep we filter out only the information that we want.

To continue our examples, let us count the number of lines that exist in the messages file.

cat /var/log/messages |wc -l

Now count the number of messages that occurred at 11 o' clock?

cat /var/log/messages |grep '11:[0-5]\{2\}'

Now count the number of messages that occurred at 11 o' clock on 25 November:

cat /var/log/messages |grep '11:[0-5][0-9]' |grep 'Nov 25'

You should notice fewer lines displayed as your pattern gets more specific.

We could keep on filtering as many times as we want. What I encourage you to do, is to look for a pattern and, using a pattern, reduce the number of output lines. By reducing output lines to fit your criteria, you can save on time.

We could use grep with ls:

cd /var/log

Let's only look for files that are directories:

ls -la |grep '^d'

Let's only look for files that are not directories:

ls -la |grep -v '^d'

Let's look for files that end in 'log':

ls -la |grep -v '^d' |grep 'log$'

You see how we are using grep as a filter for the ls command.

grep is one of the "Swiss army knives" that you just cannot do without. The more we script the more we will use grep and the better we will get at it.

Look at the info pages on grep to find out all the other things that you can do with it.

grep, egrep and fgrep

There are three versions of the grep command:

type	function
grep	basic regular expressions
egrep	uses extended regular expressions slowest
fgrep	no regular expressions fastest

If you're using egrep it's the slowest, you can test this using the following:

time egrep "[Ll]inu." bazaar.txt
time grep "[Ll]inu." bazaar.txt
time fgrep "[Ll]inu." bazaar.txt

The times should decrement from top to bottom. grep by default isn't very fast, so if you're trying to do the same job that was done with grep and sed, sed would be significantly faster than grep.

I'm not going to cover egrep or fgrep because they work almost identically to grep, the only difference being that you can use extended regular expressions (egrep) or no regular expressions at all (fgrep).

Exercises:

Use fgrep, egrep and grep where you think it appropriate to achieve the following:

Search in the messages file for all log messages that occurred today.
How many times did users attempt to log in but were denied access with "Authentication Failure"
Search in the emails* for lines containing email addresses.
Display a list of subjects in the emails.* text files.
Time the difference between using the egrep vs the fgrep in commands 3 and 4. Rewrite commands 3 and 4 in sed, and time the command. Compare the times to those obtained using the same pattern in egrep and fgrep.
Show 2 lines of context on either side of the lines in question in 3 and 4 above.
Look for the pattern linuxrus in the emails*.txt, irrespective of case!

Challenge sequence:

From the emails*.txt files, show only the lines NOT containing linuxrus.

Using the concept of grep as a filter, explain why this would be a useful command to use on large files.

^[10]This is a cool web site. I found it while looking for Far Side Cartoons on the Internet.

^[11]You can grep for any type of shell such as /bin/false, essentially you would be obtaining the lines in the password file without actually opening the file with a text editor like vi.

^[12]If you are not logged in as root, and you need to be, type the following command:su - root and enter the root password when prompted. Now you should be able to:

cat /var/log/messages

This could be rather long, so instead you could

	tail -20 /var/log/messages

which would show you only the last 20 lines of the messages file.


The Sort command		Chapter 4. Practically Shell Scripting