Van Morrison - September 25, 2018

Using the Grep Regex functionality in grep command line

The grep command is an essential tool for any linux command line beginner, novice or expert. It makes your job much easier and it is very easy to use.

As a short definition, grep is a binary that searches for a pattern in a file. A grep command can be customized in multiple ways.

Basic examples:

‘grep PATTERN FILE’ will search for the exact PATTERN (case sensitive) in the given FILE

***

# grep files copyright.doc

The files tagged with this license contains the following paragraphs:

Find the perfect Proxy Product.

Proxyrack offers a multiple options to suit most use cases, if you are unsure our 3 Day Trial allows you to test them all.
Security

Residential proxies

Never get blocked, choose your location
View all option available
Vault

Datacenter proxies

Super fast and reliable
View all option available
Try

3 Day Trial

Test all products to find the best fit
View all option available

obtaining a copy of this software and associated documentation files

The files provide information regarding scaling of HTTP and SOCKS requests.

***

grep supports various parameters that can also be combined.

‘grep –i’ ignores case sensitivity:

***

# grep -i files copyright.doc
Files: *

Files: schedutils/ionice.c

Files: schedutils/chrt.c

Files: disk-utils/raw.c

The files tagged with this license contains the following paragraphs:

obtaining a copy of this software and associated documentation files

The files provide information regarding scaling of HTTP and SOCKS requests.

***

‘grep –v’ excludes a PATTERN from a search:

***

# grep -i files copyright.doc | grep -v schedutils
Files: *

Files: disk-utils/raw.c

The files tagged with this license contains the following paragraphs:

obtaining a copy of this software and associated documentation files

The files provide information regarding scaling of HTTP and SOCKS requests.

***

These being said, let’s jump in our specific topic, using grep with regular expressions.

A regular expression (regex) is a pattern that describes a set of strings. Regular expressions use certain operators to combine smaller expressions and are constructed similar to arithmetic expressions.

Let’s explore the options.

The ^ character will only match a word at the beginning of a line, while the $ character will only match a word at the end of a line:

***

# grep -i "^files" copyright.doc
Files: *

Files: schedutils/ionice.c

Files: schedutils/chrt.c

Files: disk-utils/raw.c
# grep -i "files$" copyright.doc

obtaining a copy of this software and associated documentation files

***

Using brackets, you can list a number of characters from which only one at a time will be used for the search pattern. The following command searches for any word that contains tam, tim or tom.

***

# grep -i "t[aio]m" copyright.doc

2012 Andy Lutomirski

Files: */timeutils.*

***

Combining the brackets and the ^ character can help you exclude a character from the search. The following command will search for any string that contains “le” and a character preceding “le” EXCEPT the letter i:

***

# grep  "[^i]le" copyright.doc

On Debian systems, the complete text of the GNU General Public

LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR

must display the following acknowledgement:

***

As you can see, the words “file” and “files” are not matched.

The period . will match any single character, including blank space:

***

# grep -i 'file.' copyright.doc
Files: *

Files: schedutils/ionice.c

Files: schedutils/chrt.c

Files: disk-utils/raw.c

The files tagged with this license contains the following paragraphs:

This file may be redistributed under the terms of the

can be found in /usr/share/common-licenses/LGPL-2 file.

obtaining a copy of this software and associated documentation files

The files provide information regarding scaling of HTTP and SOCKS requests.

***

To match 2 characters before the string, you can use ..

***

# grep -i '..le' copyright.doc
Files: *

Files: schedutils/ionice.c

Files: schedutils/chrt.c

Files: disk-utils/raw.c

The files tagged with this license contains the following paragraphs:

On Debian systems, the complete text of the GNU General Public

BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN

***

These operators must be “escaped” if you want to search for the exact string, instead of their usage. By “escaped”, I mean using a \ in front of them. Example:

grep “2.” copyright.doc will match any string containing “2” and a character after it:

***

# grep "2." copyright.doc

License: BSD-2-clause

2. Redistributions in binary form must reproduce the above copyright

2) Redistributions in binary form must reproduce the above copyright notice,

2. Redistributions in binary form must reproduce the above copyright

2. The SOCKS5 protocol is defined in RFC 1928. It is an extension of the SOCKS4 protocol

***

While grep “2\.” copyright.doc will only match the string containing ”2.”:

***

# grep “2\.” copyright.doc

2. Redistributions in binary form must reproduce the above copyright

2. Redistributions in binary form must reproduce the above copyright

2. The SOCKS5 protocol is defined in RFC 1928. It is an extension of the SOCKS4 protocol

***

Some predefined classes of characters can be included in the construction of a grep command.

These are, alphabetically, [:alnum:], [:alpha:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:]

In order to match all the lines containing numbers, the syntax would be ‘grep [[:digit:]] copyright.doc’

# grep [[:digit:]] copyright.doc
License: LGPL-2.1+

Copyright: 2001 Andreas Dilger

Copyright: 2008-2012 Karel Zak 

License: LGPL-2.1+

Copyright: 2013, Red Hat, Inc.

You must use these classes between another set of brackets, so [[:digit:]] and not [:digit:].

[:digit:] is just a predefined expression for [0123456789].

Finally, as stated in the grep manual:

A regular expression can be followed by one of several repetition operators:

? The preceding item is optional and will be matched at most once.

* The preceding item will be matched zero or more times.

+ The preceding item will be matched one or more times.

{N} The preceding item is matched exactly N times.

{N,} The preceding item is matched n or more times.

{N,M} The preceding item is matched at least N times, but not more than M times.

Example, using oll* will match both strings containing ol and oll

***

# grep oll* copyright.doc
       text-utils/col.c

       text-utils/colcrt.c

       text-utils/colrm.c

       text-utils/column.c

Files: libsmartcols/*

The files tagged with this license contains the following paragraphs:

modification, are permitted provided that the following conditions

***

Using grep and your imagination can help you accomplish complex tasks in seconds.

Find the perfect Proxy Product.

Proxyrack offers a multiple options to suit most use cases, if you are unsure our 3 Day Trial allows you to test them all.
Security

Residential proxies

Never get blocked, choose your location
View all option available
Vault

Datacenter proxies

Super fast and reliable
View all option available
Try

3 Day Trial

Test all products to find the best fit
View all option available

Get Started by signing up for a Proxy Product