March 3

Awk Basics & Tutorial – 2

Today, we will see some awk pattern matching.

create a input.txt file with the below contents. we are going to use the input.txt for all our awk commands.

1
2
3
4
5
6
7
$ cat input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
scripting language consisting of a set of actions to be taken against
textual data (either in files or data streams) for the purpose of
producing formatted reports. The language used by awk extensively uses the
string datatype, associative arrays (that is, arrays indexed by key strings),
and regular expressions.

In the input.txt, i want to print only the lines which have the word “awk”, then we can use the below commands.

# In the below command /pattern/ is any normal word or regular expression.
awk ‘/pattern/’ input.txt

1
2
$ awk '/awk/' input.txt
producing formatted reports. The language used by awk extensively uses the
1
2
$ awk '/awk/{print}' input.txt
producing formatted reports. The language used by awk extensively uses the
1
2
$ awk '/awk/{print $0}' input.txt
producing formatted reports. The language used by awk extensively uses the

How to ignore the case and match the pattern ?

we can use the tolower or toupper function in awk to ignore the case and print the lines.

1
2
3
$ awk 'tolower($0)~/awk/' input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
producing formatted reports. The language used by awk extensively uses the
1
2
3
$ awk 'toupper($0)~/AWK/' input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
producing formatted reports. The language used by awk extensively uses the

# GNU awk has the special varibale called IGNORECASE. In default it set to 0 (means not case-sensitive)
# In the below command, i used -v ( used to define the variable name and initialize the variable value )

1
2
3
$ awk -v IGNORECASE=1 '/awk/' input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
producing formatted reports. The language used by awk extensively uses the
1
2
3
$ awk 'BEGIN{IGNORECASE=1}/awk/' input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
producing formatted reports. The language used by awk extensively uses the

we can also inclue the upper case and lower case in the square brackets and find the pattern.

1
2
3
$ awk '/[Aa][Ww][Kk]/' input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
producing formatted reports. The language used by awk extensively uses the

Now, we can see how to match two or more patterns in a file.

we can use && and || in the condition.

How to search a word “the” and “AWK” in the same line of a file ?

1
2
$ awk '/the/ && /awk/' input.txt
producing formatted reports. The language used by awk extensively uses the
1
2
3
4
# Ignore the case (GNU AWK)
$ awk '{IGNORECASE=1} /the/ && /AWK/' input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
producing formatted reports. The language used by awk extensively uses the

To search a any one of the pattern, then use ||

1
2
3
4
5
# Ignore the case (GNU AWK)
$ awk '{IGNORECASE=1} /the/ || /AWK/' input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
textual data (either in files or data streams) for the purpose of
producing formatted reports. The language used by awk extensively uses the
1
2
3
$ awk '/the/ ||  /awk/' input.txt
textual data (either in files or data streams) for the purpose of
producing formatted reports. The language used by awk extensively uses the

How to find a line which starts with “t” or “T” ?

1
2
3
4
# Here ^ is used to mention the start of the line
$ awk '/^[tT]/' input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
textual data (either in files or data streams) for the purpose of
1
2
3
4
#GNU AWK
$ awk '{IGNORECASE=1}/^t/' input.txt
The AWK utility is a data extraction and reporting tool that uses a data-driven
textual data (either in files or data streams) for the purpose of

How to find a line which ends with specific word ?

You can use a special character “$” to match the pattern which is there in the end of the line.

1
2
3
4
5
$ cat one.txt
one two three
two three one
three two one
one three two
1
2
3
4
# Find the line which ends with the word "one"
$ awk '/one$/' one.txt
two three one
three two one
1
2
3
# Find the line which ends with the word "two"
$ awk '/two$/' one.txt
one three two

Hope you guys enjoyed and learned something about the pattern matching ( basics ) in this blog post.

will see some more basics in the next blog.

bye
Kamaraj