Awk

WIKI AWK is a scripting language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and it is a standard feature of most Unix-like operating systems.

It works line by line, splitting input into fields and applying actions.

Basic Usage\Structure:

awk 'pattern { action }' file

pattern: condition to match (like if).
action: what to do if the pattern matches (default = print line).
file: input file (or you can pipe data in).

If no pattern → action applies to every line.

If no action → default action is { print $0 } (print entire line).

Fields

AWK splits each line by whitespace (default).

$0 → whole line
$1 → first field
$2 → second field
..can go indefinitely with integers
NF → number of fields
NR → record (line) number

echo "Alice 30 Doctor" | awk '{print $1, $2}'
# Output: Alice 30
echo "Alice 30 Doctor" | awk '{print $1, $3}'
# Output: Alice Doctor

test.txt:

cat test.txt

asdfasdf
fjfjfjfjfj
aaaaa

awk '{print $1}' test.txt
# Output:
# asdfasdf
# fjfjfjfjfj
# aaaaa

awk '{print $2}' test.txt
# Output:
# 
# 
#

Note each line only has 1 field.

Also using cat on a file and piping to awk is the same as above.

cat test.txt | awk '{print $1}'
# Output:
# asdfasdf
# fjfjfjfjfj
# aaaaa

cat test.txt | awk '{print $2}'
# Output:
# 
# 
#

Useful Examples

awk '{print NR, $0}' test.txt
# Output
# 1 asdfasdf
# 2 fjfjfjfjfj
# 3 aaaaa
awk '{print NR, $1}' test.txt
# Output
# 1 asdfasdf
# 2 fjfjfjfjfj
# 3 aaaaa
awk '{print NF, $0}' test.txt
# Output
# 1 asdfasdf
# 1 fjfjfjfjfj
# 1 aaaaa

Built in Variables

* NR → line number
* NF → number of fields
* FS → input field separator (default: space)
* OFS → output field separator (default: space)

cat test.txt
# Output
# a
# b
# c
awk 'BEGIN {OFS="|"} {print $1}' test.txt
# Output
# a
# b
# c
awk 'BEGIN {OFS="|"} {print $1,$2}' test.txt
# Output
# a|
# b|
# c|
awk 'BEGIN {OFS="|"} {print $1,$2,$3}' test.txt
# Output
# a||
# b||
# c||
cat test.txt
# Output
# a,1,2,3
# b,2,4,6
# c,3,6,9
awk 'BEGIN {FS=","; OFS="|"} {print $1,$2,$3}' test.txt
# Output
# a|1|2
# b|2|4
# c|3|6
awk 'BEGIN {FS=""; OFS="|"} {print $1,$2,$3,$4,$5,$6,$7}' test.txt
# Output
# a|,|1|,|2|,|3
# b|,|2|,|4|,|6
# c|,|3|,|6|,|9
awk 'BEGIN {FS=" "; OFS="|"} {print $1,$2,$3}' test.txt
# Output
# a,1,2,3||
# b,2,4,6||
# c,3,6,9||

BEGIN and END Blocks
BEGIN runs before reading lines.
END runs after all lines processed.

cat test.txt
# Output
# a,1,2,3
# b,2,4,6
# c,3,6,9
awk 'BEGIN {count=0} {count++} END {print "Total:", count}' test.txt
# Output
# Total: 3

Arithmetic You can do math on fields:

awk '{sum = $2 + $3; print $1, sum}' test.txt
# Output -> Notice without BEGIN and FS, even commas are part of $1 - field 1
# a,1,2,3 0
# b,2,4,6 0
# c,3,6,9 0

awk 'BEGIN {FS=","} {sum = $2 + $3; print $1, sum}' test.txt
# Output
# a 3
# b 6
# c 9

awk 'BEGIN {FS=","; OFS="|"} {sum = $2 + $3; print $1, sum}' test.txt
# Output
# a|3
# b|6
# c|9