Short practical use of the awk command

awk is a non interactive text processing and splitting tool. It is hugely powerful. This page is not a complete example of the use of the awk command, but will show some of my real-world examples.

awk will be used here in conjunction with some of the examples prepared in the sed practical use page.

Quick use of awk

We can take output pushed from another command, and pipe it into awk for processing. Basic command line piped syntax is:

awk 'BEGIN {FS="delimeterCharacter"} {print $1 $2 $3}'
  • The complete expression must be entered between single quotes.
  • BEGIN informs awk that there is an expression that is starting. If you omit the BEGIN command, then awk may completely ignore your delimiter character, and just split on spaces which can have a totally different outcome to the expected one!
  • FS stands for File Separator and the complete FS sub-expression is entered between curly braces
  • delimiterCharacter is a placeholder for the single character (or regex expression) that will be used to split the file into multiple fields. it is always specified between double quotes
  • A second set of curly braces hold the print sub-expression, that will output the split fields. Each field will be referenced by a number of the field generated by the split, starting at 1.

Example from the sed string above

$ var="This is an example string with values. Value1=100 Value2=200. We want to extract the value of Value1"
$ echo $var | sed s/Value1=/£/g | sed s/Value2=/£/g | awk 'BEGIN {FS="£"} {print $2}'
# Will output the second field containing 100.

After processing this string with £ as a string separator - a separator that we know not to be present in the source file as not to suddenly have unexpected columns created, the returned field values would be as follows:

  • $1 contains This is an example string with values.
  • $2 contains 100
  • $3 contains 200. We want to extract the value of Value1

awk magic

Get a file result of all Apache page runtime values (expecting a numerical value in microseconds), check they are numerical, print only those values where the runtime > 14000000 microseconds, then grep each value to get the actual corresponding full line.

Here, we look in our logfile apache.log, get the 15th entry that corresponds to the page processing time value in my logs, sorted numerically, and process the resulting list one by one, with each value copied into the variable called timeout, then use that value as a key to pull all results that match with grep and copy the results into the file apache.timeouts.txt.

$ for timeout in `cat apache.log | awk '{FS=" "} {print $15}' | awk '{if($1==$1+0 && $1>14000000)print $1}' |sort -n` ;
do
    grep $timeout apache.log >> apache.timeouts.txt;
done

Note: We are in a loop, so the grep will run for each value returned by awk. If there are more than 1 result, you cannot use the > redirection operator as this will overwrite the file and start again for each value. You must use the » redirection operator, which only creates a file if it does not exist, but appends new data to the end of the file if it does.

Here, awk is not just used for splitting the files, but once the 15th field is exported from the Apache log, then conditional processing is run to get the fields where that value is > 14000000 and then use that value as a key to re-grep the file and pull that affected record on the basis that there probably won't be too many to the microsecond results in the file, and if needed you can then look into sort'ing and uniq'ing those values if you do think that there could be duplicates to remove…

class HelloWorldApp {
    public static void main(String[] args) {
        System.out.println("Hello World!"); //Display the string.
    }
}