Short practical use of the sed command

sed, the Stream Editor is a non-interactive search and replace tool, that can work on a file or data provided from standard input, from the output of another command for example.

This used for finding and replacing values in a file, especially in preparation for a file's splitting with awk, as this command prefers splitting files on a single delimiter character rather than a string, so if you can identify strings around the values you want to extract, you can replace them on the fly with sed

In command line format where you will be piping input into sed you will use the following general syntax:

sed "s/SearchValue/ReplaceValue/g"
  • s for Start
  • SearchValue is the value to look for
  • ReplaceValue is the value to replace SearchValue
  • g for Go.

The command will then output it's processed string. You can either display that or pipe that into another command.

Note: If your search or replace strings contain special characters like a singe or double quote, spaces or symbols including %, # \ or ., then you may need to escape them by adding a backslash character \ before it to force sed into handling that character as it's litteral value, not it's possible special value.

Example string - Recognising Patterns

This is an example string with values. Value1=100 Value2=200 We want to extract the value of Value1

We can see that:

  • Value1=100
  • Value2 always follows Value1
  • We can replace Value1= with a unique delimiter character and Value2= with the same
  • From there, we can use awk to split on this delimiter and return the 2nd element
var="This is an example string with values. Value1=100 Value2=200. We want to extract the value of Value1"
echo $var | sed "s/Value1=/£/g" |sed "s/Value2=/£/g"
# This will return the following string:
# This is an example string with values. £100 £200. We want to extract the value of Value1

Cascading multiple sed commands

In the example above you can see that we output the value of $var to the console, piped that into sed, and then piped it's output into another sed.

This works, and can be very useful when you explicitly need one pattern change to be atomic, meaning that only one specific change is run in one instance of the data at one time.

If this is not needed, then you can cascade your matches in one sed command with the -e= argument. For example:

var="This is a string with a time: [12/Mar/2018:17:14:43 +0100] and other data"
echo $var | sed -e "s/\[/£/g" -e ""s/\]/£/"
# This will return the following string:
#"This is a string with a time: £12/Mar/2018:17:14:43 +0100£ and other data"

So, here only one sed was needed as you cascade the search and replace arguments with -e and you can see that now the date part of the string is encapsulated between £ symbols - and if you use this as a column separator, your date is now in column 2.

When cascaded processed don't work

Remember that you are looking at patterns, and sometimes you may want or need to cascade awks and seds together, due to how the data you have is formatted.

For example with the following 2 log lines. Looking for patterns as we want to extract the client number from the url encoded string, along with the date.

1.2.3.4 [16/Jan/2020 01:02:03] "GET /page.html?one=2&url=https%3A%2F%2Fexample.com%2Fsomthing%3Fclient%3D12345%26pagename%3Dreadme
1.2.3.4 [16/Jan/2020 01:02:04] "GET /page.html?one=3&url=https%3A%2F%2Fexample.com%2Fsomthingelse%3Fsession%3D123456789ABCDE%26client%3D45678%26isconnected%3Dtrue%26pagename%3Ddifferenpage

So we have 2 url's that contain a date in square brackets, and is always in the same place.

The urlencoded string contains a variable called client, but this is not always in the same place, and is not always surrounded by the same values, so we cannot just delimit the file on the string %2F (which represents the HTML entity &) as when we process that with awk we will not be able to identify the correct column position, so we will need to proceed in several steps:

  • Replace [, ] and 26client\%3D by £
    • This way we can start extracting our base strings where we expect them to be:
    • Date in the 2nd column
    • client number in the 4th column but will contain a lot of other unwanted data
  • We process this value with awk, pulling these values, adding in a new seperator between the date and the client value, for example | and then send the awk output back to sed
  • As we know that we have good values for the date, and the start of the client value is clean, but contains other data we don't want at the end, we can now replace the %2F values with £ and pass that new output to awk.
  • We then split on the new £ character with awk, and export the 1st column - which is everything we want to keep from our clean string: a date and a client id.

Note See the Quick practical use of awk for specific explanations for the awk command.

First pass

$ cat log.txt |sed -e "s/\[/£/g" -e "s/\]/£/g" -e "s/client\%3D/£/g" >> 1.txt

$ echo 1.txt

1.2.3.4 £16/Jan/2020 01:02:03£ "GET /page.html?one=2&url=https%3A%2F%2Fexample.com%2Fsomthing%3Fclient%3D12345%26pagename%3Dreadme
1.2.3.4 £16/Jan/2020 01:02:04£ "GET /page.html?one=3&url=https%3A%2F%2Fexample.com%2Fsomthingelse%3Fsession%3D123456789ABCDE£45678%26isconnected%3Dtrue%26pagename%3Ddifferenpage

Splitting the strings into columns on £ allows us to pull the columns 2 and 4 with sed

$ cat 1.txt | awk 'BEGIN {FS="£"} {print $2 "|" $4}' > 2.txt

$ cat 2.txt

16/Jan/2020 01:02:03|12345%26pagename%3Dreadme
16/Jan/2020 01:02:04|45678%26isconnected%3Dtrue%26pagename%3Ddifferenpage

Note: The print part of awk will display both columns referenced with $ + column number, but any other string as long as it's quoted between characters, allowing you to re-create separators between your new columns.

The code above contains column 1 with the date, and column 2 containing our ID and extra characters.

Second pass

We will now replace the symbols %26 by £, which will create several new columns, but we are only interested in the 1st column, which will now removed the extra data into columns 2 and above when we extract it with awk

$ cat 2.txt | sed "s/%26/£/g" | awk 'BEGIN {FS="£"} {print $1}'

16/Jan/2020 01:02:03|12345
16/Jan/2020 01:02:04|45678