Using sed with regex capture groups – Ryan Chapin's Website

There are many times when you have a file from which you want to extract specific strings based on a regex and using a capture group is a very efficient way to parse multiple strings from the same line.

I have found that sed is the easiest way to do so on the Linux command line.

Given the following input file:

This is a line of text with a year=2020 month=12 in it
This line of text does not have a year or month in it
This year=2021 is the current year the current month=1
This is the year=2021 the month=2

Let’s say that you want to extract the year and the month digits from each line and generate a line of output for each line if input that looks like:

my year: <year>, my month: <month>

You would run the following command defining two capture groups:

sed -rn 's/.*year=([0-9]+).*month=([0-9]+).*/my year: \1, my month: \2/p' input.txt

Which will output:

my year: 2020, my month: 12
my year: 2021, my month: 1
my year: 2021, my month: 2

The -rn flag tells sed to use extended regular expressions in the script and to suppress printing unless explicitly directed after we make a match.

The s command tells sed that we are going to execute a substitution and that we will define a regex, a replacement string, and optional flags.

.*year=([0-9]+).*month=([0-9]+).*

Defines two capture groups. One to look for any number of contiguous digits after year= and another for any number of contiguous digits after month=. The .* explicitly tells sed that we want to ignore any number of any type of characters between the defined groups.

my year: \1, my month: \2/p

Tells sed how to format the output to include each capture group, \1 for capture group 1 and \2 for capture group 2.

2 thoughts on “Using sed with regex capture groups”

Hi rchapin,
Thanks for this great example.
But there is a kind of unsharpnes concerning the output.
The actual output is:
my year: 2020, my month: 12
my year: 2021, my month: 1
my year: 2021, my month: 2

To complete the explanations about the call of sed in this example,
could you also tell what
‘s/
mean. even thought it might be trivial for you

Thanks in advance

Takidoso

rchapin says:

May 24, 2022 at 12:34

Takidoso,

Thanks for pointing that out, and for your kind words! I updated the post and added a link to the GNU docs for the ‘s’ command.

Log in to Reply

2 thoughts on “Using sed with regex capture groups”

Leave a Reply Cancel reply