Regular Expressions   «Prev  Next»
Lesson 10Using subexpressions
ObjectiveUse the s/// operator with subexpressions to perform selective text replacement.

Using Subexpressions with Perl

The substitution below will add commas to a long number. Europeans can use the same technique for adding periods.
while ($number =~ s/(.*\d)(\d\d\d)/$1,$2/g) { }
  1. The first subexpression (.*\d) will have all the characters except the last three digits, ensuring that at least four digits exist.
  2. The second subexpression (\d\d\d) has the last three digits. Since the /g modifier is not recursive, it is necessary to put this in a loop to make sure it gets all the numbers.

  • Substitution Expression In order to step through the flow of this substitution expression, we are going to use the following series of images.

1) Matches a minimum of 4 numbers in a row
1) Matches a minimum of 4 numbers in a row

2) First iteration through the regular expression
2) First iteration through the regular expression

3) Second iteration through the regular expression
3) Second iteration through the regular expression

4) Third iteration through the regular expression
4) Third iteration through the regular expression


Using two Sub-expressions in Perl

Here is an example of using two subexpressions:
while ($number =~ s/(.*\d)(\d\d\d)/$1,$2/g) { }

Beginning with Perl version 5.5, we can compile regular expressions and keep references to them without using a match or substitution operator. We can do all of our regular expression handling and preparation before we actually want to use them. Since a regular expression reference is just a scalar like any other reference, we can store it in an array or hash, pass it as an argument, interpolate it into a string, or use it in the many other ways we can use a scalar.

Before you begin the exercise, you may want to explore more examples of selective replacement in action.
Selective replacement Trimming whitespace
Following is an example that trims all the leading and trailing whitespace in a file:
s/^\s*(.*?)\s*$/$1/;

Notice the use of the ? character to prevent the .* from being greedy.
Without it, all the trailing spaces would end up in the subexpression. This one also trims Perlish comments:
s/^\s*(.*?)(\s*|\s*#.*)$/$1/;

Notice that this one uses two subexpressions, but we just throw the second one away. That is because the parenthesis are necessary to bind the alternation to just the two conditions in the second subexpression.

Edit File using Selective Replacement - Exercise

Click the Exercise link below to write a program that uses selective replacement on an email file.
Edit File using Selective Replacement - Exercise

SEMrush Software