Lesson 9 | Selective replacement |
Objective | Examine how the s/// operator allows for selective text replacement. |
Perl Selective Replacement
Examine how the use of the s/// operator with subexpressions allows for selective text replacement. The substitute operator also allows more selective replacements, using parenthesis to specify subexpressions.
- Sub-expressions
Just as with the pattern-matching operator (m//), subexpressions surrounded by parenthesis on the left-hand side of a substitute operator (s///
)
are matched and placed in the special variables, $1
, $2
, $3
, and so on. But in this case, you can use those variables on the right-hand side as part of the replacement. For example, many people write the nonword alot when they mean to say a lot, (and they do it a lot). This expression can fix that:
s/(a)(lot) /$1 $2/ig
Preserving case in Perl
If the goal is to replace occurrences of
alot with
a lot, why not just do it this way?
s/alot/a lot/ig
Using the subexpressions will preserve the case; the method above will make every occurrence all lowercase.
Using subexpressions is usually preferable because it is more general:
s/(a)(lot)/$1 $2/ig
- Named Subexpressions (5.10):
If you use Perl 5.10 or later, you can also use named subexpressions. Ordinarily, you refer to a captured group in the regex with \1, \2,
and so on. After a successful match, those are $1, $2, and so on. With named subexpressions you can name them and make things easier to read. To name a subexpression, use the syntax (? <name>...). To refer to it again inside of the regex, use
\g{name}.
To refer to the match outside of the regex, be aware that it's a key in the special %+ hash. For example, the double-word stripper would look something like this:
The %+ hash is a special variable that contains only entries for the last successfully matched named subexpressions in the current scope.
Thus, if a named subexpression fails to match, it will not have an entry in the %+ hash. There is a corresponding %- hash not covered here.
Using Independent subexpressions to prevent Backtracking
Independent subexpressions (or atomic subexpressions) are regular expressions, within the context of a larger regular expression, which function independently of the larger regular expression. That is, they consume as much or as little of the string as they wish without regard for the ability of the larger regexp to match. Independent subexpressions are represented by (?>regexp) or (starting in 5.32, experimentally in 5.28) (*atomic:regexp). We can illustrate their behavior by first considering an ordinary regexp:
$x = "ab";
$x =~ /a*ab/; # matches
Whenever you want to use part of the matched expression in the replacement, subexpressions will help you. For example, some characters in a CGI query string are encoded in hexadecimal to prevent conflicts with the URL. These hexadecimal numbers are always preceded by a
%
character. The following regex decodes it:
s/%(..)/pack("c",hex($1))/ge;
Using the matched expression in replacement
Here is an example of using the matched expression in replacement:
s/%(..)/pack("c",hex($1))/ge;
The left-hand side (
%(..)
) finds a
%
and puts the two characters that follow in a subexpression. The right side uses that subexpression as the argument to the
hex
function, then converts the resulting decimal number to a character with
pack
.
Some example REs
As was mentioned earlier, it's probably best to build up your use of regular expressions slowly. Here are a few examples.
Remember that to use them for matching they should be put in /.../ slashes
[01] # Either "0" or "1"
\/0 # A division by zero: "/0"
\/ 0 # A division by zero with a space: "/ 0"
\/\s0 # A division by zero with a whitespace:
# "/ 0" where the space may be a tab etc.
\/ *0 # A division by zero with possibly some
# spaces: "/0" or "/ 0" or "/ 0" etc.
\/\s*0 # A division by zero with possibly some
# whitespace.
\/\s*0\.0* # As the previous one, but with decimal
# point and maybe some 0s after it. Accepts
# "/0." and "/0.0" and "/0.00" etc and
# "/ 0." and "/ 0.0" and "/ 0.00" etc.
In the next lesson, we will continue examining examples of selective replacement and then write a program that performs selective text replacement.