REGEX SUBROUTINES
This is the feature that makes people say "regex can do THAT?" Most programmers never learn this exists. Once you know it, you'll wonder how you lived without it.
The idea is simple: define a pattern in a group, then call that pattern again later without rewriting it. Just like a function in code. Write once, use many times.
The Basic Syntax
If you have a pattern in group 1, you can re-invoke that pattern with (?1). Group 2 is (?2). And so on.
Say you want to match a date range like "2024-01-15 to 2024-12-31". The date format appears twice and it's the same format both times.
Without subroutines:
\d{4}-\d{2}-\d{2} to \d{4}-\d{2}-\d{2}
With subroutines:
(\d{4}-\d{2}-\d{2}) to (?1)
Group 1 defines the date pattern. (?1) says "match that same pattern again right here." Shorter, cleaner, and if you need to change the date format you only change it in one place.
echo "2024-01-15 to 2024-12-31" | grep -P '(\d{4}-\d{2}-\d{2}) to (?1)'
Subroutines vs Backreferences
This is the critical distinction you need to internalize.
\1 matches the SAME TEXT that group 1 already captured
(?1) matches the SAME PATTERN that group 1 defines
In action:
echo "2024-01-15 to 2024-01-15" | grep -P '(\d{4}-\d{2}-\d{2}) to \1'
echo "2024-01-15 to 2024-12-31" | grep -P '(\d{4}-\d{2}-\d{2}) to \1'
First line matches (same text "2024-01-15" appears twice). Second line FAILS because \1 wants "2024-01-15" but finds "2024-12-31".
Now with subroutines:
echo "2024-01-15 to 2024-12-31" | grep -P '(\d{4}-\d{2}-\d{2}) to (?1)'
Matches. (?1) doesn't care what the first date was. It just reruns the pattern and matches whatever valid date appears.
+---------------------------------------------------+
| \1 = "match that exact same text again" |
| (?1) = "run that pattern again on new text" |
| |
| Backreference = same value |
| Subroutine = same shape |
+---------------------------------------------------+
Named Subroutines
Just like named capture groups (?<name>...), you can call named groups as subroutines with (?&name).
(?<date>\d{4}-\d{2}-\d{2}) to (?&date)
More readable. The (?&date) says "run the date pattern here."
The DEFINE Block
Sometimes you want to define reusable patterns WITHOUT actually matching anything at that point. That's what DEFINE blocks do.
(?(DEFINE)
(?<name1>pattern1)
(?<name2>pattern2)
)
Everything inside is defined but not matched. Like declaring functions at the top of a file. Then call them with (?&name) wherever needed.
Matching a timestamp "2026-02-19 14:30:45":
(?(DEFINE)
(?<date>\d{4}-\d{2}-\d{2})
(?<time>\d{2}:\d{2}:\d{2})
)
(?&date) \s+ (?&time)
The DEFINE block sets up two named patterns. The actual matching happens below it. DEFINE itself consumes no text.
echo "2026-02-19 14:30:45" | grep -P '(?(DEFINE)(?<date>\d{4}-\d{2}-\d{2})(?<time>\d{2}:\d{2}:\d{2}))(?&date)\s+(?&time)'
A Realistic DEFINE Example
Validating CSV lines where each field can be bare or quoted:
(?(DEFINE)
(?<field>[^,"\s]+|"[^"]*")
)
^(?&field),(?&field),(?&field)$
Test it:
echo 'Alice,"Portland, OR",admin' | grep -P '(?(DEFINE)(?<field>[^,"\s]+|"[^"]*"))^(?&field),(?&field),(?&field)$'
The field pattern handles both bare values and quoted values (which can contain commas). Defined once, used three times. Compare to writing the field pattern three times inline:
^([^,"\s]+|"[^"]*"),([^,"\s]+|"[^"]*"),([^,"\s]+|"[^"]*")$
The DEFINE version is dramatically easier to read and maintain.
Matching Repeated Structures
Subroutines shine with repeated structured fields. Matching IP addresses in a firewall rule:
(?(DEFINE)
(?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
(?<port>\d{1,5})
)
(?&ip):(?&port)\s+->\s+(?&ip):(?&port)
Matches: 192.168.1.100:8080 -> 10.0.0.50:443
echo "192.168.1.100:8080 -> 10.0.0.50:443" | grep -P '(?(DEFINE)(?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})(?<port>\d{1,5}))(?&ip):(?&port)\s+->\s+(?&ip):(?&port)'
Recursive Matching
Subroutines enable recursive patterns. Match balanced parentheses including nested ones:
(\((?:[^()]*|(?1))*\))
The (?1) recursively calls group 1's pattern, handling arbitrary nesting depth. The (?0) variant calls the entire regex:
echo "calc((a+b)*(c+d))" | grep -oP '\((?:[^()]*|(?0))*\)'
When To Use Subroutines
Think of subroutines any time you copy-paste the same sub-pattern:
- Repeated format validation (dates, times, IPs appearing multiple times in a line)
- Structured data with the same field type in multiple positions (CSV, logs, config files)
- Recursive structures like balanced delimiters
- Any time your pattern is getting long and repetitive +----------------------------------------------------+ | SUBROUTINES IN ONE LINE: | | | | Define once with (?<name>...) or (...) | | Reuse with (?&name) or (?1) | | Declare without matching via (?(DEFINE)...) | | | | It's functions. For regex. That's it. | +----------------------------------------------------+