REGEX FUNDAMENTALS - COMBINING PATTERNS
Now we get powerful. Quantifiers let you match repeated patterns.
Quantifiers - How Many?
These modify the thing that comes before them:
* Zero or more (greedy)
+ One or more (greedy)
? Zero or one (optional)
{n} Exactly n times
{n,} At least n times
{n,m} Between n and m times
Examples:
Pattern: ab*c
Matches: ac, abc, abbc, abbbc (zero or more b's)
Pattern: ab+c
Matches: abc, abbc, abbbc (one or more b's, NOT ac)
Pattern: ab?c
Matches: ac, abc (zero or one b only)
Pattern: a{3}
Matches: aaa (exactly three a's)
Pattern: a{2,4}
Matches: aa, aaa, aaaa (two to four a's)
Real World Quantifier Examples
Phone numbers (simple):
Pattern: \d{3}-\d{3}-\d{4}
Matches: 123-456-7890
Variable length words:
Pattern: \w+
Matches: Any word (one or more word characters)
Optional country code:
Pattern: \+?\d{10,11}
Matches: 1234567890 or +12345678901
The question mark makes the plus sign optional.
Greedy vs Non-Greedy
By default, quantifiers are "greedy" - they match as much as possible.
Given text: <title>Hello</title>
Pattern: <.*>
Matches: <title>Hello</title> (the whole thing!)
That's often not what you want. Add ? to make it non-greedy:
Pattern: <.*?>
Matches: <title> (stops at first >)
Non-greedy quantifiers:
*? Zero or more (lazy)
+? One or more (lazy)
?? Zero or one (lazy)
The difference matters when you're extracting data.
Alternation - OR Logic
The pipe character means "or":
Pattern: cat|dog
Matches: "cat" or "dog"
Pattern: gray|grey
Matches: Both spellings
You can have multiple options:
Pattern: red|green|blue
Matches: Any of the three colors
Grouping with Parentheses
Parentheses group patterns together. This is useful for:
1. Applying quantifiers to groups:
Pattern: (ab)+
Matches: ab, abab, ababab (one or more "ab")
2. Creating alternation groups:
Pattern: gr(a|e)y
Matches: gray or grey
3. Capturing for later use (more on this in the rename section):
Pattern: (hello) (world)
Captures: "hello" in group 1, "world" in group 2
Non-Capturing Groups
Sometimes you need grouping but don't need to capture:
Pattern: (?:ab)+
Groups "ab" but doesn't capture it.
Use (?: ) when you only need grouping for logic, not extraction.
Word Boundaries
The \b anchor matches the boundary between a word and non-word character.
Pattern: \bcat\b
Matches: "cat" but NOT "category" or "concatenate"
Without word boundaries:
Pattern: cat
Matches: cat, category, concatenate (anywhere "cat" appears)
Word boundaries are essential for precise matching.
Putting It Together
Complex pattern example - email-like strings:
Pattern: \w+@\w+\.\w+
Matches: Simple email patterns
Let's break it down:
\w+ One or more word characters (username)
@ Literal @ sign
\w+ One or more word characters (domain)
\. Literal dot
\w+ One or more word characters (TLD)
This isn't a production email validator, but it illustrates how patterns combine.
Another Example - Date Patterns
Matching YYYY-MM-DD format:
Pattern: \d{4}-\d{2}-\d{2}
Matches: 2024-01-15, 2023-12-31
More flexible (allows single digits):
Pattern: \d{4}-\d{1,2}-\d{1,2}
Matches: 2024-1-5, 2024-01-15
Quick Reference - What We've Learned
Building Blocks:
. Any character
[abc] Character class
[^abc] Negated class
\d \w \s Digit, word char, whitespace
^ $ Start/end anchors
\b Word boundary
Quantifiers:
* Zero or more
+ One or more
? Zero or one
{n} Exactly n
{n,m} n to m times
*? +? ?? Non-greedy versions
Grouping:
(...) Capture group
(?:...) Non-capture group
| Alternation (OR)