REGEX FUNDAMENTALS - THE BUILDING BLOCKS
Before we touch any tools, let's learn the alphabet of regex. These are the characters that have special meaning.
Literal Characters
Most characters match themselves. The pattern "cat" matches the text "cat". Simple.
Pattern: cat
Matches: "The cat sat on the mat"
^^^
No magic here. Letters match letters.
The Dot - Any Single Character
The dot (.) matches ANY single character. It's the wildcard.
Pattern: c.t
Matches: cat, cot, cut, c9t, c!t
The dot doesn't care what the character is. One dot = one character.
Pattern: h.t
Matches: hat, hit, hot, hut, h@t
Real example - finding variations:
Pattern: gr.y
Matches: gray, grey
Character Classes - Pick from a List
Square brackets let you specify which characters are allowed.
Pattern: [aeiou]
Matches: Any single vowel
Pattern: gr[ae]y
Matches: gray OR grey (not both at once)
You can use ranges with a dash:
Pattern: [a-z] Any lowercase letter
Pattern: [A-Z] Any uppercase letter
Pattern: [0-9] Any digit
Pattern: [a-zA-Z] Any letter (upper or lower)
Multiple ranges work too:
Pattern: [a-zA-Z0-9]
Matches: Any letter or digit
Negation - NOT these characters:
Pattern: [^0-9]
Matches: Anything that's NOT a digit
The caret (^) at the start of a character class means "not".
Shorthand Classes - The Common Ones
Typing [0-9] gets old. These shortcuts exist:
\d Any digit Same as [0-9]
\D NOT a digit Same as [^0-9]
\w Word character Same as [a-zA-Z0-9_]
\W NOT word character Same as [^a-zA-Z0-9_]
\s Whitespace Space, tab, newline
\S NOT whitespace Anything except whitespace
Examples:
Pattern: \d\d\d
Matches: Any three digits (123, 456, 789)
Pattern: \w+
Matches: One or more word characters
Anchors - Position in the Text
These don't match characters. They match positions.
^ Start of line
$ End of line
Examples:
Pattern: ^Hello
Matches: "Hello world" but NOT "Say Hello"
Pattern: world$
Matches: "Hello world" but NOT "world peace"
Pattern: ^Hello$
Matches: Only a line containing exactly "Hello"
Anchors are crucial for precision. Without them, patterns match anywhere in the text.
Escaping Special Characters
What if you want to match an actual dot? Or a dollar sign?
Use backslash to escape special characters:
\. Literal dot
\$ Literal dollar sign
\^ Literal caret
\[ Literal opening bracket
\\ Literal backslash
Example - matching a filename:
Pattern: photo\.jpg
Matches: "photo.jpg" (not "photoxjpg")
The Special Characters Summary
These characters have special meaning in regex:
. Any character
^ Start of line (or NOT inside [^...])
$ End of line
[ ] Character class
\ Escape character
| OR (alternation)
( ) Grouping
* Zero or more
+ One or more
? Zero or one
{ } Specific count
When you want the literal character, escape it with backslash.
Practice - Read These Patterns
Before moving on, make sure you understand these:
Pattern Matches
------ -------
a.c abc, aXc, a9c (any single char between a and c)
[abc] a, b, or c (just one)
[^abc] Anything except a, b, or c
\d Any single digit
^start "start" at beginning of line
end$ "end" at end of line
file\.txt Literally "file.txt"