Techalicious Academy / 2026-02-19-advanced-regex

(Visit our meetup for more great tutorials)

REGEX FUNDAMENTALS - THE QUICK VERSION

This is the foundation everything else tonight builds on. If you were at Regex Therapy, this is a fast refresher. If this is your first time with pattern matching, this gets you up to speed.

Atoms and Left-to-Right Matching

A regex pattern is made of "atoms." An atom is the smallest unit the engine can match. A literal character is an atom. A character class is an atom. A group is an atom. The engine takes your pattern and breaks it into these atoms, then works through them left to right.

Here's what actually happens when the engine matches the pattern "cat" against the text "the cat sat":

Text:    t h e   c a t   s a t
Position: 0 1 2 3 4 5 6 7 8 9

Position 0: Try 'c' against 't' → no match, slide forward
Position 1: Try 'c' against 'h' → no match, slide forward
Position 2: Try 'c' against 'e' → no match, slide forward
Position 3: Try 'c' against ' ' → no match, slide forward
Position 4: Try 'c' against 'c' → match! Try 'a' against position 5
Position 5: Try 'a' against 'a' → match! Try 't' against position 6
Position 6: Try 't' against 't' → match! All atoms matched!

Result: "cat" found at position 4

The engine is persistent. It tries every starting position in the text until it finds a match or runs out of text. This is the fundamental mechanism behind everything we do tonight. Every advanced feature we cover is built on top of this left-to-right, atom-by-atom process.

Character Classes

Square brackets define a set of allowed characters. The engine matches exactly one character from the set.

[aeiou]       One vowel
[abc]         The letter a, b, or c

Ranges use a dash inside the brackets:

[a-z]         Any lowercase letter
[A-Z]         Any uppercase letter
[0-9]         Any digit
[a-zA-Z]      Any letter, upper or lower
[a-zA-Z0-9]   Any letter or digit

You can mix ranges and individual characters:

[a-z0-9_]     Lowercase letter, digit, or underscore

Negate with a caret at the start:

[^0-9]        Anything that is NOT a digit
[^aeiou]      Anything that is NOT a vowel

Shorthand classes save typing:

\d    Digit           same as [0-9]
\D    Not a digit     same as [^0-9]
\w    Word character  same as [a-zA-Z0-9_]
\W    Not word char   same as [^a-zA-Z0-9_]
\s    Whitespace      space, tab, newline
\S    Not whitespace

Try it:

echo "Hello World 123" | grep -Po '[A-Z][a-z]+'

That matches a capital letter followed by one or more lowercase letters. Result: "Hello" and "World".

Meta Characters

These characters have special meaning in regex. They're the grammar of the language. When the engine sees them, it doesn't match them literally. It interprets them.

.     Any single character (except newline by default)
^     Start of line or string
$     End of line or string
*     Zero or more of the previous atom
+     One or more of the previous atom
?     Zero or one of the previous atom (makes it optional)
|     Alternation (OR)
( )   Grouping and capturing
[ ]   Character class
{ }   Quantifier bounds like {3} or {2,5}
\     Escape character

When you need to match a literal meta character, escape it with a backslash:

\.    A literal dot
\$    A literal dollar sign
\^    A literal caret
\(    A literal opening parenthesis
\\    A literal backslash

Example: matching a filename with a real dot:

photo\.jpg    matches "photo.jpg" (not "photoxjpg")
price: \$\d+  matches "price: $50"

Alternation

The pipe character | means "or." The engine tries the left side first. If that doesn't match, it tries the right side.

cat|dog                 matches "cat" or "dog"
red|green|blue          matches any of the three
gray|grey               matches both spellings

Scope matters. The pipe applies to everything on each side unless you use parentheses to contain it:

cat|dog food            matches "cat" OR "dog food" (the whole string)
(cat|dog) food          matches "cat food" OR "dog food"

A cleaner way to handle spelling variations:

gr(a|e)y                matches "gray" or "grey"
colo(u|)r               matches "color" or "colour"

You can build entire day-of-week matchers:

(Mon|Tues|Wednes|Thurs|Fri|Satur|Sun)day

+-------------------------------------------------------+
|  If all of this is comfortable, you're ready for      |
|  tonight. If any of it feels fuzzy, the Regex         |
|  Therapy tutorial on techalicious.academy covers      |
|  every piece in detail. Grab it for five bucks.       |
+-------------------------------------------------------+