|
Getting your Trinity Audio player ready...
|
Have you ever needed to search for a pattern in text, like finding all emails, phone numbers, or dates in a file?
That’s exactly what Regular Expressions in Python — or Regex — help you do!

They’re like supercharged search tools that let you match patterns instead of typing exact words.
If you’ve ever used Ctrl + F to find something, think of Regex as its smarter, more powerful cousin.
What Is a Regular Expression?
A Regular Expression (Regex) is a special sequence of characters that helps you match, find, or manipulate strings using patterns.
Python provides this functionality through the re module.
Let’s start by importing it:
import re
Why Use Regular Expressions in Python?
Regular expressions are used for:
- Validating user inputs (like email, phone number, or password)
- Searching and extracting text patterns
- Replacing or formatting data
- Data cleaning in text analytics or data science
Basic Functions in re Module
Let’s explore the most useful functions from the re module.
| Function | Description | Example |
|---|---|---|
re.match() | Checks for a match only at the beginning of the string | re.match("Hello", "Hello World") |
re.search() | Searches the entire string for a match | re.search("World", "Hello World") |
re.findall() | Returns a list of all matches | re.findall("\d+", "There are 12 apples and 5 mangoes") |
re.split() | Splits a string by the matched pattern | re.split("\s", "Python is fun") |
re.sub() | Replaces all matches with a new string | re.sub("\d", "#", "A1B2C3") |
re.compile() | Compiles a regex pattern for reuse | pattern = re.compile("\d+") |
Also Read: JSON Module in Python
Regex Meta Characters (The Building Blocks)
Meta characters are symbols with special meanings in Regex.
| Symbol | Description | Example |
|---|---|---|
. | Matches any character (except newline) | re.search("P.thon", "Python") |
^ | Matches start of string | re.match("^Hello", "Hello World") |
$ | Matches end of string | re.search("World$", "Hello World") |
* | Matches 0 or more occurrences | re.findall("ab*", "a ab abb abbb") |
+ | Matches 1 or more occurrences | re.findall("ab+", "a ab abb abbb") |
? | Matches 0 or 1 occurrence | re.findall("ab?", "a ab abb abbb") |
{n} | Exactly n repetitions | re.findall("a{3}", "aa aaaa aaa") |
{n,} | At least n repetitions | re.findall("a{2,}", "aa aaaa aaa") |
{n,m} | Between n and m repetitions | re.findall("a{2,4}", "a aa aaa aaaa") |
[] | Matches any one character in brackets | [aeiou] matches any vowel |
| ` | ` | Acts as OR operator |
() | Groups expressions | (ab)+ matches repeated “ab” patterns |
Special Sequences in Regex
| Code | Description | Example |
|---|---|---|
\d | Any digit (0–9) | re.findall("\d", "A1B2C3") → ['1', '2', '3'] |
\D | Non-digit characters | re.findall("\D", "A1B2") → ['A', 'B'] |
\s | Whitespace (space, tab, newline) | re.findall("\s", "Python is fun") |
\S | Non-whitespace | re.findall("\S", "Python is fun") |
\w | Alphanumeric (letters, digits, _) | re.findall("\w", "A_B1!") |
\W | Non-alphanumeric | re.findall("\W", "A_B1!") |
\b | Word boundary | re.findall(r"\bword\b", "word world sword") |
\B | Non-word boundary | re.findall(r"\Bword\B", "password") |
Examples of Common Use Cases
1. Validate an Email Address
import re
email = "user123@gmail.com"
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-z]{2,}$'
if re.match(pattern, email):
print("✅ Valid email address")
else:
print("❌ Invalid email")
Output:
✅ Valid email address
2. Extract All Phone Numbers from Text
text = "Call me at 9876543210 or 9123456789"
phones = re.findall(r'\b\d{10}\b', text)
print(phones)
Output:
['9876543210', '9123456789']
3. Find All Capital Words
sentence = "Python is Fun and POWERFUL"
caps = re.findall(r'\b[A-Z]{2,}\b', sentence)
print(caps)
Output:
['POWERFUL']
4. Replace All Digits with “#”
data = "Order ID: 12345"
cleaned = re.sub(r'\d', '#', data)
print(cleaned)
Output:
Order ID: #####
5. Split a String by Multiple Delimiters
text = "apple,banana;grape orange"
fruits = re.split(r'[;,\s]+', text)
print(fruits)
Output:
['apple', 'banana', 'grape', 'orange']
6. Extract Dates from a Paragraph
text = "Meeting on 25-12-2025 and 01/01/2026."
dates = re.findall(r'\b\d{2}[-/]\d{2}[-/]\d{4}\b', text)
print(dates)
Output:
['25-12-2025', '01/01/2026']
Using re.compile() for Reusability
Instead of writing the pattern every time, you can compile it once:
pattern = re.compile(r'\d{10}')
if pattern.search("My number is 9876543210"):
print("Phone number found!")
Output:
Phone number found!
✅ Why use it?
It makes code faster and cleaner when you reuse the same regex multiple times.
Flags in RegEx
Regex flags let you modify the behavior of pattern matching.
| Flag | Description | Example |
|---|---|---|
re.IGNORECASE or re.I | Case-insensitive match | re.findall(r'python', 'PYTHON rocks', re.I) |
re.MULTILINE or re.M | ^ and $ match start/end of each line | re.findall('^Hello', text, re.M) |
re.DOTALL or re.S | . matches newline too | re.findall('a.*b', 'a\nb', re.S) |
re.VERBOSE or re.X | Allows multi-line regex with comments | For complex patterns |
Project: Text Data Extractor
Let’s make a mini project using everything we learned.
Problem:
We have a text file with messy data (emails, phone numbers, and dates).
We need to extract and clean this information using Regex.
Code:
import re
# Read text from file
with open("data.txt", "r") as file:
text = file.read()
# Define patterns
email_pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-z]{2,}'
phone_pattern = r'\b\d{10}\b'
date_pattern = r'\b\d{2}[-/]\d{2}[-/]\d{4}\b'
# Extract data
emails = re.findall(email_pattern, text)
phones = re.findall(phone_pattern, text)
dates = re.findall(date_pattern, text)
# Display results
print("Emails Found:", emails)
print("Phone Numbers:", phones)
print("Dates Found:", dates)
Read More: File Handling in Python
Code Explanation
re.findall()→ Scans through the file and extracts all matches.- We used different regex patterns for emails, phones, and dates.
- This kind of text extraction is common in data cleaning, NLP, and web scraping.
Quick Regex Cheat Sheet
| Pattern | Description | Example |
|---|---|---|
\d | Digit | 5 |
\w | Word character | A, b, 3 |
\s | Whitespace | Space, tab, newline |
. | Any character except newline | a.b → matches acb |
[a-z] | Lowercase letters | a to z |
[A-Z] | Uppercase letters | A to Z |
[0-9] | Digits | 5, 9 |
[^abc] | Not a, b, or c | d, e |
^pattern | Pattern at start | ^Hello |
pattern$ | Pattern at end | World$ |
| `pattern1 | pattern2` | Either pattern |
Final Thoughts
Regular Expressions in Python may look tricky at first — but once you understand their logic, they become an indispensable tool for data validation, cleaning, and automation.
You’ve just learned how to:
- Search, match, and replace text using patterns
- Validate emails, phone numbers, and dates
- Use flags, groups, and special sequences
- Build a real-world data extraction project
Keep practicing with real examples — the more you use Regex, the more natural it becomes.
What’s Next?
In the next post, we’ll learn about the Requests Module in Python