Python Regular Expression

Regular Expressions in Python

A Regular Expression (regex or regexp) is a powerful tool for matching patterns in strings. Python provides the re module to work with regular expressions, enabling the search, match, and manipulation of text based on patterns. Regular expressions allow complex text searches and text manipulations in a concise manner.

Key Features of Regular Expressions

Pattern Matching: Regular expressions use a combination of literal characters and special symbols (called metacharacters) to define search patterns.
Flexible Search: They allow searching for specific patterns within text, such as words, numbers, or characters in any order or structure.
Text Manipulation: Beyond searching, regular expressions can be used to replace, extract, or split parts of text.
Validation: Regular expressions can be used to validate formats, such as email addresses, phone numbers, or postal codes.

Commonly Used Metacharacters in Regular Expressions

.: Matches any character except a newline.
^: Matches the start of a string.
$: Matches the end of a string.
*: Matches 0 or more repetitions of the preceding character.
+: Matches 1 or more repetitions of the preceding character.
?: Matches 0 or 1 occurrence of the preceding character.
[]: Denotes a set of characters to match.
|: Acts as an OR operator.
\d: Matches any digit (0-9).
\D: Matches any non-digit character.
\w: Matches any word character (alphanumeric + underscore).
\W: Matches any non-word character.
\s: Matches any whitespace (space, tab, newline).
\S: Matches any non-whitespace character.

Basic Functions in Python's `re` Module

re.search(): Searches for the first occurrence of the pattern in the string.
re.match(): Checks if the beginning of the string matches the pattern.
re.findall(): Returns all occurrences of the pattern in the string.
re.sub(): Replaces occurrences of a pattern with a specified string.
re.split(): Splits a string based on the occurrences of a pattern.
re.compile(): Compiles a regular expression pattern for reuse.

Example of Regular Expressions in Python

import re

# Sample text
text = "The rain in Spain falls mainly on the plain."

# 1. Search for the word "rain" in the text
search_result = re.search(r"rain", text)
if search_result:
    print(f"Found: {search_result.group()}")  # Output: Found: rain

# 2. Find all words that start with "S" or "s"
find_all_result = re.findall(r"\b[Ss]\w+", text)
print(find_all_result)  # Output: ['Spain']

# 3. Replace the word "rain" with "snow"
replace_result = re.sub(r"rain", "snow", text)
print(replace_result)  # Output: The snow in Spain falls mainly on the plain.

# 4. Split the string at each whitespace
split_result = re.split(r"\s", text)
print(split_result)  # Output: ['The', 'rain', 'in', 'Spain', 'falls', 'mainly', 'on', 'the', 'plain.']

Explanation of the Example

re.search(): This searches for the first occurrence of the word "rain" in the string. If found, it prints the match.
re.findall(): This finds all words that start with "S" or "s" in the text, where \b represents a word boundary and \w+ matches a sequence of word characters.
re.sub(): This replaces all occurrences of the word "rain" with "snow" in the string.
re.split(): This splits the string at each space, returning a list of words.

Common Use Cases of Regular Expressions

Validating User Input:

Example: Ensuring an email address has the correct format.

pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
email = "example@example.com"
if re.match(pattern, email):
    print("Valid email address")
else:
    print("Invalid email address")

Finding Specific Patterns:

Example: Extracting all phone numbers from a document.

text = "Contact me at 123-456-7890 or 987-654-3210"
phone_numbers = re.findall(r"\d{3}-\d{3}-\d{4}", text)
print(phone_numbers)  # Output: ['123-456-7890', '987-654-3210']

Replacing Text:

Example: Replacing all dates in a specific format (MM/DD/YYYY) with a new format.

text = "Today's date is 12/25/2021."
new_text = re.sub(r"(\d{2})/(\d{2})/(\d{4})", r"\2-\1-\3", text)
print(new_text)  # Output: Today's date is 25-12-2021.

Summary

Regular expressions are used for pattern matching and text manipulation.
Python's re module provides functions like search(), match(), findall(), sub(), and split() to work with regular expressions.
Metacharacters such as ., *, +, and [] are used to define search patterns.
Regular expressions are widely used in tasks like data validation, string searching, text extraction, and text manipulation.

If you have any further questions or need more examples, feel free to ask!

Next Previous

Getting Started

What is python

Python History

Python Setting up Development environment

Python Tutorial

Python write your first program

Python Basic Syntax

Python Indentation

Python Comments

Python Variables

Python Data Types

Python Type checking

Python Type Conversion

Python Mutable and Immutable

Python int

Python float

Python Complex

Python Strings

Python Boolean

Python Data Structures

Python Lists

Python Tuples

Python Dictionaries

Python Sets

Python Arrays

Python Operators

Python Arithmetic Operators

Python Comparison Operators

Python Logical Operators

Python Assignment Operators

Python Bitwise Operators

Python Identity Operators

Python Membership Operators

Python Control Flow Statements

Python if elif else

Python Switch Statement

Python Ternary Operators

Python Nested Conditional Statements

Python Loops

Python for Loop in Python

Python while loop

Python break, continue and pass

Python Nested Loops

Python functions

Python Defining a function

Python Function with Parameters

Python Function with Return Value

Python Function with Default Parameter Values

Python Variable-Length Arguments

Python Lambda Functions

Python Scope of Variables

Python Docstrings

Python Modules and Packages

Python Creating Your Own Modules

Python Standard Library Modules

Python Using Third-Party Libraries

Python File Handling

Python Reading Files

Python Writing to Files

Python Working with CSV Files

Python File Handling Functions

Python Exception Handling

Python Context Managers

Python dict.get() method

Python OOP

Python OOP Overview

Python Classes

Python Objects

Python Attributes

Python Methods

Python Inheritance

Python Polymorphism

Python Encapsulation

Python Abstract Classes

Python Interfaces

Python instance and class variables

Python Instance Methods

Python Class Methods

Python Static Methods

Python Method Overloading