Python Regular Expressions

1. What are Regular Expressions?

Regular Expressions (Regex) are patterns used to match and manipulate text.
They are a powerful tool for searching, extracting, and replacing text based on specific patterns.
Python provides the re module for working with regular expressions.

2. Basic Regex Syntax

1. Literal Characters

Match exact characters in the text.
Example: The regex cat matches the string "cat".

2. Metacharacters

Special characters with specific meanings in regex:
- . : Matches any single character except newline.
- ^ : Matches the start of a string.
- $ : Matches the end of a string.
- * : Matches 0 or more repetitions of the preceding character.
- + : Matches 1 or more repetitions of the preceding character.
- ? : Matches 0 or 1 repetition of the preceding character.
- {m,n} : Matches between m and n repetitions of the preceding character.
- [] : Matches any single character within the brackets.
- | : Acts as an OR operator.
- () : Groups patterns together.

Examples:

a.b matches "aab", "acb", but not "ab".
^abc matches "abc" at the start of a string.
xyz$ matches "xyz" at the end of a string.

3. Special Sequences

\d : Matches any digit (0-9).
\D : Matches any non-digit.
\w : Matches any word character (a-z, A-Z, 0-9, _).
\W : Matches any non-word character.
\s : Matches any whitespace character (space, tab, newline).
\S : Matches any non-whitespace character.
\b : Matches a word boundary.
\B : Matches a non-word boundary.

Examples:

\d{3} matches any 3 digits (e.g., "123").
\w+ matches one or more word characters (e.g., "hello").

3. Using the `re` Module

1. `re.match()`

Checks if the regex matches at the beginning of the string.
Returns a match object if found, otherwise None.

Example:

import re
result = re.match(r"hello", "hello world")
print(result.group())  # Output: hello

2. `re.search()`

Searches the entire string for a match.
Returns a match object if found, otherwise None.

Example:

import re
result = re.search(r"world", "hello world")
print(result.group())  # Output: world

3. `re.findall()`

Returns all non-overlapping matches of the regex in the string as a list.

Example:

import re
result = re.findall(r"\d+", "There are 3 apples and 5 oranges.")
print(result)  # Output: ['3', '5']

4. `re.finditer()`

Returns an iterator yielding match objects for all matches.

Example:

import re
matches = re.finditer(r"\d+", "There are 3 apples and 5 oranges.")
for match in matches:
    print(match.group())  # Output: 3, 5

5. `re.sub()`

Replaces all occurrences of the regex pattern in the string with a replacement string.

Example:

import re
result = re.sub(r"\d+", "X", "There are 3 apples and 5 oranges.")
print(result)  # Output: There are X apples and X oranges.

6. `re.split()`

Splits the string by the occurrences of the regex pattern.

Example:

import re
result = re.split(r"\s+", "Split this sentence.")
print(result)  # Output: ['Split', 'this', 'sentence.']

4. Regex Groups

Use parentheses () to create groups in a regex.
Groups allow you to extract specific parts of a match.

Example:

import re
result = re.search(r"(\d{2})-(\d{2})-(\d{4})", "Date: 12-31-2023")
print(result.group(1))  # Output: 12 (day)
print(result.group(2))  # Output: 31 (month)
print(result.group(3))  # Output: 2023 (year)

5. Named Groups

Assign names to groups using (?P<name>...) syntax.

Example:

import re
result = re.search(r"(?P<day>\d{2})-(?P<month>\d{2})-(?P<year>\d{4})", "Date: 12-31-2023")
print(result.group("day"))   # Output: 12
print(result.group("month")) # Output: 31
print(result.group("year"))  # Output: 2023

6. Additional Examples

Matching Names:

import re
names = ["Raj", "Ram", "Anand", "Bala", "Karthik"]
pattern = r"^R\w+"  # Names starting with 'R'
matches = [name for name in names if re.match(pattern, name)]
print(matches)  # Output: ['Raj', 'Ram']

Extracting Phone Numbers:

import re
text = "Contact Raj at 123-456-7890 or Bala at 987-654-3210."
phone_numbers = re.findall(r"\d{3}-\d{3}-\d{4}", text)
print(phone_numbers)  # Output: ['123-456-7890', '987-654-3210']

Replacing Text:

import re
text = "Hello Raj, how are you Raj?"
new_text = re.sub(r"Raj", "Ram", text)
print(new_text)  # Output: Hello Ram, how are you Ram?

Splitting Text:

import re
text = "Karthik,Suresh,Sathish"
names = re.split(r",", text)
print(names)  # Output: ['Karthik', 'Suresh', 'Sathish']

7. Best Practices

Use raw strings (r"...") for regex patterns to avoid escaping backslashes.
Test regex patterns using tools like regex101.com.
Use comments and verbose mode (re.VERBOSE) for complex regex patterns.

Example:

import re
pattern = re.compile(r"""
    \b       # Word boundary
    \d{3}    # 3 digits
    -        # Hyphen
    \d{3}    # 3 digits
    -        # Hyphen
    \d{4}    # 4 digits
    \b       # Word boundary
""", re.VERBOSE)

​1. What are Regular Expressions?

​2. Basic Regex Syntax

​1. Literal Characters

​2. Metacharacters

​3. Special Sequences

​3. Using the re Module

​1. re.match()

​2. re.search()

​3. re.findall()

​4. re.finditer()

​5. re.sub()

​6. re.split()

​4. Regex Groups

​5. Named Groups

​6. Additional Examples

​7. Best Practices

1. What are Regular Expressions?

2. Basic Regex Syntax

1. Literal Characters

2. Metacharacters

3. Special Sequences

3. Using the `re` Module

1. `re.match()`

2. `re.search()`

3. `re.findall()`

4. `re.finditer()`

5. `re.sub()`

6. `re.split()`

4. Regex Groups

5. Named Groups

6. Additional Examples

7. Best Practices