Python RegEx
RegEx is short for regular expression. It is a sequence of characters that defines a search pattern. RegEx is used to find the specified pattern in a string.
RegEx Module in Python
In Python, RegEx is available as a built-in module called re. Import the re module to use RegEx in your code.
Example
import re
Use RegEx in Python
After importing the RegEx module, you can use it's functions to find any pattern in a string.
Example
Here is an example of searching the word "code is" in a string:
import re
text = "It is a sequence of numbers from 0 to 9. The code is 9091."
match = re.search(r'\bcode is\b',text)
if match:
print("Found")
else:
print("Not Found")
Output
Example
Following is an example to extract the specified word "code is 9091" from the string:
import re
text = "It is a sequence of numbers from 0 to 9. The code is 9091."
result = re.search(r'\bcode is \d+\b',text).group(0)
print(result)
Output
RegEx Functions
The re module provides several functions to find the specified pattern in a string.
The compile() Function
The compile(pattern, repl, string) function compiles a regular expression and return a Pattern object.
Example
import re
text = "The code is 9091. The code is 9091."
pattern = "9091"
pattern_obj = re.compile(pattern)
print(pattern_obj.findall(text))
Output
The findall() Function
The findall(pattern, string, flags=0) function returns a list of all non-overlapping matches of pattern in string.
Example
import re
text = "It is a sequence of numbers from 0 to 9. The code is 9091."
pattern = "\d+"
result = re.findall(pattern,text)
print(result)
The search() Function
The search(pattern, string, flags=0) function scans a string looking for a match to the pattern and returns a Match object if match is found or None if no match is found.
Example
import re
text = "It is a sequence of numbers from 0 to 9. The code is 9091."
pattern = "\d+"
result = re.search(pattern,text)
print(result)
The split() Function
The split(pattern, string, maxsplit=0, flags=0) function splits the source string at each match of the pattern and returns the resulting list of substrings.
Example
import re
text = "mango1apple2grapes3orange4pear"
pattern = "\d+"
result = re.split(pattern,text)
print(result)
The sub() Function
The sub(pattern, repl, string, count=0, flags=0) function replaces the leftmost non-overlapping occurrences of the pattern in string and returns the resulting string. The unchanged string is returned if the pattern is not found.
Example
import re
text = "mango1apple2grapes3orange4pear"
pattern = "\d+"
repl = " "
result = re.sub(pattern,repl,text)
print(result)
RegEx Metacharacters
Metacharacters are characters with special meaning. Following are the list of RegEx metacharacters in Python:
Character | Description | Example |
. | This (dot) matches any character except a new line. | "sec..t" |
^ | This (caret) matches the start of the string. | "^complete" |
$ | This matches the end of the string. | "secret$" |
[ ] | This matches a set of characters. | "[A-Za-z]" |
+ | This matches 1 or more occurrence of the preceding regex pattern. | "[A-Za-z]+" |
* | This matches 0 or more occurrence of the preceding pattern. | "[A-Za-z]*" |
{ } | This matches the exact number of occurrences of the preceding pattern. | "[a-z]{3,10}" |
| | This is regex or condition. | "a|b" |
(re) | This matches a pattern as a group. | "(hello)" |
RegEx Sequence Characters
RegEx uses the \ character to allow special characters to be used without invoking their special meaning. Following are the list of RegEx Backslash Characters in Python:
Character | Description | Example |
\A | Matches the specified pattern at the beginning. | "\Ahello" |
\b | Matches the specified pattern at the beginning or at the end of a word. The use of "r" at the beginning treats the string to be used as a raw string. |
r"\bhello"
r"hello\b" |
\B | Matches if the specified pattern is not at the beginning or at the end of a word. The use of "r" at the beginning treats the string to be used as a raw string. |
r"\Bhello"
r"hello\B" |
\d | Matches digits in a string. | "\d" |
\D | Matches if a string does not contain digits. | "\D" |
\s | Matches whitespaces in a string. | "\s" |
\S | Matches if a string does not contain a whitespace. | "\s" |
\w | Matches a word in a string | "\w" |
\W | Matches if a string does not contain a word. | "\W" |
\Z | Matches the specified pattern at the end of a string. | "buddy\Z" |