Auto-generating Regular expressions from a wordlists with Python

cr0hn
1 min readAug 12, 2022

Making regular expressions (REGEX) is sometimes an arduous task.

If we need to create a REGEX that matches a set of words, we need to analyse them and include all the necessary conditions to match them.

It’s very common in Web Apps, filters, or something else that has to match if a word matches with a set of allowed words.

Let’s say you want to detect lousy language. You can write something like:

import re

bad_words_regex = re.compile(r'''([bB][aA][sS][tT][aA][rR][dD]|[fF][uU][cC][kK])''')
word = input("Enter your word")

if bad_words_regex.match(word) is not None:
print("Moderate your language!")
else:
print("Good boy")

Writing REGEX for a small wordlist is ok, but it is not manageable for an large set of words.

Trieregex is a project that automatically uses a Trie to generate complex REGEX from a set of words.

import re

from trieregex import TrieRegEx as TRE

words = ['bastard', 'fuck', 'losser']

# Add word(s)
tre = TRE(*words) # word(s) can be added upon instance

# Create regex pattern from the trie
regex_pattern = tre.regex()

# Add boundary context and compile for matching
bad_words_regex = re.compile(f'\\b{regex_pattern}\\b')

word = input("Enter your word")

if bad_words_regex.match(word) is not None:
print("Moderate your language!")
else:
print("Good boy")

Referencies

--

--

cr0hn

Cybersecurity is a tricky business. I’m a freelancer helping companies avoid nasty surprises