Auto-generating Regular expressions from a wordlists with Python

Making regular expressions (REGEX) is sometimes an arduous task.

If we need to create a REGEX that matches a set of words, we need to analyse them and include all the necessary conditions to match them.

It’s very common in Web Apps, filters, or something else that has to match if a word matches with a set of allowed words.

Let’s say you want to detect lousy language. You can write something like:

import re

bad_words_regex = re.compile(r'''([bB][aA][sS][tT][aA][rR][dD]|[fF][uU][cC][kK])''')
word = input("Enter your word")

if bad_words_regex.match(word) is not None:
print("Moderate your language!")
else:
print("Good boy")

Writing REGEX for a small wordlist is ok, but it is not manageable for an large set of words.

is a project that automatically uses a Trie to generate complex REGEX from a set of words.

import re

from trieregex import TrieRegEx as TRE

words = ['bastard', 'fuck', 'losser']

# Add word(s)
tre = TRE(*words) # word(s) can be added upon instance

# Create regex pattern from the trie
regex_pattern = tre.regex()

# Add boundary context and compile for matching
bad_words_regex = re.compile(f'\\b{regex_pattern}\\b')

word = input("Enter your word")

if bad_words_regex.match(word) is not None:
print("Moderate your language!")
else:
print("Good boy")

Referencies

--

--

REST API Cybersecurity and Hacking & Python Architect. +100 GitHub projects. Speaker

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
cr0hn

REST API Cybersecurity and Hacking & Python Architect. +100 GitHub projects. Speaker