Database Regex Performance

Mastering Regular Expressions Comparing regex performance between MongoDB, MySQL and PostgreSQL

MongoDB has a very rich regular expression (regex) interface. Regex is available in many of the places that ordinary character strings are accepted. While convenient to use, I had some questions about how MongoDB regex performance stacks up to regex performance in other (SQL) Databases.

To run the examples in this post, clone the github repository at https://github.com/dwilkins/mongoregexp and follow the instructions in the README.md file

Simplicity

MongoDB regexes are simple to use. Instead of using double quotes, use forward slashes to delimit strings to match:

db.loglines.find( { logline: "66.249.65.20" ,$comment: "No Regex"}).count();
db.loglines.find( { logline: /^66\.249\.65\.20/ ,$comment: "Using Regex"}).count();

The difference is, double quotes require a full, exact string match. Forward slash (regex) will match rows where the string matches the pattern given. If no special regex characters are given columns that contain the search string anywhere in the column are considered positive matches.

The first query will only find rows that exactly match the IP Address of "66.249.65.20" The second query will find rows that match that IP address at the beginning of the line and with any (or none) characters following. The "^" character is an anchor which means that it helps the regex engine restrict the search to the beginning of a string.

MySQL and PostgreSQL regexes are just as simple to use:

MySQL

select count(1) into @rows from loglines where logline regexp '^66\.249\.65\.20';

PostgreSQL

select 'bolanchor' as Query,count(1) from loglines where logline ~ '^66\.249\.65\.20';

Notice that MySQL uses regexp instead of = . PostgreSQL uses the ~ (tilde) character instead of =

Tested Queries

I tested three different queries for performance on MongoDB, MySQL and PostgreSQL:

Regex query strings used to compart MongoDB, MySQL and PostgreSQL
Query	MongoDB Pattern	MySQL/PostgreSQL Pattern
bolanchor	`/^66\.249\.65\.20/`	`^66\.249\.65\.20`
msiecount	`/MSIE [0-9]{1,}[\.0-9]{0,}/i`	`MSIE [0-9]{1,}[\.0-9]{0,}`
eolanchor	`/baidu\.com\/search\/spider\.html\)"$/`	`baidu\.com\/search\/spider\.html\)"$`

bolanchor - Regex Query with a beginning of line anchor - only matches at the beginning of the line
msiecount - Regex Query to select lines with matching characters in the middle of the line
bolanchor - Regex Query with a end of line anchor - only matches at the end of the line

Results:

Millisecond timings of regex queries performed in MongoDB, MySQL and PostgreSQL
Query	Rows	MongoDB ms	PostgreSQL ms	MySQL ms	Winner
bolanchor	128	295ms	2768ms	7769ms	MongoDB
msiecount	371217	8512ms	5278ms	48027ms	PostgreSQL ¹
eolanchor	566760	3190ms	15892ms	61951ms	MongoDB

So, MongoDB wins handily in two cases. PostgreSQL makes a mostly respectable showing, but MySQL (MariaDB in this test) brings up a distant third place.

Database Regex Performance

Simplicity

Tested Queries

Results:

Comments

recent posts

archives

twitter

found on