Database Regex Performance

- - posted in Code Snippets, Database, MariaDB, MongoDB, MySQL, PostgreSQL

Mastering Regular Expressions Comparing regex performance between MongoDB, MySQL and PostgreSQL

MongoDB has a very rich regular expression (regex) interface. Regex is available in many of the places that ordinary character strings are accepted. While convenient to use, I had some questions about how MongoDB regex performance stacks up to regex performance in other (SQL) Databases.

To run the examples in this post, clone the github repository at https://github.com/dwilkins/mongoregexp and follow the instructions in the README.md file

Simplicity

MongoDB regexes are simple to use. Instead of using double quotes, use forward slashes to delimit strings to match:

1
2
db.loglines.find( { logline: "66.249.65.20" ,$comment: "No Regex"}).count();
db.loglines.find( { logline: /^66\.249\.65\.20/ ,$comment: "Using Regex"}).count();

The difference is, double quotes require a full, exact string match. Forward slash (regex) will match rows where the string matches the pattern given. If no special regex characters are given columns that contain the search string anywhere in the column are considered positive matches.

The first query will only find rows that exactly match the IP Address of "66.249.65.20" The second query will find rows that match that IP address at the beginning of the line and with any (or none) characters following. The "^" character is an anchor which means that it helps the regex engine restrict the search to the beginning of a string.

MySQL and PostgreSQL regexes are just as simple to use:

MySQL
1
select count(1) into @rows from loglines where logline regexp '^66\.249\.65\.20';
PostgreSQL
1
select 'bolanchor' as Query,count(1) from loglines where logline ~ '^66\.249\.65\.20';

Notice that MySQL uses regexp instead of = . PostgreSQL uses the ~ (tilde) character instead of =

Tested Queries

I tested three different queries for performance on MongoDB, MySQL and PostgreSQL:

Regex query strings used to compart MongoDB, MySQL and PostgreSQL
Query MongoDB Pattern MySQL/PostgreSQL Pattern
bolanchor /^66\.249\.65\.20/ ^66\.249\.65\.20
msiecount /MSIE [0-9]{1,}[\.0-9]{0,}/i MSIE [0-9]{1,}[\.0-9]{0,}
eolanchor /baidu\.com\/search\/spider\.html\)"$/ baidu\.com\/search\/spider\.html\)"$
  • bolanchor - Regex Query with a beginning of line anchor - only matches at the beginning of the line
  • msiecount - Regex Query to select lines with matching characters in the middle of the line
  • bolanchor - Regex Query with a end of line anchor - only matches at the end of the line

Results:

Millisecond timings of regex queries performed in MongoDB, MySQL and PostgreSQL
Query Rows MongoDB ms PostgreSQL ms MySQL ms Winner
bolanchor 128 295ms 2768ms 7769ms MongoDB
msiecount 371217 8512ms 5278ms 48027ms PostgreSQL 1
eolanchor 566760 3190ms 15892ms 61951ms MongoDB
    1 MongoDB wins on the second execution, once it caches some of the data in memory. MongoDB doesn’t have a query cache like MySQL, so you can’t get instant results for the exact same query.

So, MongoDB wins handily in two cases. PostgreSQL makes a mostly respectable showing, but MySQL (MariaDB in this test) brings up a distant third place.

Comments