Comparing regex performance between MongoDB, MySQL and PostgreSQL
MongoDB has a very rich regular expression (regex) interface. Regex is available in many of the places that ordinary character strings are accepted. While convenient to use, I had some questions about how MongoDB regex performance stacks up to regex performance in other (SQL) Databases.
To run the examples in this post, clone the github repository at https://github.com/dwilkins/mongoregexp and follow the instructions in the README.md file
Simplicity
MongoDB regexes are simple to use. Instead of using double quotes, use forward slashes to delimit strings to match:
1 2 |
|
The difference is, double quotes require a full, exact string match. Forward slash (regex) will match rows where the string matches the pattern given. If no special regex characters are given columns that contain the search string anywhere in the column are considered positive matches.
The first query will only find rows that exactly match the IP Address of
"66.249.65.20"
The second query will find rows that match that IP address at the beginning of the line
and with any (or none) characters following. The
"^"
character is an
anchor
which means that it helps the regex engine restrict the search to the beginning of a string.
MySQL and PostgreSQL regexes are just as simple to use:
MySQL1
|
|
1
|
|
Notice that MySQL uses
regexp
instead of
=
. PostgreSQL uses the
~
(tilde) character instead of
=
Tested Queries
I tested three different queries for performance on MongoDB, MySQL and PostgreSQL:
Query | MongoDB Pattern | MySQL/PostgreSQL Pattern |
---|---|---|
bolanchor | /^66\.249\.65\.20/ |
^66\.249\.65\.20 |
msiecount |
/MSIE [0-9]{1,}[\.0-9]{0,}/i
|
MSIE [0-9]{1,}[\.0-9]{0,}
|
eolanchor |
/baidu\.com\/search\/spider\.html\)"$/
|
baidu\.com\/search\/spider\.html\)"$
|
- bolanchor - Regex Query with a beginning of line anchor - only matches at the beginning of the line
- msiecount - Regex Query to select lines with matching characters in the middle of the line
- bolanchor - Regex Query with a end of line anchor - only matches at the end of the line
Results:
Query | Rows | MongoDB ms | PostgreSQL ms | MySQL ms | Winner |
---|---|---|---|---|---|
bolanchor | 128 | 295ms | 2768ms | 7769ms | MongoDB |
msiecount | 371217 | 8512ms | 5278ms | 48027ms | PostgreSQL 1 |
eolanchor | 566760 | 3190ms | 15892ms | 61951ms | MongoDB |
-
1
MongoDB wins on the second execution, once it caches some of the data in memory.
MongoDB doesn’t have a query cache like MySQL, so you can’t get instant results for
the exact same query.
So, MongoDB wins handily in two cases. PostgreSQL makes a mostly respectable showing, but MySQL (MariaDB in this test) brings up a distant third place.