How to write MySQL REGEXP? - sql

A table contains the string "Hello world!"
Thinking of * as the ordinary wildcard character, how can I write a REGEXP that will evalute to true for 'W*rld!' but false for 'H*rld!' since H is part of another word. 'W*rld' should evalute to false as well because of the trailing '!'

Use:
WHERE column REGEXP 'W[[:alnum:]]+rld!'
Alternately, you can use:
WHERE column RLIKE 'W[[:alnum:]]+rld!'
RLIKE is a synonym for REGEXP
[[:alnum:]] will allow any alphanumeric character, [[:alnum:]]+ will allow multiples
REGEXP \ RLIKE is not case sensitive, except when used with binary strings.
Reference: MySQL Regex Support

If you are just looking to match the word world, then do this:
SELECT * FROM `table` WHERE `field_name` LIKE "w_rld!";
The _ allows for a single wildcard character.
Edit: I realize the OP requested this solution with REGEXP, but since the same result can be achieved without using regular expressions, I provided this as viable solution that should perform faster than a REGEXP.

You can use regular expressions in MySQL:
SELECT 'Hello world!' REGEXP 'H[[:alnum:]]+rld!'
0
SELECT 'Hello world!' REGEXP 'w[[:alnum:]]+rld!'
1
More information about the syntax can be found here.

Related

return strings starting with any of the characters in snowflake

I am trying to get all strings starting with any of the characters but it doesn't work in Snowflake. Can someone help if there is any way to do this
select * from table where name LIKE '[A-E]%';
You can use rlike instead of like and specify a regular expression:
select * from snowflake_sample_data.tpch_sf1.nation where n_name rlike '[A-E].*';
Use a regular expression wildcard, .* after the letters rather than a like wildcard, %.
For case insensitive searches, you have some options:
-- Second syntax of rlike specifying the 'i' parameter for case insensitivity.
select * from snowflake_sample_data.tpch_sf1.nation where rlike (n_name, '[a-e].*', 'i');
-- First syntax option changing the regular expression to find either upper or lower case.
select * from snowflake_sample_data.tpch_sf1.nation where n_name rlike '[A-Ea-e].*';

Regex not matching correct string

I am busy building a lookup table for specific names of merchants. I tried to make use of the following regex but it's returning less results than the standard "like" function in Netezza SQL. Please refer to below:
SQL Like function: where trim(upper(a.MRCH_NME)) like '%CNA %' -- returns 4622 matches
Regex function in Netezza SQL: where array_combine(regexp_extract_all(trim(upper(a.MRCH_NME)),'.*CNA\s','i'),'|') = 'CNA' -- returns 2226 matches
I looked at the two result sets and found that strings such as the following aren't matched:
!C CNA INT ARR
*CNA PLATZ 0400
015764 CNA CRAD
C#CNA PARK 0
I made use of the following regex expression: /.*CNA\s'/
Any idea why the above strings aren't being returned as matches?
Thank you.
You probably should be using regexp_like:
SELECT *
FROM yourTable
WHERE REGEXP_LIKE(MRCH_NME, 'CNA[ ]', 'i');
This would be logically identical to the following query using LIKE:
SELECT *
FROM yourTable
WHERE MRCH_NME LIKE '%CNA ';
It seems to me the problem is more with your code rather than the regex. Look: like '%CNA %' returns all entries that contain a CNA substring followed with a literal space anywhere inside the entry. The '.*CNA\s' regex matches any 0+ chars other than newline followed with CNA and **any whitespace char*.
Acc. to this reference, \s matches "a white space character. White space is defined as [\t\n\f\r\p{Z}].
Thus, you should in fact just use
WHERE REGEXP_LIKE(MRCH_NME, 'CNA ', 'i')
or, better with a word boundary check:
WHERE REGEXP_LIKE(MRCH_NME, '\bCNA\b', 'i')
where \b marks a transition from a word to non-word and non-word to word character, thus ensuring a whole word search and justifying the regex usage.
If you do not need to match the merchant name as a whole word, use the regular LIKE with '%CNA %', it should be more efficient.

Excluding a field that has numbers in it, with regex for sql

I am trying to exclude any names that contain numbers from a query, but I seem to be way off in my attempts:
SELECT * FROM products WHERE name not REGEX '[0-9]|'
I thought you could do regex with sql, what am I doing wrong? How can you exclude a field with any numbers?
Here is some valid syntax.
MySQL (and SQLite if REGEXP has a supporting function):
WHERE name not REGEXP '[0-9]'
Oracle:
WHERE not regexp_like(name, '[0-9]')
Postgres:
WHERE NOT name ~ '[0-9]'
SQL Server:
WHERE name NOT LIKE '%[0-9]%'
You need to remove the alternation operator |:
SELECT * FROM products WHERE name not REGEX '[0-9]'
When you use [0-9]| regex, it matches a digit or an empty string. So, it will match any string.

What will be the regular expression for alphanumeric characters, space ,french characters and dash?

I just want to know what will be the regex for alphanumeric characters, space french characters and dash. I tried this, but it doesn't work.
SELECT * FROM my_table
WHERE regexp_like(name_elem1,'[^[:alnum:]^[:blank:]^[àâçéèêëîïôûùüÿñæœ]^[\-]]');
Please help
I am not an Oracle SQL expert and cannot test the solution but I would rather write it the following way:
SELECT * FROM my_table WHERE regexp_like(name_elem1,'[0-9A-Za-z\ \tàâçéèêëîïôûùüÿñæœ]+');
Different sources say that one cannot join regex character classes so I have put them explicitly: [0-9A-Za-z] for alnum, \ \t for white characters and an extended list of French characters.
If you want those characters, then don't use the 'not' expression...and consider case-insensitivity.
... regexp_like(name_elem1,'[[:alnum:][:blank:][àâçéèêëîïôûùüÿñæœ]]', 'i');
caveot: this is just looking for any 1 character matching the expression.
Here's the official doc:
https://docs.oracle.com/database/121/SQLRF/ap_posix.htm#SQLRF020

Where filename like '%_123456_%'

I'm trying to query a table with a like statement. Is _ considered a wildcard symbol in postgres? Would
select * from table where field1 like '%_123_%'
return the same thing as
select * from table where field1 like '%123%'
Here is an example from the official documentation regarding wildcards in Postgres:
'abc' LIKE 'abc' true
'abc' LIKE 'a%' true
'abc' LIKE '_b_' true
'abc' LIKE 'c' false
_ is a wildcard for one character, while % is a wildcard for multiple characters.
Yes, _ is a wildcard symbol that matches one character. It can't be an empty match, so no, those statements are not the same. The first requires the string be at least 5 characters long while the second only requires 3 characters.
If you're familiar with regexes, %123% is equivalent to .*123.*, while %_123_% is equivalent to .+123.+.
From the PostgreSQL manual:
To match a literal underscore or percent sign without matching other characters, the respective character in pattern must be preceded by the escape character. The default escape character is the backslash but a different one can be selected by using the ESCAPE clause. To match the escape character itself, write two escape characters.
yes
_ matches one char while % matches lots of chars.