RegEx to match String tokens in any order? - sql

I'm looking for an Oracle Regular Expression that will match tokens in any order.
For example, say I'm looking for "one two".
I would want it to match both,
"one token two"
"two other one"
The number of tokens might grow larger than two, so generating the permutations for the regex would be a hassel.
Is there an easier way to do this, than this
'(ONE.*TWO)|(TWO.*ONE)'
i.e
select *
from some_table t
where regexp_like(t.NAME_KEY, '(ONE.*TWO)|(TWO.*ONE)')

Here's an alternative query that uses Full Text Search (FTS) functionality:
WHERE CONTAINS(t.name_key, 'ONE & TWO') > 0
See the Precedence Examples for criteria evaluation explanation.
Related:
Introduction to Oracle Text

You can use several different regular expressions:
SELECT *
FROM some_table t
WHERE regexp_like(t.NAME_KEY, 'ONE')
AND regexp_like(t.NAME_KEY, 'TWO')
One issue is that this will also match 'TWONE' which the original regular expression would not match. This can be fixed if you also check for some separating tokens or word boundary.
Also a regular expression is not necessary to match a constant string. You could just use LIKE instead.

Related

How to check if regex match with another regex in sql

How to check if regex match with another regex in sql , for example ab.* matche with abc.*
As the question is partial, I can only return a partial answer. If the regex goes through, it means it matches the regular expression. Same goes for the other. If you are trying to check if both go through, simply check if both of the regex statements are true.
Or.. If you are trying to see if both of the regex "match", you can compare them to a third regex, which should match both.
Or... If you are simply trying to see if they are equal, just compare them in an if statement.

How to determine whether a varchar field DOES NOT contain characters in set

I need to determine if all rows in varchar column in a db contain any characters outside of the particular set below:
abcdefghijklmonpqrstuvwxyzABCDEFGHIJKLMONPQRSTUVWXYZ.-#,1234567890/\&%();:+#_*?|=''
I tried this but am not sure if it is correct:
select AccName
from Transactions
where AccName not like '%[!abcdefghijklmonpqrstuvwxyzABCDEFGHIJKLMONPQRSTUVWXYZ.-#,1234567890/\&%();:+#_*?|='']%'
Should this work?
Any help appeciated.
You cannot use a regular expression inside an ordinary LIKE condition in a query. If you want to use regular expressions, you will have to use a special operator. In MySQL, you could try the following:
SELECT AccName
FROM Transactions
WHERE AccName REGEXP [!abcdefghijklmonpqrstuvwxyzABCDEFGHIJKLMONPQRSTUVWXYZ.-#,1234567890/\&%();:+#_*?|='']%';
If this doesn't run to boot, then you may have to tidy up the regular expression you gave. And as marc_s asked, the exact regular expression and query will depend on the DB system you are using.
Database management systems vary in their support for matching regular expressions. Examples below use PostgreSQL, which supports POSIX regular expressions, along with other flavors. Examples below also test for case-sensitive matches to avoid sentences like "'Mike' doesn't not match the regular expression".
AFAIK, no DBMS lets you mix the like operator with a regular expression.
A like expression in the form column_name like '%a%' will match 'a' if it appears anywhere in the column. But you need your regular expression to match on the whole value of the column. Anchor the regular expression at the start and end of each value (^ and $), and tell the dbms to match one or more instances (+) of the atom.
select 'Mike' ~ '^[a-zA-Z0-9]+$'; -- 'Mike' matches the regex
Write a failing test.
select 'Mike?' ~ '^[a-zA-Z0-9]+$'; -- 'Mike?' doesn't match the regex
Add the question mark to the regex, and verify the test succeeds.
select 'Mike?' ~ '^[a-zA-Z0-9?]+$'; -- 'Mike?' matches the regex
Repeat failing test and succeeding test for each character. When you've caught all the characters you want, invert the logic using the !~ operator in place of the ~ operator.
When your data is clean move this into a CHECK constraint.
PostgreSQL pattern matching

Regular expression filter

I have this regular expression in my sql query
DECLARE #RETURN_VALUE VARCHAR(MAX)
IF #value LIKE '%[0-9]%[^A-Z]%[0-9]%'
BEGIN
SET #RETURN_VALUE = NULL
END
I am not sure, but whenever I have this in my row 12 TEST then it gives me the value of 12, but if I have three digit number then it filters out the three digit numbers.How can I modify the regular expression to return me the three digits numbers too.
any help will be appreciated.
SQL doesn't have regular expressions: it has SQL wildcard expressions. They are much simpler than regular expressions and long predate regular expressions. For instance, there is no way to specify alternation (a|b) or repetition ( a*, a+, a?, a{m,n} ) such as you might find in a regular expression.
The 'like expression' that you have
LIKE '%[0-9]%[^A-Z]%[0-9]%'
will match any string containing the following pattern anywhere in the string
zero or more of any character, followed by...
a single decimal digit, followed by...
zero or more of any character, followed by...
a single character other than A–Z (whether it's case sensitive or not depends on the collating sequence in use), followed by...
zero or of any character, followed by...
a single decimal digit, followed by...
zero or more of any character
One should note that the % is likely to match perhaps more than you might like.
Have you tried ([0-9]*). I believe that this will capture every digit for you. However, I am not as strong at regex. When I ran this through rubular, it worked, though :) BTW, rubular is a great way to test out regular expressions
You can easily create a SQL CLR function and use this in your queries. Visual Studio has a project template for this and makes deploying the functions a snap.
Here is more information from Microsoft about how to create the function and how to use it (for boolean matches and for data extraction).
First of all, note that this is not really a "regular expression", it's a SQL-specific form of wildcard matching. You are very limited in what you can accomplish with SQL wildcards. As one example, you cannot "optionally" match a specific character or character set.
Your expression, as you've written it, will match any value that contains two digits with at least one non-letter character in between them, meaning it will match:
111
1^1
1?7
1AAAAAAAAAAA?AAAAAAAAA1
-----------------------5-----------------3-------
And infinitely more items of a similar structure.
Oddly, one string that would not match this pattern is "12 TEST" because there is no character between the 1 and 2. The pattern also won't "give you" the value of 12 back because it's not a parsing expression, just a matching expression: it returns 1 (true) or 0 (false).
There is clearly something else going on in your application, possibly even an actual regular expression, but it has nothing to do with the SQL you've included here.

Return sql rows where field contains ONLY non-alphanumeric characters

I need to find out how many rows in a particular field in my sql server table, contain ONLY non-alphanumeric characters.
I'm thinking it's a regular expression that I need along the lines of [^a-zA-Z0-9] but Im not sure of the exact syntax I need to return the rows if there are no valid alphanumeric chars in there.
SQL Server doesn't have regular expressions. It uses the LIKE pattern matching syntax which isn't the same.
As it happens, you are close. Just need leading+trailing wildcards and move the NOT
WHERE whatever NOT LIKE '%[a-z0-9]%'
If you have short strings you should be able to create a few LIKE patterns ('[^a-zA-Z0-9]', '[^a-zA-Z0-9][^a-zA-Z0-9]', ...) to match strings of different length. Otherwise you should use CLR user defined function and a proper regular expression - Regular Expressions Make Pattern Matching And Data Extraction Easier.
This will not work correctly, e.g. abcÑxyz will pass thru this as it has a,b,c... you need to work with Collate or check each byte.

Regex for parsing SQL parameters

If I have a query such as SELECT * from authors where name = #name_param, is there a regex to parse out the parameter names (specifically the "name_param")?
Thanks
This is tricky because params can also occur inside quoted strings.
SELECT * FROM authors WHERE name = #name_param
AND string = 'don\'t use #name_param';
How would the regular expression know to use the first #name_param but not the second?
It's a problem that can be solved, but it's not practical to do it in a single regular expression. I had to handle this in Zend_Db, and what I did was first strip out all quoted strings and delimited identifiers, and then you can use regular expressions on the remainder.
You can see the code, because it's open-source.
See functions _stripQuoted() and _parseParameters().
https://github.com/zendframework/zf1/blob/136735e776f520b081cd374012852cb88cef9a88/library/Zend/Db/Statement.php#L200
https://github.com/zendframework/zf1/blob/136735e776f520b081cd374012852cb88cef9a88/library/Zend/Db/Statement.php#L140
Given you have no quoted strings or comments with parameters in them, the required regex would be quite trivial:
#([_a-zA-Z]+) /* match group 1 contains the name only */
I go with Bill Karwin's recommendation to be cautious, knowing that the naïve approach has its pitfalls. But if you kow the data you deal with, this regex would be all you need.