Regular expression filter - sql

I have this regular expression in my sql query
DECLARE #RETURN_VALUE VARCHAR(MAX)
IF #value LIKE '%[0-9]%[^A-Z]%[0-9]%'
BEGIN
SET #RETURN_VALUE = NULL
END
I am not sure, but whenever I have this in my row 12 TEST then it gives me the value of 12, but if I have three digit number then it filters out the three digit numbers.How can I modify the regular expression to return me the three digits numbers too.
any help will be appreciated.

SQL doesn't have regular expressions: it has SQL wildcard expressions. They are much simpler than regular expressions and long predate regular expressions. For instance, there is no way to specify alternation (a|b) or repetition ( a*, a+, a?, a{m,n} ) such as you might find in a regular expression.
The 'like expression' that you have
LIKE '%[0-9]%[^A-Z]%[0-9]%'
will match any string containing the following pattern anywhere in the string
zero or more of any character, followed by...
a single decimal digit, followed by...
zero or more of any character, followed by...
a single character other than A–Z (whether it's case sensitive or not depends on the collating sequence in use), followed by...
zero or of any character, followed by...
a single decimal digit, followed by...
zero or more of any character
One should note that the % is likely to match perhaps more than you might like.

Have you tried ([0-9]*). I believe that this will capture every digit for you. However, I am not as strong at regex. When I ran this through rubular, it worked, though :) BTW, rubular is a great way to test out regular expressions

You can easily create a SQL CLR function and use this in your queries. Visual Studio has a project template for this and makes deploying the functions a snap.
Here is more information from Microsoft about how to create the function and how to use it (for boolean matches and for data extraction).

First of all, note that this is not really a "regular expression", it's a SQL-specific form of wildcard matching. You are very limited in what you can accomplish with SQL wildcards. As one example, you cannot "optionally" match a specific character or character set.
Your expression, as you've written it, will match any value that contains two digits with at least one non-letter character in between them, meaning it will match:
111
1^1
1?7
1AAAAAAAAAAA?AAAAAAAAA1
-----------------------5-----------------3-------
And infinitely more items of a similar structure.
Oddly, one string that would not match this pattern is "12 TEST" because there is no character between the 1 and 2. The pattern also won't "give you" the value of 12 back because it's not a parsing expression, just a matching expression: it returns 1 (true) or 0 (false).
There is clearly something else going on in your application, possibly even an actual regular expression, but it has nothing to do with the SQL you've included here.

Related

Regular expression metacharacter in SQL yields different results in Oracle vs. Postgres

I'm trying to convert some queries from an Oracle environment to Postgres. This is a simplified version of one of the queries:
SELECT * FROM TABLE
WHERE REGEXP_LIKE(TO_CHAR(LINK_ID),'\D')
I believe the equivalent postgreSQL should be this:
SELECT * FROM TABLE
WHERE CAST(LINK_ID AS TEXT) ~ '\D'
But when I run these queries in their respective environments on the exact same dataset, the first query outputs no records (which is correct) and the second query outputs all records in the table. I didn't write the original code, but as I understand it, it's looking for any values in the numeric field LINK_ID that are non-digit characters. Is the \D metacharacter supposed to behave differently in Oracle vs. postgres? I'm not seeing anything in documentation to say they should.
The documentation for Oracle's TO_CHAR(number) states
If you omit fmt, then n is converted to a VARCHAR2 value exactly long enough to hold its significant digits.
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions181.htm
This means that the only non-numeric character which might be produced is a negative sign or a decimal point. If the number is positive and has no fractional part, it will not match the regular expression \D.
On the other hand, on PostgreSQL CAST(numeric(38,8)as TEXT) returns a value with the number of decimal places specified by the type specification, in this case 8.
E.g.:
cast( cast(12341234 as numeric(38,8)) as TEXT)
Generates 12341234.00000000 The result of such a cast will always contain a decimal point and therefore will always match the regular expression \D.
You may find that replacing it with this solves your problem:
(LINK_ID % 1) <> 0.0
Alternatively, If you need to use the regex (e.g. to simplify migration work), consider changing it to '\.0*[1-9]' i.e. to find a decimal point with any nonzero digit after it.

Difference between _%_% and __% in sql server

I am learning basics of SQL through W3School and during understanding basics of wildcards I went through the following query:
--Finds any values that start with "a" and are at least 3 characters in length
WHERE CustomerName LIKE 'a_%_%'
as per the example following query will search the table where CustomerName column start with 'a' and have at least 3 characters in length.
However, I try the following query also:
WHERE CustomerName LIKE 'a__%'
The above query also gives me the exact same result.
I want to know whether there is any difference in both queries? Does the second query produce a different output in some specific scenario? If yes what will be that scenario?
Both start with A, and end with %. In the middle part, the first says "one char, then between zero and many chars, then one char", while the second one says "one char, then one char".
Considering that the part that comes after them (the final part) is %, which means "between zero and many chars", I can only see both clauses as identical, as they both essentially just want a string starting with A then at least two following characters. Perhaps if there were at least some limitations on what characters were allowed by the _, then maybe they could have been different.
If I had to choose, I'd go with the second one for being more intuitive. After all, many other masks (e.g. a%%%%%%_%%_%%%%%) will yield the same effect, but why the weird complexity?
For Like operator a single underscore "_" means, any single character, so if you put One underscore like
ColumnName LIKE 'a_%'
you basically saying you need a string where first letter is 'a' then followed by another single character and then followed by anything or nothing.
ColumnName LIKE 'a__%' OR ColumnName LIKE 'a_%_%'
Both expressions mean first letter 'a' then followed by two characters and then followed by anything or nothing. Or in simple English any string with 3 or more character starting with a.

How to determine whether a varchar field DOES NOT contain characters in set

I need to determine if all rows in varchar column in a db contain any characters outside of the particular set below:
abcdefghijklmonpqrstuvwxyzABCDEFGHIJKLMONPQRSTUVWXYZ.-#,1234567890/\&%();:+#_*?|=''
I tried this but am not sure if it is correct:
select AccName
from Transactions
where AccName not like '%[!abcdefghijklmonpqrstuvwxyzABCDEFGHIJKLMONPQRSTUVWXYZ.-#,1234567890/\&%();:+#_*?|='']%'
Should this work?
Any help appeciated.
You cannot use a regular expression inside an ordinary LIKE condition in a query. If you want to use regular expressions, you will have to use a special operator. In MySQL, you could try the following:
SELECT AccName
FROM Transactions
WHERE AccName REGEXP [!abcdefghijklmonpqrstuvwxyzABCDEFGHIJKLMONPQRSTUVWXYZ.-#,1234567890/\&%();:+#_*?|='']%';
If this doesn't run to boot, then you may have to tidy up the regular expression you gave. And as marc_s asked, the exact regular expression and query will depend on the DB system you are using.
Database management systems vary in their support for matching regular expressions. Examples below use PostgreSQL, which supports POSIX regular expressions, along with other flavors. Examples below also test for case-sensitive matches to avoid sentences like "'Mike' doesn't not match the regular expression".
AFAIK, no DBMS lets you mix the like operator with a regular expression.
A like expression in the form column_name like '%a%' will match 'a' if it appears anywhere in the column. But you need your regular expression to match on the whole value of the column. Anchor the regular expression at the start and end of each value (^ and $), and tell the dbms to match one or more instances (+) of the atom.
select 'Mike' ~ '^[a-zA-Z0-9]+$'; -- 'Mike' matches the regex
Write a failing test.
select 'Mike?' ~ '^[a-zA-Z0-9]+$'; -- 'Mike?' doesn't match the regex
Add the question mark to the regex, and verify the test succeeds.
select 'Mike?' ~ '^[a-zA-Z0-9?]+$'; -- 'Mike?' matches the regex
Repeat failing test and succeeding test for each character. When you've caught all the characters you want, invert the logic using the !~ operator in place of the ~ operator.
When your data is clean move this into a CHECK constraint.
PostgreSQL pattern matching

Return sql rows where field contains ONLY non-alphanumeric characters

I need to find out how many rows in a particular field in my sql server table, contain ONLY non-alphanumeric characters.
I'm thinking it's a regular expression that I need along the lines of [^a-zA-Z0-9] but Im not sure of the exact syntax I need to return the rows if there are no valid alphanumeric chars in there.
SQL Server doesn't have regular expressions. It uses the LIKE pattern matching syntax which isn't the same.
As it happens, you are close. Just need leading+trailing wildcards and move the NOT
WHERE whatever NOT LIKE '%[a-z0-9]%'
If you have short strings you should be able to create a few LIKE patterns ('[^a-zA-Z0-9]', '[^a-zA-Z0-9][^a-zA-Z0-9]', ...) to match strings of different length. Otherwise you should use CLR user defined function and a proper regular expression - Regular Expressions Make Pattern Matching And Data Extraction Easier.
This will not work correctly, e.g. abcÑxyz will pass thru this as it has a,b,c... you need to work with Collate or check each byte.

How do I check the end of a particular string using SQL pattern matching?

I am trying to use sql pattern matching to check if a string value is in the correct format.
The string code should have the correct format of:
alphanumericvalue.alphanumericvalue
Therefore, the following are valid codes:
D0030.2190
C0052.1925
A0025.2013
And the following are invalid codes:
D0030
.2190
C0052.
A0025.2013.
A0025.2013.2013
So far I have the following SQL IF clause to check that the string is correct:
IF #vchAccountNumber LIKE '_%._%[^.]'
I believe that the "_%" part checks for 1 or more characters. Therefore, this statement checks for one or more characters, followed by a "." character, followed by one or more characters and checking that the final character is not a ".".
It seems that this would work for all combinations except for the following format which the IF clause allows as a valid code:
A0025.2013.2013
I'm having trouble correcting this IF clause to allow it to treat this format as incorrect. Can anybody help me to correct this?
Thank you.
This stackoverflow question mentions using word-boundaries: [[:<:]] and [[:>:]] for whole word matches. You might be able to use this since you don't have spaces in your code.
This is ANSI SQL solution
This LIKE expression will find any pattern not alphanumeric.alphanumeric. So NOT LIKE find only this that match as you wish:
IF #vchAccountNumber NOT LIKE '%[^A-Z0-9].[^A-Z0-9]%'
However, based on your examples, you can use this...
LIKE '[A-Z][0-9][0-9][0-9][0-9].[0-9][0-9][0-9][0-9]'
...or one like this if you 5 alphas, dot, 4 alphas
LIKE '[A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9].[A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9]'
The 2nd one is slightly more obvious for fixed length values. The 1st one is slighty less intuitive but works with variable length code either side of the dot.
Other SO questions Creating a Function in SQL Server with a Phone Number as a parameter and returns a Random Number and Best equivalent for IsInteger in SQL Server