Difference between _%_% and __% in sql server - sql

I am learning basics of SQL through W3School and during understanding basics of wildcards I went through the following query:
--Finds any values that start with "a" and are at least 3 characters in length
WHERE CustomerName LIKE 'a_%_%'
as per the example following query will search the table where CustomerName column start with 'a' and have at least 3 characters in length.
However, I try the following query also:
WHERE CustomerName LIKE 'a__%'
The above query also gives me the exact same result.
I want to know whether there is any difference in both queries? Does the second query produce a different output in some specific scenario? If yes what will be that scenario?

Both start with A, and end with %. In the middle part, the first says "one char, then between zero and many chars, then one char", while the second one says "one char, then one char".
Considering that the part that comes after them (the final part) is %, which means "between zero and many chars", I can only see both clauses as identical, as they both essentially just want a string starting with A then at least two following characters. Perhaps if there were at least some limitations on what characters were allowed by the _, then maybe they could have been different.
If I had to choose, I'd go with the second one for being more intuitive. After all, many other masks (e.g. a%%%%%%_%%_%%%%%) will yield the same effect, but why the weird complexity?

For Like operator a single underscore "_" means, any single character, so if you put One underscore like
ColumnName LIKE 'a_%'
you basically saying you need a string where first letter is 'a' then followed by another single character and then followed by anything or nothing.
ColumnName LIKE 'a__%' OR ColumnName LIKE 'a_%_%'
Both expressions mean first letter 'a' then followed by two characters and then followed by anything or nothing. Or in simple English any string with 3 or more character starting with a.

Related

How to use REGEXP_LIKE() for concatenation in Oracle

I need to make some changes in SQL within a CURSOR. Previously, the maximum value for column 'code' was 4 characters (e.g. K100, K101,....K999) but now it needs to be 8 characters (e.g. K1000, K1001, K1002,....K1000000).
CURSOR c_code(i_prefix VARCHAR2)
IS
SELECT NVL(MAX(SUBSTR(code,2))+1,100) code
FROM users
WHERE code LIKE i_prefix||'___';
The 'code' column value starts from 100 and increments +1 each time a new record is inserted. Currently, the maximum value is 'K999' and I would like it to be K1000, K1001, K1002 and so on.
I have altered and modified the 'code' column to VARCHAR(8) in the users table.
Note: i_prefix value is always 'K'.
I have tried to amend the SQL -
CURSOR c_code(i_prefix VARCHAR2)
IS
SELECT NVL(MAX(SUBSTR(code,2))+1,100) code
FROM users
WHERE code LIKE i_prefix||'________';
However, it restarts from 100 and not from K1000, K1001, K1002, etc. each time a record is inserted.
I have been suggested to use REGEXP_LIKE() but not sure how to properly use it to get the desired outcome in this case.
Can you please guide me on how can we get this result using REGEXP_LIKE().
Thank you.
Your old code
WHERE code LIKE i_prefix||'___';
will match K followed by exactly three characters, which is what you had. Your new code
WHERE code LIKE i_prefix||'________';
will match K followed by exactly eight characters, which is one too many for a start, since you said the total length was eigh - which means you need sever wilcard placeholders:
WHERE code LIKE i_prefix||'_______';
... but that still won't work at the moment since your existing values aren't that long. As all your current values are at least four, you could do:
WHERE code LIKE i_prefix||'___%';
which will match K followed by three or more characters - with no upper limit, but your column is restricted to eight too anyway.
If you did want to use a regular expression, which are generally slower, you could do:
WHERE REGEXP_LIKE(code, i_prefix||'.{3,7}');
which would match K followed by three to seven characters, or:
WHERE REGEXP_LIKE(code, i_prefix||'\d{3,7}');
which would only match K followed by three to seven digits.
fiddle
However, I would suggest you use a sequence to generate the numeric part, and just prefix that with the K character. The sequence could start from 100 on a new system with no data, or from the current maximum number in an existing system with data.
I would also consider zero-padding the data, including all the existing values, to allow them to be compared; so update K100 to K0000100. Or if you can't do that, once you get past K199 jump to K2000000. Either would then allow the values to be sorted easily as strings. Or, perhaps, add a virtual column that extracts the numeric part as a number.

SQL Difference Between "a%" and "a%_"

I have been searching on good use-cases and differences between the use of LIKE and = where I faced this problem regarding LIKE.
Since LIKE "_r%" means those with 'r' as their second character, doesn't it hold to assume that "r%_" means those with 'r' as their first character, essentially making it functionally same as "a%".
I am asking this because our lecture slides says the otherwise, and I am not sure whether I am wrong or not. I have also ran this SQL test program (https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_like_underscore) to see it firsthand and this also proves my point.
Have a good day.
No, both r%_ and r% don't actually mean the same thing. The first version r%_ will match any string starting with r, followed by zero or more of any character, followed by any single character. This pattern will match ra and ran, but it will not match single r. The pattern r% on the other hand will match r, since % allows for zero characters following the leading r.
The difference is that a% will match anything that starts with a, including a by itself (% matches zero, one, or multiple characters), while a%_ will match anything that starts with a, but must be followed by at least one other character (_ matches exactly one character).

SQL Server 2005 Update/Delete Substring of a Lengthy Column

I'm not sure if it is possible to do what I'm trying to do, but I thought i would give it a shot anyway. Also, I am fairly new to the SQL Server world and this is my fist post, so I apologize if my wording is poor or if I leave information out. Also, I am working with SQL Server 2005.
Let's say I have a table named "table" and a column named "column" The contents of column is a jumbled mess of characters (ntext data type). These characters were all drawn in from multiple entry fields in a front end application. Now one of those entry fields was for sensitive information that we no longer need and would like to get rid of but I can't just get rid of the whole column because it also contains other valuable information. Most of the solutions I have found so far only deal with columns that have short entries so they are just able to update the whole string, but for mine I think I need to identify the the beginning and the end of the specific substring that I need and replace it or delete it somehow. This is the closest I have gotten to at least selecting the data that I need... AAA and /AAA mark the beginning and the end of the substring that I need.
select
substring (column, charindex ('AAA', column), charindex ('/AAA',column))
from table
where column like '%/AAA%'
The problems I am having with this one are that the substring doesn't stop at /AAA, it just keeps going, and some of the results are just blank so it looks something like:
AAA 12345 /AAA abcdefghijklmnop
AAA 12346 /AAA qrstuvwxyzabcdef
AAA 12347 /AAA abcdefghijklmnop
With the characters in bold being the information I need to get rid of. Also even though row 3 is blank, it still does contain the info that I need but I'm guessing that it isn't returning it because it has a different amount of characters before it (for example, rows 1, 2, and 4 might have 50 characters before them but row 3 might have 100 characters before it), at least that's the only reason that I could think of.
So I suppose the first step would probably be to actually select the right substring, then to either delete it or replace it with a different, meaningless substring like "111111" or something.
If there is more information that you need to be provided with or if I was unclear about anything please let me know and thank you for taking the time to read through (and hopefully answer) my question!
Edit: Another one that gets close to the right results goes something like this
select substring(column,charindex('AAA',column),20) from table
where column like '%/AAA%'
I'm not sure if this approach would work better since the substring i'm looking for is always going to have the same amount of characters. The problem with this one though, is that instead of having blank rows, they are replaced with irrelevant substrings from that column, but all of the other rows do return exactly what I want.
First of all, check your usage of SUBSTRING(). The third argument is for length, not end character, so you would need to alter your query to something like:
select substring (column, charindex ('AAA',column)
, charindex ('/AAA',column)-charindex ('AAA',column))
from table where column like '%/AAA%'
Yes your approach of finding it and then either deleting or replacing it is sound.
If some of the results are blank, it's possible that you are finding and replacing the entire string. If it had not found the correct regular expression in there, you would have not returned the row at all, which is different from returning a black value in that column.

Regular expression filter

I have this regular expression in my sql query
DECLARE #RETURN_VALUE VARCHAR(MAX)
IF #value LIKE '%[0-9]%[^A-Z]%[0-9]%'
BEGIN
SET #RETURN_VALUE = NULL
END
I am not sure, but whenever I have this in my row 12 TEST then it gives me the value of 12, but if I have three digit number then it filters out the three digit numbers.How can I modify the regular expression to return me the three digits numbers too.
any help will be appreciated.
SQL doesn't have regular expressions: it has SQL wildcard expressions. They are much simpler than regular expressions and long predate regular expressions. For instance, there is no way to specify alternation (a|b) or repetition ( a*, a+, a?, a{m,n} ) such as you might find in a regular expression.
The 'like expression' that you have
LIKE '%[0-9]%[^A-Z]%[0-9]%'
will match any string containing the following pattern anywhere in the string
zero or more of any character, followed by...
a single decimal digit, followed by...
zero or more of any character, followed by...
a single character other than A–Z (whether it's case sensitive or not depends on the collating sequence in use), followed by...
zero or of any character, followed by...
a single decimal digit, followed by...
zero or more of any character
One should note that the % is likely to match perhaps more than you might like.
Have you tried ([0-9]*). I believe that this will capture every digit for you. However, I am not as strong at regex. When I ran this through rubular, it worked, though :) BTW, rubular is a great way to test out regular expressions
You can easily create a SQL CLR function and use this in your queries. Visual Studio has a project template for this and makes deploying the functions a snap.
Here is more information from Microsoft about how to create the function and how to use it (for boolean matches and for data extraction).
First of all, note that this is not really a "regular expression", it's a SQL-specific form of wildcard matching. You are very limited in what you can accomplish with SQL wildcards. As one example, you cannot "optionally" match a specific character or character set.
Your expression, as you've written it, will match any value that contains two digits with at least one non-letter character in between them, meaning it will match:
111
1^1
1?7
1AAAAAAAAAAA?AAAAAAAAA1
-----------------------5-----------------3-------
And infinitely more items of a similar structure.
Oddly, one string that would not match this pattern is "12 TEST" because there is no character between the 1 and 2. The pattern also won't "give you" the value of 12 back because it's not a parsing expression, just a matching expression: it returns 1 (true) or 0 (false).
There is clearly something else going on in your application, possibly even an actual regular expression, but it has nothing to do with the SQL you've included here.

How do I check the end of a particular string using SQL pattern matching?

I am trying to use sql pattern matching to check if a string value is in the correct format.
The string code should have the correct format of:
alphanumericvalue.alphanumericvalue
Therefore, the following are valid codes:
D0030.2190
C0052.1925
A0025.2013
And the following are invalid codes:
D0030
.2190
C0052.
A0025.2013.
A0025.2013.2013
So far I have the following SQL IF clause to check that the string is correct:
IF #vchAccountNumber LIKE '_%._%[^.]'
I believe that the "_%" part checks for 1 or more characters. Therefore, this statement checks for one or more characters, followed by a "." character, followed by one or more characters and checking that the final character is not a ".".
It seems that this would work for all combinations except for the following format which the IF clause allows as a valid code:
A0025.2013.2013
I'm having trouble correcting this IF clause to allow it to treat this format as incorrect. Can anybody help me to correct this?
Thank you.
This stackoverflow question mentions using word-boundaries: [[:<:]] and [[:>:]] for whole word matches. You might be able to use this since you don't have spaces in your code.
This is ANSI SQL solution
This LIKE expression will find any pattern not alphanumeric.alphanumeric. So NOT LIKE find only this that match as you wish:
IF #vchAccountNumber NOT LIKE '%[^A-Z0-9].[^A-Z0-9]%'
However, based on your examples, you can use this...
LIKE '[A-Z][0-9][0-9][0-9][0-9].[0-9][0-9][0-9][0-9]'
...or one like this if you 5 alphas, dot, 4 alphas
LIKE '[A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9].[A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9]'
The 2nd one is slightly more obvious for fixed length values. The 1st one is slighty less intuitive but works with variable length code either side of the dot.
Other SO questions Creating a Function in SQL Server with a Phone Number as a parameter and returns a Random Number and Best equivalent for IsInteger in SQL Server