SQL: Using <= and >= to compare string with wildcard - sql

Assuming I have table that looks like this:
Id | Name | Age
=====================
1 | Jose | 19
2 | Yolly | 26
20 | Abby | 3
29 | Tara | 4
And my query statement is:
1) Select * from thisTable where Name <= '*Abby';
it returns 0 row
2) Select * from thisTable where Name <= 'Abby';
returns row with Abby
3) Select * from thisTable where Name >= 'Abby';
returns all rows // row 1-4
4) Select * from thisTable where Name >= '*Abby';
returns all rows; // row 1-4
5) Select * from thisTable where Name >= '*Abby' and Name <= "*Abby";
returns 0 row.
6) Select * from thisTable where Name >= 'Abby' and Name <= 'Abby';
returns row with Abby;
My question: why I got these results? How does the wildcard affect the result of query? Why don't I get any result if the condition is this Name <= '*Abby' ?

Wildcards are only interpreted when you use LIKE opterator.
So when you are trying to compare against the string, it will be treated literally. So in your comparisons lexicographical order is used.
1) There are no letters before *, so you don't have any rows returned.
2) A is first letter in alphabet, so rest of names are bigger then Abby, only Abby is equal to itself.
3) Opposite of 2)
4) See 1)
5) See 1)
6) This condition is equivalent to Name = 'Abby'.

When working with strings in SQL Server, ordering is done at each letter, and the order those letters are sorted in depends on the collation. For some characters, the sorting method is much easier to understand, It's alphabetical or numerical order: For example 'a' < 'b' and '4' > '2'. Depending on the collation this might be done by letter and then case ('AaBbCc....') or might be Case then letter ('ABC...Zabc').
Let's take a string like 'Abby', this would be sorted in the order of the letters A, b, b, y (the order they would appear would be according to your collation, and i don't know what it is, but I'm going to assume a 'AaBbCc....' collation, as they are more common). Any string starting with something like 'Aba' would have a value sell than 'Abby', as the third character (the first that differs) has a "lower value". As would a value like 'Abbie' ('i' has a lower value than 'y'). Similarly, a string like 'Abc' would have a greater value, as 'c' has a higher value than 'b' (which is the first character that differs).
If we throw numbers into the mix, then you might be surpised. For example the string (important, I didn't state number) '123456789' has a lower value than the string '9'. This is because the first character than differs if the first character. '9' is greater than '1' and so '9' has the "higher" value. This is one reason why it's so important to ensure you store numbers as numerical datatypes, as the behaviour is unlikely to be what you expect/want otherwise.
To what you are asking, however, the wildcard for SQL Server is '%' and '_' (there is also '^',m but I won't cover that here). A '%' represents multiple characters, while '_' a single character. If you want to specifically look for one of those character you have to quote them in brackets ([]).
Using the equals (=) operator won't parse wildcards. you need to use a function that does, like LIKE. Thus, if you want a word that started with 'A' you would use the expression WHERE ColumnName LIKE 'A%'. If you wanted to search for one that consisted of 6 characters and ended with 'ed' you would use WHERE ColumnName LIKE '____ed'.
Like I said before, if you want to search for one of those specific character, you quote then. So, if you wanted to search for a string that contained an underscore, the syntax would be WHERE ColumnName LIKE '%[_]%'
Edit: it's also worth noting that, when using things like LIKE that they are effected by the collations sensitivity; for example, Case and Accent. If you're using a case sensitive collation, for example, then the statement WHERE 'Abby' LIKE 'abb%' is not true, and 'A' and 'a' are not the same case. Like wise, the statement WHERE 'Covea' = 'Covéa' would be false in an accent sensitive collation ('e' and 'é' are not treated as the same character).

A wildcard character is used to substitute any other characters in a string. They are used in conjunction with the SQL LIKE operator in the WHERE clause. For example.
Select * from thisTable WHERE name LIKE '%Abby%'
This will return any values with Abby anywhere within the string.
Have a look at this link for an explanation of all wildcards https://www.w3schools.com/sql/sql_wildcards.asp

It is because, >= and <= are comparison operators. They compare string on the basis of their ASCII values.
Since ASCII value of * is 42 and ASCII values of capital letters start from 65, that is why when you tried name<='*Abby', sql-server picked the ASCII value of first character in your string (that is 42), since no value in your data has first character with ASCII value less than 42, no data got selected.
You can refer ASCII table for more understanding:
http://www.asciitable.com/

There are a few answers, and a few comments - I'll try to summarize.
Firstly, the wildcard in SQL is %, not * (for multiple matches). So your queries including an * ask for a comparison with that literal string.
Secondly, comparing strings with greater/less than operators probably does not do what you want - it uses the collation order to see which other strings are "earlier" or "later" in the ordering sequence. Collation order is a moderately complex concept, and varies between machine installations.
The SQL operator for string pattern matching is LIKE.
I'm not sure I understand your intent with the >= or <= stateements - do you mean that you want to return rows where the name's first letter is after 'A' in the alphabet?

Related

Determine if a column has two equal vowels

How to determine if a column has two equal vowels in SQL Server?
For example 'maria' has two 'a' characters.
select
*
from
hr.locations
where
state_province is null
and
city like '...' <-- ?
You want to look for strings with a vowel appearing multiple times. You already have city like '...'.
Now, you may have in mind somehing like city like '%[aeiou]%<the same vowel>%', and you wonder how to make this <the same vowel> work. It simply is not possible; such reference is not available in LIKE. Instead find the expression for a single vowel: city like '%a%a%'. Then use OR for the different vowels:
select *
from hr.locations
where state_province is null
and
(
city like '%a%a%' or
city like '%e%e%' or
city like '%i%i%' or
city like '%o%o%' or
city like '%u%u%'
);
If your city column is case sensitive, and you want to find 'Anna' in spite of one A being in upper case and the other in lower case, make this lower(city) like '%a%a%'.
If your intention is to find those entries that contain exactly two equal vowels:
One way to find out how often a certain character (in your case a vowel) appears in a string is to first take the length of the entire string.
As second step, replace your character by an empty string and build the length of the new string.
This will be the length without all occurences of this character.
If the entire length reduced by the new length is 2, this will mean the character occurs exactly two times in your string.
So, you can create a query repeating this idea for every vowel, something like this:
SELECT yourcolumn
FROM yourtable
WHERE
LEN (yourcolumn) - LEN(REPLACE(yourcolumn,'a','')) = 2
OR LEN (yourcolumn) - LEN(REPLACE(yourcolumn,'e','')) = 2
OR LEN (yourcolumn) - LEN(REPLACE(yourcolumn,'i','')) = 2
OR LEN (yourcolumn) - LEN(REPLACE(yourcolumn,'o','')) = 2
OR LEN (yourcolumn) - LEN(REPLACE(yourcolumn,'u','')) = 2;
If your intention is to find those entries that contain at least two equal vowels: Just replace the "=" by ">=" or use LIKE instead.
Try out here: db<>fiddle

SQL Server - Regex pattern match only alphanumeric characters

I have an nvarchar(50) column myCol with values like these 16-digit, alphanumeric values, starting with '0':
0b00d60b8d6cfb19, 0b00d60b8d6cfb05, 0b00d60b8d57a2b9
I am trying to delete rows with myCol values that don't match those 3 criteria.
By following this article, I was able to select the records starting with '0'. However, despite the [a-z0-9] part of the regex, it also keeps selecting myCol values containing special characters like 00-d#!b8-d6/f&#b. Below is my select query:
SELECT * from Table
WHERE myCol LIKE '[0][a-z0-9]%' AND LEN(myCol) = 16
How should the expression be changed to select only rows with myCol values that don't contain special characters?
If the value must only contain a-z and digits, and must start with a 0 you could use the following:
SELECT *
FROM (VALUES(N'0b00d60b8d6cfb19'),
(N'0b00d60b8d6cfb05'),
(N'0b00d60b8d57a2b9'),
(N'00-d#!b8-d6/f&#b'))V(myCol)
WHERE V.myCol LIKE '0%' --Checks starts with a 0
AND V.myCol NOT LIKE '%[^0-9A-z]%' --Checks only contains alphanumerical characters
AND LEN(V.myCol) = 16;
The second clause works as the LIKE will match any character that isn't an alphanumerical character. The NOT then (obviously) reverses that, meaning that the expression only resolves to TRUE when the value only contains alphanumerical characters.
Pattern matching in SQL Server is not awesome, and there is currently no real regex support.
The % in your pattern is what is including the special characters you show in your example. The [a-z0-9] is only matching a single character. If your character lengths are 16 and you're only interested in letters and numbers then you can include a pattern for each one:
SELECT *
FROM Table
WHERE myCol LIKE '[0][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9]';
Note: you don't need the AND LEN(myCol) = 16 with this.

postgres replace calculated value in text

I have a table column numbers containing strings like:
1, 2, 2A, 14, 14A, 20
Listed in the desired ascending sort order.
How can I formulate an ORDER BY clause to achieve this order?
Per default, postgres has to resort to alphabetical order which would be:
1, 2, 14, 20, 2A, 14A
Can this be done using only the string-manipulation features that come with Postgres? (replace(), regex_replace() etc?)
My first idea was:
cut the letter, if present
number * 100
add ascii of letter, if present
This would yield the desired result as the mapped values would be:
100, 200, 265, 1400, 1465, 2000
I could also index this manipulated value to speed up sorting.
Additional restrictions:
I cannot use casts to hex numbers, because eg.: 14Z is valid too.
Ideally, the result is a single expression. I'd need to use this transformation for filtering and sorting like:
SELECT * FROM table WHERE transform(numbers) < 15 ORDER BY transform(numbers)
RESULT:
1, 2, 2A, 14, 14A
I tried to implement my idea, using what I learned from #klin's answer:
Cut the letter and multiply number by 100:
substring('12A' from '(\d+).*')::int*100
Cut the numbers and get ASCII of letter:
ascii(substring('12A' from '\d+([A-Z])'))
Add the two.
This works fine with 12A, but does not work with 12, as the second expression returns NULL and not 0 (numeric zero). Any ideas?
Based on these assumptions:
Numbers consist of digits and optionally one pending letter and nothing else.
There is always at least one leading digit.
All letters are either upper case [A-Z] or lower case [a-z], but not mixed.
I would enforce that with a CHECK constraint on the table column to be absolutely reliable.
Create a tiny IMMUTABLE SQL function:
CREATE OR REPLACE FUNCTION f_nr2sort(text)
RETURNS int AS
$func$
SELECT CASE WHEN right($1, 1) > '9' COLLATE "C" -- no collation
THEN left($1, -1)::int * 100 + ascii(right($1, 1))
ELSE $1::int * 100 END -- only digits
$func$ LANGUAGE SQL IMMUTABLE;
Optimized for performance based on above assumptions. I replaced all regular expressions with the much cheaper left() and right().
I disabled collation rules with COLLATE "C" for the CASE expression (it's cheaper, too) to assure default byte order of ASCII letters. Letters in [a-zA-Z] sort above '9' and if that's the case for the last letter, we proceed accordingly.
This way we avoid adding NULL values and don't need to fix with COALESCE.
Then your query can be:
SELECT *
FROM tbl
WHERE f_nr2sort(numbers) < f_nr2sort('15C')
ORDER BY f_nr2sort(numbers);
Since the function is IMMUTABLE, you can even create a simple functional index to support this class of queries:
CREATE INDEX tbl_foo_id ON tbl (f_nr2sort(numbers));
I am new at PostgreSQL, but I found this very useful post:
Alphanumeric sorting with PostgreSQL
So what about something like this:
select val
from test
order by (substring(val, '^[0-9]+'))::int, substring(val, '[^0-9_].*$') desc
Hope it helps

Get rows that contain only certain characters

I want to get only those rows that contain ONLY certain characters in a column.
Let's say the column name is DATA.
I want to get all rows where in DATA are ONLY (must have all three conditions!):
Numeric characters (1 2 3 4 5 6 7 8 9 0)
Dash (-)
Comma (,)
For instance:
Value "10,20,20-30,30" IS OK
Value "10,20A,20-30,30Z" IS NOT OK
Value "30" IS NOT OK
Value "AAAA" IS NOT OK
Value "30-" IS NOT OK
Value "30," IS NOT OK
Value "-," IS NOT OK
Try patindex:
select * from(
select '10,20,20-30,30' txt union
select '10,20,20-30,40' txt union
select '10,20A,20-30,30Z' txt
)x
where patindex('%[^0-9,-]%', txt)=0
For you table, try like:
select
DATA
from
YourTable
where
patindex('%[^0-9,-]%', DATA)=0
As per your new edited question, the query should be like:
select
DATA
from
YourTable
where
PATINDEX('%[^0-9,-]%', DATA)=0 and
PATINDEX('%[0-9]%', LEFT(DATA, 1))=1 and
PATINDEX('%[0-9]%', RIGHT(DATA, 1))=1 and
PATINDEX('%[,-][-,]%', DATA)=0
Edit: Your question was edited, so this answer is no longer correct. I won't bother updating it since someone else already has updated theirs. This answer does not fulfil the condition that all three character types must be found.
You can use a LIKE expression for this, although it's slightly convoluted:
where data not like '%[^0123456789,!-]%' escape '!'
Explanation:
[^...] matches any character that is not in the ... part. % matches any number (including zero) of any character. So [^0123456789-,] is the set of characters that you want to disallow.
However: - is a special character inside of [], so we must escape it, which we do by using an escape character, and I've chosen !.
So, you match rows that do not contain (not like) any character that is not in your disallowed set.
Use option with PATINDEX and LIKE logic operator
SELECT *
FROM dbo.test70
WHERE PATINDEX('%[A-Z]%', DATA) = 0
AND PATINDEX('%[0-9]%', DATA) > 0
AND DATA LIKE '%-%'
AND DATA LIKE '%,%'
Demo on SQLFiddle
As already mentioned u can use a LIKE expression but it will only work with some minor modifications, otherwise too many rows will be filtered out.
SELECT * FROM X WHERE T NOT LIKE '%[^0-9!-,]%' ESCAPE '!'
see working example here:
http://sqlfiddle.com/#!3/474f5/6
edit:
to meet all 3 conditions:
SELECT *
FROM X
WHERE T LIKE '%[0-9]%'
AND T LIKE '%-%'
AND T LIKE '%,%'
see: http://sqlfiddle.com/#!3/86328/1
Maybe not the most beautiful but a working solution.

String comparison in SQL Server 2008

Does SQL Server 2008 have a string comparison method that checks which string is supposed to come first (ex 'abc' comes before 'abd' etc)? I need to do a <= comparison.
In what context? <= works in a SELECT statement.
<= works fine. The problem you're having is that you're expecting numeric sorting out of strings. That doesn't work without special handling.
String Sorting
a1 - a10 strings sort in this order:
a1
a10
a2
a3
a4
...
This is because both a1 and a10 start with "a1".
Since they're strings the numeric values are irrelevant. Look what happens when we substitute a-z for 0-9:
ab
aba
ac
ad
ae
Can you see now why you're getting the results you are? In a dictionary, aba comes before ac, and a10 comes before a2.
To solve your problem it's best to split your column into two: one char and one a number. Some unpleasant expressions can get the right sort order for you, but it's a much worse solution unless you have absolutely no choice.
Here's one way. It may not suit or there may be a more efficient way, but I don't know what all your data is like.
SELECT
FROM Table
WHERE
Col LIKE 'a%'
AND Substring(Col, Convert(int, PatIndex('%[^a-z]%', Col + '0'), 1000)) <= 10
If the alpha part is always one character you can do it more simply. If the numbers can have letters after them then more twiddling is needed.
You could also try a derived table that splits the column into its separate alpha and numeric parts, then put conditions in the outer query.
Collation
Be aware each string and char-based column has a collation setting that determines what letters are sorted together (mostly for case and accents) and this can change the results of an inequality operation.
SELECT *
FROM Table
WHERE Value <= 'abc'
SELECT CASE WHEN Value <= 'abc' COLLATE Latin1_General_CS_AS_KS_WS THEN 1 ELSE 0 END
FROM Table
The collation I used there is case sensitive, accent sensitive.
You can see all the collations available to you like so:
SELECT *
FROM ::fn_helpcollations()