I was helping a co-worker debug a query that was returning weird results. We narrowed it down to a line that looked like this:
WHERE COL BETWEEN '11201' AND '111226'
The value in COL comes from a call to substring, so it's a string type value. This returns no results.
Naively, I had always assumed that BETWEEN represented >= and <= and that if you call it with strings, it would cast everything to numerical type values. That works just fine if you have something like:
WHERE COL BETWEEN '11201' AND '11226'
Which returns results in the case we are using it.
Clearly, since the second snippet returns results but the first snippet does not, my understanding is mistaken.
I cast everything to numbers and tried it again, and got the expected behavior. From this, it seems like I can conclude that when it does string comparisons, it actually doesn't cast the values - instead, it goes character by character. When it gets to the third character and sees 2 > 1 in the lower bound argument, it quits based on the following behavior from the Oracle documents:
If expr3 < expr2, then the interval is empty.
Can anyone weigh in on if this is what is truly happening beneath the hood?
Thank you!
The expression:
WHERE COL BETWEEN '11201' AND '111226'
is the same as:
WHERE COL >= '11201' AND COL <= '111226'
This returns nothing because -- as strings -- '11201' > '111226'. This uses alphabetic ordering, so this would be clearer if you used letters:
WHERE COL BETWEEN 'BBCAB' AND 'BBBCCG'
Clearly, there is nothing alphabetic between these values, because 'BBC' occurs after 'BBB'.
The moral? If you want comparisons that are intuitive, use the right types.
In the expression below
WHERE COL BETWEEN '11201' AND '111226'
You are comparing a text column COL against text. The string '11201' is lexicographically greater than the string '111226'. In other words, '11201' comes after '111226' in the dictionary, or the former is greater than the latter. This is why no results are coming back. However, if you cast COL to a number, and compare that to numbers, then the comparison might work, assuming there are matching records:
WHERE TO_NUMBER(COL) BETWEEN 11201 AND 111226
Related
I have a column in my table with these values:
PING_TO_ME_20100828_Any87
TO_THESE_D_COLUMN_ENTRY_20200825
TO_THESE_D_20100829_COLUMN_ENTRY
201901_ARE_YOU_TRYING_TO_REACH47
ASK_TO_UOU_201008
I need to separate date values in a separate column.
My output should be:
20100828
20200825
20100829
201901
201008
Any help is very much appreciated.
You will (and already have) likely get comments about this telling you to fix your design. And while that is likely true...I won't try to pick apart why you are doing this, and I'll just give you the answer you came here for.
Your goal is to pick out either an 8 digit string of integers, or a 6 digit string of integers.
Here is one way you could do it:
SELECT x.y
, COALESCE(SUBSTRING(x.y, NULLIF(PATINDEX('%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%', x.y), 0), 8)
, SUBSTRING(x.y, NULLIF(PATINDEX('%[0-9][0-9][0-9][0-9][0-9][0-9]%', x.y), 0), 6))
FROM (
VALUES ('PING_TO_ME_20100828_Any87'),
('TO_THESE_D_COLUMN_ENTRY_20200825'),
('TO_THESE_D_20100829_COLUMN_ENTRY'),
('201901_ARE_YOU_TRYING_TO_REACH47'),
('ASK_TO_UOU_201008')
) x(y)
Explanation:
Since you are looking for both 8 and 6 digit values, you need to check for the longer of the two first. So first I search for the occurrence of a string of 8 integers using:
NULLIF(PATINDEX('%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%', x.y), 0)
This returns the first position of a string of 8 integers. The reason I wrap it in a NULLIF() is because if the value is not found, then PATINDEX will return 0.
I use NULLIF() to return NULL in that case, essentially indicating nothing was found. If you pass a NULL value to SUBSTRING() then it also returns NULL.
This is all just a nice way of "failing over" to the 6 character string check.
So there I do the same thing again:
NULLIF(PATINDEX('%[0-9][0-9][0-9][0-9][0-9][0-9]%', x.y), 0)
Except this time, I only repeat [0-9] six times. And again, I use the NULLIF() trick, so that it returns NULL if no string is found.
Throw that all into SUBSTRING() and COALESCE() and you've got a function that returns the results you're looking for.
Potential downsides
There are a couple down sides to this method.
It is not checking for a valid date, it's simply looking for a string of either 8 integers, or 6 integers. It could be 12345678 and it would still detect and return that.
If there are strings of integers longer than 8 digits, it will grab only the first 8 characters.
If there are multiple occurrences of 6 or 8 character integer strings...it will only return the first one.
There are much more robust ways you could write this, but it all depends on your data and what you need to do.
Other methods
Another way it could be done depending on which version of SQL Server you are using, is using STRING_SPLIT().
SELECT x.y, s.[value]
FROM (
VALUES ('PING_TO_ME_20100828_Any87'),('TO_THESE_D_COLUMN_ENTRY_20200825'),('TO_THESE_D_20100829_COLUMN_ENTRY'),('201901_ARE_YOU_TRYING_TO_REACH47'),('ASK_TO_UOU_201008')
) x(y)
CROSS APPLY (
SELECT [value]
FROM STRING_SPLIT(x.y, '_')
WHERE [value] LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
OR [value] LIKE '[0-9][0-9][0-9][0-9][0-9][0-9]'
) s
This method handles a couple of the downsides mentioned earlier. For example, it will ONLY return integer strings of length 6 or 8. It will also return ALL integer strings of length 6 or 8 and not just the first one.
And there's other ways to identify the strings as well, like using ISNUMERIC(x.[value]) or TRY_CONVERT(int, s.[value]).
It all depends on how you are using this code...if it's runs fast enough, and it's a one off script, then it really doesn't matter. If it's running for millions of records at a time, then yeah you should play around with other methods.
I am trying to compare two string using Sql query. for e.g In table A i have A123.45 and in table B i have A12345. this two string are same if i ignore decimal point so as a output i would want table A's value.
First, to avoid the XY problem, it's a little unclear to me why you'd want to do this in the first place - I'm not sure exactly why 123.45 should be equal to 12345. Definitely something to think about.
With that said, if you insist, you can do something like the following:
select case when replace(cast(floatingPointNumber as varchar(50)), '.', '') = cast(yourInteger as varchar(50)) then 1 else 0 end
from YourTable
Obviously, floatingPointNumber is a float and yourInteger is an integer.
I'm not sure what platform you're using since you didn't tag it but I wrote/tested this in SQL Server. You can do something similar in Oracle/MySQL if that's what you're using.
Basically, what this is doing is casting both the floating point number and the integer to strings, removing the decimal from the floating point number, and comparing them. If they're equal, it returns 1; otherwise it returns 0.
I was given the following statement:
LEFT(f.field4, CASE WHEN PATINDEX('%[^0-9]%',f.field4) = 0 THEN LEN(f.field4) ELSE PATINDEX('%[^0-9]%',f.field4) - 1 END)=#DealNumber
and am having trouble contacting the person that wrote it. Could someone explain what that statement does, and if it is valid SQL? The goal of the statement is to compare the numeric character in f.field for to the DealNumber. DNumber and DealNumber are the same except for a wildcard at the end of DealNumber.
I am trying to use it in the context of the following statement:
SELECT d.Description, d.FileID, d.DateFiled, u.Contact AS UserFiledName, d.Pages, d.Notes
FROM Documents AS d
LEFT JOIN Files AS f ON d.FileID=f.FileID
LEFT JOIN Users AS u ON d.UserFiled=u.UserID
WHERE SUBSTRING(f.Field8, 2, 1) = #LocationIDString
AND f.field4=#DNumber OR LEFT(f.field4, CASE WHEN PATINDEX('%[^0-9]%',f.field4) = 0 THEN LEN(f.field4) ELSE PATINDEX('%[^0-9]%',f.field4) - 1 END)=#DealNumber"
but my code keeps timing out when I execute it.
It's the CASE clause which is slowing things down, not LEFT per se (although LEFT may prevent the use of indexes, which will have an effect).
The CASE determines what should be compared with #DealNumber, and I think it does the following...
If f.field4 does not start with a digit, use LEFT(f.field4, LEN(f.field4))=#DealNumber: that's equivalent to f.field4=#DealNumber.
If f.field4 does start with digits, use {those digits}=#DealNumber.
This sort of computation isn't very efficient.
I would attempt the following, which makes the large assumption that a mixed string can be cast as an integer — that is, that if you convert ABC to an integer you get zero, and if you convert 123ABC you get what can be converted, 123. I can't find any documentation which says whether that is possible or not.
AND f.field4=#DNumber
OR (f.field4=#DealNumber AND integer(f.field4)=0)
OR (integer(f.field4)=#DealNumber)
The first line is the same as your AND. The second line selects f.field4=#DealNumber only if f.field4 does not start with a number. The third line selects where the initial numeric portion of f.field4 is the same as #DealNumber.
As I say, there is an assumption here that integer() will work in this way. You may need to define a CAST function to do that conversion with strings. That's rather beyond me, although I would be confident that even such a function would be faster than a CASE as you currently have.
From the doc:
left(str text, n int)
Return first n characters in the string. When n is negative, return all but last |n| characters.
In MS SQL, I need a approach to determine the largest scale being used by the rows for a certain decimal column.
For example Col1 Decimal(19,8) has a scale of 8, but I need to know if all 8 are actually being used, or if only 5, 6, or 7 are being used.
Sample Data:
123.12345000
321.43210000
5255.12340000
5244.12345000
For the data above, I'd need the query to either return 5, or 123.12345000 or 5244.12345000.
I'm not concerned about performance, I'm sure a full table scan will be in order, I just need to run the query once.
Not pretty, but I think it should do the trick:
-- Find the first non-zero character in the reversed string...
-- And then subtract from the scale of the decimal + 1.
SELECT 9 - PATINDEX('%[1-9]%', REVERSE(Col1))
I like #Michael Fredrickson's answer better and am only posting this as an alternative for specific cases where the actual scale is unknown but is certain to be no more than 18:
SELECT LEN(CAST(CAST(REVERSE(Col1) AS float) AS bigint))
Please note that, although there are two explicit CAST calls here, the query actually performs two more implicit conversions:
As the argument of REVERSE, Col1 is converted to a string.
The bigint is cast as a string before being used as the argument of LEN.
SELECT
MAX(CHAR_LENGTH(
SUBSTRING(column_name::text FROM '\.(\d*?)0*$')
)) AS max_scale
FROM table_name;
*? is the non-greedy version of *, so \d*? catches all digits after the decimal point except trailing zeros.
The pattern contains a pair of parentheses, so the portion of the text that matched the first parenthesized subexpression (that is \d*?) is returned.
References:
https://www.postgresql.org/docs/9.6/static/sql-createcast.html
https://www.postgresql.org/docs/9.6/static/functions-matching.html
Note this will scan the entire table:
SELECT TOP 1 [Col1]
FROM [Table]
ORDER BY LEN(PARSENAME(CAST([Col1] AS VARCHAR(40)), 1)) DESC
We have legacy table where one of the columns part of composite key was manually filled with values:
code
------
'001'
'002'
'099'
etc.
Now, we have feature request in which we must know MAX(code) in order to give user next possible value, in example case form above next value is '100'.
We tried to experiment with this but we still can't find any reasonable explanation how DB2 engine calculates that
MAX('001', '099', '576') is '576'
MAX('099', '99', 'www') is '99' and so on.
Any help or suggestion would be much appreciated!
You already have the answer to getting the maximum numeric value, but to answer the other part with regard to 'www','099','99'.
The AS/400 uses EBCDIC to store values, this is different to ASCII in several ways, the most important for your purposes is that Alpha characters come before numbers, which is the opposite of Ascii.
So on your Max() your 3 strings will be sorted and the highest EBCDIC value used so
'www'
'099'
'99 '
As you can see your '99' string is really '99 ' so it is higher that the one with the leading zero.
Cast it to int before applying max()
For the numeric maximum -- filter out the non-numeric values and cast to a numeric for aggregation:
SELECT MAX(INT(FLD1))
WHERE FLD1 <> ' '
AND TRANSLATE(FLD1, '0123456789', '0123456789') = FLD1
SQL Reference: TRANSLATE
And the reasonable explanation:
SQL Reference: MAX
This max working well in your type definition, when you want do max on integer values then convert values to integer before calling MAX, but i see you mixing max with string 'www' how you imagine this works?
Filter integer only values, cast it to int and call max. This is not good designed solution but looking at your problem i think is enough.
Sharing the solution for postgresql
which worked for me.
Suppose here temporary_id is of type character in database. Then above query will directly convert char type to int type when it gives response.
SELECT MAX(CAST (temporary_id AS Integer)) FROM temporary
WHERE temporary_id IS NOT NULL
As per my requirement I've applied MAX() aggregate function. One can remove that also and it will work the same way.