Query returns rows outside of `between` range? - sql

I am querying a SQL Server database to get results from a table between two number values. Here is that statement:
select *
FROM [DATA].[dbo].[TableName] with (nolock)
where number between '1400' and '1500'
order by CAST(number as float);
For the most part, the results are within the range as expected. However, I do see some anomalies where a number that has the first four digits within the range is returned as a result. For example:
14550
In the result above, the first four digits are 1455 which would be within the range of 1400 to 1500. My guess is that this has to do with the CAST(number as float) part of the statement. Any suggestions on how I can update this statement to only return numbers between the stated values?
Here is the number info I get when running sp_help:
| Column_name | Type | Computed | Length | Prec | Scale | Nullable | TrimTrailingBlanks | FixedLenNullInSource | Collation |
=============================================================================================================================================================
| NUMBER | varchar | no | 4000 | | | yes | no | yes | SQL_Latin1_General_CP1_CI_AS |

Your comparison is being done as a string, because a column named number is stored as a string and the comparison values are strings. You could easily fix this just by changing the comparison values to numbers:
select *
FROM [DATA].[dbo].[TableName]
where number between 1400 and 1500
order by CAST(number as float);
But this is a hacky solution -- and it will return an error if any of the number values are not numbers. The real solution is to fix the data model, so it is not storing numbers as strings:
alter table tablename alter number int;
This uses int because all the referenced values in the question are ints.
If you cannot do this because the column is erroneously called number and contains non-numbers, then use a safe conversion function:
select *
FROM [DATA].[dbo].[TableName]
where try_cast(number as float) between 1400 and 1500
order by try_cast(number as float);
Note: I'm also not sure if this is the logic you really want, because it includes 1500. You might really want:
select *
FROM [DATA].[dbo].[TableName]
where try_cast(number as float) >= 1400 and
try_cast(number as float) < 1500
order by try_cast(number as float);

You have to cast the number as an int...
select *
FROM [DATA].[dbo].[TableName]
where CAST(number as int) between 1400 and 1500
order by CAST(number as int);

Related

How do I query a column where a specific number does not exist in any of the rows of that column

I have ID | Name | Salary with types as Integer | String | Integer respectively.
I need to query the avg of all the rows of the Salary column, and then query the avg of all the rows of the Salary column again, but if any of those rows contain 0, remove 0 from those numbers, and calculate the avg.
So like if Salary returns 1420, 2006, 500, the next query should return 142, 26, 5. Then I calculate the avg of the subsequent numbers not containing 0.
I tried googling my specific problem but am not finding anything close to a solution. I'm not looking for an answer too much more than a shove in the right direction.
My Thoughts
Maybe I need to convert the integer data type to a varchar or string then remove the '0' digit from there, then convert back?
Maybe I need to create a temporary table from the first tables results, and insert them, just without 0?
Any ideas? Hope I was clear. Thanks!
Sample table data:
ID | Name | Salary
---+----------+-------
1 | Kathleen | 1420
2 | Bobby | 690
3 | Cat | 500
Now I need to query the above table but with the 0's removed from the salary rows
ID | Name | Salary
---+----------+-------
1 | Kathleen | 142
2 | Bobby | 69
3 | Cat | 5
You want to remove all 0s from your numbers, then take a numeric average of the result. As you are foreseeing, this requires mixing string and numeric operations.
The actual syntax will vary across databases. In MySQL, SQL Server and Oracle, you should be able to do:
select avg(replace(salary, '0', '') + 0) as myavg
from mytable
This involves two steps of implicit conversion: replace() forces string context, and + 0 turns the result back to a number. In SQL Server, you will get an integer result - if you want a decimal average instead, you might need to add a decimal value instead - so + 0.0 instead of + 0.
In Postgres, where implicit conversion is not happening as easily, you would use explicit casts:
select avg(replace(salary::text, '0', '')::int) as myavg
from mytable
This returns a decimal value.
Do you just want conditional aggregation?
select avg(salary), avg(case when salary <> 0 then salary end)
from t;
or do you want division?
select id, name, floor(salary / 10)
from t;
This produces the results you specify but it has nothing to do with "average"s.

Get Distinct value from a list in SQL Server

I have a DB column that has a comma delimited list:
VALUES ID
--------------------
1,11,32 A
11,12,28 B
1 C
32,12,1 D
When I run my SQL statement, in my WHERE clause I have tried IN, CONTAINS and LIKE with varying degrees of errors and success, but none offer an exact return of what I need.
What I need is a where clause that if I'm looking for all IDs with vale of '1' (NOT the number) in the list.
Example of problem:
WHERE values like (1)
This will return A,B,C,D because 1 is included in the value (11). I would expect IDs (A,C,D).
WHERE values like (2)
This will return A,B,D because 2 is included in the value (32,28,12). I would expect zeros records.
Thanks in advance for your help!
I will begin my answer by quoting the spot-on comment given by #Jarlh above:
Never, ever store data as comma separated items. It will only cause you lots of trouble.
That being said, if you're really stuck with this design, you could use:
SELECT *
FROM yourTable
WHERE ',' + [VALUES] + ',' LIKE '%,1,%';
The trick here is convert every VALUES into something looking like:
,11,12,28,
Then, we can search for a target number with comma delimiters on both sides. Since we placed commas at both ends, then every number in the CSV list is now guaranteed to have commas around it.
If you are stuck with such a poor data model, I would suggest:
select t.*
from t
where exists (select 1
from string_split(t.values, ',') s
where s.value = 1
);
Exactly i echo what jarlh and Tim says. relational model is not the right place to store comma delimited strings in table.
Here is an approach, that can likely use an index if there is one on column x
select distinct x
from t
cross apply string_split(t.x,',')
where value=1 /*out here you may parameterize, and also could make use of an index each if there is one in value*/
+---------+
| x |
+---------+
| 1 |
| 1,11,32 |
| 32,12,1 |
+---------+
working example
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=b9b3084f52b0f42ffd17d90427016999
--SQL Server older versions
with data
as (
SELECT t.c.value('.', 'VARCHAR(1000)') as val
,y
,x
FROM (
SELECT x1 = CAST('<t>' +
REPLACE(x , ',', '</t><t>') + '</t>' AS XML)
,y
,x
FROM t
) a
CROSS APPLY x1.nodes('/t') t(c)
)
select x,y
from data
+---------+
| x |
+---------+
| 1 |
| 1,11,32 |
| 32,12,1 |
+---------+
working example
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=011a096bbdd759ea5fe3aa74b08bc895

updating the column values by performing the rounding function

Need help for the below scenario :
I have a column quantity in 1 table , where the values in the column quantity are
Quantity
234.8735 |
43.7611 |
477.654 |
I want to update each record by performing 2 digit round function . so that the output must be
Quantity
234.87 |
43.76 |
477.65 |
You may simply check for the length of the decimal portition
UPDATE MYTABLE
set Quantity=round(Quantity,length(TO_CHAR(Quantity - floor(Quantity) ) ) - 3);

Why FLOAT type data comparison is not working even providing the exact data

In real table with the same value
SELECT * FROM float_value WHERE val = 49640.2473896214 -- No data returns
If I round it upto the total precision then it works
SELECT * FROM float_value WHERE ROUND(val, 10) = ROUND(49640.2473896214, 10) --Returning Data
After that I have created temporaty table with the same value 49640.2473896214 and it works in the first query which is failed above
CREATE TABLE #testing(Vvalue FLOAT)
INSERT INTO #testing VALUES (49640.2473896214)
SELECT * FROM #testing WHERE Vvalue = 49640.2473896214 -- Simply returning row
Would you please help me to figure out this why = comparison is not working in the above ? If I should use ROUND always then it would be another problem to figure out the precision to be rounded and compare.
I want the result if we provide input what exactly visible in the field i.e. = 49640.2473896214 should return value.
Thank you in advance.
Approximate-number data types for use with floating point numeric data. Floating point data is approximate; therefore, not all values in the data type range can be represented exactly.
https://learn.microsoft.com/en-us/sql/t-sql/data-types/float-and-real-transact-sql
This can be very frustrating, but some float values will fail comparisons by the equal operator and it is necessary to fix the decimal precision to enable reliable use of equal.
Floating point numbers cannot represent values exactly as you want, for example:
49640.2473896214321 gets stored as 49640.2473896214287378825247287750244140625
49640.2473896214521 gets stored as 49640.247389621450565755367279052734375
17.56 gets stored as... left as exercise
It is also worth noting that the displayed value of floats is usually an approximation. Some environments allow you to change the precision of the displayed value but I could not find any such setting in SQL server. Having said all that:
SELECT * FROM float_value WHERE val = 49640.2473896214 -- No data returns
That is because 49640.2473896214 does not exist in the database. The value in database could be ...62139... or ...62141..., who knows.
Would you please help me to figure out this why = comparision is not
working in the above?
It should work if you supply the exact value stored in database (used in previous INSERT or UPDATE operation). If you supply the value you see in the database then see notes above.
If I should use ROUND always then it would be another problem to
figure out the precision to be rounded and compare.
ROUNDing returns FLOAT for FLOATs so you could end up with similar issues. The most cited solution for this problem is subtract the two numbers and check if the difference is very small:
select * from #testing WHERE ABS(vvalue - 49640.24738962 ) < 1e-11
-- id | vvalue | actual_value
-- 1 | 49640.24738962 | 49640.24738962
select * from #testing WHERE ABS(vvalue - 49640.2473896214 ) < 1e-11
-- id | vvalue | actual_value
-- 2 | 49640.2473896214 | 49640.2473896214
select * from #testing WHERE ABS(vvalue - 49640.2473896214321) < 1e-11
-- id | vvalue | actual_value
-- 3 | 49640.2473896214 | 49640.2473896214321
select * from #testing WHERE ABS(vvalue - 49640.2473896214521) < 1e-11
-- id | vvalue | actual_value
-- 4 | 49640.2473896215 | 49640.2473896214521
The 1e-11 is referred to as epsilon, the amount of tolerance you can accept. You can set it to something smaller but not smaller than 1e-16 as far as I can tell.
Now, I understand the scenario of FLOAT field values as its storing upto 10 precision by rounding the given values
For Example:
CREATE TABLE #testing(id INT IDENTITY(1,1), Vvalue FLOAT, actual_value VARCHAR(50))
INSERT INTO #testing VALUES
(49640.24738962, '49640.24738962'),
(49640.2473896214, '49640.2473896214'),
(49640.2473896214321, '49640.2473896214321'),
(49640.2473896214521, '49640.2473896214521')
value saved as:
id Vvalue actual_value
1 49640.24738962 49640.24738962 --Saved same as input
2 49640.2473896214 49640.2473896214 --Saved same as input
3 49640.2473896214 49640.2473896214321 --Saved upto 10 precision only by rounding leaving trailing zeros
4 49640.2473896215 49640.2473896214521 --Saved upto 10 precision only by rounding leaving trailing zeros
Now, Apparently following query should return two rows 2,3 but row 3 value is not exactly as out input
SELECT * FROM #testing WHERE Vvalue = 49640.2473896214
id Vvalue actual_value
2 49640.2473896214 49640.2473896214
In my case, it should return both rows 2,3 so, if I round comparison column value by 10 then it will give what I want and it doesn't matter for me now, what unseen value it's holding ? I just simply want to receive what's present there in the table
SELECT * FROM #testing WHERE ROUND(Vvalue, 10) = 49640.2473896214
id Vvalue actual_value
2 49640.2473896214 49640.2473896214
3 49640.2473896214 49640.2473896214321
Thank you everyone for sharing your ideas and boost up my mind :)

SQL Server - Find records with identical substrings [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I inherited a table that has a column containing hand-entered award numbers. It has been used for many years by many people. The award numbers in general look like this:
R01AR012345-01
R01AR012345-02
R01AR012345-03
Award numbers get assigned each year. Because so many different people have had their hands in this in the past, there isn't a lot of consistency in how these are entered. For instance, an award sequence may appear like this:
R01AR012345-01
1 RO1AR012345-02
12345-03
12345-05A1
1234506
The rule I've been given to find is to return any record in which 5 consecutive integers from that column match with another record.
I know how to match a given string, but am at a loss when the 5 consecutive integers are unknown.
Here's a sample table to make what I'm looking for more clear:
+----------------------+
| table: AWARD |
+-----+----------------+
| ID | AWARD_NO |
+-----+----------------+
| 12 | R01AR015123-01 |
+-----+----------------+
| 13 | R01AR015124-01 |
+-----+----------------+
| 14 | 15123-02A1 |
+-----+----------------+
| 15 | 1 Ro1XY1512303 |
+-----+----------------+
| 16 | R01XX099232-01 |
+-----+----------------+
In the above table, the following IDs would be returned: 12,13,14,15
The five consecutive integers that match are:
12,13: 01512
12,14: 15123
12,15: 15123
In our specific case, ID 13 is a false positive... but we're willing to deal with those on a case-by-case basis.
Here's the desired return set for the above table:
+-----+-----+----------------+----------------+
| ID1 | ID2 | AWARD_NO_1 | AWARD_NO_2 |
+-----+-----+----------------+----------------+
| 12 | 13 | R01AR015123-01 | R01AR015124-01 |
+-----+-----+----------------+----------------+
| 12 | 14 | R01AR015123-01 | 15123-02A1 |
+-----+-----+----------------+----------------+
| 12 | 15 | R01AR015123-01 | 1 Ro1XY1512303 |
+-----+-----+----------------+----------------+
Now... I'm OK with false positives (like 12 matching 13) and duplicates (because if 12 matches 14, then 14 also matches 12). We're looking through something like 18,000 rows. Optimization isn't really necessary in this situation, because it's only needed to be run one time.
This should handle removing duplicates and most false-positives:
DECLARE #SPONSOR TABLE (ID INT NOT NULL PRIMARY KEY, AWARD_NO VARCHAR(50))
INSERT INTO #SPONSOR VALUES (12, 'R01AR015123-01')
INSERT INTO #SPONSOR VALUES (13, 'R01AR015124-01')
INSERT INTO #SPONSOR VALUES (14, '15123-02A1')
INSERT INTO #SPONSOR VALUES (15, '1 Ro1XY1512303')
INSERT INTO #SPONSOR VALUES (16, 'R01XX099232-01')
;WITH nums AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS [Num]
FROM sys.objects
),
cte AS
(
SELECT sp.ID,
sp.AWARD_NO,
SUBSTRING(sp.AWARD_NO, nums.Num, 5) AS [TestCode],
SUBSTRING(sp.AWARD_NO, nums.Num + 5, 1) AS [FalsePositiveTest]
FROM #SPONSOR sp
CROSS JOIN nums
WHERE nums.Num < LEN(sp.AWARD_NO)
AND SUBSTRING(sp.AWARD_NO, nums.Num, 5) LIKE '%[1-9][0-9][0-9][0-9][0-9]%'
-- AND SUBSTRING(sp.AWARD_NO, nums.Num, 5) LIKE '%[0-9][0-9][0-9][0-9][0-9]%'
)
SELECT sp1.ID AS [ID1],
sp2.ID AS [ID2],
sp1.AWARD_NO AS [AWARD_NO1],
sp2.AWARD_NO AS [AWARD_NO2],
sp1.TestCode
FROM cte sp1
CROSS JOIN #SPONSOR sp2
WHERE sp2.AWARD_NO LIKE '%' + sp1.TestCode + '%'
AND sp1.ID < sp2.ID
--AND 1 = CASE
-- WHEN (
-- sp1.FalsePositiveTest LIKE '[0-9]'
-- AND sp2.AWARD_NO NOT LIKE
-- '%' + sp1.TestCode + sp1.FalsePositiveTest + '%'
-- ) THEN 0
-- ELSE 1
-- END
Output:
ID1 ID2 AWARD_NO1 AWARD_NO2 TestCode
12 14 R01AR015123-01 15123-02A1 15123
12 15 R01AR015123-01 1 Ro1XY1512303 15123
14 15 15123-02A1 1 Ro1XY1512303 15123
If IDs 14 and 15 should not match, we might be able to correct for that as well.
EDIT:
Based on the comment from #Serpiton I commented out the creation and usage of the [FalsePositiveTest] field since changing the initial character range in the LIKE clause on the SUBSTRING to be [1-9] accomplished the same goal and slightly more efficiently. However, this change assumes that no valid Award # will start with a 0 and I am not sure that this is a valid assumption. Hence, I left the original code in place but just commented out.
You want to use the LIKE command in your where clause and use a pattern to look for the 5 numbers. See this post here:
There are probably better ways of representing this but the below example looks for 5 digits from 0-9 next to each other in the data anywhere in your column value. This could perform quite slowly however...
Select *
from blah
Where column LIKE '%[0-9][0-9][0-9][0-9][0-9]%'
Create a sql server function to extract the 5 numbers and then use the function in your query.
Perhaps something like:
select GetAwardNumber(AwardNumberField) as AwardNumber
from Awards
group by GetAwardNumber(AwardNumberField)
I will not post the code, but an idea on how to do it.
First of all you need to make a table valued function that will return all number sequences from a string bigger then 5 characters. (there are examples on SO)
So for each entry your function will return a list of numbers.
After that the query will simplify like:
;with res as (
select
id, -- hopefully there is an id on your table
pattern -- pattern is from the list of patterns the udtf returns
from myTable
cross apply udtf_custom(myString) -- myString is the string you need to split
)
select
pattern
from res
group by pattern
having count(distinct id)>1
I have to note that this is for example purposes, there should be some coding and testing involved, but this should be the story with it.
Good luck, hope it helps.
Here's what I ended up with:
SELECT a1.ID as AWARD_ID_1, a2.ID as AWARD_ID_2, a1.AWARD_NO as Sponsor_Award_1, a2.AWARD_NO as Sponsor_Award_2
FROM AWARD a1
LEFT OUTER JOIN AWARD a2
ON SUBSTRING(a1.AWARD_NO,PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%',a1.AWARD_NO + '1'),5) = SUBSTRING(a2.AWARD_NO,PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%',a2.AWARD_NO + '1'),5)
WHERE
a1.AWARD_NO <> '' AND a2.AWARD_NO <> ''
AND a1.ID <> a2.ID
AND a1.AWARD_NO LIKE '%[0-9][0-9][0-9][0-9][0-9]%' AND a2.AWARD_NO LIKE '%[0-9][0-9][0-9][0-9][0-9]%'
There's a possibility that the first substring of five characters might not match (when they should generate a match), but it's close enough for us. :-)