Calculating SQL Average with non-numeric data in Table - sql

I have this SQL query that is failing on this nvarchar, even though it seems to me that it is properly guarded for a cast to work. I am trying to get the average of the values for correctly formatted numerical values on a database that I only have read access to.
select TagName, count(TagName) as Freq,
sum(isnumeric(value)) as GoodNums,
avg(case isnumeric(value) when 1 then cast(value as numeric) else 0 end) as "Avg",
Min(Timestamp) as StartTime,
Max(Timestamp) as EndTime
from HH_Data_9 group by TagName
However I never really completely grokked the SQL syntax for combinations of CASE, aggregate functions, and GROUP_BY, so maybe I am just writing it wrong (Tried quite a few things before posting this). Note that the "GoodNums" column works and is giving reasonable answers but when I add the "Avg" column to the query the whole thing errors out of with:
Msg 8114, Level 16, State 5, Line 1
Error converting data type nvarchar to numeric.
This is Microsoft SQL Server 2014 by the way. Any ideas?

Just want to caution that this approach is very dangerous. Consider the following table structure:
CREATE TABLE TestTable
(
FieldType NVARCHAR(50) ,
FieldValue NVARCHAR(50)
)
GO
INSERT INTO dbo.TestTable
VALUES ( 'INT', '5' ),
( 'INT', '15' ),
( 'MONEY', '5.5' )
SELECT *
FROM dbo.TestTable
WHERE FieldType = 'INT'
AND CAST(FieldValue AS INT) > 10
On my machine this query works, but the thing here is that many assumes that this predicate will evaluate from left to right. But this is incorrect. SQL Server engine may decide to evaluate this expression from right to left. And in this case you will get Conversion failed error. It's called ALL-AT-ONCE principle. However in your example this is not applicable, but just wanted to mention.

Related

Union leads to error converting varchar to numeric

I'm reposting my question from yesterday with the addition of code.
I'm creating a new table LAB_RESULT as a union of two tables (Labs and CTE). A column called result_num (numeric (18,5)) is causing problems in LAB_RESULT.
Table LAB_RESULT is declared first with explicit data types.
CREATE TABLE dbo.LAB_RESULT
(
[LAB_RESULT_CM_ID] VARCHAR (36),
[RESULT_NUM] NUMERIC (18,5)
)
INSERT INTO [dbo].[LAB_RESULT] (LAB_RESULT_CM_ID, RESULT_NUM)
SELECT LAB_RESULT_CM_ID, RESULT_NUM
FROM [etl].[lab_result]
LAB_RESULT draws from varchar columns in tables LABS and CTE. I have used try_cast in both A and B to make sure nothing slips through.
So when I select from (LABS U CTE), I get:
Msg 8114
Error converting data type varchar to numeric
but when I select from only LABS or only CTE (all there just commented out) the data loads fine.
This is happening even if I use cast(null as numeric).
SELECT LAB_RESULT_CM_ID, RESULT_NUM
FROM
((SELECT
NEWID() AS LAB_RESULT_CM_ID,
CAST(NULL AS NUMERIC(18, 5)) AS RESULT_NUM
FROM [dbo].[Labs] as labs
UNION
(SELECT
NEWID() as LAB_RESULT_CM_ID,
CAST(NULL as NUMERIC(18, 5)) AS RESULT_NUM
FROM CTE)
What gives? I would really appreciate some insight. TIA!

Finding max value for a column containing hierarchical decimals

I have a table where the column values are like '1.2.4.5', '3.11.0.6',
'3.9.3.14','1.4.5.6.7', N/A, etc.. I want to find the max of that particular column. However when i use this query i am not getting the max value.
(SELECT max (CASE WHEN mycolumn = 'N/A'
THEN '-1000'
ELSE mycolumn
END )
FROM mytable
WHERE column like 'abc')
I am getting 3.9.3.14 as max value instead of 3.11....
Can someone help me?
Those aren't really decimals - they're strings containing multiple dots, so it's unhelpful to think of them as being "decimals".
We can accomplish your query with a bit of manipulation. There is a type build into SQL Server that more naturally represents this type of structure - hierarchyid. If we convert your values to this type then we can find the MAX fairly easily:
declare #t table (val varchar(93) not null)
insert into #t(val) values
('1.2.4.5'),
('3.11.0.6'),
('3.9.3.14'),
('1.4.5.6.7')
select MAX(CONVERT(hierarchyid,'/' + REPLACE(val,'.','/') + '/')).ToString()
from #t
Result:
/3/11/0/6/
I leave the exercise of fully converting this string representation back into the original form as an exercise for the reader. Alternatively, I'd suggest that you may want to start storing your data using this datatype anyway.
MAX() on values stored as text performs an alphabetic sort.
Use FIRST_VALUE and HIERARCHYID:
SELECT DISTINCT FIRST_VALUE(t.mycolumn) OVER(
ORDER BY CONVERT(HIERARCHYID, '/' + REPLACE(NULLIF(t.mycolumn,'N/A'), '.', '/') + '/') DESC) AS [Max]
FROM #mytable t

Can SELECT expressions sometimes be evaluated for rows not matching WHERE clause?

I would like to know if it's possible for expressions that are part of the SELECT statement list to be evaluated for rows not matching the WHERE clause?
From the execution order documented here, it seems that the SELECT gets evaluated long after the WHERE, however I ran into a very weird problem with a real-life query similar to the query below.
To put you in context, in the example, the SomeOtherTable has a a_varchar column which always contains numerical values for the code 105, but may contain non-numerical values for other codes.
The query statement works:
SELECT an_id, an_integer FROM SomeTable
UNION ALL
SELECT an_id, CAST(a_varchar AS int)
FROM SomeOtherTable
WHERE code = 105
The following query complains about being unable to cast a_varchar to int:
SELECT 1
FROM (
SELECT an_id, an_integer FROM SomeTable
UNION ALL
SELECT an_id, CAST(a_varchar AS int)
FROM SomeOtherTable
WHERE code = 105
) i
INNER JOIN AnotherOne a
ON a.an_id = i.an_id
And finally, the following query works:
SELECT 1
FROM (
SELECT an_id, an_integer FROM SomeTable
UNION ALL
SELECT
an_id,
CASE code WHEN 105 THEN CAST(a_varchar AS int) ELSE NULL END
FROM SomeOtherTable
WHERE code = 105
) i
INNER JOIN AnotherOne a
ON a.an_id = i.an_id
Therefore, the only explanation I could find was that with the JOIN, the query gets optimized differently in a way that CAST(a_varchar AS int) gets executed even if code <> 105.
The queries are run against SQL SERVER 2008.
Absolutely.
The documentation that you reference has a section called Logical Processing Order of the SELECT statement. This is not the physical processing order. It explains how the query itself is interpreted. For instance, an alias defined in the select clause cannot be references in the where clause, because the where clause is logically processed first.
In fact, SQL Server has the ability to optimize queries by doing various data transformation operations when it reads the data. This is a nice performance benefit, because the data is in memory, locally, and the operations can simply be done in place. However, the following can fail with a run-time error:
select cast(a_varchar as int)
from table t
where a_varchar not like '%[^0-9]%';
The filter is applied after the attempt at conversion, in the real process flow. I happen to consider this a bug; presumably, the folks at Microsoft do not think so, because they have not bothered to fix this.
Two workarounds are available. The first is try_convert(), which does conversions and returns NULL for a failure instead of a run-time error. The second is the case statement:
select (case when a_varchar not like '%[^0-9]%' then cast(a_varchar as int) end)
from table t
where a_varchar not like '%[^0-9]%';

sql server rewrites my query incorrectly?

There is a dirty data in input.
We are trying to cleanup dataset and then make some calculations on cleared data.
declare #t table (str varchar(10))
insert into #t select '12345' union all select 'ABCDE' union all select '111aa'
;with prep as
(
select *, cast(substring(str, 1, 3) as int) as str_int
from #t
where isnumeric(substring(str, 1, 3)) = 1
)
select *
from prep
where 1=1
and case when str_int > 0 then 'Y' else 'N' end = 'Y'
--and str_int > 0
Last 2 lines are doing the same thing. First one works, but if you uncomment second one it will crash with Conversion failed when converting the varchar value 'ABC' to data type int.
Obviously, SQL Server is rewriting query mixing all the conditions together.
My guess it that it considers 'case' as a havy operation and performs it as a last step. That's why workaround with case works.
Is this behavior documented in any way? or is it a bug?
This is a known issue with SQL Server, and Microsoft does not consider it a bug although users do. The difference between the two queries is the execution path. One is doing the conversion before the filtering, the other after.
SQL Server reserves the right to re-order the processing. The documentation does specify the logical processing of clauses as:
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
With (presumably but not explicitly documented here) CTEs being logically processed first. What does logically processed mean? Well, it doesn't mean that run-time errors are caught. It really determines the scope of identifiers during the compile phase.
When SQL Server reads from a data source, it can add new variables in. This is a convenient time to do this, because everything is in memory. However, this might occur before the filtering, which is what is causing the error when it occurs.
The fix to this problem is to use a case statement. So, the following CTE will usually work:
with prep as (
select *, (case when isnumeric(substring(str, 1, 3)) = 1 and str not like '%.%'
then cast(substring(str, 1, 3) as int)
end) as str_int
from #t
where isnumeric(substring(str, 1, 3)) = 1
)
Looks weird. And I think Redmond thinks so too. SQL Server 2012 introduced try_convert() (see here) which returns NULL if the conversion fails.
It would also help if you could instruct SQL Server to materialize CTEs. That would also solve the problem in this case. You can vote on adding such an option to SQL Server here.

Check if field is numeric, then execute comparison on only those field in one statement?

This may be simple, but I am no SQL whiz so I am getting lost. I understand that sql takes your query and executes it in a certain order, which I believe is why this query does not work:
select * from purchaseorders
where IsNumeric(purchase_order_number) = 1
and cast(purchase_order_number as int) >= 7
MOST of the purchar_order_number fields are numeric, but we introduce alphanumeric ones recently. The data I am trying to get is to see if '7' is greater than the highest numeric purchase_order_number.
The Numeric() function filters out the alphanumeric fields fine, but doing the subsequent cast comparison throws this error:
Conversion failed when converting the nvarchar value '124-4356AB' to data type int.
I am not asking what the error means, that is obvious. I am asking if there is a way to accomplish what I want in a single query, preferably in the where clause due to ORM constraints.
does this work for you?
select * from purchaseorders
where (case when IsNumeric(purchase_order_number) = 1
then cast(purchase_order_number as int)
else 0 end) >= 7
You can do a select with a subselect
select * from (
select * from purchaseorders
where IsNumeric(purchase_order_number) = 1) as correct_orders
where cast(purchase_order_number as int) >= 7
try this:
select * from purchaseorders
where try_cast(purchase_order_number as int) >= 7
have to check which column has numeric values only.
Currently, in a table every field is setted with nvarchar(max) Like tableName (field1 nvarchar(max),field2 nvarchar(max),field3 nvarchar(3)) and tableName has 25lac Rows.
But on manually Check Field2 Contain the numeric Values Only... How to Check With t-sql that in the Complete Column (Field2) has numeric Value or not/null value with Longest Length in the Column!