MSSQL: IN with hardcoded value vs. IN with SQL query - sql

I have the following query:
SELECT c004, mesosafe, CAST(SUBSTRING (c000 ,1 , 5) as int) as sortorder, c001, c000, c002
FROM DE_DATA.dbo.t309
WHERE c001 IS NOT NULL
AND c002 = 1
AND mesocomp = 'EMTD'
AND mesoyear IN (
SELECT MAX(mesoyear) AS currmesoyear
FROM DE_DATA.dbo.t001
)
AND CAST(SUBSTRING (c000 ,1 , 5) as int) < 101
ORDER BY c000 ASC;
This query fails because some values of c000 cannot be casted because they look like 008- instead of 00052-. Here the mesoyear is asked (mesoyear IN ...). If I query this SQL part alone I get 1344 as result.
On the other side this query works:
SELECT c004, mesosafe, CAST(SUBSTRING (c000 ,1 , 5) as int) as sortorder, c001, c000, c002
FROM DE_DATA.dbo.t309
WHERE c001 IS NOT NULL
AND c002 = 1
AND mesocomp = 'EMTD'
AND mesoyear IN (
'1344'
)
AND CAST(SUBSTRING (c000 ,1 , 5) as int) < 101
ORDER BY c000 ASC;
So what is the difference between hardcoded values and the SQL query?
Edit:
I think the reason is that the subquery is evaluated later than the main query. Can it be? What can I do against this?

You are assuming a certain order of execution, in as only certain rows you believe are 'correct' will be evaluated against the CAST operation. This is a fundamental fallacy. SQL is a declarative, set oriented language which does not make any evaluation ordering promise the way imperative languages do. As such your entire approach is flawed and you're asking the wrong question. Some query execution plans may work, some may fail, but those that work will start failing randomly later as the query chooses a different plan.
Ultimately your problem is the data model, the fact that you have to crack this composite c000 field into substrings and cast to get out int values. Use string fields to store strings, use numeric fields to store numbers. Simple as that. What you're trying to achieve will never work.
See also On SQL Server boolean operator short-circuit and T-SQL Functions do not imply a certain order of execution.

Usually MsSql runs subqueries before their parent. You query probably fails when the inner query finds the first value it cannot cast to the same type-and-size of mesoyear. Did you check if columns t309.mesoyear and t001.mesoyear have the same type?
Nevertheless, there are some cases when MsSql doesn't follow the usual behaviour. If you think this is the case, consider using the FORCE ORDER query hint (although this situation will eventually emerge again).
Last, but not least, if the CAST operator is not safe, you can (should) check if the value can be cast before incurring into a runtime error:
-- use
AND (1=1
AND (ISNUMERIC(SUBSTRING(c000 ,1 , 5) = 1)
AND (CAST(SUBSTRING (c000 ,1 , 5) as int) < 101)
)
-- instead of
AND CAST(SUBSTRING (c000 ,1 , 5) as int) < 101

Related

SQL problem with data conversions and divide by zero (pulling my hair out)

I have been googling my butt off but I can't solve this. So I reach for those more experienced than myself. Thank you and please!
I have a table with a few important columns:
AvailableQty which is varchar(200)
hardAllocQty which is varchar(200)
Value which is numberic(19,4)
I am trying to add the quantities in the two qty columns and then divide by the value. But I'm getting "Error converting data type nvarchar to numeric." and sometimes other errors depending on how I try to troubleshoot it such as "Divide by zero error encountered". Here is the code. Any help on how to make this work would be great. I think the issue is somewhere in the first half of the isnull()
select distinct part.Part_Num, part.[Part Description], part.[Part Type],
isnull(
(select (sum(cast(stock.AvailableQty as bigint))+sum(cast(stock.HardAllocQty as bigint)))/
sum(cast(stock.Value as bigint)) from dbo.StockAvailability Stock
WHERE stock.PartNo = part.Part_Num),
(SELECT
CASE
WHEN Avg(cast(NULLIF(po.cost,0) as BIGINT)) IS NULL THEN format(Avg(po.cost),'c')
ELSE format(Avg(cast(NULLIF(po.cost,0) as BIGINT)),'c')
END
from dbo.tblPO_DETAIL_DATA PO WHERE PO.[Part Number] = part.Part_Num AND
cast(po.[PO Date] as date) > DATEADD(MONTH,#LookBack60,GETDATE()))
)
as 'Avg Cost'
from dbo.UNIQUE_PARTS_LIST part
tldr;
On first inspection, this CASE statement looks offending:
CASE
WHEN Avg(cast(NULLIF(po.cost,0) as BIGINT)) IS NULL
THEN format(Avg(po.cost),'c')
ELSE format(Avg(cast(NULLIF(po.cost,0) as BIGINT)),'c')
END
I can't justify it and suspect it is a cut'n'paste issue or it's just half formed. When there are zeros, you want to average across all rows, and when there are no zeros you still want to average across all rows without conversion... otherwise you do want to use conversion.
Do you want the 0s for po.cost to be included in the average cost or do you want to exclude them?
I suspect it is the format(Avg(po.cost),'c') that is causing the problems, mainly because every other reference to po.cost uses a cast and this does not.
Quick fix if you want to exclude the 0s from the Avg Cost (and po.cost _already IS numeric)
(SELECT FORMAT(AVG(NULLIF(po.cost,0)),'c')
FROM dbo.tblPO_DETAIL_DATA PO
WHERE PO.[Part Number] = part.Part_Num
AND cast(po.[PO Date] as date) > DATEADD(MONTH,#LookBack60,GETDATE()))
)
as 'Avg Cost'
If po.Cost is NOT numeric, then use this instead:
(SELECT FORMAT(AVG(
CASE
WHEN ISNUMERIC(po.cost)
THEN CAST(po.cost AS BIGINT)
ELSE 0
END
),'c')
FROM dbo.tblPO_DETAIL_DATA PO
WHERE PO.[Part Number] = part.Part_Num
AND cast(po.[PO Date] as date) > DATEADD(MONTH,#LookBack60,GETDATE()))
)
as 'Avg Cost'
But lets explore this in detail...
Before adding aggregates into the mix, take a step back and make sure your source data is correct, run the following queries to validate your conversions:
select cast(stock.AvailableQty as bigint)
, cast(stock.HardAllocQty as bigint)
, cast(stock.Value as bigint)
from dbo.StockAvailability Stock
select cast(po.cost as bigint)
, cast(po.[PO Date] as date)
from dbo.tblPO_DETAIL_DATA po
If the data from those two queries is OK, then shouldn't be an issue, if it is an issue, you can try to isolate the rows that have invalid values, the first option I would try is to use the IsNumeric() check, you can do this in two ways, 1 - just exclude non-numerics, or 2, query for the non-numerics, so you can inspect the results and then come up with a better solution. So run the following to identify the issues:
select stock.AvailableQty, stock.HardAllocQty, stock.Value
from dbo.StockAvailability Stock
WHERE ISNUMERIC(stock.AvailableQty) = 0 OR ISNUMERIC(stock.HardAllocQty) = 0 OR ISNUMERIC(stock.Value) = 0
-- Finding the number of invalid values might help drive the solution?
select po.cost, COUNT(*)
from dbo.tblPO_DETAIL_DATA po
WHERE ISNUMERIC(po.cost)
GROUP BY po.cost
If the invalid value should be counted as a zero, then we can use CASE logic to resolve the value, or if there is a specific value that should be treated as NULL you can use NULLIF(), or if the specific value should be 0 you can use ISNULL(NULLIF())
Why zero vs null matters - zeros wont affect SUM() aggregates, but they will be counted in the record count when computing AVG()
If your recordset commonly has 'NaN' as a stored value instead of null, in columnA and you want the whole record to be excluded from recordset then an easy option is to add that in the WHERE clause:
WHERE columnA <> 'N/A'
However if you want the rest of the record, but just need this value interpreted as a NULL so that it will be excluded from the aggregate calculations you can use NULLIF():
My personal preference is to alias the column with the same name if an alias is required, the user or next operation doesn't need access to the original value after we have sanitized it like this
SELECT NULLIF(columnA, 'N/A') AS columnA
Again, if you know a specific value that you want to treat as a zero in aggregates, we can use ISNULL(NULLIF):
SELECT ISNULL(NULLIF(columnA, 'N/A'),0) AS columnA
If you have many different values, you can use nested NULLIF() and then COALESCE(), but it quickly gets unmanageable:
SELECT COALESCE(NULLIF(columnA, 'N/A'), NULLIF(columnA, '-'), 0) AS columnA
You can get more control over this type of logic or short-circuit it with a call to ISNUMERIC() in a CASE statement:
-- output as null if non-numeric
SELECT CASE WHEN ISNUMERIC(columnA) = 1 THEN columnA END AS columnA
...
-- ouput as 0 is non-numeric
SELECT CASE WHEN ISNUMERIC(columnA) = 1 THEN columnA ELSE 0 END AS columnA
Once you have identified the offending columns and decided if they should be excluded or treated as zeros, then you could re-write your inline sub-queries into a more set-based operation, this process and the reasoning is explained in this article How to Use GROUP BY with Distinct Aggregates and Derived tables and applies if there are many parts in the source list.
Having a DISTINCT in a result-set that is returning aggregates is a red flag that points to inefficiencies or flaws in the aggregation logic, a distinct is often put in as a quick fix on a query, when aggregates are concerned better scoping or a GROUP BY is often more efficient.
In MS SQL Server, CROSS APPLY is a useful mechanism for evaluating conversions and other functions, it gives us a clean way to separate the steps involved in formatting your data or evaluating repetitive logic once and reusing the result, instead of re-evaluating the same expression many times in a query.
That is the main point to structuring your query using CROSS APPLY by defining the expression once, when you need to alter the logic you wont accidentally leave some expressions in their original (and not potentially invalid) state.
CTEs are used here to separate the concerns, you could write them as inline sub query based joins, but IMO it gets harder to read and interpret the query
;
WITH StockByPart as (
(
SELECT Stock.PartNo
, SUM(numerics.AvailableQty)+SUM(numerics.HardAllocQty))/SUM(numerics.Value) AS AvgPrice
FROM dbo.StockAvailability Stock
CROSS APPLY (SELECT
CASE WHEN ISNUMERIC(stock.AvailableQty) = 1 THEN CAST(stock.AvailableQty as bigint) END AS AvailableQty
, CASE WHEN ISNUMERIC(stock.HardAllocQty) = 1 THEN CAST(stock.HardAllocQty as bigint) END AS HardAllocQty
, CASE WHEN ISNUMERIC(stock.Value) = 1 THEN CAST(stock.Value as bigint) END AS [Value]
) Numerics
GROUP BY Stock.PartNo
)
, POByPart as (
SELECT PO.[Part Number]
, SUM(numerics.AvailableQty)+SUM(numerics.HardAllocQty))/SUM(numerics.Value) AS AvgCost
FROM dbo.tblPO_DETAIL_DATA PO
CROSS APPLY (SELECT
CASE WHEN ISNUMERIC(po.cost) = 1 THEN CAST(po.cost as bigint) ELSE 0 END AS cost
) Numerics
GROUP BY PO.[Part Number]
)
SELECT DISTINCT
part.Part_Num, part.[Part Description], part.[Part Type]
, Stock.AvgPrice
, PO.AvgCost as [Avg Cost]
FROM dbo.UNIQUE_PARTS_LIST part
LEFT OUTER JOIN StockByPart Stock ON stock.PartNo = part.Part_Num
LEFT OUTER JOIN POByPart PO ON PO.[Part Number] = part.Part_Num

Is there a performance benefit to repeating a WHERE filter in subqueries?

I have the following query:
WITH prices AS (
SELECT itemId
, monthId
, MIN(lastPrice / firstPrice) AS gain
FROM (
SELECT *
, FIRST_VALUE(price) OVER (PARTITION BY monthId
ORDER BY date) AS firstPrice
, LAST_VALUE(price) OVER (PARTITION BY monthId
ORDER BY date) AS lastPrice
FROM (
SELECT *
FROM foo
WHERE monthId = 82 -- a repeat of the final WHERE
) x
) x
WHERE firstPrice != 0
AND lastPrice != 0
GROUP BY itemId
, monthId
)
SELECT f.monthId
, f.itemId
, p.gain
FROM foo f
LEFT JOIN prices p
ON f.itemId = p.itemId
AND f.monthId = p.monthId
WHERE gain IS NOT NULL
AND monthId = 82 -- repeated above
As noted, the full query ends with a WHERE monthId = 82 clause, which is also present in the prices subquery.
If I remove the WHERE from the subquery, the result is the same. This makes sense since the result would be naturally constrained by the final WHERE.
However, the case without the subquery WHERE runs dramatically slower (40 vs. 3 minutes). However, I'm not proficient enough at SQL to know if this is expected or if it's merely an artifact of statistics (I've run the version with the subquery WHERE many, many times already and only now tried to remove it).
It'd make sense for this to improve performance since it allows the server to only perform the operations within prices (there are many more in my real case) on the subset of rows with monthId = 82. However, I don't know if the compiler already optimizes the subquery to filter it with that subset regardless and therefore the benefit I'm seeing is merely an illusion.
For the record, my actual FIRST/LAST_VALUE calls have ROWS BETWEEN PRECEEDING UNBOUNDED AND FOLLOWING UNBOUNDED, just omitted them to simplify the query.
The SQL Server optimizer is smart enough to push where filters into subqueries under many circumstances. However, optimizers make mistakes and they miss situations -- as would appear to be the case here. In general, you can check the query plan to see if it makes a difference.
I would be inclined to repeat the logic, just to be sure that the query is as efficient as possible.

Bizarre result in SQL query - PostgreSQL

I discovered this strange behavior with this query:
-- TP4N has stock_class = 'Bond'
select lot.symbol
, round(sum(lot.qty_left), 4) as "Qty"
from ( select symbol
, qty_left
-- , amount
from trade_lot_tbl t01
where t01.symbol not in (select symbol from stock_tbl where stock_class = 'Cash')
and t01.qty_left > 0
and t01.trade_date <= current_date -- only current trades
union
select 'CASH' as symbol
, sum(qty_left) as qty_left
-- , sum(amount) as amount
from trade_lot_tbl t11
where t11.symbol in (select symbol from stock_tbl where stock_class = 'Cash')
and t11.qty_left > 0
and t11.trade_date <= current_date -- only current trades
group by t11.symbol
) lot
group by lot.symbol
order by lot.symbol
;
Run as is, the Qty for TP4N is 1804.42
Run with the two 'amount' lines un-commented, which as far as I can tell should NOT affect the result, yet Qty for TP4N = 1815.36. Only ONE of the symbols (TP4N) has a changed value, all others remain the same.
Run with the entire 'union' statement commented out results in Qty for TP4N = 1827.17
The correct answer, as far as I can tell, is 1827.17.
So, to summarize, I get three different values by modifying parts of the query that, as far as I can tell, should NOT affect the answer.
I'm sure I'm going to kick myself when the puzzle is solved, this smells like a silly mistake.
Likely, what you are seeing is caused by the use of union. This set operator deduplicates the resultsets that are returned by both queries. So adding or removing columns in the unioned sets may affect the final resultset (by default, adding more columns reduces the risk of duplication).
As a rule of thumb: unless you do want deduplication, you should use union all (which is also more efficient, since the database does not need to search for duplicates).

NVL function in Oracle

I'm using Oracle and I want to replace null values with 0 but it does not work. This is my query:
with pivot_data as (
select (NVL(d.annulation,0)) as annulation ,
d.id_cc1_annulation as nature ,
t.mois as mois
from datamart_cnss d , ref_temps t
where t.id_temps = d.id_temps
)
select * from pivot_data
PIVOT ( sum(annulation)
for nature in (2 as debit ,1 as redit)
)
order by mois asc;
I guess (because of missing example), your idea is to show no nulls in the result after pivoting. If so, your query (I guess) doesn't always return both nature values 1 and 2.
The NVL operator in this case works fine, but you put it in a wrong place. This is the place which generates your NULL because of no rows found for the given criteria:
PIVOT ( sum(annulation)
If you enhance this sum result with the love of NVL - I am pretty sure it will work as you expect.

WHERE clause on calculated field not working

I have an Access SQL query pulling back results from a Latitude & Longitude input (similar to a store locator). It works perfectly fine until I attempt to put in a WHERE clause limiting the results to only resultants within XXX miles (3 in my case).
The following query works fine without the WHERE distCalc < 3 clause being added in:
PARAMETERS
[selNum] Long
, [selCBSA] Long
, [cosRadSelLAT] IEEEDouble
, [radSelLONG] IEEEDouble
, [sinRadSelLAT] IEEEDouble;
SELECT B.* FROM (
SELECT A.* FROM (
SELECT
CERT
, RSSDHCR
, NAMEFULL
, BRNUM
, NAMEBR
, ADDRESBR
, CITYBR
, STALPBR
, ZIPBR
, simsLAT
, simsLONG
, DEPDOM
, DEPSUMBR
, 3959 * ArcCOS(
cosRadSelLAT
* cosRadSimsLAT
* cos(radSimsLONG - radSelLONG)
+ sinRadSelLAT
* sinRadSimsLAT
) AS distCalc
FROM aBRc
WHERE CBSA = selCBSA
AND cosRadSimsLAT IS NOT NULL
AND UNINUMBR <> selNum
) AS A
ORDER BY distCalc
) AS B
WHERE B.distCalc < 3
ORDER BY B.DEPSUMBR DESC;
When I add the WHERE distCalc < 3 clause, I get the dreaded
This expression is typed incorrectly, or it is too complex to be evaluated.
error.
Given that the value is created in the A sub-query I thought that it would be available in the outer B query for comparative calcs. I could recalculate the distCalc in the WHERE, however, I'm trying to avoid that since I'm using a custom function (ArcCOS). I'm already doing one hit on each row and there is significant overhead involved doing additional if I can avoid it.
The way you have it typed you are limiting it by B.distCalc, which is requires calcuation of A.distCalc, on which you are asking for a sort. Even if this worked worked, would require n^2 calculations to compute.
Try putting the filter on distCalc in the inner query (using the formula for distCalc, not distCalc itself).
This is not an answer. It's a formatted comment. What happens when you do this:
select *
from (
select somefield, count(*) records
from sometable
group by somefield) temp
If that runs successfully, try it with
where records > 0
at the end. If that fails, you probably need another approach. If it succeeds, start building your real query using baby steps like this. Test early and test often.
I was able to "solve" the problem by pushing the complicated formula into the function and then returning the value (which was then able to be used in the WHERE clause).
Instead of:
3959 * ArcCOS( cosRadSelLAT * cosRadSimsLAT * cos(radSimsLONG - radSelLONG) + sinRadSelLAT * sinRadSimsLAT) AS distCalc
I went with:
ArcCOS2(cosRadSelLAT,cosRadSimsLAT,radSimsLONG, radSelLONG,sinRadSelLAT,sinRadSimsLAT) AS distCalc
The ArcCOS2 Function contained the full formula. The upside is it works, the downside is that appears to be a slight tad slower. I appreciate everyone's help on this. Thank you.