SQL assign variable with subquery - sql

I have a question for following 2 SQL:
declare #i1 bit, #b1 bit
declare #i2 bit, #b2 bit
declare #t table (Seq int)
insert into #t values (1)
-- verify data
select case when (select count(1) from #t n2 where 1 = 2) > 0 then 1 else 0 end
-- result 0
select #i1 = 1, #b1 = case when #i1 = 1 or ((select count(1) from #t n2 where 1 = 2) > 0) then 1 else 0 end from #t n where n.Seq = 1
select #i1, #b1
-- result 1, 0
select #i2 = 1, #b2 = case when #i2 = 1 or (0 > 0) then 1 else 0 end from #t n where n.Seq = 1
select #i2, #b2
-- result 1, 1
SQL Fiddle Here
Before the execute, I thought the case part should be null = 1 or (0 > 0), and it will return 0.
But now, I wondering why the 2nd SQL will return 1

Just to extend #Giorgi's answer:
See this execution plan:
Since #i2 is evaluated first (#i2=1), case when #i2 = 1 or anything returns 1.
See also this msdn entry: https://msdn.microsoft.com/en-us/library/ms187953.aspx and Caution section
If there are multiple assignment clauses in a single SELECT statement,
SQL Server does not guarantee the order of evaluation of the
expressions. Note that effects are only visible if there are
references among the assignments.
It's all related to internal optimization.

I will post this as an answer as it is quite large text from Training Kit (70-461):
WHERE propertytype = 'INT' AND CAST(propertyval AS INT) > 10
Some assume that unless precedence rules dictate otherwise, predicates
will be evaluated from left to right, and that short circuiting will
take place when possible. In other words, if the first predicate
propertytype = 'INT' evaluates to false, SQL Server won’t evaluate the
second predicate CAST(propertyval AS INT) > 10 because the result is
already known. Based on this assumption, the expectation is that the
query should never fail trying to convert something that isn’t
convertible.
The reality, though, is different. SQL Server does
internally support a short-circuit concept; however, due to the
all-at-once concept in the language, it is not necessarily going to
evaluate the expressions in left-to-right order. It could decide,
based on cost-related reasons, to start with the second expression,
and then if the second expression evaluates to true, to evaluate the
first expression as well. This means that if there are rows in the
table where propertytype is different than 'INT', and in those rows
propertyval isn’t convertible to INT, the query can fail due to a
conversion error.

Just to extend both answers.
From Dirty Secrets of the CASE Expression:
CASE will not always short circuit
The official documentation implies that the entire expression will short-circuit, meaning it will evaluate the expression from left-to-right, and stop evaluating when it hits a match:
The CASE statement evaluates its conditions sequentially and stops with the
first condition whose condition is satisfied.
And MS Connect:
CASE / COALESCE won't always evaluate in textual order
Aggregates Don't Follow the Semantics Of CASE
CASE Transact-SQL
The CASE statement evaluates its conditions sequentially and stops with the first condition whose condition is satisfied. In some situations, an expression is evaluated before a CASE statement receives the results of the expression as its input. Errors in evaluating these expressions are possible.

Related

SQL Scalar UDF returns expected results and then returns a consistent weird value

Im using a SQL Scalar UDF to calculate the Weighted Moving Average for a particular stock.
I created the following UDF [dbo].[fn_WeightedMovingAverageClosePriceCalculate]. (See below )
However, I get mixed results when calling the function. This is while executing both queries at the same time, but I'm getting different results. I took the code out of the function in a query, plugged in my test values and it works perfectly (WMA13 = 1540.8346). Would love to hear why I'm getting the value of 15.7313 as WMA13 in the second resultset, when both the queries are exactly the same.
ALTER FUNCTION [dbo].[fn_WeightedMovingAverageClosePriceCalculate]
(
#SecCode varchar(100),
#StartDateId int,
#MovingAverageCount int
)
RETURNS decimal(18,4)
AS
BEGIN
--Generate the Weighting Factor
Declare #WeightingFactor as decimal(18,8)
Set #WeightingFactor = (#MovingAverageCount*(#MovingAverageCount+1))/2 -- using the formula n(n+1)/2
-- Declare the return variable here
Declare #MovingAverage as decimal (18,4)
Set #MovingAverage = 0
if #MovingAverageCount <> 0
begin
Select #MovingAverage = SUM(ClosePrice*RowNum/#WeightingFactor)from
(
Select ROW_NUMBER() OVER(order by BusinessDateId asc) AS RowNum , ClosePrice, BusinessDateId
from
(
Select TOP (#MovingAverageCount) ClosePrice, BusinessDateId
from dbo.BhavCopy
where BusinessDateId <=#StartDateId
and SecCode = #SecCode
and Exchange = 'NSE'
order by BusinessDateId desc
)d
)a
end
Set #WeightingFactor = 0
Set #MovingAverageCount = 0
-- Return the result of the function
Return #MovingAverage
See Data that i'm working with :
So there were 2 different execution plans, which surprised me as well.
Right one - https://drive.google.com/file/d/1vPHbAS3X8Jmua8E5ReUgumsiUuovtL4p/view?usp=sharing
Wrong one - https://drive.google.com/file/d/180-Z3bMtzvV31En6z-zA-sVM_yPNyaQv/view?usp=sharing
This is a bug with scalar UDF inlining and how it treats scalar aggregates in some cases (report).
The issue is that when inlined the execution plan contains a stream aggregate with
ANY(SUM(ClosePrice*CONVERT_IMPLICIT(decimal(19,0),[Expr1007],0)/[Expr1002]))
The nesting of SUM in ANY here is incorrect.
ANY is an internal aggregate that returns the first NOT NULL value that it finds (or NULL if none were found.
So in your case the stream aggregate receives its first row (very likely to be the one with the lowest BusinessDateId out of the 13 eligible) - calculates the SUM(ClosePrice*RowNum/#WeightingFactor) for that row, passes the partial aggregate result to ANY - which considers its work done and uses that as the final result. Any contribution to the SUM from the remaining 12 rows is lost.
You can add with inline = off to the function definition to disable it until the issue is fixed.
A simpler demo is below (tested on SQL Server 2019 RTM and RTM-CU2)
Setup
DROP TABLE IF EXISTS dbo.Numbers
GO
CREATE TABLE dbo.Numbers(Number INT UNIQUE CLUSTERED);
INSERT INTO dbo.Numbers VALUES (NULL), (23), (27), (50);
Demo 1
CREATE OR ALTER FUNCTION [dbo].[fnDemo1]()
RETURNS INT
AS
BEGIN
DECLARE #Result as int, #Zero as int = 0
SELECT #Result = SUM(Number + #Zero) from dbo.Numbers
RETURN #Result
END
GO
DECLARE #Zero INT = 0
SELECT SUM(Number + #Zero) AS SanityCheck,
dbo.fnDemo1() AS FunctionResult
FROM dbo.Numbers
OPTION (RECOMPILE) --I found the inlining happened more reliably with this
Demo 1 Results
+-------------+----------------+
| SanityCheck | FunctionResult |
+-------------+----------------+
| 100 | 23 |
+-------------+----------------+
All 4 rows were read from the clustered index in key order. After the first one was read the SUM was NULL. After the second one was read the SUM was 23. ANY can then stop and considers its work done. The remaining two rows were still read but don't contribute to the returned ANY(SUM()).
Demo 2
Without the intermediate #Result variable a spurious error is thrown
CREATE OR ALTER FUNCTION [dbo].[fnDemo2]()
RETURNS INT
AS
BEGIN
DECLARE #Zero as int = 0;
RETURN (SELECT SUM(Number + #Zero) from dbo.Numbers);
END
GO
Select dbo.fnDemo2()
OPTION (RECOMPILE)
Msg 512, Level 16, State 1, Line xx
Subquery returned more than 1 value. This is not permitted when the subquery > > follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
Only one row actually comes out of the stream aggregate but the stream aggregate also calculates a COUNT(*) along with the ANY(SUM()). This is not wrapped in ANY so totals 4 in this case. It is used in the assert operator to give the bogus error that too many rows will be returned.
Demo 3
CREATE OR ALTER FUNCTION [dbo].[fnDemo3]()
RETURNS INT
AS
BEGIN
DECLARE #Zero as int = 0;
RETURN (SELECT SUM(Number + #Zero) from dbo.Numbers GROUP BY ());
END
GO
Select dbo.fnDemo3()
OPTION (RECOMPILE)
This generates a stack dump and a different error
Msg 8624, Level 16, State 17, Line xx
Internal Query Processor Error: The query processor could not produce a query plan. For more information, contact Customer Support Services.

View causing Invalid length parameter passed to the LEFT or SUBSTRING function error when providing where clause

I'm having an odd thing occurring in one of my views. Initially, I have a view that does the following:
SELECT id, CAST((CASE WHEN
LEN(line) = 1
THEN ISNULL(LTRIM(RTRIM(line)), '-1')
ELSE
ISNULL(LTRIM(RTRIM(SUBSTRING(line, 1, (CHARINDEX(CHAR(9),
line) - 1)))), '-1')
END) AS varchar(MAX)) AS ObjMarker
FROM dbo.tblM2016_RAW_Current_Import_File
WHERE ((CASE WHEN
LEN(line) = 1
THEN ISNULL(LTRIM(RTRIM(line)), '')
ELSE
LTRIM(RTRIM(SUBSTRING([line], 1, CHARINDEX(CHAR(9), [line]))))
END) <> CHAR(9))
AND
((CASE WHEN
LEN(line) = 1
THEN ISNULL(LTRIM(RTRIM(line)), '')
ELSE
LTRIM(RTRIM(SUBSTRING([line], 1, CHARINDEX(CHAR(9), [line]))))
END) NOT LIKE '%*%')
AND
((CASE WHEN
LEN(line) = 1
THEN ISNULL(LTRIM(RTRIM(line)), '')
ELSE LTRIM(RTRIM(SUBSTRING([line], 1, CHARINDEX(CHAR(9),
[line])))) END) <> '')
And it works fine. However, I have another view which uses the results of the above view, shown below:
SELECT curr.id
,curr.ObjMarker
,Nxt.id AS NxtID
,Nxt.ObjMarker AS NxtObjMarker
,Nxt.id - curr.id - 2 AS OFFSET
,curr.id + 1 AS StrtRec
,Nxt.id - 1 AS EndRec
FROM dbo.vwM2016_RAW_Import_File_Object_Line_Markers AS curr
LEFT OUTER JOIN
dbo.vwM2016_RAW_Import_File_Object_Line_Markers AS Nxt ON
Nxt.id =
(SELECT MIN(id) AS Expr1
FROM dbo.vwM2016_RAW_Import_File_Object_Line_Markers AS source
WHERE (id > curr.id))
WHERE curr.ObjMarker <> '0'
And apparently, if I leave the WHERE curr.ObjMarker <> '0' in the second query, it gives the error
Msg 537, Level 16, State 3, Line 1
Invalid length parameter passed to the LEFT or SUBSTRING function.
But if I remove WHERE curr.ObjMarker <> '0' it returns the result set without error.
Could this be a problem with the query optimizer not doing operations in order? I've checked the rows where 0 occurs for any special characters in an editor and I couldn't find any hidden whitespace characters or anything.
There's no guarantee of the order the criteria in your select statements will be evaluated, and SQL Server does not short circuit. Predicate pushdown can also happen for any criteria SQL Server estimates to be useful -- so you can't assume that a certain criteria will always be evaluated before something else.
The problem is with this expression:
ISNULL(LTRIM(RTRIM(SUBSTRING(line, 1, (CHARINDEX(CHAR(9),
line) - 1)))), '-1')
If [line] does not contain a Char(9), the CharIndex field returns 0. This turns the substring expression into SUBSTRING(line,1,-1), which is invalid because the length cannot be a negative number.
When not used in the WHERE clause, the expression is not evaluated until after the other filters are applied and the result set is reduced. At least one of the filters in the view eliminate the rows without tabs, so the expression never operates on those rows.
However, when the expression is used in the WHERE clause, it is combined with the view filters and evaluated in the order that SQL Server determines is best for performance. Unfortunately, some of the rows without tabs are still part of the result set when this is evaluated, causing the result to fail.
A possible fix, add an explicit test in your case statement (in the first view where you define objMarker) to address rows that do not contain tabs.
WHEN CHARINDEX(CHAR(9), line) = 0 THEN '-1'

What is order of operational precedence in SQL Case statement?

I am having a time discovering how to best write the following:
SET #SAMPLE = (SELECT CASE
WHEN #A < 0.01 AND #B < 0.01 THEN -1
WHEN #A < 0.01 THEN #B
ELSE #C
END )
I do not get what I expect here. I am finding that #SAMPLE contains 0.00 after running this. Thanks for any guidance.
The CASE statement evaluates its conditions sequentially and stops
with the first condition whose condition is satisfied
From your example, all one can deduct is that #B or #c is zero.
Operational precedence usually refers to which operator is first evaluated ("*" or "-", for example). Your question should probably be titled "Case evaluation order".
http://technet.microsoft.com/en-us/library/ms181765.aspx
Without values I cannot be sure that this is what you mean but it seems that you are asking in what order does a case statement evaluate its WHEN clauses. If that indeed is the question then the answer is fairly simple. A CASE WHEN will return the value of the THEN statement for the first WHEN that is true and will stop evaluating immediately once it returns. This means that in your example #Sample will evaluate first WHEN to last WHEN (and within the WHEN it evaluates left to right) such that your logical checks are:
Is #A < 0.01, TRUE continues on this same line while FALSE goes to the next line.
If 1 was TRUE then #B < 0.01 is evaluated, TRUE returns -1 and the case statement ends while FALSE goes to the next line.
If you are here either 1 or 2 is FALSE, either way #A < 0.01 is still evaluated again, if TRUE returns #B and the case statement ends while FALSE returns #C because all WHEN statements were FALSE.
Hope this helps.

How do I map true/false/unknown to -1/0/null without repetition?

I am currently working on a tool to help my users port their SQL code to SQL-Server 2005. For this purpose, I parse the SQL into a syntax tree, analyze it for constructs which need attentions, modify it and transform it back into T-SQL.
On thing that I want to support, is the "bools are values too" semantics of other RDBMS. For example, MS-Access allows me to write select A.x and A.y as r from A, which is impossible in T-SQL because:
Columns can't have boolean type (column values can't be and'ed)
Logical predicates can not be used where expressions are expected.
Therefore, my transformation routine converts the above statement into this:
select case
when (A.x<>0) and (A.y<>0)
then -1
when not((A.x<>0) and (A.y<>0))
then 0
else
null
end as r
from A;
Which works, but is annoying, because I have to duplicate the logical expression (which can be very complex or contain subqueries etc.) in order to distinguish between true, false and unknown - the latter shall map to null. So I wonder if the T-SQL pro's here know a better way to achieve this?
UPDATE:
I would like to point out, that solutions which try to keep the operands in the integer domain have to take into account, that some operands may be logical expressions in the first place. This means that a efficient solution to convert a bool to a value is stil required. For example:
select A.x and exists (select * from B where B.y=A.y) from A;
I don't think there's a good answer, really it's a limitation of TSQL.
You could create a UDF for each boolean expression you need
CREATE FUNCTION AndIntInt
(
#x as int,#y as int
)
RETURNS int
AS
BEGIN
if (#x<>0) and (#y<>0)
return -1
if not((#x<>0) and (#y<>0))
return 0
return null
END
used via
select AndIntInt(A.x,A.y) as r from A
Boolean handling
Access seems to use the logic that given 2 booleans
Both have to be true to return true
Either being false returns false (regardless of nulls)
Otherwise return null
I'm not sure if this is how other DBMS (Oracle, DB2, PostgreSQL) deal with bool+null, but this answer is based on the Access determination (MySQL and SQLite agree). The table of outcomes is presented below.
X Y A.X AND B.Y
0 0 0
0 -1 0
0 (null) 0
-1 0 0
-1 -1 -1
-1 (null) (null)
(null) 0 0
(null) -1 (null)
(null) (null) (null)
SQL Server helper 1: function for boolean from any "single value"
In SQL Server in general, this function will fill the gap for the missing any value as boolean functionality. It returns a ternary result, either 1/0/null - 1 and 0 being the SQL Server equivalent of true/false (without actually being boolean).
drop function dbo.BoolFromAny
GO
create function dbo.BoolFromAny(#v varchar(max)) returns bit as
begin
return (case
when #v is null then null
when isnumeric(#v) = 1 and #v like '[0-9]%' and (#v * 1.0 = 0) then 0
else 1 end)
end
GO
Note: taking Access as a starting point, only the numeric value 0 evaluates to FALSE
This uses some SQL Server tricks
everything is convertible to varchar. Therefore only one function taking varchar input is required.
isnumeric is not comprehensive, '.' returns 1 for isnumeric but will fail at #v * 1.0, so an explicit test for LIKE [0-9]%`` is required to "fix" isnumeric.
#v * 1.0 is required to overcome some arithmetic issues. If you pass the string "1" into the function without *1.0, it will bomb
Now we can test the function.
select dbo.BoolFromAny('abc')
select dbo.BoolFromAny(1)
select dbo.BoolFromAny(0) -- the only false
select dbo.BoolFromAny(0.1)
select dbo.BoolFromAny(-1)
select dbo.BoolFromAny('')
select dbo.BoolFromAny('.')
select dbo.BoolFromAny(null) -- the only null
You can now safely use it in a query against ANY SINGLE COLUMN, such as
SELECT dbo.BoolFromAny(X) = 1
SQL Server helper 2: function to return result of BOOL AND BOOL
Now the next part is creating the same truth table in SQL Server. This query shows you how two bit columns interact and the simple CASE statement to produce the same table as Access and your more complicated one.
select a.a, b.a,
case
when a.a = 0 or b.a = 0 then 0
when a.a = b.a then 1
end
from
(select 1 A union all select 0 union all select null) a,
(select 1 A union all select 0 union all select null) b
order by a.a, b.a
This is easily expressed as a function
create function dbo.BoolFromBits(#a bit, #b bit) returns bit as
begin
return case
when #a = 0 or #b = 0 then 0
when #a = #b then 1
end
end
SQL Server conversion of other expressions (not of a single value)
Due to lack of support for bit-from-boolean conversion, expressions that are already [true/false/null] in SQL Server require repetition in a CASE statement.
An example is a "true boolean" in SQL Server, which cannot be the result for a column.
select A > B -- A=B resolves to one of true/false/null
from C
Needs to be expressed as
select case when A is null or B is null then null when A > B then 1 else 0 end
from C
But if A is not a scalar value but a subquery like (select sum(x)...), then as you can see A will appear twice and be evaluated twice in the CASE statement (repeated).
FINAL TEST
Now we put all the conversion rules to use in this long expression
SELECT X AND Y=Z AND C FROM ..
( assume X is numeric 5, and C is varchar "H" )
( note C contributes either TRUE or NULL in Access )
This translates to SQL Server (chaining the two functions and using CASE)
SELECT dbo.BoolFromBits(
dbo.BoolFromBits(dbo.BoolFromAny(X), CASE WHEN Y=Z then 1 else 0 end),
dbo.BoolFromAny(C))
FROM ...
Access Bool or bool
For completeness, here is the truth table for Access bool OR bool. Essentially, it is the opposite of AND, so
Both have to be false to return false
Either being true returns true (regardless of nulls)
Otherwise return null
The SQL SERVER case statement would therefore be
case
when a.a = 1 or b.a = 1 then 1
when a.a = b.a then 0
end
(the omission of an ELSE clause is intentional as the result is NULL when omitted)
EDIT: Based on additional information added to the Question and comments made on one of the suggested Answers, I am reformulating this answer:
If you are porting to SQL Server then I would expect that you are also transforming the data to match SQL Server types. If you have a boolean field then True, False, and Unknown map to 1, 0, and NULL as a NULLable BIT field.
With this in mind, you only need to worry about transforming pure boolean values. Expressions such as:
exists (select * from B where B.y=A.y)
and:
A.x in (1,2,3)
are already in a workable form. Meaning, statements like:
IF (EXISTS(SELECT 1 FROM Table))
and:
IF (value IN (list))
are already correct. So you just need to worry about the fact that "1" for "True" is not by itself enough. Hence, you can convert "1" values to boolean expressions by testing if they are in fact equal to "1". For example:
IF (value = 1)
is the equivalent of what you previously had as:
IF (value)
Putting all of this together, you should be able to simply translate all instances of pure boolean values of the old code into boolean expressions in the form of "value = 1" since a 1 will produce a True, 0 will produce False, and NULL will give you False.
HOWEVER, the real complexity is that SELECTing the value vs. testing with it via a WHERE condition is different. Boolean expressions evaluate correctly in WHERE conditions but have no direct representation to SELECT (especially since NULL / Unknown isn't really boolean). So, you can use the "Value = 1" translation in WHERE conditions but you will still need a CASE statement if you want to SELECT it as a result.
As mentioned briefly a moment ago, since NULL / Unknown isn't truly boolean, it is meaningless trying to convert "NULL AND NULL" to NULL for the purposes of a WHERE condition. In effect, NULL is truly FALSE since it cannot be determined to be TRUE. Again, this might be different for your purposes in a SELECT statement which is again why the CASE statement is your only choice there.

sql query - true => true, false => true or false

Simple query, possibly impossible but I know there are some clever people out there :)
Given a boolean parameter, I wish to define my where clause to either limit a certain column's output - or do nothing.
So, given parameter #bit = 1 this would be the result:
where column = 1
given parameter #bit = 0 this would be the result:
where column = 1 or 0
i.e. have no effect/show all results (column is a bit field)
I'm not wanting dynamic sql - I can settle for fixing this in code but I just wondered if there's some clever magic that would make the above neat and simple.
Is there? I'm using sql server.
cheers :D
The answer column = 1 or #bit = 0 works if column may only be 0 or 1. If column may be any value you want: column = 1 or #bit = 0 and column = 0.
SELECT *
FROM mytable
WHERE column = 1 OR #bit = 0
If you have an index on column1, this one will be more efficient:
SELECT *
FROM mytable
WHERE column = 1 AND #bit = 1
UNION ALL
SELECT *
FROM mytable
WHERE #bit = 0
See this article in my blog for performance comparison of a single WHERE condition vs. UNION ALL:
IN with a comma separated list: SQL Server
where column BETWEEN #bit AND 1
select *
from MyTable
where (#bit = 0 OR MyColumn = 1)
select ...
from [table]
where #bit = 0 or (column = #bit)
I had come up with a different answer and felt dumb when seeing the consensus answer.
So, just for yucks, compared the two using my own database. I don't really know if they are really comparable, but my execution plans give a slight advantage to my goofy answer:
select *
from MyTable
where column <> case #bit when 1 then 0 else -1 end
I realize indices, table size, etc. can affect this.
Also, realized you probably can't compare a bit to a -1...
Just thought I'd share.
try this
select ...
from table
where column = case when #bit = 0 then 0 else column end
this works no matter what the datatype of column is (could even be a string, for example). If it were, of course, it would be a different default value (not 0)
WHERE column >= #bit
However, this only works for > 0 values in a numeric column. #bit will be implicitly cast to int, smallint etc because of data type precedence.