Order of execution in SQL Server variable assignment using SELECT - sql

Given the following example:
declare #i int
select #i = 1, #i = 2
select #i
Will #i always be 2?
This is about the most trivial example I can think of, but I am considering using this for swapping values in variables. I also believe this method of assignment (select) is not ANSI compliant (however useful), but don't really care to have portable code in this case.
UPDATE
Thanks to #MichaelFredrickson, we have #MartinSmith's answer and reference to MSDN on this. I am now struggling with what the second sentence in this documentation means, exactly (emphasis added):
If there are multiple assignment clauses in a single SELECT statement, SQL Server does not guarantee the order of evaluation of the expressions. Note that effects are only visible if there are references among the assignments.
The first sentence is plenty enough to keep me away from relying upon the behavior, however.

For variable assignment, Martin Smith answers this question here referencing MSDN:
If there are multiple assignment clauses in a single SELECT statement,
SQL Server does not guarantee the order of evaluation of the
expressions. Note that effects are only visible if there are
references among the assignments.
But...
If we're dealing with tables, instead of with variables, it is a different story.
In this case, Sql Server uses an All-At-Once operation, as discussed by Itzik Ben-Gan in T-Sql Fundamentals.
This concept states that all expressions in the same logical phase are evaluated as if the same point in time... regardless of their left-to-right position.
So when dealing with the corresponding UPDATE statement:
DECLARE #Test TABLE (
i INT,
j INT
)
INSERT INTO #Test VALUES (0, 0)
UPDATE #Test
SET
i = i + 10,
j = i
SELECT
i,
j
FROM
#Test
We get the following results:
i j
----------- -----------
10 0
And by using the All-At-Once evaluation... in Sql Server you can swap column values in a table without an intermediate variable / column.
Most RBDMSs behave this way as far as I know, but MySql is an exception.
EDIT:
Note that effects are only visible if there are references among the
assignments.
I understand this to mean that if you have a SELECT statement such as the following:
SELECT
#A = ColumnA,
#B = ColumnB,
#C = ColumnC
FROM MyTable
Then it doesn't matter what order the assignments are performed in... you'll get the same results no matter what. But if you have something like...
SELECT
#A = ColumnA,
#B = #A,
#C = ColumnC
FROM MyTable
There is now a reference among the assignments (#B = #A), and the order that #A and #B are assigned now matters.

Related

How to get a list of IDs from a parameter which sometimes includes the IDs already, but sometimes include another sql query

I have developed a SQL query in SSMS-2017 like this:
DECLARE #property NVARCHAR(MAX) = #p;
SET #property = REPLACE(#property, '''', '');
DECLARE #propList TABLE (hproperty NUMERIC(18, 0));
IF CHARINDEX('SELECT', #property) > 0 OR CHARINDEX('select', #property) > 0
BEGIN
INSERT INTO #propList
EXECUTE sp_executesql #property;
END;
ELSE
BEGIN
DECLARE #x TABLE (val NUMERIC(18, 0));
INSERT INTO #x
SELECT CONVERT(NUMERIC(18, 0), strval)
FROM dbo.StringSplit(#property, ',');
INSERT INTO #propList
SELECT val
FROM #x;
END;
SELECT ...columns...
FROM ...tables and joins...
WHERE ...filters...
AND HMY IN (SELECT hproperty FROM #propList)
The issue is, it is possible that the value of the parameter #p can be a list of IDs (Example: 1,2,3,4) or a direct select query (Example: Select ID from mytable where code='A123').
The code is working well as shown above. However it causes a problem in our system (as we use Yardi7-Voyager), and we need to leave only the select statement as a query. To manage it, I was planning to create a function and use it in the where clause like:
WHERE HMY IN (SELECT myFunction(#p))
However I could not manage it as I see I cannot execute a dynamic query in an SQL Function. Then I am stacked. Any idea at this point to handle this issue will be so appreciated.
Others have pointed out that the best fix for this would be a design change, and I agree with them. However, I'd also like to treat your question as academic and answer it in case any future readers ever have the same question in a use case where a design change wouldn't be possible/desirable.
I can think of two ways you might be able to do what you're attempting in a single select, as long as there are no other restrictions on what you can do that you haven't mentioned yet. To keep this brief, I'm just going to give you psuedo-code that can be adapted to your situation as well as those of future readers:
OPENQUERY (or OPENROWSET)
You can incorporate your code above into a stored procedure instead of a function, since stored procedures DO allow dynamic sql, unlike functions. Then the SELECT query in your app would be a SELECT from OPENQUERY(Execute Your Stored Prodedure).
UNION ALL possibilities.
I'm about 99% sure no one would ever want to use this, but I'm mentioning it to be as academically complete as I know how to be.
The second possibility would only work if there are a limited, known, number of possible queries that might be supported by your application. For instance, you can only get your Properties from either TableA, filtered by column1, or from TableB, filtered by Column2 and/or Column3.
Could be more than these possibilities, but it has to be a limited, known quantity, and the more possibilities, the more complex and lengthy the code will get.
But if that's the case, you can simply SELECT from a UNION ALL of every possible scenario, and make it so that only one of the SELECTs in the UNION ALL will return results.
For example:
SELECT ... FROM TableA WHERE Column1=fnGetValue(#p, 'Column1')
AND CHARINDEX('SELECT', #property) > 0
AND CHARINDEX('TableA', #property) > 0
AND CHARINDEX('Column1', #property) > 0
AND (Whatever other filters are needed to uniquely identify this case)
UNION ALL
SELECT
...
Note that fnGetValue() isn't a built-in function. You'd have to write it. It would parse the string in #p, find the location of 'Column1=', and return whatever value comes after it.
At the end of your UNION ALL, you'd need to add a last UNION ALL to a query that will handle the case where the user passed a comma-separated string instead of a query, but that's easy, because all the steps in your code where you populated table variables are unnecessary. You can simply do the final query like this:
WHERE NOT CHARINDEX('SELECT', #p) > 0
AND HMY IN (SELECT strval FROM dbo.StringSplit(#p, ','))
I'm pretty sure this possibility is way more work than its worth, but it is an example of how, in general, dynamic SQL can be replaced with regular SQL that simply covers every possible option you wanted the dynamic sql to be able to handle.

How to optimize a condition with a variable and multiple constants in T-SQL

I have a simple query in T-SQL:
SELECT
*
FROM
Table t
WHERE
t.Column IN ( 'Value1', 'Value2', 'Value3', ..., 'ValueN' )
;
Of course, the query is actually much more complex with a couple of JOINs and subqueries, but that does not matter at the moment.
The question is:
Which of the following is faster in terms of performance?
(1) The original condition of
t.Column IN ( 'Value1', 'Value2', 'Value3', ..., 'ValueN' )
(2) Using a table ValueEnumeration with just one column named Value (being possibly a primary key), the table being populated with the values 'Value1', 'Value2', ...
SELECT
*
FROM
Table t
WHERE
t.Column in ( SELECT ve.Value FROM ValueEnumeration ve )
;
(3) Using a user defined function (UDF), a scalar function to be precise, called IsAllowedValue.
The function:
CREATE FUNCTION dbo.IsAllowedValue ( #ValueToCheck VARCHAR(20) ) RETURNS INT
AS
BEGIN
IF #ValueToCheck = 'Value1'
OR #ValueToCheck = 'Value2'
OR #ValueToCheck = 'Value3'
...
OR #ValueToCheck = 'ValueN'
BEGIN
RETURN 1;
END;
RETURN 0;
END
;
The query:
SELECT
*
FROM
Table t
WHERE
dbo.IsAllowedValue(t.Column) = 1
;
Well, I guess the first one will be the quickest solution. However, I need to perform a similar check in many places in my stored procedures. Once the original enumeration of values changes in the future (which is very likely to happen - e.g. a new value has to be added to it), you will have to go to all the occurrences of the original condition in your code and add the new value in there. Therefore, I decided for a more re-usable solution. But I don't know which one to choose. I sometimes need to do a test the other way around (WHERE t.Column NOT IN (...)). It also came to my mind to use the table ValueEnumeration in an INNER JOIN (for positive checks) or a LEFT OUTER JOIN (for negative checks), but this will be painful to implement since I have approx. 50 locations of such conditions in the code and, in general, adding a JOIN considerably changes the look of the SQL query as well as the execution plan, the latter not always for good.
Do you have any idea?
Solution 2 ist fine as long as you store the allowed values before invoking the query. This will not impact your performance (you will get the values once, not for every record in a table) and will be more reusable than Solution 1.
declare #AllowedValues table(val varchar(...))
insert into #AllowedValues
SELECT ve.Value FROM ValueEnumeration ve
Then you can use it in your code:
......
WHERE
t.Column in ( SELECT val FROM #AllowedValues )
Well, I ultimately decided for the 3rd (generally not recommended) solution (creating a UDF). It appears to be O.K. in terms of performance. Or, at least, it is not slower than the 2nd solution (a table of the "allowed" values).
A function, although generally considered to be a bottleneck of many SQL queries, provides a couple of advantages:
(i) It is re-usable and easy to adjust (if some new values have to be added in the future, for example).
(ii) Unlike a table of enumerated values, whenever you see DDL of the function, the function definition, you can look at the values (the constants) currently in use (the advantage of the 1st solution which is not re-usable, though). If you used a table, you would have to execute a SELECT to check for values currently in place.
(iii) Even syntactically, it is easier to write
dbo.IsAllowedValue(t.Column) = 1
than
t.Column IN (SELECT Value FROM ValueEnumeration)
I'll provide more comments on the topic if any bad/good experience comes to me in the future.

T-Sql - Select query in another select query takes long time

I have a procedure with arguments but its calling takes a very long time. I decided to check what is wrong with my query and came to the conclusion that the problem is Column In (SELECT [...]).
Both queries return 1500 rows.
First query: time 45 second
Second query: time 0 second
1.
declare #FILTER_OPTION int
declare #ID_DISTRIBUTOR type_int_value
declare #ID_DATA_TYPE type_bigint_value
declare #ID_AGGREGATION_TYPE type_int_value
set #FILTER_OPTION = 8
insert into #ID_DISTRIBUTOR values (19)
insert into #ID_DATA_TYPE values (30025)
insert into #ID_AGGREGATION_TYPE values (10)
SELECT * FROM dbo.[DATA] WHERE
[ID_DISTRIBUTOR] IN (select [VALUE] from #ID_DISTRIBUTOR)
AND [ID_DATA_TYPE] IN (select [VALUE] from #ID_DATA_TYPE)
AND [ID_AGGREGATION_TYPE] IN (select [VALUE] from #ID_AGGREGATION_TYPE)
2.
select * FROM dbo.[DATA] WHERE
[ID_DISTRIBUTOR] IN (19)
AND [ID_DATA_TYPE] IN (30025)
AND [ID_AGGREGATION_TYPE] IN (10)
Why this is happening?
How should I create a stored procedure that takes an array of arguments to use it quickly?
Edit:
Maybe it's a problem with indexes? indexes are created on these three columns.
For such a large performance difference, I would guess that you have one or more indexes. In particular, if you have an index on (ID_DISTRIBUTOR, ID_DATA_TYPE, ID_AGGREGATION_TYPE), then the second query can make use of the index. SQL Server can recognize that the IN is really = and the query is a simple lookup.
In the first case, SQL Server doesn't "know" that the subqueries really have only one row in them. That requires a different set of optimizations. In particular, the above index cannot be used, because the IN generally optimizes differently from =.
As for what to do. First, look at the execution plans so you can see the different between the two versions. Then, test the second version with more than one value in the IN lists.
If you can live with just one value for each comparison, then use = rather than IN.

Having trouble understanding this query

Basically I can't understand what this query below does:
UPDATE #so_stockmove
SET #total_move_qty = total_move_qty = (
CASE WHEN #so_docdt_id <> so_docdt_id THEN 0
ELSE ISNULL(#total_move_qty, 0)
END
) + ISNULL(move_qty,0),
balance = so_qty - #total_move_qty,
#so_docdt_id = so_docdt_id
I only can guess that it updates each row for the columns total_move_qty,balance,so_docdt_id.
Can someone explain to me in detail what the query means:
UPDATE tbl SET #variable1 = columnA = expression
Update
After reading #MotoGP comments, I did some digging and found this article by Jeff Moden where he states the following:
Warning:
Well, sort of. Lots of folks (including some of the "big" names in the SQL world) warn against and, sometimes, outright condemn the
method contained in this article as "unreliable" & "unsupported". One
fellow MVP even called it an "undocumented hack" on the fairly recent
"24 hours of SQL". Even the very core of the method, the ability to
update a variable from row to row, has been cursed in a similar
fashion. Worse yet, except for the ability to do 3 part updates (SET
#variable = columnname = expression) and to update both variables and
columns at the same time, there is absolutely no Microsoft
documentation to support the use of this method in any way, shape, or
form. In fact, even Microsoft has stated that there is no guarantee
that this method will work correctly all the time.
Now, let me tell you that, except for one thing, that's ALL true. The
one thing that isn't true is its alleged unreliability. That's part of
the goal of the article... to prove its reliability (which really
can't be done unless you use it. It's like proving the reliability of
the SELECT statement). At the end of the article, make up your own
mind. If you decide that you don't want to use such a very old ,yet,
undocumented feature, then use a Cursor or While loop or maybe even a
CLR because all of the other methods are just too darned slow. Heh...
just stop telling me that it's an undocumented hack... I already know
that and, now, so do you. ;-)
First edition
Well, this query updates columns total_move_qty and balance in a table variable called #so_stockmove, and in the same time sets values to the variables called #total_move_qty and #so_docdt_id.
I didn't know it's possible to assign values to more then one target this way in Sql server (#variable1 = columnA = expression) but apparently that is possible.
Here is my test:
declare #bla char(1)
declare #tbl table
(
X char(1)
)
insert into #tbl VALUES ('A'),('B'), ('C')
SELECT *
FROM #tbl
UPDATE #tbl
SET #Bla = X = 'D'
SELECT *
FROM #tbl
SELECT #bla
Results:
X -- first select before update
----
A
B
C
X -- second select after update
----
D
D
D
---- select the variable value after update
D
It just sets the value to the variable and updates the field.

Conditional Joins - Dynamic SQL

The DBA here at work is trying to turn my straightforward stored procs into a dynamic sql monstrosity. Admittedly, my stored procedure might not be as fast as they'd like, but I can't help but believe there's an adequate way to do what is basically a conditional join.
Here's an example of my stored proc:
SELECT
*
FROM
table
WHERE
(
#Filter IS NULL OR table.FilterField IN
(SELECT Value FROM dbo.udfGetTableFromStringList(#Filter, ','))
)
The UDF turns a comma delimited list of filters (for example, bank names) into a table.
Obviously, having the filter condition in the where clause isn't ideal. Any suggestions of a better way to conditionally join based on a stored proc parameter are welcome. Outside of that, does anyone have any suggestions for or against the dynamic sql approach?
Thanks
You could INNER JOIN on the table returned from the UDF instead of using it in an IN clause
Your UDF might be something like
CREATE FUNCTION [dbo].[csl_to_table] (#list varchar(8000) )
RETURNS #list_table TABLE ([id] INT)
AS
BEGIN
DECLARE #index INT,
#start_index INT,
#id INT
SELECT #index = 1
SELECT #start_index = 1
WHILE #index <= DATALENGTH(#list)
BEGIN
IF SUBSTRING(#list,#index,1) = ','
BEGIN
SELECT #id = CAST(SUBSTRING(#list, #start_index, #index - #start_index ) AS INT)
INSERT #list_table ([id]) VALUES (#id)
SELECT #start_index = #index + 1
END
SELECT #index = #index + 1
END
SELECT #id = CAST(SUBSTRING(#list, #start_index, #index - #start_index ) AS INT)
INSERT #list_table ([id]) VALUES (#id)
RETURN
END
and then INNER JOIN on the ids in the returned table. This UDF assumes that you're passing in INTs in your comma separated list
EDIT:
In order to handle a null or no value being passed in for #filter, the most straightforward way that I can see would be to execute a different query within the sproc based on the #filter value. I'm not certain how this affects the cached execution plan (will update if someone can confirm) or if the end result would be faster than your original sproc, I think that the answer here would lie in testing.
Looks like the rewrite of the code is being addressed in another answer, but a good argument against dynamic SQL in a stored procedure is that it breaks the ownership chain.
That is, when you call a stored procedure normally, it executes under the permissions of the stored procedure owner EXCEPT when executing dynamic SQL with the execute command,for the context of the dynamic SQL it reverts back to the permissions of the caller, which may be undesirable depending on your security model.
In the end, you are probably better off compromising and rewriting it to address the concerns of the DBA while avoiding dynamic SQL.
I am not sure I understand your aversion to dynamic SQL. Perhaps it is that your UDF has nicely abstracted away some of the messyness of the problem, and you feel dynamic SQL will bring that back. Well, consider that most if not all DAL or ORM tools will rely extensively on dynamic SQL, and I think your problem could be restated as "how can I nicely abstract away the messyness of dynamic SQL".
For my part, dynamic SQL gives me exactly the query I want, and subsequently the performance and behavior I am looking for.
I don't see anything wrong with your approach. Rewriting it to use dynamic SQL to execute two different queries based on whether #Filter is null seems silly to me, honestly.
The only potential downside I can see of what you have is that it could cause some difficulty in determining a good execution plan. But if the performance is good enough as it is, there's no reason to change it.
No matter what you do (and the answers here all have good points), be sure to compare the performance and execution plans of each option.
Sometimes, hand optimization is simply pointless if it impacts your code maintainability and really produces no difference in how the code executes.
I would first simply look at changing the IN to a simple LEFT JOIN with NULL check (this doesn't get rid of your udf, but it should only get called once):
SELECT *
FROM table
LEFT JOIN dbo.udfGetTableFromStringList(#Filter, ',') AS filter
ON table.FilterField = filter.Value
WHERE #Filter IS NULL
OR filter.Value IS NOT NULL
It appears that you are trying to write a a single query to deal with two scenarios:
1. #filter = "x,y,z"
2. #filter IS NULL
To optimise scenario 2, I would INNER JOIN on the UDF, rather than use an IN clause...
SELECT * FROM table
INNER JOIN dbo.udfGetTableFromStringList(#Filter, ',') AS filter
ON table.FilterField = filter.Value
To optimise for scenario 2, I would NOT try to adapt the existing query, instead I would deliberately keep those cases separate, either an IF statement or a UNION and simulate the IF with a WHERE clause...
TSQL IF
IF (#filter IS NULL)
SELECT * FROM table
ELSE
SELECT * FROM table
INNER JOIN dbo.udfGetTableFromStringList(#Filter, ',') AS filter
ON table.FilterField = filter.Value
UNION to Simulate IF
SELECT * FROM table
INNER JOIN dbo.udfGetTableFromStringList(#Filter, ',') AS filter
ON table.FilterField = filter.Value
UNION ALL
SELECT * FROM table WHERE #filter IS NULL
The advantage of such designs is that each case is simple, and determining which is simple is it self simple. Combining the two into a single query, however, leads to compromises such as LEFT JOINs and so introduces significant performance loss to each.