Does a SQL join only execute the minimum number of conditions? - sql

In C# if I run the following.
if(obj.a() && obj.b()){
// do something
}
Function b will only execute if a returns true. Does the same thing happen below?
select
*
from
tablea a
inner join tableb b
isnumeric(b.col1) = 1
and cast(b.col1 as int) = a.id
Will the cast only be executed when b.col1 is a numeric?

You can simulate short-circuit evaluation using a CASE expression.
ON CASE WHEN ISNUMERIC(b.col1) = 1
THEN CAST(b.col1 AS int)
ELSE NULL
END = a.id

This covers short-circuat evaluation in SQL-Server deeply:
http://www.sqlservercentral.com/articles/T-SQL/71950/
In short: the evaluation order depends on the query optimizer.
Edit: As Martin commented this does not guarantee the order since it could also be optimized. From the above link(i should have read it completely):
When run against a SQL Server 2000, no error is thrown, but SQL Server
2005 and 2008 implement an optimization to push non-SARGable
predicates into the index scan from the subquery which causes the
statement to fail.
To avoid this issue, the query can be rewritten incorporating a CASE
expression, maybe a bit obscure, but guaranteed not to fail.
So this should guarantee that ISNUMERIC will be evaluated first:
SELECT aData.*,bData.*
FROM #TableA aData INNER JOIN #TableB bData
ON aData.id = CASE ISNUMERIC(bData.col1) WHEN 1 THEN CAST(bData.col1 AS INT) END
Ignore my first approach(which might not work everytime):
You should modify your join to ensure that is gets evaluated correctly:
SELECT aData.*,bData.*
FROM #TableA aData INNER JOIN
(
SELECT col1
FROM #TableB b
WHERE ISNUMERIC(b.col1) = 1
) AS bData
ON aData.id = CAST(bData.Col1 AS int)
Sample data:
create table #TableA(id int)
create table #TableB(col1 varchar(10))
insert into #TableA values(1);
insert into #TableA values(2);
insert into #TableA values(3);
insert into #TableA values(4);
insert into #TableB values('1');
insert into #TableB values('2');
insert into #TableB values(null);
insert into #TableB values('4abc');
SELECT aData.*,bData.*
FROM #TableA aData INNER JOIN
(
SELECT col1
FROM #TableB b
WHERE ISNUMERIC(b.col1) = 1
) AS bData
ON aData.id = CAST(bData.Col1 AS int)
drop table #TableA;
drop table #TableB;
Result:
id col1
1 1
2 2

From HERE:
No. Where the precedence is not determined by the Formats or by parentheses, effective evaluation of expressions is generally performed from left to right. However, it is implementation-dependent whether expressions are actually evaluated left to right, particularly when operands or operators might cause conditions to be raised or if the results of the expressions can be determined without completely evaluating all parts of the expression.

Related

Weird join on on behavior in tsql [duplicate]

This question already has answers here:
Strange / esoteric join syntax
(2 answers)
Closed 5 years ago.
I recently found old code that uses JOIN JOIN ON ON instead of the more familiar JOIN ON JOIN ON syntax.
DECLARE #a TABLE (
val INT
)
DECLARE #b TABLE (
val INT
)
DECLARE #c TABLE (
val INT
)
INSERT INTO #a VALUES (1),(2),(4)
INSERT INTO #b VALUES (1),(2),(4)
INSERT INTO #c VALUES (1),(2),(4)
SELECT *
FROM #a as a
join #b as b
join #c as c
on b.val = c.val on a.val = b.val
What I find weird now is that if you consult the query plan, first a and c is joined but there is not even a join condition a.val = c.val.
Can anybody explain the implicit evaluation of this case?
I would say it is query optimizer thing. First your query:
SELECT *
FROM #a as a
join #b as b
join #c as c
on b.val = c.val
on a.val = b.val;
Is the same as:
SELECT *
FROM #a AS A
JOIN ( #b AS B
JOIN #c AS C ON B.Val = C.Val
) ON A.Val = B.Val;
Second if you use hint:
FORCE ORDER
When you put this query hint on to your query, it tells SQL Server that when it executes the statement to not change the order of the joins in the query. It will join the tables in the exact order that is specified in the query.
Normally the SQL Server optimizer will rearrange your joins to be in the order that it thinks will be optimal for your query to execute.
SELECT *
FROM #a as a
join #b as b
join #c as c
on b.val = c.val
on a.val = b.val
OPTION (FORCE ORDER);
You will get:
Since you are joining:
#a and #b and
#b and #c
on the same b.val column, it's equivalent (and has better performance) if you just join these two tables together (on a.val = c.val) and then bring in everything from #b in the final result set.
Your join condition between #a and #c is not explicit, but implicit.
Additional miscellaneous info:
Also, since because you are joining table variables, it's most likely that the row estimates for each of the iterators in your execution plan (the table scans of #a, #b and #c) are going to be 1.
So, having this information around, SQL Server will most likely think that there's no reason to join 1 row tables in any particular order. So on some executions you could get #a and #b joined in the bottom branch of the execution plan and in others you could get #a and #c.
But this is just all speculation, what is certain is that the join conditions are implicit, but not explicit, which is why you're getting #a and #c joined first.

Sql update column from column update in single statement

I have a table with 50 columns, lets's say a,b,c,d etc.
I want to update b from a and c from the new value of b
,so
b=b+a,c=c+b ,d = c+ d,.. etc
I don't want to make an update like
Update [table] set b=b+a, c= c+b+a.. etc
because for each column update I must write a huge calculated formula, which will get bigger and more complex for each column.
The table I want to update has about 50 million rows so I think that creating multiple update statements will be a suicide performance wise.
How should I handle this?
Do you think using variables is a good idea?
Thank you and sorry for my English.
Note: Sorry for not being clear about my question.
There are 50 new columns in an existing table.
The first new column, a, is calculated using a join with other tables.
The second new column, b, is calculated by adding the result to the new value of a, to a calculation I want to make using a join with other tables (the same tables as the first update).
The sames logic applies for calculating the values for all 50 columns.
UPDATED:
Thanks for updating your question. Definitely, use SUBQUERIES in your UPDATE statement to do most of the work. You can utilize indexes in your queries through SARGs in your ON/WHERE predicates.
Since your logic is self-dependent on the other tables for columns, something like the following will help:
CREATE TABLE #TABLE1 (ColA INT, ID INT IDENTITY(1,1) )
INSERT INTO #TABLE1 (ColA ) VALUES (1)
CREATE TABLE #TABLE2 (ColB INT, ID INT IDENTITY(1,1) )
INSERT INTO #TABLE2 (ColB) VALUES (4)
CREATE TABLE #TABLE3 (ColC INT, ID INT IDENTITY(1,1) )
INSERT INTO #TABLE3 (ColC) VALUES (10)
SELECT ColC + B.ColB AS ColC, ColC, B.ColB, B.ColA, B.ORIGINAL, B.ID
FROM #TABLE3 A
RIGHT OUTER JOIN (SELECT ColB + B.ColA AS ColB, ColA, ORIGINAL, B.ID
FROM #TABLE2 A
RIGHT OUTER JOIN (SELECT ColA + ColB AS ColA, ColB AS ORIGINAL, A.ID
FROM #TABLE1 A
INNER JOIN #TABLE2 B ON A.ID = B.ID) B ON A.ID = B.ID ) B ON A.ID = B.ID
Note that for simplicity I just assumed the tables had the same ID columns, as I kept the inner subquery ID in the list. SQL goes from right to left (inner queries first), so keep that in mind.
It might be a long series of joins, but at least most of repetitive logic will be handles inside the subqueries themselves. OUTER JOINS make sense, since you want to keep the value of the inner query and compare that new computed value with the next outer subquery.
You can try using update from select like this.
UPDATE T
SET a = T.a
,b = T.b
,c = T.c
,d = T.d
...
...
SELECT a
,b = b + a
,c = c + b
,d = d + c
...
...
FROM yourtable T

Not in or Not exist Query Very Slow for Large Data Sybase

I have a table A which is having around 50000 records and a table B which is having 50000 records as well.
sample data:
A B
1 1
2 2
3 null
4 null
I want to find records 3, 4 which are present in Table A but not in Table B.
I am using
select id from A where id NOT IN(select id from B)
I have also tried NOT Exist, but as the records are very large in number, it still takes a lot of time.
select id from A where NOT Exists(select id from B and B.id = A.id)
Left Outer Join cant be used to find the missing records as the id is not present in Table B.
Is there any way to make the Query Work Faster in Sybase itself?
Or Shifting the database to MongoDB is the solution?
I'm not sure why you are not prepare LEFT JOIN, I tried with the LEFT JOIN it returns your expected result.
Sample execution with the given data:
DECLARE #TableA TABLE (Id INT);
DECLARE #TableB TABLE (Id INT);
INSERT INTO #TableA (Id) VALUES (1), (2), (3), (4);
INSERT INTO #TableB (Id) VALUES (1), (2), (NULL), (NULL);
SELECT T1.Id
FROM #TableA T1
LEFT JOIN #TableB T2 ON T2.Id = T1.Id
WHERE T2.Id IS NULL
Result
3
4
In performance perspective, always try to avoid using inverse keywords like NOT IN, NOT EXISTS. Because to check the inverse items DBMS need to runs through all the available records and drop the inverse selection.
LEFT JOIN / IS NULL and NOT EXISTS are semantically equivalent, while NOT IN is not. These method differ in how they handle NULL values in table_right.
Therefore, You should go for LEFT JOIN to improve your sql performance.
select A.id from A LEFT JOIN B
on A.id = B.id
where B.id is null
order by A.id;

Is a scalar database function used in a join called once per distinct set of inputs or once per row?

If I have a sql statement like this:
select *
from tableA a
inner join tableB b on dbo.fn_something(a.ColX) = b.ColY
if you assume there are 5 rows in tableA with the same value for ColX will dbo.fn_something() be called with that value 5 times or just one time?
Clearly this is a trivial example, but I'm interested for the purposes of thinking about performance in a more complex scenario.
UPDATE
Thanks #DStanley, following from your answer I investigated further. Using SQL Profiler with the SP:StmtStarting event on the SQL below illustrates what happens. i.e. as you said: the function will be called once for each row in the join.
This has an extra join from the original question.
create table tableA
( id int )
create table tableB
( id_a int not null
, id_c int not null
)
create table tableC
( id int )
go
create function dbo.fn_something( #id int )
returns int
as
begin
return #id
end
go
-- add test data
-- 5 rows:
insert into tableA (id) values (1), (2), (3), (4), (5)
-- 5 rows:
insert into tableC (id) values (101), (102), (103), (104), (105)
-- 25 rows:
insert into tableB (id_a, id_c) select a.id, c.id from tableA a, tableC c
go
-- here dbo.fn_something() is called 25 times:
select *
from tableA a
inner join tableB b on a.id = b.id_a
inner join tableC c on c.id = dbo.fn_something(b.id_c)
-- here dbo.fn_something() is called just 5 times,
-- as the 'b.id_c < 102' happens to be applied first.
-- That's likely to depend on whether SQL thinks it's
-- faster to evaluate the '<' or the function.
select *
from tableA a
inner join tableB b on a.id = b.id_a
inner join tableC c on c.id = dbo.fn_something(b.id_c) and b.id_c < 102
go
drop table tableA ;
drop table tableB;
drop table tableC;
drop function dbo.fn_something;
go
It will be called for each row in a. I do not know of any optimization that would call the function just for unique inputs. If performance is an issue you could create a temp table with distinct input values and use thoce results in your join, but I would only do that it it was an issue - don't assume it's a problem and clutter your query unnecessarily.
If you declare your function as schema bound, it can be run one for each unique case. This requires that the function be deterministic and always has the same output for a given input.
CREATE FUNCTION dbo.fn_something (#id INT)
RETURNS INT
WITH SCHEMABINDING
AS
BEGIN
RETURN #id
END
GO

Updating and join on multiple rows, which row's value is used?

Let's say I have the following statement and the inner join results in 3 rows where a.Id = b.Id, but each of the 3 rows have different b.Value's. Since only one row from tableA is being updated, which of the 3 values is used in the update?
UPDATE a
SET a.Value = b.Value
FROM tableA AS a
INNER JOIN tableB as b
ON a.Id = b.Id
I don't think there are rules for this case and you cannot depend on a particular outcome.
If you're after a specific row, say the latest one, you can use apply, like:
UPDATE a
SET a.Value = b.Value
FROM tableA AS a
CROSS APPLY
(
select top 1 *
from tableB as b
where b.id = a.id
order by
DateColumn desc
) as b
Usually what you end up with in this scenario is the first row that appears in the order of the physical index on the table. In actual practice, you should treat this as non-deterministic and include something that narrows your result to one row.
Here is what I came up with using SQL Server 2008
--drop table #b
--drop table #a
select 1 as id, 2 as value
into #a
select 1 as id, 5 as value
into #b
insert into #b
select 1, 3
insert into #b
select 1, 6
select * from #a
select * from #b
UPDATE #a
SET #a.Value = #b.Value
FROM #a
INNER JOIN #b
ON #a.Id = #b.Id
It appears that it uses the top value of a basic select each time (row 1 of select * from #b). So, it possibly depends on indexing. However, I would not rely on the implementation set by SQL, as that has the possibility of changing. Instead, I would suggest using the solution presented by Andomar to make sure you know what value you are going to choose.
In short, do not trust the default implementation, create your own. But, this was an interesting academic question :)
Best option in my case for updating multiple records is to use merge Query(Supported from SQL Server 2008), in this query you have complete control of what you are updating.
Also you can use output query to do further processing.
Example: Without Output clause(only update)
;WITH cteB AS
( SELECT Id, Col1, Col2, Col3
FROM B WHERE Id > 10 ---- Select Multiple records
)
MERGE A
USING cteB
ON(A.Id = cteB.Id) -- Update condition
WHEN MATCHED THEN UPDATE
SET
A.Col1 = cteB.Col1, --Note: Update condition i.e; A.Id = cteB.Id cant appear here again.
A.Col2 = cteB.Col2,
A.Col3 = cteB.Col3;
Example: With OputPut clause
CREATE TABLE #TempOutPutTable
{
PkId INT NOT NULL,
Col1 VARCHAR(50),
Col2 VARCHAR(50)
}
;WITH cteB AS
( SELECT Id, Col1, Col2, Col3
FROM B WHERE Id > 10
)
MERGE A
USING cteB
ON(A.Id = cteB.Id)
WHEN MATCHED THEN UPDATE
SET
A.Col1 = cteB.Col1,
A.Col2 = cteB.Col2,
A.Col3 = cteB.Col3
OUTPUT
INSERTED.Id, cteB.Col1, A.Col2 INTO #TempOutPutTable;
--Do what ever you want with the data in temporary table
SELECT * FROM #TempOutPutTable; -- you can check here which records are updated.
Yes, I came up with a similar experiment to Justin Pihony:
IF OBJECT_ID('tempdb..#test') IS NOT NULL DROP TABLE #test ;
SELECT
1 AS Name, 0 AS value
INTO #test
IF OBJECT_ID('tempdb..#compare') IS NOT NULL DROP TABLE #compare ;
SELECT 1 AS name, 1 AS value
INTO #compare
INSERT INTO #compare
SELECT 1 AS name, 0 AS value;
SELECT * FROM #test
SELECT * FROM #compare
UPDATE t
SET t.value = c.value
FROM #test t
INNER JOIN #compare c
ON t.Name = c.name
Takes the topmost row in the comparison, right-side table. You can reverse the #compare.value values to 0 and 1 and you'll get the reverse. I agree with the posters above...its very strange that this operation does not throw an error message as it is completely hidden that this operation IGNORES secondary values