Choosing Best SQL Query - sql

Hi I am pretty New to MS SQL so forgive me if I am asking something which is very obvious to other more experienced people. I can write the query to fetch the data in multiple way to fetch the same data. Now I have two SQL queries X and Y which look like following
(Query 1)
select column1, column2, column3
from
Table1 a
inner join
Table2 b on a.column1=b.column1
where Condition1 and condition2
EXCEPT
(select column1, column2, column3
from
Table1 a
inner join
Table2 b on a.column1=b.column1
where Condition3
)
(Query 2)
select column1, column2, column3
from
Table1 a
inner join
Table2 b on a.column1=b.column1
where Condition1 and condition2
And column1 Not in
(select column1
from
Table1 a
inner join
Table2 b on a.column1=b.column1
where Condition3
)
These both take similar time and Estimated Subtree cost also have minimal difference. I am not sure which one is a better query and why.

EXCEPT compares all (paired)columns of two full-selects and returns distinct rows from left result set which are not present in the right result set, while NOT IN compares two or more tables according to the conditions specified in WHERE clause in the sub-query following NOT EXISTS keyword and does the same however it doesn’t returns the distinct result set.
The EXCEPT returns distinct rows whereas NOT IN didn’t return distinct values. If you analyse the execution plan, you will realise that the EXCEPT query is slower than NOT IN.
The distinct sort operator in the EXCEPT costs around 65% of the total execution time.
According to this Link, EXCEPT can be rewritten by using NOT EXISTS. (EXCEPT ALL can be rewritten by using ROW_NUMBER and NOT EXISTS.)
Refer to LINK for more info.

Second one seems to have a slight edge on the first one.
The sub-query in second one fetches only one column i.e. column1.
If that column is indexed then it will be far better for sql engine to query with precision and speed.

What if you modify the where condition like below?
select column1, column2, column3
from
Table1 a
inner join
Table2 b on a.column1=b.column1
where Condition1 and condition2 and not condition 3

Related

Right Join vs where a value exists in another table

Without realizing it I've switched to the first block of code as a preference. I am curious if it is a best practice or more efficient to use the first block of code over the second or vice versa?
In my opinion the first is more readable and concise since all the columns are from one table.
SELECT Column2, Column3, Column4
FROM Table1
WHERE Column1 in (SELECT Column1 FROM Table2)
vs
SELECT A.Column2, A.Column3, A.Column4
FROM Table1 A
RIGHT JOIN Table2 B ON A.Column1 = B.Column1
Just hoping for clarification on best practices/efficiency of each statement and if there's an accepted form.
Your two queries don't do the same thing.
Your first one
SELECT Column2, Column3, Column4
FROM Table1
WHERE Column1 in (SELECT Column1 FROM Table2)
is called a semi-join. It works like an inner join where the resultset has no columns from the second table. This is another way of writing the semi-join, but you have pointed out that your way is easier for you to read and reason about. (I agree.) Modern query planners satisfy either way of writing the semi-join the same way. This is the other way of writing the semi-join.
SELECT Table1.Column2, Table1.Column3, Table1.Column4
FROM Table1
INNER JOIN Table2 ON Table1.Column1 = Table2.Column1
Your second query is this. (By the way, RIGHT JOINs are far less common than LEFT JOINs in production code; many people have to stop and think twice when reading a RIGHT JOIN.)
SELECT A.Column2, A.Column3, A.Column4
FROM Table1 A
RIGHT JOIN Table2 B ON A.Column1 = B.Column1
This will produce resultset rows for every row in Table2 whether or not they match rows in Table1. Inner joins only deliver the rows that match the ON condition for both joined tables, and that's what you want.
Left joins produce at least one row for every row in Table1, even if it doesn't match. It's the same mutatis mutandis for right joins.

How to reuse sub-query when the number of reuses is not the complete list of columns in main query?

So the question is:
select
column1,
column2,
(select column_x from anotherTable with conditionsDerived) as column3,
(select column_y from anotherTable with conditionsDerived) as column4
from mainTable
where conditions;
In the above example, for columns column3 and column4, the same query is being used twice. I believe it takes twice the time and I want to avoid that. I know how to handle it if column1 and column2 were not there. Important point is conditionsDerived is based on the current query from mainTable, that is it is not a standalone query. It depends on at least one column of mainTable.
I assume that your queries are really correlated subqueries. If that is the case, then you can use a "lateral join". In SQL Server this is accomplished using outer apply:
select t.column1, t.column2,
an.column_x as column3, an.column_y as column4
from mainTable t outer apply
(select column_x, column_y
from anothertable an
where . . .
) an
where conditions;
Apply is like having a correlated subquery in the from clause. But it is better . . . the subquery can return multiple columns and/or multiple rows.
If the query is not correlated, you can still use apply. But you could also use cross join as well.
You can use a WITH clause which could improve the performance and make more readable code.
This articles can help you:
https://oracle-base.com/articles/misc/with-clause
http://modern-sql.com/feature/with
Use a STD join query:
select table1.id,
table1.column_1,
table1.column_2,
table2.column_3,
table2.column_4
from table1
inner join table2
on table1.id = table2.id
where table2.field_x = table2.conditions
and table1.field_y = table1.conditions;

SQL Server Query Performance Issue: Need Replacement of NOT EXISTS

Could someone optimitize the performance of below General SQL Query:
select fn.column1
from dbo.function1(input1) as fn
where (not exists (select 1 from table1 t1
where fn.column1 = t1.column1)
and not exists (select 1 from table2 t2
where fn.column1 = t2.column1))
For the above query, consider the approximate row count given below.
select fn.column1 from dbo.function1(input1) as fn -- returns 64000 records in 2 seconds.
table 1 (Column1) record-- returns 3000 records -- 1 second
table 2 (Column1) record-- returns 2000 records -- 1 second
So, if I run each select statement, it pulls and displays record in 1 or 2 seconds. But, if I run the full query, it takes more than a minute to display 64000 - (3000 + 2000) = 59000 records.
I tried the using EXCEPT like this:
select fn.column1
from dbo.function1(input1)
except
(select column1 from dbo.table1 union select column1 from dbo.table2)
Nothing improves my performance. Same it takes a minute to display 59000 records. This is with the same case for "NOT IN" Scenario.
Also I noticed that if we use UNION, instead of EXCEPT in the above query, it returns 59K records in 2 seconds.
UPDATED:
The function (a bit complex) contains the below pseudocode
select column1, column2,...column6
from dbo.table1
inner join dbo.table2
inner join ....
inner join dbo.table6
inner join dbo.innerfunction1
where <Condition 1>
UNION ALL
select column1, column2,...column6
from dbo.table1
inner join dbo.table2
inner join ...
inner join dbo.table4
inner join dbo.innerfunction2
where (condition 2)
Assume that two inner functions has single table select statement
My question is: if I select the column from the function, it displays 64K records in 1 sec. But, if the whole query executed, it takes more than a minute.
[Please Note: This query need to be used in function]
Could any one help me to improve this?
Kindly let me know if you need more details or clarifications.
Regards,
Viswa V.
Its a bit hard to optimize without data to play with. A fiddle would be good. Nonetheless here is an approach that may work.
Create a temp table, index it then do the EXCEPT as follows:
SELECT
fn.column1
INTO
#temp
FROM
dbo.function1(input1) AS fn
CREATE NONCLUSTERED INDEX [temp_index] ON #temp
(
column1 ASC
)
SELECT
column1
FROM
#temp AS t
EXCEPT
(
SELECT
column1
FROM
dbo.table1
UNION
SELECT
column1
FROM
dbo.table2
)
I would be interested in the result.

Are Columns Not Selected in SQL Views Executed?

I wasn't able to come up with the right keywords to search for the answer for this, so apologies if it was answered already.
Consider the following SQL view:
CREATE VIEW View1 AS
SELECT Column1
,Column2
,(SELECT SUM(Column3) FROM Table2 WHERE Table2.ID = Table1.ID) -- Subquery
FROM Table1
If I run the following query, will the subquery be executed or does SQL Server optimise the query?
SELECT Column1 FROM View1
I'm looking at this from a performance point of view, say, if the view has quite a few subqueries (aggregations can take a long time if the inner select refers to a large table).
I'm using SQL Server 2008 R2, but I'm interested to know if the answer differs for 2012 or maybe MySQL.
Thanks.
As has been said, this varies depending on your DBMS (version and provider), to know for sure check the execution plan. This shows for SQL-Server 2008 the subquery is not executed:
As you can see in the top plan where Column3 is not selected the plan is simply selecting from table1, in the bottom plan that in includes Column3, table2 is queried.
In SQL-Server 2008 R2 it is not executed.
In SQL-Server 2012 it is not executed;
In MySQL it is executed, and both queries generate the same plan:
To elaborate further, it will also depend on your exact query, as well as your DBMS. For example:
CREATE VIEW View2
AS
SELECT t.ID, t.Column1, t.Column2, t2.Column3
FROM Table1 t
LEFT JOIN
( SELECT ID, Column3 = SUM(Column3)
FROM Table2
GROUP BY ID
) t2
ON t2.ID = t.ID
GO
SELECT Column1, Column2
FROM View2;
SELECT Column1, Column2, Column3
FROM View2;
In this case you get similar results to the correlated subquery, The plan shows only a select from table1 if column3 is not selected, because it is a LEFT JOIN the optimiser knows that the subquery t2 has no bearing on the select from table1, and no columns are used so it does not bother with it. If you changed the LEFT JOIN to an INNER JOIN though, e.g.
CREATE VIEW View3
AS
SELECT t.ID, t.Column1, t.Column2, t2.Column3
FROM Table1 t
INNER JOIN
( SELECT ID, Column3 = SUM(Column3)
FROM Table2
GROUP BY ID
) t2
ON t2.ID = t.ID
GO
SELECT Column1, Column2
FROM View3;
SELECT Column1, Column2, Column3
FROM View3;
The query plan for these two queries shows that because the aggregate column is not used in the second query, the optimiser essentially changes the view to this:
SELECT t.ID, t.Column1, t.Column2
FROM Table1 t
INNER JOIN
( SELECT DISTINCT ID
FROM Table2
) t2
ON t2.ID = t.ID;
As seen by the appearance of the Distinct Sort on table2 and the removal of the Stream Aggregate.
So to summarise, it depends.
The view is just a definition, like a temporary table in a query.
First the query behind the view will be executed and then your selection on the view. So yes the subquery will be executed. If you don't want this you should create a new view without the subquery.

Problems while trying to make a query with variables in the conditions (stored procedure)

I'm having a problem. I'm trying to do a query... I remember that in the past I did something like this but today this query is returning nothing, no error, no data, just nothing... the query is something like this:
SELECT field1, #variableX:=field2
FROM table
WHERE
(SELECT COUNT(fieldA) FROM table2 WHERE fieldB=#variableX AND fieldC=0)>0 AND
(SELECT COUNT(fieldA) FROM table2 WHERE fieldB=#variableX AND fieldC=4)=0;
I also tried this query but it didn't work (also it gaves no error):
SELECT field1, #variableX:=field2,
#variableY:=(SELECT COUNT(fieldA) FROM table2 WHERE fieldB=#variableX AND fieldC=0),
#variableZ:=(SELECT COUNT(fieldA) FROM table2 WHERE fieldB=#variableX AND fieldC=4)
FROM table
WHERE #variableY>0 AND #variableZ=0;
As you can see, what I'm trying to do in the 1st query is use a variable in the conditions. In the 2nd query I'm trying to create some variables and evaluate them in the conditions. At the end in the 2nd query the #variableY=1 AND #variableZ=0 but I don't know why the query returns an empty data set.
What could be wrong here??? Any comment or suggestion is welcome!!! Thanks!!!
Bye!!!
You don't need variables, subqueries, or COUNT to solve this problem.
SELECT DISTINCT t1.field1, t1.field2
FROM mytable t1
INNER JOIN mytable2 t2 ON t1.field2 = t2.fieldB AND t3.fieldC = 0
LEFT OUTER JOIN mytable2 t3 ON t1.field2 = t3.fieldB AND t3.fieldC = 4
WHERE t3.fieldB IS NULL
You wanted nonzero matches for the t2 case, which is true if the inner join is satisfied.
And you wanted zero matches for the t3 case, which is true if the outer join is not satisfied (and therefore t3.* would be NULL).
#ircmaxell is correct that select-list expressions are evaluated only after the row passes the conditions in the WHERE clause. This is good, because if you have a costly expression in your select-list, but the WHERE clause is going to filter out 99% of rows, it would be wasteful to evaluate the costly select-list expression for all those rows, only to discard them.
IIRC, select fields are only executed after a where clause has been satisfied. So the #variableX will always be empty during the entire execution of the where clause... For the same query (IF I understand you correctly) you can do:
SELECT field1, field2
FROM table AS a
WHERE
(SELECT COUNT(fieldA) FROM table2 WHERE fieldB=a.field2 AND fieldC=0)>0 AND
(SELECT COUNT(fieldA) FROM table2 WHERE fieldB=a.field2 AND fieldC=4)=0;