Best way to tune NOT EXISTS in SQL queries

Best way to tune NOT EXISTS in SQL queries - sql

I am trying to tune SQLs which have NOT EXISTS clause in the queries.My database is Netezza.I tried replacing NOT EXISTS with NOT IN and looked at the query plans.Both are looking similar in execution times.Can someone help me regarding this?I am trying to tune some SQL queries.Thanks in advance.
SELECT ETL_PRCS_DT, COUNT (*) TOTAL_PRGM_HOLD_DUE_TO_STATION
FROM DEV_AM_EDS_1..AM_HOLD_TV_PROGRAM_INSTANCE D1
WHERE NOT EXISTS (
SELECT *
FROM DEV_AM_EDS_1..AM_STATION
WHERE D1.STN_ID = STN_ID
)
GROUP BY ETL_PRCS_DT;

You can try a JOIN:
SELECT ETL_PRCS_DT, COUNT (*) TOTAL_PRGM_HOLD_DUE_TO_STATION
FROM DEV_AM_EDS_1..AM_HOLD_TV_PROGRAM_INSTANCE D1
LEFT JOIN DEV_AM_EDS_1..AM_STATION TAB2 ON D1.STN_ID = TAB2.STN_ID
WHERE TAB2.STN_ID IS NULL
Try to compare the execution plans. The JOIN might produce the same you already have.

You can try a join, but you sometimes need to be careful. If the join key is not unique in the second table, then you might end up with multiple rows. The following query takes care of this:
SELECT ETL_PRCS_DT,
COUNT (*) TOTAL_PRGM_HOLD_DUE_TO_STATION
FROM DEV_AM_EDS_1..AM_HOLD_TV_PROGRAM_INSTANCE D1
left outer join
(
select distinct STN_ID
from DEV_AM_EDS_1..AM_STATION ams
) ams
on d1.STN_ID = ams.STN_ID
WHERE ams.STN_ID is NULL

Related

Prevent duplicate record when inner join query in SQL

I used the inner join command to get the data from two tables.
But, when I run the SQL query.
I got the same record duplicated 48 times.
The SQL query I created is below
SELECT
ABS_LIMIT.B1_NAME, ABS_LIMIT.B2_NAME, ABS_LIMIT.B3_NAME, ABS_LIMIT.ELEM_NAME
FROM
ABS_LIMIT
INNER JOIN
RTU_SCAN ON RTU+SCAN.B1_NAME = ABS_LIMIT.B1_NAME
WHERE
ABS_LIMIT.B3_NAME LIKE 'AMP%';
Does anyone have any idea how to remove the duplicate from the query result?

You never SELECT any columns from RTU_SCAN so you can use EXISTS rather than an INNER JOIN:
SELECT a.B1_NAME,
a.B2_NAME,
a.B3_NAME,
a.ELEM_NAME
FROM ABS_LIMIT a
WHERE EXISTS (SELECT 1 FROM RTU_SCAN r WHERE r.B1_NAME = a.B1_NAME)
AND a.B3_NAME LIKE 'AMP%';
Then, if there are duplicates in RTU_SCAN they will not propagate duplicate rows in the output.
Alternatively, you could use DISTINCT to remove duplicates:
SELECT DISTINCT
a.B1_NAME,
a.B2_NAME,
a.B3_NAME,
a.ELEM_NAME
FROM ABS_LIMIT a
INNER JOIN RTU_SCAN r
ON r.B1_NAME = a.B1_NAME
AND a.B3_NAME LIKE 'AMP%';
However, it will probably be less efficient to generate duplicates and then filter them out using DISTINCT compared to using EXISTS and not generating the duplicates in the first place.

Fastest way to count from a subquery

I have the following query to return a list of current employees and the number of 'corrections' they have. This is working correctly but is very slow.
I was previously not using a subquery, instead opting for a count (from...) as an aggregate subselect but I have read that a subquery should be much faster. Changing to the code to the below did improve performance but not anywhere near what I was expecting.
SELECT DISTINCT
tblStaff.StaffID, CorrectionsOut.Count AS CorrectionsAssigned
FROM tblStaff
LEFT JOIN tblMeetings ON tblMeetings.StaffID = tblStaff.StaffID
JOIN tblTasks ON tblTasks.TaskID = tblMeetings.TaskID
--Get Corrections Issued
LEFT JOIN(
SELECT
COUNT(DISTINCT tblMeetings.TaskID) AS Count, tblMeetings.StaffID
FROM tblRegister
JOIN tblMeetings ON tblRegister.MeetingID = tblMeetings.MeetingID
WHERE tblRegister.FDescription IS NOT NULL
AND tblRegister.CorrectionOutDate IS NULL
GROUP BY tblMeetings.StaffID
) AS CorrectionsOut ON CorrectionsOut.StaffID = tblStaff.StaffID
WHERE tblStaff.CurrentEmployee = 1
I need an open vendor solution as we are transitioning from SQL Server to Postgres. Note this is a simplified example of the query where there are quite few counts. My current query time without the counts is less than half a second, but with the counts, is approx 20 seconds, if it runs at all without locking or otherwise failing.

I would get rid of the joins that you are not using which probably makes the SELECT DISTINCT unnecessary as well:
SELECT s.StaffID, co.Count AS CorrectionsAssigned
FROM tblStaff s LEFT JOIN
(SELECT COUNT(DISTINCT m.TaskID) AS Count, m.StaffID
FROM tblRegister r
tblMeetings m
ON r.MeetingID = m.MeetingID
WHERE r.FDescription IS NOT NULL AND
r.CorrectionOutDate IS NULL
GROUP BY m.StaffID
) co
ON co.StaffID = s.StaffID
WHERE s.CurrentEmployee = 1;
Getting rid of the SELECT DISTINCT and the duplicate rows added by the tasks should help performance.
For additional benefit, you would want to be sure you have indexes on the JOIN keys, and perhaps on the filtering criteria.

Execute SQL query based on results of a condition

My SQL knowledge is somewhat limited. I have 2 Select statements that returns separate sets of results. Since the tables being queried are different sets of tables for each Select statement I would like to know if it is possible to write it such that it picks which query to execute based on the result returned from a Select Statement on a completely different table, for example:
Select C.ManBtchNum from Table C
If OITM.ManBtchNum = 'Y' then
Select * from Table A
else
Select * from Table B
Apologies for using psuedocode but its the shortest way I can explain this.
I'm not sure if a Case expression can be used here. Any advice would be helpful. Thanks
I have tried both a Case as well as a Union but I'm not getting the results needed. Granted I might be doing something wrong considering my limited knowledge

As an example,
select t1.Val, t2.Val, t3.Val,
case
when(t1.Val = Something)Then (T2.Val)Else(t3.Val)
END
from tablex t1
inner join tabley t2 on t1.commonfield = t2.commonfield
inner join tablez t3 ON t1.commonfield = t3.commonfield

SQL: Optimization problem, has rows?

I got a query with five joins on some rather large tables (largest table is 10 mil. records), and I want to know if rows exists. So far I've done this to check if rows exists:
SELECT TOP 1 tbl.Id
FROM table tbl
INNER JOIN ... ON ... = ... (x5)
WHERE tbl.xxx = ...
Using this query, in a stored procedure takes 22 seconds and I would like it to be close to "instant". Is this even possible? What can I do to speed it up?
I got indexes on the fields that I'm joining on and the fields in the WHERE clause.
Any ideas?

switch to EXISTS predicate. In general I have found it to be faster than selecting top 1 etc.
So you could write like this IF EXISTS (SELECT * FROM table tbl INNER JOIN table tbl2 .. do your stuff

Depending on your RDBMS you can check what parts of the query are taking a long time and which indexes are being used (so you can know they're being used properly).
In MSSQL, you can use see a diagram of the execution path of any query you submit.
In Oracle and MySQL you can use the EXPLAIN keyword to get details about how the query is working.
But it might just be that 22 seconds is the best you can do with your query. We can't answer that, only the execution details provided by your RDBMS can. If you tell us which RDBMS you're using we can tell you how to find the information you need to see what the bottleneck is.

4 options
Try COUNT(*) in place of TOP 1 tbl.id
An index per column may not be good enough: you may need to use composite indexes
Are you on SQL Server 2005? If som, you can find missing indexes. Or try the database tuning advisor
Also, it's possible that you don't need 5 joins.
Assuming parent-child-grandchild etc, then grandchild rows can't exist without the parent rows (assuming you have foreign keys)
So your query could become
SELECT TOP 1
tbl.Id --or count(*)
FROM
grandchildtable tbl
INNER JOIN
anothertable ON ... = ...
WHERE
tbl.xxx = ...
Try EXISTS.
For either for 5 tables or for assumed heirarchy
SELECT TOP 1 --or count(*)
tbl.Id
FROM
grandchildtable tbl
WHERE
tbl.xxx = ...
AND
EXISTS (SELECT *
FROM
anothertable T2
WHERE
tbl.key = T2.key /* AND T2 condition*/)
-- or
SELECT TOP 1 --or count(*)
tbl.Id
FROM
mytable tbl
WHERE
tbl.xxx = ...
AND
EXISTS (SELECT *
FROM
anothertable T2
WHERE
tbl.key = T2.key /* AND T2 condition*/)
AND
EXISTS (SELECT *
FROM
yetanothertable T3
WHERE
tbl.key = T3.key /* AND T3 condition*/)

Doing a filter early on your first select will help if you can do it; as you filter the data in the first instance all the joins will join on reduced data.
Select top 1 tbl.id
From
(
Select top 1 * from
table tbl1
Where Key = Key
) tbl1
inner join ...
After that you will likely need to provide more of the query to understand how it works.

Maybe you could offload/cache this fact-finding mission. Like if it doesn't need to be done dynamically or at runtime, just cache the result into a much smaller table and then query that. Also, make sure all the tables you're querying to have the appropriate clustered index. Granted you may be using these tables for other types of queries, but for the absolute fastest way to go, you can tune all your clustered indexes for this one query.
Edit: Yes, what other people said. Measure, measure, measure! Your query plan estimate can show you what your bottleneck is.

Use the maximun row table first in every join and if more than one condition use
in where then sequence of the where is condition is important use the condition
which give you maximum rows.
use filters very carefully for optimizing Query.

optimize SQL query

What more can I do to optimize this query?
SELECT * FROM
(SELECT `item`.itemID, COUNT(`votes`.itemID) AS `votes`,
`item`.title, `item`.itemTypeID, `item`.
submitDate, `item`.deleted, `item`.ItemCat,
`item`.counter, `item`.userID, `users`.name,
TIMESTAMPDIFF(minute,`submitDate`,NOW()) AS 'timeMin' ,
`myItems`.userID as userIDFav, `myItems`.deleted as myDeleted
FROM (votes `votes` RIGHT OUTER JOIN item `item`
ON (`votes`.itemID = `item`.itemID))
INNER JOIN
users `users`
ON (`users`.userID = `item`.userID)
LEFT OUTER JOIN
myItems `myItems`
ON (`myItems`.itemID = `item`.itemID)
WHERE (`item`.deleted = 0)
GROUP BY `item`.itemID,
`votes`.itemID,
`item`.title,
`item`.itemTypeID,
`item`.submitDate,
`item`.deleted,
`item`.ItemCat,
`item`.counter,
`item`.userID,
`users`.name,
`myItems`.deleted,
`myItems`.userID
ORDER BY `item`.itemID DESC) as myTable
where myTable.userIDFav = 3 or myTable.userIDFav is null
limit 0, 20
I'm using MySQL
Thanks

What does the analyzer say for this query? Without knowledge about how many rows there are in the table you cant tell any optimization. So run the analyzer and you'll see what parts costs what.

Of course, as #theomega said, look at the execution plan.
But I'd also suggest to try and "clean up" your statement. (I don't know which one is faster - that depends on your table sizes.) Usually, I'd try to start with a clean statement and start optimizing from there. But typically, a clean statement makes it easier for the optimizer to come up with a good execution plan.
So here are some observations about your statement that might make things slow:
a couple of outer joins (makes it hard for the optimzer to figure out an index to use)
a group by
a lot of columns to group by
As far as I understand your SQL, this statement should do most of what yours is doing:
SELECT `item`.itemID, `item`.title, `item`.itemTypeID, `item`.
submitDate, `item`.deleted, `item`.ItemCat,
`item`.counter, `item`.userID, `users`.name,
TIMESTAMPDIFF(minute,`submitDate`,NOW()) AS 'timeMin'
FROM (item `item` INNER JOIN users `users`
ON (`users`.userID = `item`.userID)
WHERE
Of course, this misses the info from the tables you outer joined, I'd suggest to try to add the required columns via a subselect:
SELECT `item`.itemID,
(SELECT count (itemID)
FROM votes v
WHERE v.itemID = 'item'.itemID) as 'votes', <etc.>
This way, you can get rid of one outer join and the group by. The outer join is replaced by the subselect, so there is a trade-off which may be bad for the "cleaner" statement.
Depending on the cardinality between item and myItems, you can do the same or you'd have to stick with the outer join (but no need to reintroduce the group by).
Hope this helps.

Some quick semi-random thoughts:
Are your itemID and userID columns indexed?
What happens if you add "EXPLAIN " to the start of the query and run it? Does it use indexes? Are they sensible?
DO you need to run the whole inner query and filter on it, or could you put move the where myTable.userIDFav = 3 or myTable.userIDFav is null part into the inner query?

You do seem to have too many fields in the Group By list, since one of them is itemID, I suspect that you could use an inner SELECT to preform the grouping and an outer SELECT to return the set of fields desired.

Can't you add the where clause myTable.userIDFav = 3 or myTable.userIDFav is null to WHERE (item.deleted = 0)?
Regards
Lieven

Look at the way your query is built. You join a lot of stuff, then limit the output to 20 rows. You should have the outer join on items and myitems, since your conditions only apply to these two tables, limit the output to the first 20 rows, then join and aggregate. Here you are performing a lot of work that is going to be discarded.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Best way to tune NOT EXISTS in SQL queries - sql

Related

Prevent duplicate record when inner join query in SQL

Fastest way to count from a subquery

Execute SQL query based on results of a condition

SQL: Optimization problem, has rows?

optimize SQL query

Categories

Resources