SQL Server : stored procedure is slow when running two left join from one table - sql

I have a stored procedure that runs a query to get some data coupe of rows not that big of tables that has two left joins from the same table but is acting slow and taking up to 300 ms with 6 to 20 rows in each table.
How can I optimize this stored procedure?
SELECT
m.MobileNotificationID,
m.[Message] AS text,
m.TypeId AS typeId ,
m.MobileNotificationID AS recordId ,
0 badge ,
m.DeviceID,
ISNULL(users.DeviceToken, subscribers.DeviceToken) DeviceToken,
ISNULL(users.DeviceTypeID, subscribers.DeviceTypeID) DeviceTypeID,
m.Notes,
isSent = 0
--, m.SubscriberID, m.UserID
FROM
MobileNotification m
LEFT JOIN
Device users ON m.userId = users.UserID
AND users.DeviceID = m.DeviceID
LEFT JOIN
Device subscribers ON m.SubscriberID = subscribers.SubscriberId
AND subscribers.DeviceID = m.DeviceID
WHERE
IsSent = 0
AND m.DateCreated <= (SELECT GETDATE())
AND (0 = 0 OR ISNULL(users.DeviceTypeID, subscribers.DeviceTypeID) = 0)
AND (ISNULL(users.DeviceToken, '') <> '' OR
ISNULL(subscribers.DeviceToken, '') <> '')
ORDER BY
m.DateCreated DESC

Few advices:
ISNULL check makes queries much slower, try to avoid
To significantly improve speed, create an index on columns that you filter like "IsSent" & "DateCreated", as well as columns that you group by.
Also index every table with clusterd index on its id column.
https://learn.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described?view=sql-server-ver15
try to avoid twice left join on the same table if its possible. in you case i think you can merge the terms into one line
and finally- from my experience: sometimes its a lot faster to perform 2 queries:
supose you select fields only from 1 big table: first just select the IDs in the first query. and then in the second query select all string fields and other calculations filtering previous IDs.
good luck

Related

SQL aggregation updates for some but not others

I am running this query which should take the sum of an amount from a table and if it <= 0, update the status of a different table from Active to Deactive. The query updates some values but not others. I have isolated to one observation where there are 3 payments that total 0 where it does not work.(123456789) What could be happening here? I am using sql query in Microsoft Access. Thank you.
UPDATE tbl_MASTER INNER JOIN tbl_Payments ON tbl_MASTER.DeviceID = tbl_Payments.DeviceID SET tbl_MASTER.ActiveDeactive = "DeActive"
WHERE tbl_Payments.Amount=(SELECT SUM(tbl_Payments.Amount) <= 0 FROM tbl_Payments) AND tbl__MASTER = '123456789';
Your query doesn't really make a lot of sense, to be honest. Where you have tbl_Payments.Amount=(SELECT SUM(tbl_Payments.Amount) <= 0 FROM tbl_Payments), that sub-query will just be summing up the "Amount" of every record in the table, regardless of which DeviceID. Plus, you're looking for one record in tbl_Payments table where the Amount = the sum of all of the Amounts in tbl_Payments??
I'd suggest that your query probably needs to be something more like this:
UPDATE tbl_MASTER SET tbl_MASTER.ActiveDeactive = "DeActive"
WHERE (SELECT SUM(tbl_Payments.Amount) FROM tbl_Payments WHERE tbl_Payments.DeviceID = tbl_MASTER.DeviceID) <= 0 AND tbl__MASTER = '123456789';
Currently, the subquery does not correlate specific IDs to outer query and also you specify <= 0 inside subquery's SELECT clause. Consider adjusting for IN clause with logic in a conditional HAVING and use table aliases to distinguish same named tables.
UPDATE tbl_MASTER AS m
INNER JOIN tbl_Payments AS p
ON m.DeviceID = p.DeviceID
SET m.ActiveDeactive = 'DeActive'
WHERE sub_p.DeviceID IN (
SELECT sub_p.DevideID
FROM tbl_Payments AS sub_p
GROUP BY sub_p.DeviceID
HAVING SUM(sub_p.Amount) <= 0
)

Is it possible to improve the performance of query with distinct and multiple joins?

There is following query:
SELECT DISTINCT ID, ACCOUNT,
CASE
WHEN p.GeneralLevel = '1' THEN '1'
WHEN p.Level3 IS NULL THEN '2'
WHEN p.Level4 IS NULL THEN '3'
WHEN p.Level5 IS NULL THEN '4'
WHEN p.Level6 IS NULL THEN '5'
WHEN p.Level7 IS NULL THEN '6'
WHEN p.Level8 IS NULL THEN '7'
ELSE '8'
END AS LEVEL,
CASE
WHEN c.codeValueDescription IS NULL THEN p.Level2
ELSE c.codeValueDescription
END AS L2_CODE,
CASE
WHEN d.codeValueDescription IS NULL THEN p.Level3
ELSE d.codeValueDescription
END AS L3_CODE,
CASE
WHEN j.codeValueDescription IS NULL THEN p.Level4
ELSE j.codeValueDescription
END AS L4_CODE,
CASE
WHEN f.codeValueDescription IS NULL THEN p.Level5
ELSE f.codeValueDescription
END AS L5_CODE,
CASE
WHEN g.codeValueDescription IS NULL THEN p.Level6
ELSE g.codeValueDescription
END AS L6_CODE,
CASE
WHEN h.codeValueDescription IS NULL THEN p.Level7
ELSE h.codeValueDescription
END AS L7_CODE,
p.Level8
FROM generic p
LEFT JOIN
(SELECT codeValue, codeValueDescription
FROM codes
WHERE code = '2') c ON p.Level2 = c.codeValue
LEFT JOIN
(SELECT codeValue, codeValueDescription
FROM codes
WHERE code = '3') d ON p.Level3 = d.codeValue
LEFT JOIN
(SELECT codeValue, codeValueDescription
FROM codes
WHERE code = '4') j ON p.Level4 = j.codeValue
LEFT JOIN
(SELECT codeValue, codeValueDescription
FROM codes
WHERE code = '5') f ON p.Level5 = f.codeValue
LEFT JOIN
(SELECT codeValue, codeValueDescription
FROM codes
WHERE code = '3') g ON p.Level6 = g.codeValue //yes, code is 3 again
LEFT JOIN
(SELECT codeValue, codeValueDescription
FROM codes
WHERE code = '3') h ON p.Level7 = h.codeValue //and yes, again code 3 here
Some columns of the table 'generic' (excluded dates and other non-important columns for us):
ID INTEGER NOT NULL,
ACCOUNT VARCHAR(50) NOT NULL,
GeneralLevel1 VARCHAR(50),
Level2 VARCHAR(50),
Level3 VARCHAR(50),
Level4 VARCHAR(50),
Level5 VARCHAR(50),
Level6 VARCHAR(50),
Level7 VARCHAR(50),
Level8 VARCHAR(50)
Simple data:
ID,ACCOUNT_ID,LEVEL_1,LEVEL_2,...LEVEL_8
id1,ACCOUNT_ID1,GENERAL,null,...null
id1,ACCOUNT_ID2,GENERAL,A,...null
id1,ACCOUNT_ID2,GENERAL,B,...null
id2,ACCOUNT_ID1,GENERAL,null,...null
id2,ACCOUNT_ID2,GENERAL,A,...null
id2,ACCOUNT_ID3,GENERAL,B,...H
Current query is running more than 1s, usually it returns between 100 and 1000 records, I want to improve the performance of this query. The idea is to get rid of these LEFT JOINS and somehow rewrite this query to improve performance.
Maybe there are ways to improve this query to fetch data a bit faster? I hope I've provided enough information here. Database is custom, NO_SQL giant under the hood but syntax of our database bridge is very similar to MySQL. Unfortunately, I cannot provide the EXECUTION PLAN of this query because it is processing on the server side and then generate some SQL for which I cannot have an access.
You're doing key/value lookups from your codes tables. Your query contains several of these LEFT JOIN patterns.
FROM generic p
LEFT JOIN
(SELECT codeValue, codeValueDescription
FROM codes
WHERE code = '2') c ON p.Level2 = c.codeValue
LEFT JOIN
(SELECT codeValue, codeValueDescription
FROM codes
WHERE code = '3') d ON p.Level3 = d.codeValue
These LEFT JOINs can be refactored to eliminate the subqueries. This refactoring may signal your intent to your SQL system more clearly. The result looks like this.
FROM generic p
LEFT JOIN codes c ON p.Level2 = c.codeValue AND c.code = '2'
LEFT JOIN codes d ON p.Level3 = d.codeValue AND d.code = '3'
If your SQL system allows indexes, a covering index like this on your codes table will help speed up your key/value lookup.
ALTER TABLE codes ADD INDEX (codeValue, code, codeValueDescription)
Your SELECT clause contains a lot of this sort of thing:
CASE
WHEN c.codeValueDescription IS NULL THEN p.Level2
ELSE c.codeValueDescription
END AS L2_CODE,
CASE
WHEN d.codeValueDescription IS NULL THEN p.Level3
ELSE d.codeValueDescription
END AS L3_CODE
It probably doesn't help much, but this can be simplified by rewriting it as
COALESCE(c.codeValueDescription, p.Level2) AS L2_CODE,
COALESCE(d.codeValueDescription, p.Level3) AS L3_CODE
What happens if you eliminate your DISTINCT qualifier? It probably takes some processing time. If your generic.ID column is the primary key, DISTINCT does you no good at all: those column values don't repeat. (Most modern SQL query planners detect that case and skip the deduplication step, but we don't know how modern your query planner is.)
Your query contains no overall WHERE clause so it necessarily must handle every row in your generic table. And, if that table is large your result set will be large. As I'm sure you know, scanning entire large tables takes time and resources.
All that being said, a millisecond per row for a query like this through a SQL bridge isn't smoking-gun-horrible performance. You may have to live with it. The alternative might be to apply the codes to your data in your application program: slurp the entire codes table then write some application logic to do your CASE / WHEN / THEN or COALESCE work. In other words, move the LEFT JOIN operations to your app. If your SQL bridge is fast at handling dirt-simple SELECT * FROM generic single table queries this will help a lot.

Best way to compare two sets of data w/ SQL

What I have is a query that grabs a set of data. This query is ran at a certain time. Then, 30 minutes later, I have another query (same syntax) that runs and grabs that same set of data. Finally, I have a third query (which is the query in question) that compares both sets of data. The records it pulls out are ones that agree with: if "FEDVIP_Active" was FALSE in the first data set and TRUE in the second data set, OR "UniqueID" didn't exist in the first data set and does in the second data set AND FEDVIP_Active is TRUE. I'm questioning the performance of the query below that does the comparison. It times out after 30 minutes. Is there anything you can see that I shouldn't be doing in order to be the most efficient to run? The two identical-ish data sets I'm comparing have around a million records each.
First query that grabs the initial set of data:
select Unique_ID, First_Name, FEDVIP_Active, Email_Primary
from Master_Subscribers_Prospects
Second query is exactly the same as the first.
Then, the third query below compares the data:
select
a.FEDVIP_Active,
a.Unique_ID,
a.First_Name,
a.Email_Primary
from
Master_Subscribers_Prospects_1 a
inner join
Master_Subscribers_Prospects_2 b
on 1 = 1
where a.FEDVIP_Active = 1 and b.FEDVIP_Active = 0 or
(b.Unique_ID not in (select Unique_ID from Master_Subscribers_Prospects_1) and b.FEDVIP_Active = 1)
If I understand correctly, you want all records from the second data set where the corresponding unique id in the first data set is not active (either by not existing or by having the flag set to not active).
I would suggest exists:
select a.*
from Master_Subscribers_Prospects_1 a
where a.FEDVIP_Active = 1 and
not exists (select 1
from Master_Subscribers_Prospects_2 b
where b.Unique_ID = a.Unique_ID and
b.FEDVIP_Active = 1
);
For performance, you want an index on Master_Subscribers_Prospects_2(Unique_ID, FEDVIP_Active).
An inner join on 1 = 1 is a disguised cross join and the number of rows a cross join produces can grow rapidly. It's the product of the number of rows in both relations involved. For performance you want to keep intermediate results as small as possible.
Then instead of IN EXISTS is often performing better, when the number of rows of the subquery is large.
But I think you don't need IN or EXITS at all.
Assuming unique_id identifies a record and is not null, you could left join the first table to the second one on common unique_ids. Then if and only if no record for an unique_id in the second table exits the unique_id of the first table in the result of the join is null, so you can check for that.
SELECT b.fedvip_active,
b.unique_id,
b.first_name,
b.email_primary
FROM master_subscribers_prospects_2 b
LEFT JOIN master_subscribers_prospects_1 a
ON b.unique_id = a.unique_id
WHERE a.fedvip_active = 1
AND b.fedvip_active = 0
OR a.unique_id IS NULL
AND b.fedvip_active = 1;
For that query indexes on master_subscribers_prospects_1 (unique_id, fedvip_active) and master_subscribers_prospects_2 (unique_id, fedvip_active) might also help to speed things up.
Doing an inner select in where sats is always bad.
Here is a same version with a left join, that might work for you.
select
a.FEDVIP_Active,
a.Unique_ID,
a.First_Name,
a.Email_Primary
from
Master_Subscribers_Prospects_1 a
inner join
Master_Subscribers_Prospects_2 b on 1 = 1
left join Master_Subscribers_Prospects_1 sa on sa.Unique_ID = b.Unique_ID
where (a.FEDVIP_Active = 1 and b.FEDVIP_Active = 0) or
(sa.Unique_ID is null and b.FEDVIP_Active = 1)

SQL Join taking too much time to run

This query shown below is taking almost 2 hrs to run and I want to reduce the execution time of this query. Any help would be really helpful for me.
Currently:
If Exists (Select 1
From PRODUCTS prd
Join STORE_RANGE_GRP_MATCH srg On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID
And srg.Match_Flag = 'Y'
And prd.Range_Event_Id = srg.LAR_Range_Event_Id
Where srg.Range_Event_Id Not IN (Select distinct Range_Event_Id
From Last_Authorised_Range)
)
I have tried replacing the Not IN clause by Not Exists and Left join but no luck in runtime execution.
What I have used:
If Exists( Select top 1 *
From PRODUCTS prd
Join STORE srg
On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID
And srg.Match_Flag = 'Y'
And prd.Range_Event_Id = srg.LAR_Range_Event_Id
and srg.Range_Event_Id ='45655'
Where NOT EXISTS (Select top 1 *
From Last_Authorised_Range where Range_Event_Id=srg.Range_Event_Id)
)
Product table has 432837 records and the Store table also has almost the same number of records. This table I am creating in the stored procedure itself and then dropping it in the end in the stored procedure.
Create Table PRODUCTS
(
Range_Event_Id int,
Store_Range_Grp_Id int,
Ranging_Prod_No nvarchar(14) collate database_default,
Space_Break_Code nchar(1) collate database_default
)
Create Clustered Index Idx_tmpLAR_PRODUCTS
ON PRODUCTS (Range_Event_Id, Ranging_Prod_No, Store_Range_Grp_Id, Space_Break_Code)
Should I use non clustered index on this table or what all can I do to lessen the execution time? Thanks in advance
First, you don't need top 1 or distinct in exists and in subqueries. But this shouldn't affect performance.
This is the query, slightly re-arranged so I can understand it better:
Select 1
From PRODUCTS prd Join
STORE srg
On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID and
prd.Range_Event_Id = srg.LAR_Range_Event_Id
Where srg.Match_Flag = 'Y'
srg.Range_Event_Id = 45655 and
Where NOT EXISTS (Select 1
From Last_Authorised_Range lar
where lar.Range_Event_Id = srg.Range_Event_Id)
)
Do note that I removed the double quotes around 45655. I presume this column is actually a number. If so, don't confuse yourself and the optimizer by using a string for the comparison.
Then, try indexes. I think the best indexes are:
store(Range_Event_Id, Match_Flag, Orig_Store_Range_Grp_ID, LAR_Range_Event_Id)
products(Store_Range_Grp_Id, Range_Event_Id) (or any index, clustered or otherwise, that starts with these two columns in either order)
Last_Authorised_Range(Range_Event_Id)
From what you describe as the volume of data, your query should not be taking hours. I think indexes can help.

Why is Selecting From Table Variable Far Slower than List of Integers

I have a pretty big MSSQL stored procedure that I need to conditionally check for certain IDs:
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where b.SomeID in (1,2,3,4,5)
I wanted to conditionally check the SomeID field, so I did the following:
if #enteredText = 'This'
INSERT INTO #AwesomeIDs
VALUES(1),(2),(3)
if #enteredText = 'That'
INSERT INTO #AwesomeIDs
VALUES(4),(5)
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where b.SomeID in (Select ID from #AwesomeIDs)
Nothing else has changed, yet I can't even get the latter query to grab 5 records. The top query returns 5000 records in less than 3 seconds. Why is selecting from a table variable so much drastically slower?
Two other possible options you can consider
Option 1
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where
( b.SomeID IN (1,2,3) AND #enteredText = 'This')
OR
( b.SomeID IN (4,5) AND #enteredText = 'That')
Option 2
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where EXISTS (Select 1
from #AwesomeIDs
WHERE b.SomeID = ID)
Mind you for Table variables , SQL Server always assumes there is only ONE row in the table (except sql 2014 , assumption is 100 rows) and it can affect the estimated and actual plans. But 1 row against 3 not really a deal breaker.