Is it possible to improve the performance of query with distinct and multiple joins?

Is it possible to improve the performance of query with distinct and multiple joins? - sql

There is following query:
SELECT DISTINCT ID, ACCOUNT,
CASE
WHEN p.GeneralLevel = '1' THEN '1'
WHEN p.Level3 IS NULL THEN '2'
WHEN p.Level4 IS NULL THEN '3'
WHEN p.Level5 IS NULL THEN '4'
WHEN p.Level6 IS NULL THEN '5'
WHEN p.Level7 IS NULL THEN '6'
WHEN p.Level8 IS NULL THEN '7'
ELSE '8'
END AS LEVEL,
CASE
WHEN c.codeValueDescription IS NULL THEN p.Level2
ELSE c.codeValueDescription
END AS L2_CODE,
CASE
WHEN d.codeValueDescription IS NULL THEN p.Level3
ELSE d.codeValueDescription
END AS L3_CODE,
CASE
WHEN j.codeValueDescription IS NULL THEN p.Level4
ELSE j.codeValueDescription
END AS L4_CODE,
CASE
WHEN f.codeValueDescription IS NULL THEN p.Level5
ELSE f.codeValueDescription
END AS L5_CODE,
CASE
WHEN g.codeValueDescription IS NULL THEN p.Level6
ELSE g.codeValueDescription
END AS L6_CODE,
CASE
WHEN h.codeValueDescription IS NULL THEN p.Level7
ELSE h.codeValueDescription
END AS L7_CODE,
p.Level8
FROM generic p
LEFT JOIN
(SELECT codeValue, codeValueDescription
FROM codes
WHERE code = '2') c ON p.Level2 = c.codeValue
LEFT JOIN
(SELECT codeValue, codeValueDescription
FROM codes
WHERE code = '3') d ON p.Level3 = d.codeValue
LEFT JOIN
(SELECT codeValue, codeValueDescription
FROM codes
WHERE code = '4') j ON p.Level4 = j.codeValue
LEFT JOIN
(SELECT codeValue, codeValueDescription
FROM codes
WHERE code = '5') f ON p.Level5 = f.codeValue
LEFT JOIN
(SELECT codeValue, codeValueDescription
FROM codes
WHERE code = '3') g ON p.Level6 = g.codeValue //yes, code is 3 again
LEFT JOIN
(SELECT codeValue, codeValueDescription
FROM codes
WHERE code = '3') h ON p.Level7 = h.codeValue //and yes, again code 3 here
Some columns of the table 'generic' (excluded dates and other non-important columns for us):
ID INTEGER NOT NULL,
ACCOUNT VARCHAR(50) NOT NULL,
GeneralLevel1 VARCHAR(50),
Level2 VARCHAR(50),
Level3 VARCHAR(50),
Level4 VARCHAR(50),
Level5 VARCHAR(50),
Level6 VARCHAR(50),
Level7 VARCHAR(50),
Level8 VARCHAR(50)
Simple data:
ID,ACCOUNT_ID,LEVEL_1,LEVEL_2,...LEVEL_8
id1,ACCOUNT_ID1,GENERAL,null,...null
id1,ACCOUNT_ID2,GENERAL,A,...null
id1,ACCOUNT_ID2,GENERAL,B,...null
id2,ACCOUNT_ID1,GENERAL,null,...null
id2,ACCOUNT_ID2,GENERAL,A,...null
id2,ACCOUNT_ID3,GENERAL,B,...H
Current query is running more than 1s, usually it returns between 100 and 1000 records, I want to improve the performance of this query. The idea is to get rid of these LEFT JOINS and somehow rewrite this query to improve performance.
Maybe there are ways to improve this query to fetch data a bit faster? I hope I've provided enough information here. Database is custom, NO_SQL giant under the hood but syntax of our database bridge is very similar to MySQL. Unfortunately, I cannot provide the EXECUTION PLAN of this query because it is processing on the server side and then generate some SQL for which I cannot have an access.

You're doing key/value lookups from your codes tables. Your query contains several of these LEFT JOIN patterns.
FROM generic p
LEFT JOIN
(SELECT codeValue, codeValueDescription
FROM codes
WHERE code = '2') c ON p.Level2 = c.codeValue
LEFT JOIN
(SELECT codeValue, codeValueDescription
FROM codes
WHERE code = '3') d ON p.Level3 = d.codeValue
These LEFT JOINs can be refactored to eliminate the subqueries. This refactoring may signal your intent to your SQL system more clearly. The result looks like this.
FROM generic p
LEFT JOIN codes c ON p.Level2 = c.codeValue AND c.code = '2'
LEFT JOIN codes d ON p.Level3 = d.codeValue AND d.code = '3'
If your SQL system allows indexes, a covering index like this on your codes table will help speed up your key/value lookup.
ALTER TABLE codes ADD INDEX (codeValue, code, codeValueDescription)
Your SELECT clause contains a lot of this sort of thing:
CASE
WHEN c.codeValueDescription IS NULL THEN p.Level2
ELSE c.codeValueDescription
END AS L2_CODE,
CASE
WHEN d.codeValueDescription IS NULL THEN p.Level3
ELSE d.codeValueDescription
END AS L3_CODE
It probably doesn't help much, but this can be simplified by rewriting it as
COALESCE(c.codeValueDescription, p.Level2) AS L2_CODE,
COALESCE(d.codeValueDescription, p.Level3) AS L3_CODE
What happens if you eliminate your DISTINCT qualifier? It probably takes some processing time. If your generic.ID column is the primary key, DISTINCT does you no good at all: those column values don't repeat. (Most modern SQL query planners detect that case and skip the deduplication step, but we don't know how modern your query planner is.)
Your query contains no overall WHERE clause so it necessarily must handle every row in your generic table. And, if that table is large your result set will be large. As I'm sure you know, scanning entire large tables takes time and resources.
All that being said, a millisecond per row for a query like this through a SQL bridge isn't smoking-gun-horrible performance. You may have to live with it. The alternative might be to apply the codes to your data in your application program: slurp the entire codes table then write some application logic to do your CASE / WHEN / THEN or COALESCE work. In other words, move the LEFT JOIN operations to your app. If your SQL bridge is fast at handling dirt-simple SELECT * FROM generic single table queries this will help a lot.

Related

Need to optimise select query

I have a query that does a select with joins from multiple tables that contains in total about 90 million rows. I only need data from the last 30 days. The problem is that when I run the select query the sql server throws a timeout while the query is running and new records are not created during this time frame. This query takes about 5 seconds to complete.
I would like to optimise this query so that it wont go through the entire tables looking at the datetime and would only search from the latest entries.
Right now it seems that I would need to index datetime column. Please advise if I need to create indexes or if there is another way to optimise this query.
SELECT [table1].Column1 AS InvoiceNo,
'ND' AS VATRegistrationNumber,
'ND' AS RegistrationNumber,
Column2 AS Country,
[table2].Column3 + ' ' + [table2].Column4 AS Name,
CAST([table1].Column5 AS date) AS InvoiceDate,
'SF' AS InvoiceType,
'' AS SpecialTaxation,
'' AS VATPointDate,
ROUND([table1Line].Column6, 2) AS TaxableValue,
CASE
WHEN [table1Line].Column7 = 9 THEN 'PVM2'
WHEN [table1Line].Column7 = 21 THEN 'PVM1'
WHEN [table1Line].Column7 = 0 THEN 'PVM14'
END AS TaxCode,
CAST([table1Line].Column7 AS int) AS TaxPercentage,
table1Line.Column8 - ROUND([table1Line].Column6, 2) AS Amount,
'' AS VATPointDate2,
[table1].Column1 AS InvoiceNo,
'' AS ReferenceNo,
'' AS ReferenceDate,
[table1].CustomerPersonID AS CustomerID
FROM [table1]
INNER JOIN [table2] ON [table1].CustomerPersonID = [table2].ID
INNER JOIN [table3] ON [table2].Column9 = [table3].ID
INNER JOIN [table1Line] ON [table1].ID = [table1Line].table1ID
INNER JOIN [table4] ON table1Line.TaxID = Tax.ID
INNER JOIN [table5] ON [table1].CompanyID = Company.ID
INNER JOIN table6 ON [table1].SalesChannelID = table6.ID
WHERE Column5 LIKE '%date%'
AND table6.id = 5
OR table6.id = 2
AND Column5 LIKE '%date%'
ORDER BY Column5 DESC;

First things first, each database runs a little differently because the optimizer has been running and figuring out how the unique circumstances can be improved and continuously tries to make common things run better.
There's also versioning differences that also play a part is the performance of the server.
Besides that stuff, Here's a few things to do to optimize this query.
When working with Joins, Your Joined table comes first then compare against the already specified table.
For example t2 checks against t1:
select t1.name, t2.car
from customers as t1
left join purchases as t2
on t2.customerid = t1.customerid
The next thing I see is the Like condition in the Where part of the code.
The stored date that it's finding is stored as text in your example.
I would recommend processing the date as a datetime instead of a string type of datatype.
I would include that in the code below, but I'm not sure what the format looks like for your string of text.
%date% is the same thing as saying "Contains date".
This takes the date string, and tries to see if it matches in every position of characters from left to right.
So if your date text is 20200130, it will check to see if it matches 2date0200130, then tries 20date200130, then tries 202date00130, etc.
It will significantly increase the time it takes to process.
I also see that the date is being searched accidently two times instead of one.
I would recommend doing:
WHERE LTRIM(RTRIM(Column5)) LIKE 'date'
As for the Inner Joins, I would not use them.
Use the Left join, and then in the Where, I would make sure it had no Null values for that joined data.
This makes the Left Join work the same as the Inner Join and runs more optimally when you are running the query.
For Instance, the first Join would look like this:
FROM [table1]
LEFT JOIN [table2] ON [table2].ID = [table1].CustomerPersonID
WHERE table2.id IS NOT NULL
I see an error in the code in the Where statement:
AND table6.id = 5
OR tables6.id = 2
This should be:
AND (tables6.id = 5 OR tables6.id = 2)
So here should be an optimized version of your code:
SELECT [table1].Column1 AS InvoiceNo,
'ND' AS VATRegistrationNumber,
'ND' AS RegistrationNumber,
Column2 AS Country,
[table2].Column3 + ' ' + [table2].Column4 AS Name,
CAST([table1].Column5 AS date) AS InvoiceDate,
'SF' AS InvoiceType,
'' AS SpecialTaxation,
'' AS VATPointDate,
ROUND([table1Line].Column6, 2) AS TaxableValue,
(CASE WHEN [table1Line].Column7 = 9 THEN 'PVM2'
WHEN [table1Line].Column7 = 21 THEN 'PVM1'
WHEN [table1Line].Column7 = 0 THEN 'PVM14'
ELSE '' END ) AS TaxCode,
CAST([table1Line].Column7 AS int) AS TaxPercentage,
table1Line.Column8 - ROUND([table1Line].Column6, 2) AS Amount,
'' AS VATPointDate2,
[table1].Column1 AS InvoiceNo,
'' AS ReferenceNo,
'' AS ReferenceDate,
[table1].CustomerPersonID AS CustomerID
FROM [table1]
LEFT JOIN [table2] ON [table2].ID = [table1].CustomerPersonID
LEFT JOIN [table3] ON [table3].ID = [table2].Column9
LEFT JOIN [table1Line] ON [table1Line].table1ID = [table1].ID
LEFT JOIN [table4] ON [table4].ID = table1Line.TaxID
LEFT JOIN [table5] ON [table5].ID = [table1].CompanyID
LEFT JOIN [table6] ON table6.ID = [table1].SalesChannelID
WHERE table2.ID IS NOT null
AND table3.ID IS NOT null
AND table1Line.ID IS NOT null
AND table4.ID IS NOT null
AND table5.ID IS NOT null
AND table6.ID IS NOT null
AND LTRIM(RTRIM(Column5)) LIKE 'date'
AND (table6.id = 5 OR table6.id = 2)
ORDER BY Column5 DESC;

SQL Server : stored procedure is slow when running two left join from one table

I have a stored procedure that runs a query to get some data coupe of rows not that big of tables that has two left joins from the same table but is acting slow and taking up to 300 ms with 6 to 20 rows in each table.
How can I optimize this stored procedure?
SELECT
m.MobileNotificationID,
m.[Message] AS text,
m.TypeId AS typeId ,
m.MobileNotificationID AS recordId ,
0 badge ,
m.DeviceID,
ISNULL(users.DeviceToken, subscribers.DeviceToken) DeviceToken,
ISNULL(users.DeviceTypeID, subscribers.DeviceTypeID) DeviceTypeID,
m.Notes,
isSent = 0
--, m.SubscriberID, m.UserID
FROM
MobileNotification m
LEFT JOIN
Device users ON m.userId = users.UserID
AND users.DeviceID = m.DeviceID
LEFT JOIN
Device subscribers ON m.SubscriberID = subscribers.SubscriberId
AND subscribers.DeviceID = m.DeviceID
WHERE
IsSent = 0
AND m.DateCreated <= (SELECT GETDATE())
AND (0 = 0 OR ISNULL(users.DeviceTypeID, subscribers.DeviceTypeID) = 0)
AND (ISNULL(users.DeviceToken, '') <> '' OR
ISNULL(subscribers.DeviceToken, '') <> '')
ORDER BY
m.DateCreated DESC

Few advices:
ISNULL check makes queries much slower, try to avoid
To significantly improve speed, create an index on columns that you filter like "IsSent" & "DateCreated", as well as columns that you group by.
Also index every table with clusterd index on its id column.
https://learn.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described?view=sql-server-ver15
try to avoid twice left join on the same table if its possible. in you case i think you can merge the terms into one line
and finally- from my experience: sometimes its a lot faster to perform 2 queries:
supose you select fields only from 1 big table: first just select the IDs in the first query. and then in the second query select all string fields and other calculations filtering previous IDs.
good luck

SQL Join taking too much time to run

This query shown below is taking almost 2 hrs to run and I want to reduce the execution time of this query. Any help would be really helpful for me.
Currently:
If Exists (Select 1
From PRODUCTS prd
Join STORE_RANGE_GRP_MATCH srg On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID
And srg.Match_Flag = 'Y'
And prd.Range_Event_Id = srg.LAR_Range_Event_Id
Where srg.Range_Event_Id Not IN (Select distinct Range_Event_Id
From Last_Authorised_Range)
)
I have tried replacing the Not IN clause by Not Exists and Left join but no luck in runtime execution.
What I have used:
If Exists( Select top 1 *
From PRODUCTS prd
Join STORE srg
On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID
And srg.Match_Flag = 'Y'
And prd.Range_Event_Id = srg.LAR_Range_Event_Id
and srg.Range_Event_Id ='45655'
Where NOT EXISTS (Select top 1 *
From Last_Authorised_Range where Range_Event_Id=srg.Range_Event_Id)
)
Product table has 432837 records and the Store table also has almost the same number of records. This table I am creating in the stored procedure itself and then dropping it in the end in the stored procedure.
Create Table PRODUCTS
(
Range_Event_Id int,
Store_Range_Grp_Id int,
Ranging_Prod_No nvarchar(14) collate database_default,
Space_Break_Code nchar(1) collate database_default
)
Create Clustered Index Idx_tmpLAR_PRODUCTS
ON PRODUCTS (Range_Event_Id, Ranging_Prod_No, Store_Range_Grp_Id, Space_Break_Code)
Should I use non clustered index on this table or what all can I do to lessen the execution time? Thanks in advance

First, you don't need top 1 or distinct in exists and in subqueries. But this shouldn't affect performance.
This is the query, slightly re-arranged so I can understand it better:
Select 1
From PRODUCTS prd Join
STORE srg
On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID and
prd.Range_Event_Id = srg.LAR_Range_Event_Id
Where srg.Match_Flag = 'Y'
srg.Range_Event_Id = 45655 and
Where NOT EXISTS (Select 1
From Last_Authorised_Range lar
where lar.Range_Event_Id = srg.Range_Event_Id)
)
Do note that I removed the double quotes around 45655. I presume this column is actually a number. If so, don't confuse yourself and the optimizer by using a string for the comparison.
Then, try indexes. I think the best indexes are:
store(Range_Event_Id, Match_Flag, Orig_Store_Range_Grp_ID, LAR_Range_Event_Id)
products(Store_Range_Grp_Id, Range_Event_Id) (or any index, clustered or otherwise, that starts with these two columns in either order)
Last_Authorised_Range(Range_Event_Id)
From what you describe as the volume of data, your query should not be taking hours. I think indexes can help.

Creating a view in SQL with a case statement in the select

I know something must be wrong with my syntax but I can't seem to figure it out.
I want to populate this column prop_and_cas_dtl_prdct_desc from either expir_prop_and_cas_dtl_prdct_cd or ren_prop_and_cas_dtl_prdct_cd depending on the value in type_indicator but before it goes into prop_and_cas_dtl_prdct_desc it should look up the prop_and_cas_dtl_prdct_desc from pc_ref_detail_product_cd and select the one corresponding to its expir_prop_and_cas_dtl_prdct_cd or ren_prop_and_cas_dtl_prdct_cd.
I apologize for the terrible indenting, I know it is difficult to read but this is the best way I know how to put it.
select
,ren_prop_and_cas_dtl_prdct_cd
...
,p_and_c_cd
,case when type_indicator in ('R','C') then
select prop_and_cas_dtl_prdct_desc
from pc_ref_detail_product_cd a inner join op_pif_coverage_rpc_new b
on a.prop_and_cas_dtl_prdct_cd = b.expir_prop_and_cas_dtl_prdct_cd
else when type_indicator in ('N','O') then
select prop_and_cas_dtl_prdct_desc
from pc_ref_detail_product_cd a inner join op_pif_coverage_rpc_new b
on a.prop_and_cas_dtl_prdct_cd = b.ren_prop_and_cas_dtl_prdct_cd
else NULL
END
AS prop_and_cas_dtl_prdct_desc
FROM dbo.op_pif_coverage_rpc_new
Here is the code I used to create my reference table
create table pc_ref_detail_product_cd(
prop_and_cas_dtl_prdct_cd char(2),
prop_and_cas_dtl_prdct_desc char(30)
)
insert into pc_ref_detail_product_cd (prop_and_cas_dtl_prdct_cd, prop_and_cas_dtl_prdct_desc)
values ('01', 'CORE'),
('02', 'FORECLOSED'),
('04', 'TRUST'),
('06', 'MORTGAGE HOLDERS E&O'),
('07', 'SECURITY INTEREST E&O')

If you need to select column from different table depending on value in additional column you need to include all tables in query, with appropriate JOIN and than use case statement like so
SELECT CASE WHEN a.MyColumn = 0 THEN b.SomeColumn
WHEN a.MyColumn = 1 THEN a.SomeColumn
END AS SomeColumn
FROM MyTableA AS a
JOIN MyTableB AS b
ON a.ID = b.ID
Instead of select statement in case statement, you just going to select column from ether table that you need for each particular case.

I got it figured out. Here is the SQL I used. Sorry if it wasn't apparent what I wanted to do from my horrible code in the original question.
select
,ren_prop_and_cas_dtl_prdct_cd
...
,p_and_c_cd
,case when type_indicator in ('R','C') then
(select prop_and_cas_dtl_prdct_desc
from pc_ref_detail_product_cd a where expir_prop_and_cas_dtl_prdct_cd = prop_and_cas_dtl_prdct_cd)
when type_indicator in ('N','O') then
(select prop_and_cas_dtl_prdct_desc
prop_and_cas_dtl_prdct_desc
from pc_ref_detail_product_cd a where ren_prop_and_cas_dtl_prdct_cd = prop_and_cas_dtl_prdct_cd)
else NULL
END
AS prop_and_cas_dtl_prdct_desc
FROM dbo.op_pif_coverage_rpc_new

SQL Query + more efficient alternative

I have a query which involves 2 tables 'Coupons' and 'CouponUsedLog' in SQL Server, the query below will obtain some information from these 2 tables for statistics study use. Somehow I feel that while my query works and returns me the desired results, I feel that I can be written in a more efficient way, can someone please advice if there's a better way to rewrite this? Am I using too many unnecessary variables and joins? Thanks.
DECLARE #CouponISSUED int=null
DECLARE #CouponUSED int=null
DECLARE #CouponAVAILABLE int=null
DECLARE #CouponEXPIRED int=null
DECLARE #CouponLastUsed Date=null
--Total CouponIssued
SET #CouponISSUED =
(
select count(*)
from Coupon C Left Join
couponusedlog CU on C.autoid = CU.Coupon_AutoID
where C.VoidedBy is null and
C.VoidedOn is null and
DeletedBy is null and
DeletedOn is null and
Card_AutoID in (Select AutoID
from Card
where MemberID = 'Mem001')
)
--Total CouponUsed
SET #CouponUSED =
(
select count(*)
from couponusedlog CU Left Join
Coupon C on CU.Coupon_AutoID = V.autoid
where CU.VoidedBy is null and
CU.VoidedOn is null and
C.Card_AutoID in (select AutoID
from Card
where MemberID = 'Mem001')
)
SET #CouponAVAILABLE = #CouponISSUED - #CouponUSED
--Expired Coupons
SET #CouponEXPIRED =
(
select Count(*)
from Coupon C Left Join
couponusedlog CU on C.autoid = CU.Coupon_AutoID
where C.VoidedBy is null and
C.VoidedOn is null and
deletedBy is null and
deletedOn is null and
Card_AutoID in (select AutoID
from Card
where MemberID = 'Mem002') and
CONVERT (date, getdate()) > C.expirydate
)
--Last Used On
SET #CouponLastUsed =
(
select CONVERT(varchar(10),
Max(VU.AddedOn), 103) AS [DD/MM/YYYY]
from couponusedlog CU Left Join
coupon C on CU.Coupon_AutoID = C.autoid
where CU.voidedBy is null and
CU.voidedOn is null and
C.Card_AutoID in (select AutoID
from Card
where MemberID = 'Mem002')
)
Select #CouponISSUED As Coupon_Issued,
#CouponUSED As Coupon_Used,
#CouponAVAILABLE As Coupon_Available,
#CouponEXPIRED As Coupon_Expired,
#CouponLastUsed As Last_Coupon_UsedOn

In general its better to do things in a single query if you you're just looking for counts of things particularly against nearly the same data set then in four separate queries.
This query combines what you need into a single query by converting your WHERE Clauses into SUMS of CASE statements. The MAX of the date is just a normal thing you can do when you're doing a count or a sum.
SELECT COUNT(*) couponissued,
SUM(CASE
WHEN deletedby IS NULL
AND deletedon IS NULL THEN 1
ELSE 0
END) AS couponused,
SUM(CASE
WHEN deletedby IS NULL
AND deletedon IS NULL
AND Getdate() > c.expirydate THEN 1
ELSE 0
END) AS couponex,
MAX(vu.addedon) CouponEXPIRED
FROM [couponusedlog] cu
LEFT JOIN [Coupon] c
ON ( cu.coupon_autoid = v.autoid )
WHERE cu.voidedby IS NULL
AND cu.voidedon IS NULL
AND ( c.card_autoid IN (SELECT [AutoID]
FROM [Card]
WHERE memberid = 'Mem001') )
You can then convert that into a Common Table Expression to do your subtraction and formatting

Are you asking this question out of a proactive desire to be as effecient as possible, or because of an actual performance issue you would like to correct? You can make this more effecient at the cost of having code that is harder to manage. If the performance is okay right now I would highly recommend you leave it because the next person to come along will be able to understand it just fine. If you make one huge effecient but garbled sql statement out of it then when you or anyone else wants to update something about it it's going to take you 3 times longer as you try to re-figure out what the heck you were thinking when you wrote it.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Is it possible to improve the performance of query with distinct and multiple joins? - sql

Related

Need to optimise select query

SQL Server : stored procedure is slow when running two left join from one table

SQL Join taking too much time to run

Creating a view in SQL with a case statement in the select

SQL Query + more efficient alternative

Categories

Resources