How do I optimize this sql query that takes forever? - sql

I'm trying to optimize this query as it takes 30 seconds to execute:
SELECT
TOP 100 table1.*
FROM
table1 (NOLOCK)
INNER JOIN
DB1..table2 (NOLOCK)
ON DB1..table2.id = DB1..table1.id
INNER JOIN
DB1..table2_batch (NOLOCK)
ON DB1..table2_batch.table2_batch_id = DB1..table2.table2_batch_id
INNER JOIN
DB2..table4 (NOLOCK)
ON CASE
WHEN CHARINDEX(':',
table2_batch.reference_number,
3) > 3 THEN SUBSTRING(table2_batch.reference_number,
3,
CHARINDEX(':',
table2_batch.reference_number,
3) -3 )
ELSE
RIGHT(table2_batch.reference_number,
LEN(table2_batch.reference_number) -2)
end = Cast(table4.PurchaseOrderID as int) INNER JOIN
DB2..table3 (NOLOCK)
ON DB2..table3.key = DB2..table4.key
WHERE
table1.id IS NOT NULL
AND (
table1.id!=''
OR table1.id IS NULL
)
AND DB2..table3.AccountTypeID != 30000
AND CHARINDEX('O', DB1..table2_batch.reference_number) = 1
AND table1_id NOT IN (
select
lim.table1_id
from
link_table1_message as lim (nolock)
inner join
table1 as i (nolock)
on i.table1_id = lim.table1_id
where
lim.message_id >= 90
)
ORDER BY
last_hit DESC

Based on this comment:
The bottleneck is on the DB2..table4 join. When I take that part out it does the query really fast ... but I really need that join in there to make sure the proper condition is met for this scenario.
It's a little bit of a guess because we don't know how many rows are involved, nor what your schema looks like, but making a few assumptions, this could possibly help:
INNER JOIN
DB2..table4 (NOLOCK)
ON CASE
WHEN CHARINDEX(':',
table2_batch.reference_number,
3) > 3 THEN SUBSTRING(table2_batch.reference_number,
3,
CHARINDEX(':',
table2_batch.reference_number,
3) -3 )
ELSE
RIGHT(table2_batch.reference_number,
LEN(table2_batch.reference_number) -2)
end = Cast(table4.PurchaseOrderID as int)
That's a lot of logic to put in a join I find.
You have table2_batch that has a reference_number that's encoding a value that you'll end up needing to extract and use to join with table4.PurchaseOrderID; I'd start by selecting that first:
with cteTable2 as (
select
table2_batch_id
,reference_number
,case when charindex(':',reference_number,3) > 3
then substring(reference_number,3,charindex(':',reference_number,3) - 3)
else right(reference_number,len(reference_number) - 2)
end PurchaseOrderID
from table2_batch
)
And then you can treat cteTable2 as a full-fledged table in the select statement that follows, so instead of joining with table2_batch, you join with cteTable2:
inner join DB2..table4 (NOLOCK)
on cteTable2.PurchaseOrderID = cast(table4.PurchaseOrderID as int)
...and is that cast really needed?

Related

Need to optimise select query

I have a query that does a select with joins from multiple tables that contains in total about 90 million rows. I only need data from the last 30 days. The problem is that when I run the select query the sql server throws a timeout while the query is running and new records are not created during this time frame. This query takes about 5 seconds to complete.
I would like to optimise this query so that it wont go through the entire tables looking at the datetime and would only search from the latest entries.
Right now it seems that I would need to index datetime column. Please advise if I need to create indexes or if there is another way to optimise this query.
SELECT [table1].Column1 AS InvoiceNo,
'ND' AS VATRegistrationNumber,
'ND' AS RegistrationNumber,
Column2 AS Country,
[table2].Column3 + ' ' + [table2].Column4 AS Name,
CAST([table1].Column5 AS date) AS InvoiceDate,
'SF' AS InvoiceType,
'' AS SpecialTaxation,
'' AS VATPointDate,
ROUND([table1Line].Column6, 2) AS TaxableValue,
CASE
WHEN [table1Line].Column7 = 9 THEN 'PVM2'
WHEN [table1Line].Column7 = 21 THEN 'PVM1'
WHEN [table1Line].Column7 = 0 THEN 'PVM14'
END AS TaxCode,
CAST([table1Line].Column7 AS int) AS TaxPercentage,
table1Line.Column8 - ROUND([table1Line].Column6, 2) AS Amount,
'' AS VATPointDate2,
[table1].Column1 AS InvoiceNo,
'' AS ReferenceNo,
'' AS ReferenceDate,
[table1].CustomerPersonID AS CustomerID
FROM [table1]
INNER JOIN [table2] ON [table1].CustomerPersonID = [table2].ID
INNER JOIN [table3] ON [table2].Column9 = [table3].ID
INNER JOIN [table1Line] ON [table1].ID = [table1Line].table1ID
INNER JOIN [table4] ON table1Line.TaxID = Tax.ID
INNER JOIN [table5] ON [table1].CompanyID = Company.ID
INNER JOIN table6 ON [table1].SalesChannelID = table6.ID
WHERE Column5 LIKE '%date%'
AND table6.id = 5
OR table6.id = 2
AND Column5 LIKE '%date%'
ORDER BY Column5 DESC;
First things first, each database runs a little differently because the optimizer has been running and figuring out how the unique circumstances can be improved and continuously tries to make common things run better.
There's also versioning differences that also play a part is the performance of the server.
Besides that stuff, Here's a few things to do to optimize this query.
When working with Joins, Your Joined table comes first then compare against the already specified table.
For example t2 checks against t1:
select t1.name, t2.car
from customers as t1
left join purchases as t2
on t2.customerid = t1.customerid
The next thing I see is the Like condition in the Where part of the code.
The stored date that it's finding is stored as text in your example.
I would recommend processing the date as a datetime instead of a string type of datatype.
I would include that in the code below, but I'm not sure what the format looks like for your string of text.
%date% is the same thing as saying "Contains date".
This takes the date string, and tries to see if it matches in every position of characters from left to right.
So if your date text is 20200130, it will check to see if it matches 2date0200130, then tries 20date200130, then tries 202date00130, etc.
It will significantly increase the time it takes to process.
I also see that the date is being searched accidently two times instead of one.
I would recommend doing:
WHERE LTRIM(RTRIM(Column5)) LIKE 'date'
As for the Inner Joins, I would not use them.
Use the Left join, and then in the Where, I would make sure it had no Null values for that joined data.
This makes the Left Join work the same as the Inner Join and runs more optimally when you are running the query.
For Instance, the first Join would look like this:
FROM [table1]
LEFT JOIN [table2] ON [table2].ID = [table1].CustomerPersonID
WHERE table2.id IS NOT NULL
I see an error in the code in the Where statement:
AND table6.id = 5
OR tables6.id = 2
This should be:
AND (tables6.id = 5 OR tables6.id = 2)
So here should be an optimized version of your code:
SELECT [table1].Column1 AS InvoiceNo,
'ND' AS VATRegistrationNumber,
'ND' AS RegistrationNumber,
Column2 AS Country,
[table2].Column3 + ' ' + [table2].Column4 AS Name,
CAST([table1].Column5 AS date) AS InvoiceDate,
'SF' AS InvoiceType,
'' AS SpecialTaxation,
'' AS VATPointDate,
ROUND([table1Line].Column6, 2) AS TaxableValue,
(CASE WHEN [table1Line].Column7 = 9 THEN 'PVM2'
WHEN [table1Line].Column7 = 21 THEN 'PVM1'
WHEN [table1Line].Column7 = 0 THEN 'PVM14'
ELSE '' END ) AS TaxCode,
CAST([table1Line].Column7 AS int) AS TaxPercentage,
table1Line.Column8 - ROUND([table1Line].Column6, 2) AS Amount,
'' AS VATPointDate2,
[table1].Column1 AS InvoiceNo,
'' AS ReferenceNo,
'' AS ReferenceDate,
[table1].CustomerPersonID AS CustomerID
FROM [table1]
LEFT JOIN [table2] ON [table2].ID = [table1].CustomerPersonID
LEFT JOIN [table3] ON [table3].ID = [table2].Column9
LEFT JOIN [table1Line] ON [table1Line].table1ID = [table1].ID
LEFT JOIN [table4] ON [table4].ID = table1Line.TaxID
LEFT JOIN [table5] ON [table5].ID = [table1].CompanyID
LEFT JOIN [table6] ON table6.ID = [table1].SalesChannelID
WHERE table2.ID IS NOT null
AND table3.ID IS NOT null
AND table1Line.ID IS NOT null
AND table4.ID IS NOT null
AND table5.ID IS NOT null
AND table6.ID IS NOT null
AND LTRIM(RTRIM(Column5)) LIKE 'date'
AND (table6.id = 5 OR table6.id = 2)
ORDER BY Column5 DESC;

Query result not working with array type in PostgreSQL

I have two tables which contains a column with data type array in PostgreSQL. The structure is like below:
tbl_tour_packages
tbl_header_images
I have a query which contains several joins. The query is working fine with other joins and showing no error. But missing the values from tbl_header_images.
The query is:
SELECT
t1.tour_id AS pid,
t1.tour_name AS title,
t1.tour_duration AS nights,
t1.tour_price_full AS price,
t1.discount AS discount,
t1.tour_seo_title AS seo,
t3.category AS category,
t4.image_names[1] AS image_url,
CASE WHEN max(s.state_name) IS NULL THEN NULL ELSE array_agg(s.state_name) END AS state,
CASE WHEN max(o.destination) IS NULL THEN NULL ELSE array_agg(o.destination) END AS destinations
FROM tbl_tour_packages t1
LEFT JOIN tbl_countries t2 ON t1.tour_country_iso = t2.iso
LEFT JOIN tbl_categories t3 on t1.tour_category_id = t3.id
LEFT JOIN tbl_header_images t4 ON t1.tour_id = t4.package_id
LEFT JOIN tbl_states AS s ON (t1.tour_state #> array[s.state_code])
LEFT JOIN tbl_destinations AS o ON (t1.tour_destination #> array[o.id])
WHERE t1.tour_status = 1
GROUP BY 1,7,8
ORDER BY view_count ASC LIMIT 6
I want to get the 'image_name' from tbl_header_images. Any quick help or suggestion will be appreciated.
before WHERE clause you should be able to do something like:
, unnest(image_names) _image_names
and then in select statement aggregate that back into an array
array_agg(_image_names) AS image_names
I don't quite get the t4.image_names[1] AS image_url attempt, but I'm sure you can pick it up from here.
so the whole query would be something like:
edit: I've stripped extra groupping
SELECT
t1.tour_id AS pid,
t1.tour_name AS title,
t1.tour_duration AS nights,
t1.tour_price_full AS price,
t1.discount AS discount,
t1.tour_seo_title AS seo,
t3.category AS category,
(array_agg(_image_names))[1] AS image_url,
CASE WHEN max(s.state_name) IS NULL THEN NULL ELSE array_agg(s.state_name) END AS state,
CASE WHEN max(o.destination) IS NULL THEN NULL ELSE array_agg(o.destination) END AS destinations
FROM tbl_tour_packages t1
LEFT JOIN tbl_countries t2 ON t1.tour_country_iso = t2.iso
LEFT JOIN tbl_categories t3 on t1.tour_category_id = t3.id
LEFT JOIN tbl_header_images t4 ON t1.tour_id = t4.package_id
LEFT JOIN tbl_states AS s ON (t1.tour_state #> array[s.state_code])
LEFT JOIN tbl_destinations AS o ON (t1.tour_destination #> array[o.id])
, unnest(t4.image_names) AS _image_names
WHERE t1.tour_status = 1
GROUP BY 1,7
ORDER BY view_count ASC LIMIT 6
alternatively I'd go with subselect:
SELECT t1.*,
(SELECT image_names[1] FROM tbl_header_images WHERE package_id = t1.tour_id) AS image_url
FROM t1, t2, t3
WHERE ...

Merge two rows with condition SQL View

I have a View which has a SQL Script as:
Select
a.iAssetId,
ac.eEventCode,
vm.dtUTCDateTime,
g.iGeofenceId,
g.sGeofenceName,
c.sCategoryName,
c.iCategoryId,
s.sSiteName,
s.iSiteId,
CASE WHEN ac.eEventCode = 6 THEN vm.dtUTCDateTime ELSE NULL END as EnterTime,
CASE WHEN ac.eEventCode = 7 THEN vm.dtUTCDateTime ELSE NULL END as ExitTime,
CASE WHEN
a.iAssetId = Lead(a.iAssetId) OVER (ORDER BY a.iAssetId)
AND g.iGeofenceId = Lead(g.iGeofenceId) OVER (ORDER BY a.iAssetId)
AND ac.eEventCode != Lead(ac.eEventCode) OVER (ORDER BY a.iAssetId)
THEN DATEDIFF(minute, vm.dtUTCDateTime, Lead(vm.dtUTCDateTime) OVER (ORDER BY a.iAssetId)) ELSE NULL END as Test
From AssetCommunicationSummary ac
Inner join VehicleMonitoringLog vm on vm.iVehicleMonitoringId = ac.iVehicleMonitoringId
Inner Join Geofences g on g.iGeofenceId = vm.iGeofenceId
Inner Join Assets a on a.iAssetId = ac.iAssetId
Inner Join Categories c on c.iCategoryId = a.iCategoryId
Inner Join Sites s on s.iSiteId = c.iSiteId
Where ac.eEventCode = 6 OR ac.eEventCode = 7
Group by
a.iAssetId,
ac.eEventCode,
vm.dtUTCDateTime,
g.iGeofenceId,
g.sGeofenceName,
c.sCategoryName,
c.iCategoryId,
s.sSiteName,
s.iSiteId
I have used Lead to calculate the Time differenc in minutes for leading rows based on conditions.
I need to now merge the leading Row and the Current Row based on Condition.
Is there a possible way to do this?
The goal is to get the EnterTime and ExitTime in the Same Row with Time Difference in the Column Next to it.
My result is like this:
If your eventcode is always going to be 6 and 7, then you can just join to that table twice using that clause in the join itself. I think I've got the rest of your schema joined up properly below, but if not, you can adjust it around to fit.
Select
a.iAssetId,
vmEnter.dtUTCDateTime,
g.iGeofenceId,
g.sGeofenceName,
c.sCategoryName,
c.iCategoryId,
s.sSiteName,
s.iSiteId,
vmEnter.dtUTCDateTime as EnterTime,
vmExit.dtUTCDateTime as ExitTime,
DATEDIFF(minute, vmEnter.dtUTCDateTime, vmExit.dtUTCDateTime) as ExitTime,
From Sites s
Inner Join Categories c on s.iSiteId = c.iSiteId
Inner Join Assets a on c.iCategoryId = a.iCategoryId
Inner Join AssetCommunicationSummary acEnter on a.iAssetId = acEnter.iAssetId and acEnter.eEventCode = 6
Inner Join VehicleMonitoringLog vmEnter on vmEnter.iVehicleMonitoringId = acEnter.iVehicleMonitoringId
Inner Join AssetCommunicationSummary acExit on a.iAssetId = acExit.iAssetId and acExit.eEventCode = 7
Inner Join VehicleMonitoringLog vmExit on vmExit.iVehicleMonitoringId = acExit.iVehicleMonitoringId
Inner Join Geofences g on g.iGeofenceId = vmEnter.iGeofenceId
You can use this ddl to test and see the idea of what is going on. It's copy and paste ready, if you want to see a difference in times, make sure you wait before you insert each record.
Create table testing
(
Id int ,
Enter DateTime,
Exitt DateTime,
Eventt int,
GeoCode int
)
insert into testing values (1, GETDATE(),null,6,10)
insert into testing values (1, null,GETDATE(),7,10)
insert into testing values (1, GETDATE(),null,6,11)
insert into testing values (1, null,GETDATE(),7,11)
insert into testing values (2, GETDATE(),null,6,10)
insert into testing values (2, null,GETDATE(),7,10)
create table #temp1
(
Id int, EnterTime datetime, GeoCode int
)
create table #temp2
(
Id int, ExitTime datetime, GeoCode int
)
insert into #temp1
Select Id, MAX(Enter), GeoCode from testing where Eventt = 6 group by Id,GeoCode
insert into #temp2
Select Id, MAX(Exitt),GeoCode from testing where Eventt = 7 group by Id,GeoCode
Select t1.Id, t1.EnterTime,t2.ExitTime, t1.GeoCode, DATEDIFF(ss,t1.EnterTime,t2.ExitTime)
from #temp1 t1
inner join #temp2 t2 on t2.Id = t1.Id
and t1.GeoCode = t2.GeoCode
This is basically pseudo code so your going to need to modify, but everything you need is here.
Im gonna guess that eventcode = 6 means thats the intake time
if so two of your data paris dont make much sense as the exit time is before the intake time,
The Query below only accounts for when amd if eventcode 6 = intake time
and the fact that exit time should be before entertime.
query is based on the output you provided and not the view query.
if doing a select * on your view table gives you that output then replace vw_table with yourviewstablename
There are Nulls in the timedif of sqlfiddle because
there was only one instance of assetid 2
assetid 4 and 6 have exit time that happened before entertimes
SQLFIDDLE
select
v1.iAssetid,
v1.EnterTime,
v2.ExitTime,
datediff(minute, v1.Entertime, v2.Exittime) timedif
from vw_table v1
left join vw_table v2 on
v1.iAssetid= v2.iAssetid
and v1.sCategoryNamea = v2.sCategoryNamea
and v2.eEventcode = 7
and v2.dtUTCDatetime >= v1.dtUTCDatetime
where
v1.eEventcode = 6
You can merge two result sets by adding Row_Number to them and then join on that. Like
SELECT DISTINCT tbl1.col1, tbl2.col2
FROM
(SELECT FirstName AS col1, ROW_NUMBER() OVER (ORDER BY FirstName) Number FROM dbo.UBUser) tbl1
INNER JOIN
(SELECT LastName AS col2, ROW_NUMBER() OVER (ORDER BY LastName) Number FROM dbo.UBUser) tbl2
ON tbl1.Number = tbl2.Number
This way you will be able to have EnterTime and ExitTime in the Same Row with Time Difference in the Column Next to it.
Try this
SELECT iAssetid,
iGeoFenceId,
iGeoFenceName,
sCategoryNamea,
iCategoryid,
sSiteName,
Max(EnterTime) As EnterTime,
Min(ExitTime) As ExitTime,
Datediff(minute, Max(EnterTime), Min(ExitTime)) As Timediff
FROM #vw_Table
GROUP BY iAssetid,
iGeoFenceId,
iGeoFenceName,
sCategoryNamea,
iCategoryid,
sSiteName

Is there a way to make this query more efficient performance wise?

This query takes a long time to run on MS Sql 2008 DB with 70GB of data.
If i run the 2 where clauses seperately it takes a lot less time.
EDIT - I need to change the 'select *' to 'delete' afterwards, please keep it in mind when answering. thanks :)
select *
From computers
Where Name in
(
select T2.Name
from
(
select Name
from computers
group by Name
having COUNT(*) > 1
) T3
join computers T2 on T3.Name = T2.Name
left join policyassociations PA on T2.PK = PA.EntityId
where (T2.EncryptionStatus = 0 or T2.EncryptionStatus is NULL) and
(PA.EntityType <> 1 or PA.EntityType is NULL)
)
OR
ClientId in
(
select substring(ClientID,11,100)
from computers
)
Swapping IN for EXISTS will help.
Also, as per Gordon's answer: UNION can out-perform OR.
SELECT computers.*
FROM computers
LEFT
JOIN policyassociations
ON policyassociations.entityid = computers.pk
WHERE (
computers.encryptionstatus = 0
OR computers.encryptionstatus IS NULL
)
AND (
policyassociations.entitytype <> 1
OR policyassociations.entitytype IS NULL
)
AND EXISTS (
SELECT name
FROM (
SELECT name
FROM computers
GROUP
BY name
HAVING Count(*) > 1
) As duplicate_computers
WHERE name = computers.name
)
UNION
SELECT *
FROM computers As c
WHERE EXISTS (
SELECT SubString(clientid, 11, 100)
FROM computers
WHERE SubString(clientid, 11, 100) = c.clientid
)
You've now updated your question asking to make this a delete.
Well the good news is that instead of the "OR" you just make two DELETE statements:
DELETE
FROM computers
LEFT
JOIN policyassociations
ON policyassociations.entityid = computers.pk
WHERE (
computers.encryptionstatus = 0
OR computers.encryptionstatus IS NULL
)
AND (
policyassociations.entitytype <> 1
OR policyassociations.entitytype IS NULL
)
AND EXISTS (
SELECT name
FROM (
SELECT name
FROM computers
GROUP
BY name
HAVING Count(*) > 1
) As duplicate_computers
WHERE name = computers.name
)
;
DELETE
FROM computers As c
WHERE EXISTS (
SELECT SubString(clientid, 11, 100)
FROM computers
WHERE SubString(clientid, 11, 100) = c.clientid
)
;
Some things I would look at are
1. are indexes in place?
2. 'IN' will slow your query, try replacing it with joins,
3. you should use column name, I guess 'Name' in this case, while using count(*),
4. try selecting required data only, by selecting particular columns.
Hope this helps!
or can be poorly optimized sometimes. In this case, you can just split the query into two subqueries, and combine them using union:
select *
From computers
Where Name in
(
select T2.Name
from
(
select Name
from computers
group by Name
having COUNT(*) > 1
) T3
join computers T2 on T3.Name = T2.Name
left join policyassociations PA on T2.PK = PA.EntityId
where (T2.EncryptionStatus = 0 or T2.EncryptionStatus is NULL) and
(PA.EntityType <> 1 or PA.EntityType is NULL)
)
UNION
select *
From computers
WHERE ClientId in
(
select substring(ClientID,11,100)
from computers
);
You might also be able to improve performance by replacing the subqueries with explicit joins. However, this seems like the shortest route to better performance.
EDIT:
I think the version with join's is:
select c.*
From computers c left outer join
(select c.Name
from (select c.*, count(*) over (partition by Name) as cnt
from computers c
) c left join
policyassociations PA
on T2.PK = PA.EntityId and PA.EntityType <> 1
where (c.EncryptionStatus = 0 or c.EncryptionStatus is NULL) and
c.cnt > 1
) cpa
on c.Name = cpa.Name left outer join
(select substring(ClientID, 11, 100) as name
from computers
) csub
on c.Name = csub.name
Where cpa.Name is not null or csub.Name is not null;

Link tables based on column value

Is it possible to pull values from 2 different tables based on the value of a column? For example, I have a table with a boolean column that either returns 0 or 1 depending on what the end user selects in our program. 0 means that I should pull in the default values. 1 means to use the user's data.
If my table Table1 looked like this:
Case ID Boolean
====================
1 0
2 1
3 1
4 0
5 0
Then I would need to pull Case IDs 1,4,and 5's corresponding data from table Default and Case IDs 3 and 4's corresponding data from table UserDef. Then I would have to take these values, combine them, and reorder them by Case ID so I can preserve the order in the resulting table.
I am fairly inexperienced with SQL but I am trying to learn. Any help or suggestions are greatly appreciated. Thank you in advance for your help.
Something like this:
SELECT
t1.CaseID
,CASE WHEN t1.Boolean = 1 THEN dt.Col1 ELSE ut.Col1 END AS Col1
,CASE WHEN t1.Boolean = 1 THEN dt.Col2 ELSE ut.Col2 END AS Col2
FROM Table1 t1
LEFT JOIN DefaultTable dt ON dt.CaseID = t1.CaseID
LEFT JOIN UserDefTable ut ON ut.CaseID = t1.CaseID
ORDER BY t1.CaseID
You join on both tables and then use CASE in SELECT to choose from which one to display data.
Option B:
WITH CTE_Combo AS
(
SELECT 0 as Boolean, * FROM Default --replace * with needed columns
UNION ALL
SELECT 1 AS Boolean, * FROM UserDef --replace * with needed columns
)
SELECT * FROM Table1 t
LEFT JOIN CTE_Combo c ON t.CaseID = c.CaseID AND t.Boolean = c.Boolean
ORDER BY t.CaseID
This might be even simpler - using CTE make a union of both tables adding artificial column, and then join CTE and your Table using both ID and flag column.
SELECT t1.CaseID,
ISNULL(td.data, tu.data) userData -- pick data from table_default
-- if not null else from table_user
FROM table1 t1
LEFT JOIN table_default td ON t1.CaseID = td.CaseID -- left join with table_default
AND t1.Boolean = 0 -- when boolean = 0
LEFT JOIN table_user tu ON t1.CaseID = tu.CaseID -- left join with table_user
AND t1.Boolean = 1 -- when boolean = 1
ORDER BY t1.CaseID