Need to optimise select query - sql

I have a query that does a select with joins from multiple tables that contains in total about 90 million rows. I only need data from the last 30 days. The problem is that when I run the select query the sql server throws a timeout while the query is running and new records are not created during this time frame. This query takes about 5 seconds to complete.
I would like to optimise this query so that it wont go through the entire tables looking at the datetime and would only search from the latest entries.
Right now it seems that I would need to index datetime column. Please advise if I need to create indexes or if there is another way to optimise this query.
SELECT [table1].Column1 AS InvoiceNo,
'ND' AS VATRegistrationNumber,
'ND' AS RegistrationNumber,
Column2 AS Country,
[table2].Column3 + ' ' + [table2].Column4 AS Name,
CAST([table1].Column5 AS date) AS InvoiceDate,
'SF' AS InvoiceType,
'' AS SpecialTaxation,
'' AS VATPointDate,
ROUND([table1Line].Column6, 2) AS TaxableValue,
CASE
WHEN [table1Line].Column7 = 9 THEN 'PVM2'
WHEN [table1Line].Column7 = 21 THEN 'PVM1'
WHEN [table1Line].Column7 = 0 THEN 'PVM14'
END AS TaxCode,
CAST([table1Line].Column7 AS int) AS TaxPercentage,
table1Line.Column8 - ROUND([table1Line].Column6, 2) AS Amount,
'' AS VATPointDate2,
[table1].Column1 AS InvoiceNo,
'' AS ReferenceNo,
'' AS ReferenceDate,
[table1].CustomerPersonID AS CustomerID
FROM [table1]
INNER JOIN [table2] ON [table1].CustomerPersonID = [table2].ID
INNER JOIN [table3] ON [table2].Column9 = [table3].ID
INNER JOIN [table1Line] ON [table1].ID = [table1Line].table1ID
INNER JOIN [table4] ON table1Line.TaxID = Tax.ID
INNER JOIN [table5] ON [table1].CompanyID = Company.ID
INNER JOIN table6 ON [table1].SalesChannelID = table6.ID
WHERE Column5 LIKE '%date%'
AND table6.id = 5
OR table6.id = 2
AND Column5 LIKE '%date%'
ORDER BY Column5 DESC;

First things first, each database runs a little differently because the optimizer has been running and figuring out how the unique circumstances can be improved and continuously tries to make common things run better.
There's also versioning differences that also play a part is the performance of the server.
Besides that stuff, Here's a few things to do to optimize this query.
When working with Joins, Your Joined table comes first then compare against the already specified table.
For example t2 checks against t1:
select t1.name, t2.car
from customers as t1
left join purchases as t2
on t2.customerid = t1.customerid
The next thing I see is the Like condition in the Where part of the code.
The stored date that it's finding is stored as text in your example.
I would recommend processing the date as a datetime instead of a string type of datatype.
I would include that in the code below, but I'm not sure what the format looks like for your string of text.
%date% is the same thing as saying "Contains date".
This takes the date string, and tries to see if it matches in every position of characters from left to right.
So if your date text is 20200130, it will check to see if it matches 2date0200130, then tries 20date200130, then tries 202date00130, etc.
It will significantly increase the time it takes to process.
I also see that the date is being searched accidently two times instead of one.
I would recommend doing:
WHERE LTRIM(RTRIM(Column5)) LIKE 'date'
As for the Inner Joins, I would not use them.
Use the Left join, and then in the Where, I would make sure it had no Null values for that joined data.
This makes the Left Join work the same as the Inner Join and runs more optimally when you are running the query.
For Instance, the first Join would look like this:
FROM [table1]
LEFT JOIN [table2] ON [table2].ID = [table1].CustomerPersonID
WHERE table2.id IS NOT NULL
I see an error in the code in the Where statement:
AND table6.id = 5
OR tables6.id = 2
This should be:
AND (tables6.id = 5 OR tables6.id = 2)
So here should be an optimized version of your code:
SELECT [table1].Column1 AS InvoiceNo,
'ND' AS VATRegistrationNumber,
'ND' AS RegistrationNumber,
Column2 AS Country,
[table2].Column3 + ' ' + [table2].Column4 AS Name,
CAST([table1].Column5 AS date) AS InvoiceDate,
'SF' AS InvoiceType,
'' AS SpecialTaxation,
'' AS VATPointDate,
ROUND([table1Line].Column6, 2) AS TaxableValue,
(CASE WHEN [table1Line].Column7 = 9 THEN 'PVM2'
WHEN [table1Line].Column7 = 21 THEN 'PVM1'
WHEN [table1Line].Column7 = 0 THEN 'PVM14'
ELSE '' END ) AS TaxCode,
CAST([table1Line].Column7 AS int) AS TaxPercentage,
table1Line.Column8 - ROUND([table1Line].Column6, 2) AS Amount,
'' AS VATPointDate2,
[table1].Column1 AS InvoiceNo,
'' AS ReferenceNo,
'' AS ReferenceDate,
[table1].CustomerPersonID AS CustomerID
FROM [table1]
LEFT JOIN [table2] ON [table2].ID = [table1].CustomerPersonID
LEFT JOIN [table3] ON [table3].ID = [table2].Column9
LEFT JOIN [table1Line] ON [table1Line].table1ID = [table1].ID
LEFT JOIN [table4] ON [table4].ID = table1Line.TaxID
LEFT JOIN [table5] ON [table5].ID = [table1].CompanyID
LEFT JOIN [table6] ON table6.ID = [table1].SalesChannelID
WHERE table2.ID IS NOT null
AND table3.ID IS NOT null
AND table1Line.ID IS NOT null
AND table4.ID IS NOT null
AND table5.ID IS NOT null
AND table6.ID IS NOT null
AND LTRIM(RTRIM(Column5)) LIKE 'date'
AND (table6.id = 5 OR table6.id = 2)
ORDER BY Column5 DESC;

Related

How to join two tables having two common column values and unioning the rest

I'd like to combine Table A and Table B at the link below and end up with Table C. What is the best way to do this in SQL? I've thought about creating a composite key between the tables for LedgerID + Year doing an inner join and then unioning the left and right only data. I'm also curious how to avoid duplicating values across rows like Balance = 50.00 ending up in rows for Tires and Windshield.
Try a full outer join, joining on LedgerID and Year, using coalesce to show Table B's LedgerID/Year when Table A's is NULL:
SELECT
COALESCE(A.LedgerID, B.LedgerID) as LedgerID,
COALESCE(A.Year, B.Year) as Year,
A.Title,
A.Payment,
B.Balance
FROM "Table A" AS A
FULL OUTER JOIN "Table B" AS B ON (A.LedgerID=B.LedgerID AND A.Year=B.Year)
--Please try this Query. Since you have only reference LedgerId and Year, the balance will show 50 for both Tires & Windshield
; with cte_Ledger (LedgerId, [year])
AS
(
Select DISTINCT LedgerId, [year]
From tableA
UNION
Select DISTINCT LedgerId, [year]
From tableB
)
select t.LedgerId
, t.[year]
, t1.Title
, T1.Payments
, t2.Balance
FROM cte_Ledger t
left join tableA t1 on t.LedgerId = t1.LedgerId and t.[year] = t1.[year]
left join tableB t2 on t2.LedgerId = t.LedgerId and t2.[year] = t.[year]
I think so, Above Queries will not help to get expected result.
some misunderstanding is with requirement.
For ledgerid = 22 and Year = 2017, have 2 records in table-A and 1 with Table-B. But in expecting result, Balance 50(Record of table-B) is exists with matched first row of Table-A only. As per above all logic it will be with 2 Records where ledgerid = 22, Year = 2017 and Title with "Tires" & "Windshield".
If required same result as mentioned then need to use recursive CTE or ranking function with order of ID column.
Here is my solution after I loaded the tables, another nested case statement may be need to format out the zero on Ledger 24.
Select
[LedgerID],
[Year],
Case when PayRank = 1 then Title else '' end as Title,
Case when PayRank = 1 then convert(varchar(20),Payments) else '' end as
Payments,
Case when BalRank = 1 then convert(varchar(20),Balance) else '' end as
Balance
from(
SELECT
B.[LedgerID]
,B.[Year]
,Rank()Over(Partition by B.LedgerID,Payments order by
B.LedgerID,B.Year,Title) as PayRank
,isnull([Title],'') as Title
,isnull([Payments],0) as Payments
,Rank()Over(Partition by B.LedgerID,B.Year order by
B.LedgerID,B.Year,Payments) as BalRank
,Balance
FROM [TableB] B
left outer join [TableA] A
on A.LedgerID = B.LedgerID
) Query
order by LedgerID,Year

How do I optimize this sql query that takes forever?

I'm trying to optimize this query as it takes 30 seconds to execute:
SELECT
TOP 100 table1.*
FROM
table1 (NOLOCK)
INNER JOIN
DB1..table2 (NOLOCK)
ON DB1..table2.id = DB1..table1.id
INNER JOIN
DB1..table2_batch (NOLOCK)
ON DB1..table2_batch.table2_batch_id = DB1..table2.table2_batch_id
INNER JOIN
DB2..table4 (NOLOCK)
ON CASE
WHEN CHARINDEX(':',
table2_batch.reference_number,
3) > 3 THEN SUBSTRING(table2_batch.reference_number,
3,
CHARINDEX(':',
table2_batch.reference_number,
3) -3 )
ELSE
RIGHT(table2_batch.reference_number,
LEN(table2_batch.reference_number) -2)
end = Cast(table4.PurchaseOrderID as int) INNER JOIN
DB2..table3 (NOLOCK)
ON DB2..table3.key = DB2..table4.key
WHERE
table1.id IS NOT NULL
AND (
table1.id!=''
OR table1.id IS NULL
)
AND DB2..table3.AccountTypeID != 30000
AND CHARINDEX('O', DB1..table2_batch.reference_number) = 1
AND table1_id NOT IN (
select
lim.table1_id
from
link_table1_message as lim (nolock)
inner join
table1 as i (nolock)
on i.table1_id = lim.table1_id
where
lim.message_id >= 90
)
ORDER BY
last_hit DESC
Based on this comment:
The bottleneck is on the DB2..table4 join. When I take that part out it does the query really fast ... but I really need that join in there to make sure the proper condition is met for this scenario.
It's a little bit of a guess because we don't know how many rows are involved, nor what your schema looks like, but making a few assumptions, this could possibly help:
INNER JOIN
DB2..table4 (NOLOCK)
ON CASE
WHEN CHARINDEX(':',
table2_batch.reference_number,
3) > 3 THEN SUBSTRING(table2_batch.reference_number,
3,
CHARINDEX(':',
table2_batch.reference_number,
3) -3 )
ELSE
RIGHT(table2_batch.reference_number,
LEN(table2_batch.reference_number) -2)
end = Cast(table4.PurchaseOrderID as int)
That's a lot of logic to put in a join I find.
You have table2_batch that has a reference_number that's encoding a value that you'll end up needing to extract and use to join with table4.PurchaseOrderID; I'd start by selecting that first:
with cteTable2 as (
select
table2_batch_id
,reference_number
,case when charindex(':',reference_number,3) > 3
then substring(reference_number,3,charindex(':',reference_number,3) - 3)
else right(reference_number,len(reference_number) - 2)
end PurchaseOrderID
from table2_batch
)
And then you can treat cteTable2 as a full-fledged table in the select statement that follows, so instead of joining with table2_batch, you join with cteTable2:
inner join DB2..table4 (NOLOCK)
on cteTable2.PurchaseOrderID = cast(table4.PurchaseOrderID as int)
...and is that cast really needed?

Merge two rows with condition SQL View

I have a View which has a SQL Script as:
Select
a.iAssetId,
ac.eEventCode,
vm.dtUTCDateTime,
g.iGeofenceId,
g.sGeofenceName,
c.sCategoryName,
c.iCategoryId,
s.sSiteName,
s.iSiteId,
CASE WHEN ac.eEventCode = 6 THEN vm.dtUTCDateTime ELSE NULL END as EnterTime,
CASE WHEN ac.eEventCode = 7 THEN vm.dtUTCDateTime ELSE NULL END as ExitTime,
CASE WHEN
a.iAssetId = Lead(a.iAssetId) OVER (ORDER BY a.iAssetId)
AND g.iGeofenceId = Lead(g.iGeofenceId) OVER (ORDER BY a.iAssetId)
AND ac.eEventCode != Lead(ac.eEventCode) OVER (ORDER BY a.iAssetId)
THEN DATEDIFF(minute, vm.dtUTCDateTime, Lead(vm.dtUTCDateTime) OVER (ORDER BY a.iAssetId)) ELSE NULL END as Test
From AssetCommunicationSummary ac
Inner join VehicleMonitoringLog vm on vm.iVehicleMonitoringId = ac.iVehicleMonitoringId
Inner Join Geofences g on g.iGeofenceId = vm.iGeofenceId
Inner Join Assets a on a.iAssetId = ac.iAssetId
Inner Join Categories c on c.iCategoryId = a.iCategoryId
Inner Join Sites s on s.iSiteId = c.iSiteId
Where ac.eEventCode = 6 OR ac.eEventCode = 7
Group by
a.iAssetId,
ac.eEventCode,
vm.dtUTCDateTime,
g.iGeofenceId,
g.sGeofenceName,
c.sCategoryName,
c.iCategoryId,
s.sSiteName,
s.iSiteId
I have used Lead to calculate the Time differenc in minutes for leading rows based on conditions.
I need to now merge the leading Row and the Current Row based on Condition.
Is there a possible way to do this?
The goal is to get the EnterTime and ExitTime in the Same Row with Time Difference in the Column Next to it.
My result is like this:
If your eventcode is always going to be 6 and 7, then you can just join to that table twice using that clause in the join itself. I think I've got the rest of your schema joined up properly below, but if not, you can adjust it around to fit.
Select
a.iAssetId,
vmEnter.dtUTCDateTime,
g.iGeofenceId,
g.sGeofenceName,
c.sCategoryName,
c.iCategoryId,
s.sSiteName,
s.iSiteId,
vmEnter.dtUTCDateTime as EnterTime,
vmExit.dtUTCDateTime as ExitTime,
DATEDIFF(minute, vmEnter.dtUTCDateTime, vmExit.dtUTCDateTime) as ExitTime,
From Sites s
Inner Join Categories c on s.iSiteId = c.iSiteId
Inner Join Assets a on c.iCategoryId = a.iCategoryId
Inner Join AssetCommunicationSummary acEnter on a.iAssetId = acEnter.iAssetId and acEnter.eEventCode = 6
Inner Join VehicleMonitoringLog vmEnter on vmEnter.iVehicleMonitoringId = acEnter.iVehicleMonitoringId
Inner Join AssetCommunicationSummary acExit on a.iAssetId = acExit.iAssetId and acExit.eEventCode = 7
Inner Join VehicleMonitoringLog vmExit on vmExit.iVehicleMonitoringId = acExit.iVehicleMonitoringId
Inner Join Geofences g on g.iGeofenceId = vmEnter.iGeofenceId
You can use this ddl to test and see the idea of what is going on. It's copy and paste ready, if you want to see a difference in times, make sure you wait before you insert each record.
Create table testing
(
Id int ,
Enter DateTime,
Exitt DateTime,
Eventt int,
GeoCode int
)
insert into testing values (1, GETDATE(),null,6,10)
insert into testing values (1, null,GETDATE(),7,10)
insert into testing values (1, GETDATE(),null,6,11)
insert into testing values (1, null,GETDATE(),7,11)
insert into testing values (2, GETDATE(),null,6,10)
insert into testing values (2, null,GETDATE(),7,10)
create table #temp1
(
Id int, EnterTime datetime, GeoCode int
)
create table #temp2
(
Id int, ExitTime datetime, GeoCode int
)
insert into #temp1
Select Id, MAX(Enter), GeoCode from testing where Eventt = 6 group by Id,GeoCode
insert into #temp2
Select Id, MAX(Exitt),GeoCode from testing where Eventt = 7 group by Id,GeoCode
Select t1.Id, t1.EnterTime,t2.ExitTime, t1.GeoCode, DATEDIFF(ss,t1.EnterTime,t2.ExitTime)
from #temp1 t1
inner join #temp2 t2 on t2.Id = t1.Id
and t1.GeoCode = t2.GeoCode
This is basically pseudo code so your going to need to modify, but everything you need is here.
Im gonna guess that eventcode = 6 means thats the intake time
if so two of your data paris dont make much sense as the exit time is before the intake time,
The Query below only accounts for when amd if eventcode 6 = intake time
and the fact that exit time should be before entertime.
query is based on the output you provided and not the view query.
if doing a select * on your view table gives you that output then replace vw_table with yourviewstablename
There are Nulls in the timedif of sqlfiddle because
there was only one instance of assetid 2
assetid 4 and 6 have exit time that happened before entertimes
SQLFIDDLE
select
v1.iAssetid,
v1.EnterTime,
v2.ExitTime,
datediff(minute, v1.Entertime, v2.Exittime) timedif
from vw_table v1
left join vw_table v2 on
v1.iAssetid= v2.iAssetid
and v1.sCategoryNamea = v2.sCategoryNamea
and v2.eEventcode = 7
and v2.dtUTCDatetime >= v1.dtUTCDatetime
where
v1.eEventcode = 6
You can merge two result sets by adding Row_Number to them and then join on that. Like
SELECT DISTINCT tbl1.col1, tbl2.col2
FROM
(SELECT FirstName AS col1, ROW_NUMBER() OVER (ORDER BY FirstName) Number FROM dbo.UBUser) tbl1
INNER JOIN
(SELECT LastName AS col2, ROW_NUMBER() OVER (ORDER BY LastName) Number FROM dbo.UBUser) tbl2
ON tbl1.Number = tbl2.Number
This way you will be able to have EnterTime and ExitTime in the Same Row with Time Difference in the Column Next to it.
Try this
SELECT iAssetid,
iGeoFenceId,
iGeoFenceName,
sCategoryNamea,
iCategoryid,
sSiteName,
Max(EnterTime) As EnterTime,
Min(ExitTime) As ExitTime,
Datediff(minute, Max(EnterTime), Min(ExitTime)) As Timediff
FROM #vw_Table
GROUP BY iAssetid,
iGeoFenceId,
iGeoFenceName,
sCategoryNamea,
iCategoryid,
sSiteName

SQL: Want to alter the conditions on a join depending on values in table

I have a table called Member_Id which has a column in it called Member_ID_Type. The select statement below returns the value of another column, id_value from the same table. The join on the tables in the select statement is on the universal id column. There may be several entries in that table with this same universal id.
I want to adjust the select statement so that it will return the id_values for entries that have member_id_type equal to '7'. However if this is null then I want to return records that have member_id_type equal to '1'
So previously I had a condition on the join (commented out below) but that just returned records that had member_id_type equal to '7' and otherwise returned null.
I think I may have to use a case statement here but I'm not 100% sure how to use it in this scenario
SELECT TOP 1 cm.Contact_Relation_Gid,
mc.Universal_ID,
mi.ID_Value,
cm.First_Name,
cm.Last_Name,
cm.Middle_Name,
cm.Name_Suffix,
cm.Email_Address,
cm.Disability_Type_PKID,
cm.Race_Type_PKID,
cm.Citizenship_Type_PKID,
cm.Marital_Status_Type_PKID,
cm.Actual_SSN,
cm.Birth_Date,
cm.Gender,
mc.Person_Code,
mc.Relationship_Code,
mc.Member_Coverage_PKID,
sc.Subscriber_Coverage_PKID,
FROM Contact_Member cm (NOLOCK)
INNER JOIN Member_Coverage mc (NOLOCK)
ON cm.contact_relation_gid = mc.contact_relation_gid
AND mc.Record_Status = 'A'
INNER JOIN Subscriber_Coverage sc (NOLOCK)
ON mc.Subscriber_Coverage_PKID = sc.Subscriber_Coverage_PKID
AND mc.Record_Status = 'A'
LEFT outer JOIN Member_ID mi ON mi.Universal_ID = cm.Contact_Gid
--AND mi.Member_ID_Type_PKID='7'
WHERE cm.Contact_Relation_Gid = #Contact_Relation_Gid
AND cm.Record_Status = 'A'
Join them both, and use one if the other is not present:
select bt.name
, coalesce(eav1.value, eav2.value) as Value1OrValue2
from BaseTable bt
left join EavTable eav1
on eav1.id = bt.id
and eav1.type = 1
left join EavTable eav2
on eav2.id = bt.id
and eav2.type = 2
This query assumes that there is never more than one record with the same ID and Type.

Consolidating results on one row SQL

I would like to consolidate a one to many relationship that outputs on different rows to a single row.
(select rate_value1
FROM xgenca_enquiry_event
INNER JOIN xgenca_enquiry_iso_code_translation
ON
xgenca_enquiry_event_rate.rate_code_id
= xgenca_enquiry_iso_code_translation.id
where xgenca_enquiry_event_rate.event_id = xgenca_enquiry_event.id
and ISO_code = 'PDIV') as PDIVrate,
(select rate_value1
FROM xgenca_enquiry_event
INNER JOIN xgenca_enquiry_iso_code_translation
ON
xgenca_enquiry_event_rate.rate_code_id
= xgenca_enquiry_iso_code_translation.id
where xgenca_enquiry_event_rate.event_id = xgenca_enquiry_event.id
and ISO_code = 'TAXR') as TAXrate
PDIVrate TAXrate
NULL 10.0000000
0.0059120 NULL
I would like the results on one row. Any help would be greatly appreciated.
Thanks.
You can use an aggregate function to perform this:
select
max(case when ISO_code = 'PDIV' then rate_value1 end) PDIVRate,
max(case when ISO_code = 'TAXR' then rate_value1 end) TAXRate
FROM xgenca_enquiry_event_rate r
INNER JOIN xgenca_enquiry_iso_code_translation t
ON r.rate_code_id = t.id
INNER JOIN xgenca_enquiry_event e
ON r.event_id = e.id
It looks like you are joining three tables are are identical in the queries. This consolidates this into a single query using joins.
Look here:
Can I Comma Delimit Multiple Rows Into One Column?
Simulate Oracle's LISTAGG() in SQL Server using STUFF:
SELECT Column1,
stuff((
SELECT ', ' + Column2
FROM tableName as t1
where t1.Column1 = t2.Column1
FOR XML PATH('')
), 1, 2, '')
FROM tableName as t2
GROUP BY Column1
/
Copied from here: https://github.com/jOOQ/jOOQ/issues/1277