SQL takes very long to execute - sql

This SQL statement left joins two tables, both with approx. 10.000 rows (table1 = 20 columns, table2 = 50+ columns), and it takes 60+ seconds to execute. Is there a way to make it faster?
SELECT
t.*, k.*
FROM
table1 AS t
LEFT JOIN
table2 AS k ON t.key_Table1 = k.Key_Table2
WHERE
((t.Time) = (SELECT MAX(t2.Time) FROM table1 AS t2
WHERE t2.key2_Table1 = t.key2_Table1))
ORDER BY
t.Time;
The ideal execution time would be < 5 seconds, since Excel query does it in 8 secs, and it is very surprising that Excel query would work faster than a SQL Server Express query.
Execution plan:

also you can rewrite your query better :
select *
from table2 as k
join (
select *, row_number() over (partition by Key_Table2 order by time desc) rn
from table1
) t
on t.rn = 1
and t.key_Table1 = k.Key_Table2
but you need index on Key_Table2, time and key_Table1 columns if you already don't have.
also another improvement would be to select only columns you want instead of select *

The optimizer is determining that a merge join is best, but if both tables have 10,000 rows and they aren't joining on indexed columns then forcing the optimizer to get out of the way and telling it to hash join may improve performance
The syntax would be to change LEFT JOIN to LEFT HASH JOIN
https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2008/ms191426(v=sql.100)
https://learn.microsoft.com/en-us/sql/relational-databases/performance/joins?view=sql-server-ver15
https://learn.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-join?view=sql-server-ver15

I would recommend rewriting the query using outer apply:
SELECT t.*, k.*
FROM table1 t OUTER APPLY
(SELECT TOP (1) k.*
FROM table2 k
WHERE t.key_Table1 = k.Key_Table2
ORDER BY Time DESC
) k
ORDER BY t.Time;
And for this query, you want an index on table2(Key_Table2, time desc).

Related

Get the latest entry time using SQL if your result returns two different time, should I use cross or outer apply?

So I want to use datediff for two tables that I'm doing a join on. The problem is if I filter by a unique value, it returns two rows of result. For example:
select *
from [internalaudit]..ReprocessTracker with (nolock)
where packageID = '1983446'
It returns two rows, because it was repackaged twice, by two different workers.
User RepackageTime
KimVilder 2021-06-10
DanielaS 2021-06-05
I want to use the latest repackagetime of that unique packageID and then do a datediff with another time record when I do a join with a different table.
Is there way to filer so I can get the latest time entry of Repackagetime?
There are numerous ways you can accomplish this, if I understand your goal - proper example data and tables would be a help here.
One way is using apply and selecting the max date for each packageId
select DateDiff(datepart, t.datecolumn, r.RepackageTime)...
from othertable t
cross apply (
select Max(RepackageTime)RepackageTime
from internalaudit.dbo.ReprocessTracker r
where r.packageId=t.packageId
)r
select *
from Othertable t1
join (
select *
from [internalaudit]..ReprocessTracker t2
where packageID = '1983446'
limit 1
) t2
on t1.id = t2.id
if you are using sql server instead of limit 1 you should use top 1
also otherwise you solid reason to use nolock hint, avoid using it.
also to generalize the query above:
select *
from Othertable t1
cross join (
select *
from [internalaudit]..ReprocessTracker t2
where t1.packageID = t2.packageID
limit 1
) t2

SQL Server using order by clause significantly improves select performance

I am executing the following query directly in SQL Server:
SELECT *
FROM TableA
LEFT JOIN TableB
ON TableB.field1 = TableA.field1
LEFT JOIN TableC
ON TableC.field2 = TableA.field2
LEFT JOIN TableD
ON TableD.field3 = TableA.field3
LEFT JOIN TableE
ON TableE.field4 = TableA.field4
LEFT JOIN TableF
ON TableF.field5 = TableA.field5
LEFT JOIN
(SELECT *
FROM
(SELECT
Id1, Id2,
UpdateDate,
ROW_NUMBER() OVER(PARTITION BY Id1, Id2,
ORDER BY UpdateDate DESC) AS RN
FROM TableG) AS G
WHERE G.RN = 1) TableH
ON TableA.Id1 = TableH.Id2
AND TableA.Id1 = TableH.Id2
For point of reference, Table A-F and G are about 1000 rows, and Table G is about 10000 rows.
For a particular input, this query takes about 1 minute to run.
I then add a
ORDER BY Id1 ASC
at the end of the statement, and now it takes about 6 seconds to run. How can adding a sort significantly improve performance like this?
Run a showplan on both versions of your query.
Probably what's happening is the sort forces a different query plan, which uses a more efficient for your particular data join strategy (probably in-memory), but which has a higher estimated cost.
After examining the execution plan, it seems that the issue was with the JOIN on Table A and Table G. Initially the optimizer was trying to use a nested loop join which was very inefficient for tables of their size. Adding the ORDER BY clause hinted to the optimizer to use a merge join instead, which was much faster. Thanks for the answers!

Which is better for performance, selecting all the columns or select only the required columns while performng join?

I am been asked to do performance tuning of a SQL Server query which has so many joins in it.
For example
LEFT JOIN
vw_BILLABLE_CENSUS_R CEN ON DE.Client = CEN.Client
AND CAL.REPORTING_MONTH = CEN.REPORTING_MONTH
There are almost 25 columns present in vw_Billing_Cenus_R but we wanted to use only 3 of them. So I wanted to know instead of selecting all the columns from the view or table, if I only select those columns which are required and then perform join like this
LEFT JOIN (SELECT [Column_1], [Column_2], [Column_3]
FROM vw_BILLABLE_CENSUS_R) CEN ON DE.Client = CEN.Client
AND CAL.REPORTING_MONTH = CEN.REPORTING_MONTH
So Will this improve the performance or not?
The important part is the columns you are actually using on the outmost SELECT, not the ones to are selecting to join. The SQL Server engine is smart enough to realize that he does not need to retrieve all columns from the referenced table (or view) if he doesn't need them.
So the following 2 queries should yield the exact same query execution plan:
SELECT
A.SomeColumn
FROM
MyTable AS A
LEFT JOIN (
SELECT
*
FROM
OtherTable AS B) AS X ON A.SomeColumn = X.SomeColumn
SELECT
A.SomeColumn
FROM
MyTable AS A
LEFT JOIN (
SELECT
B.SomeColumn
FROM
OtherTable AS B) AS X ON A.SomeColumn = X.SomeColumn
The difference would be if you actually use the selected column (in a conditional where or actually retrieving the value), as in here:
SELECT
A.SomeColumn,
X.* -- * has all X columns
FROM
MyTable AS A
LEFT JOIN (
SELECT
B.*
FROM
OtherTable AS B) AS X ON A.SomeColumn = X.SomeColumn
SELECT
A.SomeColumn,
X.* -- * has only X's SomeColumn
FROM
MyTable AS A
LEFT JOIN (
SELECT
B.SomeColumn
FROM
OtherTable AS B) AS X ON A.SomeColumn = X.SomeColumn
I would rather use this approach:
LEFT JOIN
vw_BILLABLE_CENSUS_R CEN ON DE.Client = CEN.Client
AND CAL.REPORTING_MONTH = CEN.REPORTING_MONTH
than this
LEFT JOIN (SELECT [Column_1], [Column_2], [Column_3]
FROM vw_BILLABLE_CENSUS_R) CEN ON DE.Client = CEN.Client
AND CAL.REPORTING_MONTH = CEN.REPORTING_MONTH
Since in this case:
you make your query simpler,
you does not have to rely on query optimizer smartness and expect that it will eliminate unnecessary columns and rows
finally, you can select as many columns in the outer SELECT as necessary without using derived tables techniques.
In some cases, derived tables are welcome, when you want to eliminate duplicates in a table you want to join on a fly, but, imho, not in your case.
It depends on how many records are stored, but generally it will improve performance.
In this case read #LukStorms ' comments, I think he is right

Different way of writing this SQL query with partition

Hi I have the below query in Teradata. I have a row number partition and from that I want rows with rn=1. Teradata doesn't let me use the row number as a filter in the same query. I know that I can put the below into a subquery with a where rn=1 and it gives me what I need. But the below snippet needs to go into a larger query and I want to simplify it if possible.
Is there a different way of doing this so I get a table with 2 columns - one row per customer with the corresponding fc_id for the latest eff_to_dt?
select cust_grp_id, fc_id, row_number() over (partition by cust_grp_id order by eff_to_dt desc) as rn
from table1
Have you considered using the QUALIFY clause in your query?
SELECT cust_grp_id
, fc_id
FROM table1
QUALIFY ROW_NUMBER()
OVER (PARTITION BY cust_grp_id
ORDER BY eff_to_dt desc)
= 1;
Calculate MAX eff_to_dt for each cust_grp_id and then join result to main table.
SELECT T1.cust_grp_id,
T1.fc_id,
T1.eff_to_dt
FROM Table1 AS T1
JOIN
(SELECT cust_grp_id,
MAX(eff_to_dt) AS max_eff_to_dt
FROM Table
GROUP BY cust_grp_id) AS T2 ON T2.cust_grp_id = T1.cust_grp_id
AND T2.max_eff_to_dt = T1.eff_to_dt
You can use a pair of JOINs to accomplish the same thing:
INNER JOIN My_Table T1 ON <some criteria>
LEFT OUTER JOIN My_Table T2 ON <some criteria> AND T2.eff_to_date > T1.eff_to_date
WHERE
T2.my_id IS NULL
You'll need to sort out the specific criteria for your larger query, but this is effectively JOINing all of the rows (T1), but then excluding any where a later row exists. In the WHERE clause you eliminate these by checking for a NULL value in a column that is NOT NULL (in this case I just assumed some ID value). The only way that would happen is if the LEFT OUTER JOIN on T2 failed to find a match - i.e. no rows later than the one that you want exist.
Also, whether or not the JOIN to T1 is LEFT OUTER or INNER is up to your specific requirements.

LIMIT and JOIN order of actions

I have a query which includes LIMIT to the main table and JOIN.
My questions is that: what is coming before? the query finds the x rows of the LIMIT and then doing JOIN to these rows or doing first the JOIN on the all rows and just after that LIMIT?
LIMIT Applies to the query to which it is applied. It will be applied to the query AFTER the JOINs in that query, but if the derived table is JOINed to other tables, that/those JOIN(s) comes after.
e.g.
SELECT ..
FROM (SELECT ..
FROM TABLE1 T1
JOIN TABLE2 T2 ON ..
LIMIT 10) X
JOIN OTHERTABLE Y
LIMIT 20;
The JOIN between T1 and T2 occurs first
LIMIT 10 is applied to result from the previous step, so only 10 records from this derived table will be used in the outer query
LIMIT 20 is applied to the result of the JOIN between X and Y
Although LIMIT is a specific keyword for PostgreSQL, MySQL and SQLite, the TOP keyword and processing in SQL Server works the same way.
doing first the JOIN on the all rows and just after that LIMIT