SQL Query Performance Issues Using Subquery - sql

I am having issues with my query run time. I want the query to automatically pull the max id for a column because the table is indexed off of that column. If i punch in the number manually, it runs in seconds, but i want the query to be more dynamic if possible.
I've tried placing the sub-query in different places with no luck
SELECT *
FROM TABLE A
JOIN TABLE B
ON A.SLD_MENU_ITM_ID = B.SLD_MENU_ITM_ID
AND B.ACTV_FLG = 1
WHERE A.WK_END_THU_ID_NU >= (SELECT DISTINCT MAX (WK_END_THU_ID_NU) FROM TABLE A)
AND A.WK_END_THU_END_YR_NU = YEAR(GETDATE())
AND A.LGCY_NATL_STR_NU IN (7731)
AND B.SLD_MENU_ITM_ID = 4314
I just want this to run faster. Maybe there is a different approach i should be taking?

I would move the subquery to the FROM clause and change the WHERE clause to only refer to A:
SELECT *
FROM A CROSS JOIN
(SELECT MAX(WK_END_THU_ID_NU) as max_wet
FROM A
) am
ON a.WK_END_THU_ID_NU = max_wet JOIN
B
ON A.SLD_MENU_ITM_ID = B.SLD_MENU_ITM_ID AND
B.ACTV_FLG = 1
WHERE A.WK_END_THU_END_YR_NU = YEAR(GETDATE()) AND
A.LGCY_NATL_STR_NU IN (7731) AND
A.SLD_MENU_ITM_ID = 4314; -- is the same as B
Then you want indexes. I'm pretty sure you want indexes on:
A(SLD_MENU_ITM_ID, WK_END_THU_END_YR_NU, LGCY_NATL_STR_NU, SLD_MENU_ITM_ID)
B(SLD_MENU_ITM_ID, ACTV_FLG)
I will note that moving the subquery to the FROM clause probably does not affect performance, because SQL Server is smart enough to only execute it once. However, I prefer table references in the FROM clause when reasonable. I don't think a window function would actually help in this case.

Related

Tuning Oracle Query for slow select

I'm working on an oracle query that is doing a select on a huge table, however the joins with other tables seem to be costing a lot in terms of time of processing.
I'm looking for tips on how to improve the working of this query.
I'm attaching a version of the query and the explain plan of it.
Query
SELECT
l.gl_date,
l.REST_OF_TABLES
(
SELECT
MAX(tt.task_id)
FROM
bbb.jeg_pa_tasks tt
WHERE
l.project_id = tt.project_id
AND l.task_number = tt.task_number
) task_id
FROM
aaa.jeg_labor_history l,
bbb.jeg_pa_projects_all p
WHERE
p.org_id = 2165
AND l.project_id = p.project_id
AND p.project_status_code = '1000'
Something to mention:
This query takes data from oracle to send it to a sql server database, so I need it to be this big, I can't narrow the scope of the query.
the purpose is to set it to a sql server job with SSIS so it runs periodically
One obvious suggestion is not to use sub query in select clause.
Instead, you can try to join the tables.
SELECT
l.gl_date,
l.REST_OF_TABLES
t.task_id
FROM
aaa.jeg_labor_history l
Join bbb.jeg_pa_projects_all p
On (l.project_id = p.project_id)
Left join (SELECT
tt.project_id,
tt.task_number,
MAX(tt.task_id) task_id
FROM
bbb.jeg_pa_tasks tt
Group by tt.project_id, tt.task_number) t
On (l.project_id = t.project_id
AND l.task_number = t.task_number)
WHERE
p.org_id = 2165
AND p.project_status_code = '1000';
Cheers!!
As I don't know exactly how many rows this query is returning or how many rows this table/view has.
I can provide you few simple tips which might be helpful for you for better query performance:
Check Indexes. There should be indexes on all fields used in the WHERE and JOIN portions of the SQL statement.
Limit the size of your working data set.
Only select columns you need.
Remove unnecessary tables.
Remove calculated columns in JOIN and WHERE clauses.
Use inner join, instead of outer join if possible.
You view contains lot of data so you can also break down and limit only the information you need from this view

Convert from JOIN on ROWID in Netezza to RedShift

I'm converting ETL queries written for Netezza to RedShift. I'm facing some issues with ROWID, because it's not supported in RedShift. I have tried using the key columns in the predicates, based on which ROWID is being generated to actually do a workaround. But i'm confused which columns would be used if there are multiple join operations. So is there anyone who can help me convert the query. I even tried to use ROW_NUMBER() over () function, but it also doesn't work because row ids won't be unique for all rows.
Here are the queries from netezza:
Query #1
CREATE TEMP TABLE TMPRY_DELTA_UPD_1000 AS
SELECT
nvl(PT.HOST_CRRNCY_SRRGT_KEY,-1) as HOST_CRRNCY_SRRGT_KEY,
delta1.ROWID ROW_ID
FROM TMPRY_POS_TX_1000 PT
LEFT JOIN TMPRY_TX_CSTMR_1000 TC ON PT.TX_SRRGT_KEY = TC.TX_SRRGT_KEY AND PT.UPDT_TMSTMP > '2017-01-01'
AND PT.INS_TMSTMP < '2017-01-01' AND PT.DVSN_NBR = 70
JOIN INS_EDW_CP.DM_TX_LINE_FCT delta1 ON PT.TX_SRRGT_KEY = delta1.TX_SRRGT_KEY
WHERE
(
delta1.HOST_CRRNCY_SRRGT_KEY <> PT.HOST_CRRNCY_SRRGT_KEY OR
)
AND PT.DVSN_NBR = 70;
Query #2
UPDATE INS_EDW_CP..DM_TX_LINE_FCT base
SET
base.HOST_CRRNCY_SRRGT_KEY = delta1.HOST_CRRNCY_SRRGT_KEY,
)
FROM TMPRY_DELTA_UPD_1000 delta1
WHERE base.ROWID = delta1.ROW_ID;
How can i convert query # 2?
Well, most of the time I have seen joins on rowid it is due to performance optimizations, but in some cases there ARE no unique combination of columns in the table.
Please talk to the people owning these data & run your own analysis of different key combinations and then get back to us.

Huge Performance Cost to using SQL Server ORDER BY clause?

What is causing a query to take longer if we have an ORDER BY clause at the end?
If I run the query without Order BY it takes a split second, but throw the ORDER BY on and its MINUTES!!
Is there a known reason for this?
SELECT top 100 a.UniqueID
,a.SomeID
,a.ContentID
,SortOrder
,b.ValueOfMine
INTO #ContentHistory
FROM widgetHistory.dbo.CustomerProductContent a WITH (NOLOCK)
LEFT JOIN widgetHistory.dbo.ProductContent b WITH (NOLOCK) ON a.ContentID = b.ContentID
LEFT JOIN widgetHistory.dbo.SomeThings k WITH (NOLOCK) ON a.SomeID = k.SomeID
LEFT JOIN widgetHistory.dbo.SubscriptionContents c WITH (NOLOCK) ON b.ContentID = c.ContentID
AND c.SubscriptionID = k.SubscriptionID
WHERE c.ContentStatus = 'GO'
ORDER BY UniqueID
It wont even complete so I cannot view the execution plan..
Without the ORDER BY, SQL Server will give you the first 100 rows it computes as soon as it's done computing them.
With the ORDER BY, SQL Server must compute all rows, sort them, and only then can it give you the 100 rows you asked for.
As SQL is set-oriented, I think that you would be better off creating your temporary table and then using your order by when you query the result set from the temporary table. Tables by definition do not have a default ordering, so you are always better off to use the Order By clause when you actually want to query the data rather than when you are posting the data.

Optimize SQL Query having SUM and COUNT functions

I have the following query which takes too long to retrieve around 70000 records. I noticed that the execution time is proportional to the number of the records retrieved. I need to optimize this query so that the execution time is not proportional to the number of records retrieved. Any idea?
;WITH TT AS (
SELECT TaskParts.[TaskPartID],
PartCost,
LabourCost,
VendorPaidPartAmount,
VendorPaidLabourAmount,
ROW_NUMBER() OVER (ORDER BY [Employees].[EmpCode] asc) AS RowNum
FROM [TaskParts],[Tasks],[WorkOrders], [Employees], [Status],[Models]
,[SubAccounts]WHERE 1=1 AND (TaskParts.TaskLineID = Tasks.TaskLineID)
AND (Tasks.WorkOrderID = [WorkOrders].WorkOrderID)
AND (Tasks.EmpID = [Employees].EmpID)
AND (TaskParts.StatusID = [Status].StatusID)
And (Models.ModelID = Tasks.FailedModelID)
And (SubAccounts.SubAccountID = Tasks.SubAccountID)AND (SubAccounts.GLAccountID = 5))
SELECT --*
COUNT(0)--,
SUM(ISNULL(PartCost,0)),
SUM(ISNULL(LabourCost,0)),
SUM(ISNULL(VendorPaidPartAmount,0)),
SUM(ISNULL(VendorPaidLabourAmount,0))
FROM TT
As Lieven noted, you can remove TD0, TD1 and TP1 as they are redundant.
You can also remove the row_number column, as that is not used and windowing functions are relatively expensive.
It may also be possible to remove some of the tables from the TT CTE if they are not used; however, as table names have not been included with each column selected, it isn't possible to tell which tables are not being used.
Aside from that, your query's response will always be proportional to the number of rows returned, because the RDBMS has to read each row returned to calculate the results.
Make sure that you have support index for each Foreign Key also most probably it is not the issue in this case but MS SQL optimization better works with inner joins.
Also I don't see any reason why you need RowNum if you need only totals.

sql, query optimisation with and inner join?

I'm trying to optimise my query, it has an inner join and coalesce.
The join table, is simple a table with one field of integer, I've added a unique key.
For my where clause I've created a key for the three fields.
But when I look at the plan it still says it's using a table scan.
Where am I going wrong ?
Here's my query
select date(a.startdate, '+'||(b.n*a.interval)||' '||a.intervaltype) as due
from billsndeposits a
inner join util_nums b on date(a.startdate, '+'||(b.n*a.interval)||'
'||a.intervaltype) <= coalesce(a.enddate, date('2013-02-26'))
where not (intervaltype = 'once' or interval = 0) and factid = 1
order by due, pid;
Most likely your JOIN expression cannot use any index and it is calculated by doing a NATURAL scan and calculate date(a.startdate, '+'||(b.n*a.interval)||' '||a.intervaltype) for every row.
BTW: That is a really weird join condition in itself. I suggest you find a better way to join billsndeposits to util_nums (if that is actually needed).
I think I understand what you are trying to achieve. But this kind of join is a recipe for slow performance. Even if you remove date computations and the coalesce (i.e. compare one date against another), it will still be slow (compared to integer joins) even with an index. And because you are creating new dates on the fly you cannot index them.
I suggest creating a temp table with 2 columns (1) pid (or whatever id you use in billsndeposits) and (2) recurrence_dt
populate the new table using this query:
INSERT INTO TEMP
SELECT PID, date(a.startdate, '+'||(b.n*a.interval)||' '||a.intervaltype)
FROM billsndeposits a, util_numbs b;
Then create an index on recurrence_dt columns and runstats. Now your select statement can look like this:
SELECT recurrence_dt
FROM temp t, billsndeposits a
WHERE t.pid = a.pid
AND recurrence_dt <= coalesce(a.enddate, date('2013-02-26'))
you can add a exp_ts on this new table, and expire temporary data afterwards.
I know this adds more work to your original query, but this is a guaranteed performance improvement, and should fit naturally in a script that runs frequently.
Regards,
Edit
Another thing I would do, is make enddate default value = date('2013-02-26'), unless it will affect other code and/or does not make business sense. This way you don't have to work with coalesce.