I am trying to calculate year to date
I try this script, it returns the good values but it's too slow
select
T.Delivery_month,
T.[Delivery_Year],
Sales_Organization,
SUM(QTY) as Month_Total,
COALESCE(
(
select SUM(s2.QTY)
FROM stg.Fact_DC_AGG s2
where
s2.Sales_Organization = T.Sales_Organization
and s2.[Delivery_Year]=T.[Delivery_Year]
AND s2.Delivery_month<= T.Delivery_month
),0) as YTD_Total
from stg.Fact_DC_AGG T
group by
T.Delivery_month,
T.[Delivery_Year],
Sales_Organization
ORDER BY
Sales_Organization,T.[Delivery_Year],
T.Delivery_month
I modified it in order to optimize it, but it returns wrongs values with duplicates:
select
T.Delivery_month,
T.[Delivery_Year],
Sales_Organization,
SUM(QTY) as Month_Total,
COALESCE(
(
),0) as YTD_Total
from stg.Fact_DC_AGG T
INNER JOIN stg.Fact_DC_AGG s2
ON
s2.Sales_Organization = T.Sales_Organization
and s2.[Delivery_Year]=T.[Delivery_Year]
AND s2.Delivery_month<= T.Delivery_month
group by
T.Delivery_month,
T.[Delivery_Year],
Sales_Organization
ORDER BY
Sales_Organization,T.[Delivery_Year],
T.Delivery_month
How to optimize the query or to correct the second script ?
To optimize the performance of your query first you have to remove sub-query and use join instead of sub-query as well as you also need to generate an actual execution plan for your query and identify required missing index.
NOTE: Missing indexes might affect your SQL Server performance, that can down your SQL Server performance, So be sure to review your actual query execution plans and the identify the right index.
For more Information Please visit the following links.
1) SQL Server Basic Performance Tuning Tips and Tricks
2) Create Missing Index From the Actual Execution Plan
Related
I'm writing a query against what is currently a small table in development. In production, we expect it to grow quite large over the life of the table (the primary key is a number(10)).
My query does a selection for the top N rows of my table, filtered by specific criteria and ordered by date ascending. Essentially, we're assigning records, in bulk, to a specific user for processing. In my case, N will only be 10, 20, or 30.
I'm currently selecting my primary keys inside a subselect, using rownum to limit my results, like so:
SELECT log_number FROM (
SELECT
il2.log_number,
il2.final_date
FROM log il2
INNER JOIN agent A ON A.agent_id = il2.agent_id
INNER JOIN activity lat ON il2.activity_id = lat.activity_id
WHERE (p_criteria1 IS NULL OR A.criteria1 = p_criteria1)
WHERE lat.criteria2 = p_criteria2
AND lat.criteria3 = p_criteria3
AND il2.criteria3 = p_criteria4
AND il2.current_user IS NULL
GROUP BY il2.log_number, il2.final_date
ORDER BY il2.final_date ASC)
WHERE ROWNUM <= p_how_many;
Although I have a stopkey due to the rownum, I'm wondering if using an Oracle hint here (/*+ FIRST_ROWS(p_how_many) */) on the inner select will affect the query plan in the future. I'd like to know more about what the database does when this hint is specified; does it actually make a difference if you have to order the table? (Seems like it wouldn't.) Or does it only affect the select portion, after the access and join parts?
Looking at the explain plan now doesn't get me much as the table hasn't grown yet.
Thanks for your help!
Even with an ORDER BY, different execution plans could be selected when you limit the number of rows returned. It can be easier to select the top n rows by some order key, then sort those, than to sort the entire table then select the top n rows.
However, the GROUP BY is likely to restrict the benefit of this sort of optimization. Grouping (or a DISTINCT operation) generally prevents the optimizer from using a plan that can pipe individual rows into a STOPKEY operation.
I've inherited a SQL Server based application and it has a stored procedure that contains the following, but it hits timeout. I believe I've isolated the issue to the SELECT MAX() part, but I can't figure out how to use alternatives, such as ROW_NUMBER() OVER( PARTITION BY...
Anyone got any ideas?
Here's the "offending" code:
SELECT BData.*, B.*
FROM BData
INNER JOIN
(
SELECT MAX( BData.StatusTime ) AS MaxDate, BData.BID
FROM BData
GROUP BY BData.BID
) qryMaxDates
ON ( BData.BID = qryMaxDates.BID ) AND ( BData.StatusTime = qryMaxDates.MaxDate )
INNER JOIN BItems B ON B.InternalID = qryMaxDates.BID
WHERE B.ICID = 2
ORDER BY BData.StatusTime DESC;
Thanks in advance.
SQL performance problems are seldom addressed by rewriting the query. The compiler already know how to rewrite it anyway. The problem is always indexing. To get MAX(StatusTime ) ... GROUP BY BID efficiently, you need an index on BData(BID, StatusTime). For efficient seek of WHERE B.ICID = 2 you need an index on BItems.ICID.
The query could also be, probably, expressed as a correlated APPLY, because it seems that what is what's really desired:
SELECT D.*, B.*
FROM BItems B
CROSS APPLY
(
SELECT TOP(1) *
FROM BData
WHERE B.InternalID = BData.BID
ORDER BY StatusTime DESC
) AS D
WHERE B.ICID = 2
ORDER BY D.StatusTime DESC;
SQL Fiddle.
This is not semantically the same query as OP, the OP would return multiple rows on StatusTime collision, I just have a guess though that this is what is desired ('the most recent BData for this BItem').
Consider creating the following index:
CREATE INDEX LatestTime ON dbo.BData(BID, StatusTime DESC);
This will support a query with a CTE such as:
;WITH x AS
(
SELECT *, rn = ROW_NUMBER() OVER (PARTITION BY BID ORDER BY StatusDate DESC)
FROM dbo.BData
)
SELECT * FROM x
INNER JOIN dbo.BItems AS bi
ON x.BID = bi.InternalID
WHERE x.rn = 1 AND bi.ICID = 2
ORDER BY x.StatusDate DESC;
Whether the query still gets efficiencies from any indexes on BItems is another issue, but this should at least make the aggregate a simpler operation (though it will still require a lookup to get the rest of the columns).
Another idea would be to stop using SELECT * from both tables and only select the columns you actually need. If you really need all of the columns from both tables (this is rare, especially with a join), then you'll want to have covering indexes on both sides to prevent lookups.
I also suggest calling any identifier the same thing throughout the model. Why is the ID that links these tables called BID in one table and InternalID in another?
Also please always reference tables using their schema.
Bad habits to kick : avoiding the schema prefix
This may be a late response, but I recently ran into the same performance issue where a simple query involving max() is taking more than 1 hour to execute.
After looking at the execution plan, it seems in order to perform the max() function, every record meeting the where clause condition will be fetched. In your case, it's every record in your table will need to be fetched before performing max() function. Also, indexing the BData.StatusTime will not speed up the query. Indexing is useful for looking up a particular record, but it will not help performing comparison.
In my case, I didn't have the group by so all I did was using the ORDER BY DESC clause and SELECT TOP 1. The query went from over 1 hour down to under 5 minutes. Perhaps, you can do what Gordon Linoff suggested and use PARTITION BY. Hopefully, your query can speed up.
Cheers!
The following is the version of your query using row_number():
SELECT bd.*, b.*
FROM (select bd.*, row_number() over (partition by bid order by statustime desc) as seqnum
from BData bd
) bd INNER JOIN
BItems b
ON b.InternalID = bd.BID and bd.seqnum = 1
WHERE B.ICID = 2
ORDER BY BData.StatusTime DESC;
If this is not faster, then it would be useful to see the query plans for your query and this query to figure out how to optimize them.
Depends entirely on what kind of data you have there. One alternative that may be faster is using CROSS APPLY instead of the MAX subquery. But more than likely it won't yield any faster results.
The best option would probably be to add an index on BID, with INCLUDE containing the StatusTime, and if possible filtering that by InternalID's matching BItems.ICID = 2.
[UNSOLVED] But I've moved on!
Thanks to everyone who provided answers / suggestions. Unfortunately I couldn't get any further with this, so have given-up trying for now.
It looks like the best solution is to re-write the application to UPDATE the latest data into into a different table, that way it's a really quick and simple SELECT to latest readings.
Thanks again for the suggestions.
I have a massive query that has been working just fine, however, due to the number or records now in my database, the query is taking longer and longer for the stored procedure to complete.
I had a hard enough time getting the query to work in the first place and I'm not confident in myself to either A) Simplify the query or B) break it into smaller queries/stored procedures.
Can any expert help me out?
SELECT
r.resourceFirstName,
r.resourceLastName,
a.eventDateTime,
CONVERT(char(1), a.eventType) as eventType,
CONVERT(varchar(5), a.reasonCode) as reasonCode,
r.extension,
GETDATE() AS ciscoDate into #temp_Agent
FROM
CCX1.db_cra.dbo.Resource r
INNER JOIN CCX1.db_cra.dbo.AgentStateDetail a
ON r.resourceID = a.agentID
INNER JOIN (
SELECT
p.resourceFirstName,
p.resourceLastName,
MAX(e.eventDateTime) MaxeventDateTime
FROM
CCX1.db_cra.dbo.Resource p
INNER JOIN CCX1.db_cra.dbo.AgentStateDetail e
ON p.resourceID = e.agentID
where
e.eventDateTime > (GETDATE() - 1)
GROUP BY
p.resourceFirstName,
p.resourceLastName
) d
ON r.resourceFirstName = d.resourceFirstName
AND r.resourceLastName = d.resourceLastName
AND a.eventDateTime = d.MaxeventDateTime
AND r.active = 1
where
a.eventDateTime >= (GETDATE() - 7)
ORDER BY
r.resourceLastName,
r.resourceFirstName ASC
Can't give the correct answer having only the query. But...
Consider putting an index on "eventDateTime".
It appears you are joining with a set of records within 1 day. That would make the 7 day filter in the outer query irrelevant. I don't have the ability to test, but maybe your query can be reduced to this? (pseudo code below)
Also consider different solutions. Maybe partition a table based on the datetime. Maybe have a separate database for reporting using star schemas or cube design.
What is being done with the temporary table #temp_Agent?
declare #max datetime = (select max(eventDateTime)
from CCX1.db_cra.dbo.AgentStateDetail
where active=1
and eventDateTime > getdate()-1);
if(#max is null)
exit no records today
SELECT r.resourceFirstName,
r.resourceLastName,
a.eventDateTime,
CONVERT(char(1), a.eventType) as eventType,
CONVERT(varchar(5), a.reasonCode) as reasonCode,
r.extension,
GETDATE() AS ciscoDate
into #temp_Agent
FROM CCX1.db_cra.dbo.Resource r
INNER JOIN CCX1.db_cra.dbo.AgentStateDetail a ON r.resourceID = a.agentID
where r.active = 1
and a.eventDateTime = #max;
Without the full definition of your tables it is hard to troubleshoot why the query is hanging out, but i give you a couple of tips that could help you improve the performance in the Query:
Instead of using a temporary table such as "#temp_Agent" it is preferable to create a local Variable of type "Table". You can achieve exactly the same result but you can drastically increase the performace because:
A local variable of the type "Table" can be created with primary keys and indexes, which improves how SQL finds the information.
A local variable can be clustered, which also improves the performacne in certain scenarios, because the information is accesed directly from disk
A temporary table requires that SQL resolves at runtime the types of columns that should be used to store the information obtained by the query.
If you require to store information in temporary tables, variables, etc, avoid storing unnecesary information in those variables. for example, if you only require latter in your proccess two id colums, then avoid including extra columns that you can retrieve lather
If there is a bunch of information that you need to retireve from multiple sources, you should consider using a view, which also can be indexed and improve the retrieval of the information.
Avoid using unnecesary sorting, grouping, conversions and joins of strings. These particular operations cad degrade drastically the performance of a QUery.
As an extra tip, you can made use of the SQL server tools designed to improve your database and objects:
Check the execution plan of your query (Menu Query->Include Actual Execution Plan, or control + M)
Run the SQL Server Engine Tunning Advisor to analyze a trace file(See SQL Server Profiler) and add extra indexes to improve the database performace
Check with SQL Server Profiler if your Queryes are not generating Deadlocks in the tables you are using to get the information. It is a good practice to use "Hints" in all of your queries to avoid lock problems and other behaviors that in certain scenarios you want to avoid.
Se attached links to better understand what i mean:
Understanding Execution Plan
Usage of Hints
Tunning Options available in SQL Server
I hope the information helps.
Assuming this is SQLServer, try:
WITH CTE AS
(SELECT r.resourceFirstName,
r.resourceLastName,
a.eventDateTime,
CONVERT(char(1), a.eventType) as eventType,
CONVERT(varchar(5), a.reasonCode) as reasonCode,
r.extension,
GETDATE() AS ciscoDate,
RANK() OVER (PARTITION BY r.resourceFirstName, r.resourceLastName
ORDER BY a.eventDateTime DESC) RN
FROM CCX1.db_cra.dbo.Resource r
INNER JOIN CCX1.db_cra.dbo.AgentStateDetail a
ON r.resourceID = a.agentID AND a.eventDateTime >= (GETDATE() - 1)
where r.active = 1)
SELECT resourceFirstName, resourceLastName, eventDateTime, eventType, reasonCode, r.extension, ciscoDate
into #temp_Agent
FROM CTE
WHERE RN=1
ORDER BY r.resourceLastName, r.resourceFirstName ASC
I have a performance issue on the following (example) select statement that returns the first row using a sub query:
SELECT ITEM_NUMBER,
PROJECT_NUMBER,
NVL((SELECT DISTINCT
FIRST_VALUE(L.LOCATION) OVER (ORDER BY L.SORT1, L.SORT2 DESC) LOCATION
FROM LOCATIONS L
WHERE L.ITEM_NUMBER=P.ITEM_NUMBER
AND L.PROJECT_NUMBER=P.PROJECT_NUMBER
),
P.PROJECT_NUMBER) LOCATION
FROM PROJECT P
The DISTINCT is causing the performance issue by performing a SORT and UNIQUE but I can't figure out an alternative.
I would however prefer something akin to the following but referencing within 2 select statements doesn't work:
SELECT ITEM_NUMBER,
PROJECT_NUMBER,
NVL((SELECT LOCATION
FROM (SELECT L.LOCATION LOCATION
ROWNUM RN
FROM LOCATIONS L
WHERE L.ITEM_NUMBER=P.ITEM_NUMBER
AND L.PROJECT_NUMBER=P.PROJECT_NUMBER
ORDER BY L.SORT1, L.SORT2 DESC
) R
WHERE RN <=1
), P.PROJECT_NUMBER) LOCATION
FROM PROJECT P
Additionally:
- My permissions do not allow me to create a function.
- I am cycling through 10k to 100k records in the main query.
- The sub query could return 3 to 7 rows before limiting to 1 row.
Any assistance in improving the performance is appreciated.
It's difficult to understand without sample data and cardinalities, but does this get you what you want? A unique list of projects and items, with the first occurrence of a location?
SELECT
P.ITEM_NUMBER,
P.PROJECT_NUMBER,
MIN(L.LOCATION) KEEP (DENSE_RANK FIRST ORDER BY L.SORT1, L.SORT2 DESC) LOCATION
FROM
LOCATIONS L
INNER JOIN
PROJECT P
ON L.ITEM_NUMBER=P.ITEM_NUMBER
AND L.PROJECT_NUMBER=P.PROJECT_NUMBER
GROUP BY
P.ITEM_NUMBER,
P.PROJECT_NUMBER
I encounter similar problem in the past -- and while this is not ultimate solution (in fact might just be a corner-cuts) -- Oracle query optimizer can be adjusted with the OPTIMIZER_MODE init param.
Have a look at chapter 11.2.1 on http://docs.oracle.com/cd/B28359_01/server.111/b28274/optimops.htm#i38318
FIRST_ROWS
The optimizer uses a mix of cost and heuristics to find a best plan
for fast delivery of the first few rows. Note: Using heuristics
sometimes leads the query optimizer to generate a plan with a cost
that is significantly larger than the cost of a plan without applying
the heuristic. FIRST_ROWS is available for backward compatibility and
plan stability; use FIRST_ROWS_n instead.
Of course there are tons other factors you should analyse like your index, join efficiency, query plan etc..
I have the following query which takes too long to retrieve around 70000 records. I noticed that the execution time is proportional to the number of the records retrieved. I need to optimize this query so that the execution time is not proportional to the number of records retrieved. Any idea?
;WITH TT AS (
SELECT TaskParts.[TaskPartID],
PartCost,
LabourCost,
VendorPaidPartAmount,
VendorPaidLabourAmount,
ROW_NUMBER() OVER (ORDER BY [Employees].[EmpCode] asc) AS RowNum
FROM [TaskParts],[Tasks],[WorkOrders], [Employees], [Status],[Models]
,[SubAccounts]WHERE 1=1 AND (TaskParts.TaskLineID = Tasks.TaskLineID)
AND (Tasks.WorkOrderID = [WorkOrders].WorkOrderID)
AND (Tasks.EmpID = [Employees].EmpID)
AND (TaskParts.StatusID = [Status].StatusID)
And (Models.ModelID = Tasks.FailedModelID)
And (SubAccounts.SubAccountID = Tasks.SubAccountID)AND (SubAccounts.GLAccountID = 5))
SELECT --*
COUNT(0)--,
SUM(ISNULL(PartCost,0)),
SUM(ISNULL(LabourCost,0)),
SUM(ISNULL(VendorPaidPartAmount,0)),
SUM(ISNULL(VendorPaidLabourAmount,0))
FROM TT
As Lieven noted, you can remove TD0, TD1 and TP1 as they are redundant.
You can also remove the row_number column, as that is not used and windowing functions are relatively expensive.
It may also be possible to remove some of the tables from the TT CTE if they are not used; however, as table names have not been included with each column selected, it isn't possible to tell which tables are not being used.
Aside from that, your query's response will always be proportional to the number of rows returned, because the RDBMS has to read each row returned to calculate the results.
Make sure that you have support index for each Foreign Key also most probably it is not the issue in this case but MS SQL optimization better works with inner joins.
Also I don't see any reason why you need RowNum if you need only totals.