When running this query, so far there are 4 different results that can occur, but with the most frequency the main result is that it runs for over 10 minutes before I kill it, but around 1% of the time it will return either 2 records or 3 records or no records:
select *
from [master] as [p]( nolock )
inner join eddd(nolock) as [e]
on [p].[guid] = [e].pggg
inner join stppp as [s]( nolock )
on [e].[guid] = [s].evggg
inner join [edddex] as [x]( nolock )
on [e].[guid] = [x].[guid]
where [p].[guid] = '370ECB0F-6222-4D02-86DD-336BAAA49B81'
and ( CAST([s].rddd as date) = '20150821'
or CAST([s].rddd as date) = '20150821' )
and [s].[st111num] = 4
and [e].[ev111type] = 6
and [x].[l11105] = 6;
My friend argues that because removing this filter:
and ( CAST([s].rddd as date) = '20150821'
or CAST([s].rddd as date) = '20150821' )
causes the query to respond immediately, that this proves this is not a blocking/locking issue.
When having only SELECT access against a specific database and not having the ability to audit or view system tables, how does one find out whether a specific query is returning different results due to locking?
The CAST([s].rddd is the performance problem. When you put a function ( Like CAST(), MAX(), etc ) around a criteria field, you disable the use of the index that would otherwise aid in the execution of your query, and this causes a full table scan. Modify your query so that you can avoid use of such functions, AND, be sure your filters/matches involve the LEADING index fields. Not employing the leading index fields ALSO causes the indexe(s) to not be used.
For more insight, run an EXPLAIN PLAN ( Show Plan ...whichever you system utilizes ) and it will tell you what indexes are used, and much more.
Also,.... SELECT * is a performance killer, as it has to return ALL FIELDS, so even if the indexes can assist with the filtering, every block containing any of the needed records must be fully read. SELECT * is very rarely needed, so visit that as well and trim it down to only the needed fields.
Related
target table FUR_FACT_T has around 9 million records. while the inner query returns around 50k records. how to optimize the performance of this merge statement? I tried to put the parallel keyword but nothing seems to be working to optimize the performance
MERGE --+ parallel(targ)
INTO FUR_FACT_T targ
USING ( WITH LATEST_DATA AS
(
SELECT ITEM_NO ,
ITEM_TYPE ,
CODE_SUP ,
TYPE_SUP ,
CODE_RU ,
TYPE_RU ,
PERCENT_SHARE,
VALID_FROM ,
VALID_TO ,
MAX(VALID_FROM) OVER (PARTITION BY ITEM_NO,ITEM_TYPE,CODE_SUP,TYPE_SUP,CODE_RU,TYPE_RU) AS MAX_VALID_FROM
FROM FUR_FACT_T
WHERE (ITEM_NO,ITEM_TYPE,CODE_SUP,TYPE_SUP,CODE_RU,TYPE_RU) IN
(SELECT DISTINCT ITEM_NO,ITEM_TYPE,SUP_CODE,SUP_TYPE,SUP_CODE_RAGREE,SUP_TYPE_RAGREE
FROM TEMP.FACT_ITEM_TEMP_T WHERE VER_DELETE_DATE IS NOT NULL
)
AND date '2023-01-11' >= VALID_FROM
AND date '2023-01-11' <= nvl(VALID_TO,DATE '9999-12-31')
)
SELECT p.ITEM_NO ,
p.ITEM_TYPE ,
p.CODE_SUP ,
p.TYPE_SUP ,
p.CODE_RU ,
p.TYPE_RU ,
p.NO_OF_SEATS,
p.PERCENT_SHARE
FROM ( SELECT * FROM TEMP.FACT_ITEM_DATA_TEMP_T WHERE VER_DELETE_DATE IS NOT NULL) p
LEFT OUTER JOIN LATEST_DATA q
ON p.ITEM_NO = q.ITEM_NO
AND p.ITEM_TYPE = q.ITEM_TYPE
AND p.CODE_SUP = q.CODE_SUP
AND p.TYPE_SUP = q.TYPE_SUP
AND p.CODE_RU = q.CODE_RU
AND p.TYPE_RU = q.TYPE_RU
AND q.VALID_FROM = q.MAX_VALID_FROM
WHERE (p.ITEM_NO,p.ITEM_TYPE,p.CODE_SUP,p.TYPE_SUP,p.CODE_RU,p.TYPE_RU) IN
(SELECT DISTINCT ITEM_NO,ITEM_TYPE,SUP_CODE,SUP_TYPE,SUP_CODE_RAGREE,SUP_TYPE_RAGREE
FROM TEMP.FACT_ITEM_TEMP_T WHERE VER_DELETE_DATE IS NOT NULL
)
) src
ON ( targ.ITEM_NO = src.ITEM_NO
AND targ.ITEM_TYPE = src.ITEM_TYPE
AND targ.CODE_SUP = src.CODE_SUP
AND targ.TYPE_SUP = src.TYPE_SUP
AND targ.CODE_RU = src.CODE_RU
AND targ.TYPE_RU = src.TYPE_RU
AND targ.VALID_FROM = src.VALID_FROM)
WHEN MATCHED THEN
UPDATE SET
targ.PERCENT_SHARE = src.PERCENT_SHARE,
targ.VALID_TO = src.VALID_TO ;
Parallelism
The current query is probably not using any parallelism. The hint is using object-level parallelism, where parallelism is only applied to the specified object - . It's usually better to use statement-level parallelism, which parallelizes every part of the query. Replace the hint --+ parallel(targ) with --+ parallel to enable statement-level parallelism.
But parallelism by default does not apply to modified tables, since modifying objects in parallel can be dangerous since the entire table will be locked and the changes may be unrecoverable if there's a disaster. If you're willing to accept those risks, run a statement like this in your session:
ALTER SESSION ENABLE PARALLEL DML;
Note that parallelism will only make your query work harder, not smarter. No matter what you do, you'll want to generate an execution plan to ensure the execution plan is sensible.
Generic execution plan advice
One of the first step in performance tuning is to find out precisely what is slow. Since this appears to be a large data warehouse query, you'll want to generate the execution plan which will show you the actual rows and times instead of merely estimates. Add that execution plan as text to the question and you may be able to get more detailed help.
I am having issues with my query run time. I want the query to automatically pull the max id for a column because the table is indexed off of that column. If i punch in the number manually, it runs in seconds, but i want the query to be more dynamic if possible.
I've tried placing the sub-query in different places with no luck
SELECT *
FROM TABLE A
JOIN TABLE B
ON A.SLD_MENU_ITM_ID = B.SLD_MENU_ITM_ID
AND B.ACTV_FLG = 1
WHERE A.WK_END_THU_ID_NU >= (SELECT DISTINCT MAX (WK_END_THU_ID_NU) FROM TABLE A)
AND A.WK_END_THU_END_YR_NU = YEAR(GETDATE())
AND A.LGCY_NATL_STR_NU IN (7731)
AND B.SLD_MENU_ITM_ID = 4314
I just want this to run faster. Maybe there is a different approach i should be taking?
I would move the subquery to the FROM clause and change the WHERE clause to only refer to A:
SELECT *
FROM A CROSS JOIN
(SELECT MAX(WK_END_THU_ID_NU) as max_wet
FROM A
) am
ON a.WK_END_THU_ID_NU = max_wet JOIN
B
ON A.SLD_MENU_ITM_ID = B.SLD_MENU_ITM_ID AND
B.ACTV_FLG = 1
WHERE A.WK_END_THU_END_YR_NU = YEAR(GETDATE()) AND
A.LGCY_NATL_STR_NU IN (7731) AND
A.SLD_MENU_ITM_ID = 4314; -- is the same as B
Then you want indexes. I'm pretty sure you want indexes on:
A(SLD_MENU_ITM_ID, WK_END_THU_END_YR_NU, LGCY_NATL_STR_NU, SLD_MENU_ITM_ID)
B(SLD_MENU_ITM_ID, ACTV_FLG)
I will note that moving the subquery to the FROM clause probably does not affect performance, because SQL Server is smart enough to only execute it once. However, I prefer table references in the FROM clause when reasonable. I don't think a window function would actually help in this case.
I have something like this:
SELECT CompanyId
FROM Company
WHERE CompanyId not in
(SELECT CompanyId
FROM Company
WHERE (IsPublic = 0) and CompanyId NOT IN
(SELECT ShoppingLike.WhichId
FROM Company
INNER JOIN
ShoppingLike ON Company.CompanyId = ShoppingLike.UserId
WHERE (ShoppingLike.IsWaiting = 0) AND
(ShoppingLike.ShoppingScoreTypeId = 2) AND
(ShoppingLike.UserId = 75)
)
)
It has 3 select, I want to know how could I have it without making 3 selects, and which one has better speed for 1 million record? "select in select" or "left join"?
My experiences are from Oracle. There is never a correct answer to optimising tricky queries, it's a collaboration between you and the optimiser. You need to check explain plans and sometimes traces, often at each stage of writing the query, to find out what the optimiser in thinking. Having said that:
You could remove the outer SELECT by putting the entire contents of it's subquery WHERE clause in a NOT(...). On the face of it will prevent that outer full scan of Company (or it's index of CompanyId). Try it, check the output is the same and get timings, then remove it temporarily before trying the below. The NOT() may well cause the optimiser to stop considering an ANTI-JOIN against the ShoppingLike subquery due to an implicit OR being created.
Ensure that CompanyId and WhichId are defined as NOT NULL columns. Without this (or the likes of an explicit CompanyId IS NOT NULL) then ANTI-JOIN options are often discarded.
The inner most subquery is not correlated (does not reference anything from it's outer query) so can be extracted and tuned separately. As a matter of style I'd swap the table names round the INNER JOIN as you want ShoppingLike scanned first as it has all the filters against it. It wont make any difference but it reads easier and makes it possible to use a hint to scan tables in the order specified. I would even question the need for the Company table in this subquery.
You've used NOT IN when sometimes the very similar NOT EXISTS gives the optimiser more/alternative options.
All the above is just trial and error unless you start trying the explain plan. Oracle can, with a following wind, convert between LEFT JOIN and IN SELECT. 1M+ rows will create time to invest.
Sure could use some optimization help here. I've got a stored procedure which takes approximately 1 minute, 18 seconds to run and it gets even worse when I run the asp.net page which hits it.
Some stats:
tbl_Allocation typically has approximately 55K records
CS_Ready has ~300
Redate_Orders has ~2000
Here is the code:
ALTER PROCEDURE [dbo].[sp_Order_Display]
/*
(
#parameter1 int = 5,
#parameter2 datatype OUTPUT
)
*/
AS
/* SET NOCOUNT ON */
BEGIN
WTIH CS_Ready AS
(
SELECT
tbl_Order_Notes.txt_Order_Only As CS_Ready_Order
FROM
tbl_Order_Notes
INNER JOIN
tbl_Order_Notes_by_line ON tbl_Order_Notes.txt_Order_Only = SUBSTRING(tbl_Order_Notes_by_line.txt_Order_Key_by_line, 1, CHARINDEX('-', tbl_Order_Notes_by_line.txt_Order_Key_by_line, 0) - 1)
WHERE
(tbl_Order_Notes.bin_Customer_Service_Review = 'True')
AND (tbl_Order_Notes_by_line.dat_Recommended_Date_by_line IS NOT NULL)
AND (tbl_Order_Notes_by_line.bin_Redate_Request_by_line = 'True')
OR (tbl_Order_Notes.bin_Customer_Service_Review = 'True')
AND (tbl_Order_Notes_by_line.dat_Recommended_Date_by_line IS NULL)
AND (tbl_Order_Notes_by_line.bin_Redate_Request_by_line = 'False'
OR tbl_Order_Notes_by_line.bin_Redate_Request_by_line IS NULL)
),
Redate_Orders AS
(
SELECT DISTINCT
SUBSTRING(txt_Order_Key_by_line, 1, CHARINDEX('-', txt_Order_Key_by_line, 0) - 1) AS Redate_Order_Number
FROM
tbl_Order_Notes_by_line
WHERE
(bin_Redate_Request_by_line = 'True')
)
SELECT DISTINCT
tbl_Allocation.*, tbl_Order_Notes.*,
tbl_Order_Notes_by_line.*,
tbl_Max_Promised_Date_1.Max_Promised_Ship,
tbl_Max_Promised_Date_1.Max_Scheduled_Pick,
Redate_Orders.Redate_Order_Number, CS_Ready.CS_Ready_Order,
tbl_Most_Recent_Comments.Abbr_Comment,
MRC_Line.Abbr_Comment as Abbr_Comment_Line
FROM
tbl_Allocation
INNER JOIN
tbl_Max_Promised_Date AS tbl_Max_Promised_Date_1 ON tbl_Allocation.num_Order_Num = tbl_Max_Promised_Date_1.num_Order_Num
LEFT OUTER JOIN
CS_Ready ON tbl_Allocation.num_Order_Num = CS_Ready.CS_Ready_Order
LEFT OUTER JOIN
Redate_Orders ON tbl_Allocation.num_Order_Num = Redate_Orders.Redate_Order_Number
LEFT OUTER JOIN
tbl_Order_Notes ON Hidden_Order_Only = tbl_Order_Notes.txt_Order_Only
LEFT OUTER JOIN
tbl_Order_Notes_by_line ON Hidden_Order_Key = tbl_Order_Notes_by_line.txt_Order_Key_by_line
LEFT OUTER JOIN
tbl_Most_Recent_Comments ON Cast(tbl_Allocation.Hidden_Order_Only as varchar) = tbl_Most_Recent_Comments.Com_ID_Parent_Key
LEFT OUTER JOIN
tbl_Most_Recent_Comments as MRC_Line ON Cast(tbl_Allocation.Hidden_Order_Key as varchar) = MRC_Line.Com_ID_Parent_Key
ORDER BY
num_Order_Num, num_Line_Num
End
RETURN
What suggestions do you have to make this execute within five seconds or less?
Thanks,
Rob
Assuming you have appropriate indices defined, you still have several things that suggest problems.
1) You have 2 select distinct clauses in this query -- in a good design, distinct clauses are are rarely needed
2) The first inner join uses
tbl_Order_Notes_by_line
ON tbl_Order_Notes.txt_Order_Only
= SUBSTRING(tbl_Order_Notes_by_line.txt_Order_Key_by_line, 1,
CHARINDEX('-', tbl_Order_Notes_by_line.txt_Order_Key_by_line, 0) - 1)
This looks like a horrible join criteria -- function calls during the join that prevent any decent query optimization. My guess is that your are using data the has internal meaning and that you are parsing the internal meaning during the join, e.g.,
PartNumber = AAA-BBBB_NNNNNNNN
where AAA is the Country product line and BBBB is the year & month of the design
If you must have coded fields like these AND you need to manipulate them, put the codes into separate database fields and created a computer column -- or even a plan copy of the full part number field if the combined field is unusually complex.
This point is not a performance issue, but you have a long sub-query using multiple AND & OR clauses. I know the rules for operator precedence, you may know the rules for operator precedence, but will the next guy? Will you remember them an 1:00 when stuff is broken.
ADDED
You are using 2 common table expressions. I know others say it does not happen, but I don't really trust the query optimizer for CTE's -- I have had to recode CTE based joins for performance issues on several occasions -- creating an actual view equivalent to the CTE and using that instead can be a significant speedup. May well depend on the version of SQL server, but if you are running an older version I would definitely wonder about CTR optimization. -- This is not as important as the first 2 things I've mentioned, try to fix those first.
ADDED
I'm going to harsh on CTEs again, as I did not really explain why they are bad for performance, and it was bothering me. If you don't have performance issues, and you like the syntax, they can be useful in at least limited usage, personally I don't normally recommend them for anything more than that -- and given that it is MS specific syntactical sugar, I really can't recommend them much at all.
I think the primary reason that CTEs don't get optimized well is that there are no statistics for the opimizer to use. If you are pulling a lot of rows into a CTE, you are probably better off creating #temptable and populating it. You can even add an index or two to your #temptable and the optimizer can figure out how to use them too. A #temp table is similar, but at least through sql 2012, the were no faster than #temp that I could tell -- supposedly new goodness in server 2014 help this.
A CTE is really just a temporary view in disguise, which I why I suggested you can replace with a real view to better better performance (and you often can), or you can populate a temp table and sometime get even better performance.
I'm working on some upgrades to an internal web analytics system we provide for our clients (in the absence of a preferred vendor or Google Analytics), and I'm working on the following query:
select
path as EntryPage,
count(Path) as [Count]
from
(
/* Sub-query 1 */
select
pv2.path
from
pageviews pv2
inner join
(
/* Sub-query 2 */
select
pv1.sessionid,
min(pv1.created) as created
from
pageviews pv1
inner join Sessions s1 on pv1.SessionID = s1.SessionID
inner join Visitors v1 on s1.VisitorID = v1.VisitorID
where
pv1.Domain = isnull(#Domain, pv1.Domain) and
v1.Campaign = #Campaign
group by
pv1.sessionid
) t1 on pv2.sessionid = t1.sessionid and pv2.created = t1.created
) t2
group by
Path;
I've tested this query with 2 million rows in the PageViews table and it takes about 20 seconds to run. I'm noticing a clustered index scan twice in the execution plan, both times it hits the PageViews table. There is a clustered index on the Created column in that table.
The problem is that in both cases it appears to iterate over all 2 million rows, which I believe is the performance bottleneck. Is there anything I can do to prevent this, or am I pretty much maxed out as far as optimization goes?
For reference, the purpose of the query is to find the first page view for each session.
EDIT: After much frustration, despite the help received here, I could not make this query work. Therefore, I decided to simply store a reference to the entry page (and now exit page) in the sessions table, which allows me to do the following:
select
pv.Path,
count(*)
from
PageViews pv
inner join Sessions s on pv.SessionID = s.SessionID
and pv.PageViewID = s.ExitPage
inner join Visitors v on s.VisitorID = v.VisitorID
where
(
#Domain is null or
pv.Domain = #Domain
) and
v.Campaign = #Campaign
group by pv.Path;
This query runs in 3 seconds or less. Now I either have to update the entry/exit page in real time as the page views are recorded (the optimal solution) or run a batch update at some interval. Either way, it solves the problem, but not like I'd intended.
Edit Edit: Adding a missing index (after cleaning up from last night) reduced the query to mere milliseconds). Woo hoo!
For starters,
where pv1.Domain = isnull(#Domain, pv1.Domain)
won't SARG. You can't optimize a match on a function, as I remember.
I'm back. To answer your first question, you could probably just do a union on the two conditions, since they are obviously disjoint.
Actually, you're trying to cover both the case where you provide a domain, and where you don't. You want two queries. They may optimize entirely differently.
What's the nature of the data in these tables? Do you find most of the data is inserted/deleted regularly?
Is that the full schema for the tables? The query plan shows different indexing..
Edit: Sorry, just read the last line of text. I'd suggest if the tables are routinely cleared/insertsed, you could think about ditching the clustered index and using the tables as heap tables.. just a thought
Definately should put non-clustered index(es) on Campaign, Domain as John suggested
Your inner query (pv1) will require a nonclustered index on (Domain).
The second query (pv2) can already find the rows it needs due to the clustered index on Created, but pv1 might be returning so many rows that SQL Server decides that a table scan is quicker than all the locks it would need to take. As pv1 groups on SessionID (and hence has to order by SessionID), a nonclustered index on SessionID, Created, and including path should permit a MERGE join to occur. If not, you can force a merge join with "SELECT .. FROM pageviews pv2 INNER MERGE JOIN ..."
The two indexes listed above will be:
CREATE NONCLUSTERED INDEX ncixcampaigndomain ON PageViews (Domain)
CREATE NONCLUSTERED INDEX ncixsessionidcreated ON PageViews(SessionID, Created) INCLUDE (path)
SELECT
sessionid,
MIN(created) AS created
FROM
pageviews pv
JOIN
visitors v ON pv.visitorid = v.visitorid
WHERE
v.campaign = #Campaign
GROUP BY
sessionid
so that gives you the sessions for a campaign. Now let's see what you're doing with that.
OK, this gets rid of your grouping:
SELECT
campaignid,
sessionid,
pv.path
FROM
pageviews pv
JOIN
visitors v ON pv.visitorid = v.visitorid
WHERE
v.campaign = #Campaign
AND NOT EXISTS (
SELECT 1 FROM pageviews
WHERE sessionid = pv.sessionid
AND created < pv.created
)
To continue from doofledorf.
Try this:
where
(#Domain is null or pv1.Domain = #Domain) and
v1.Campaign = #Campaign
Ok, I have a couple of suggestions
Create this covered index:
create index idx2 on [PageViews]([SessionID], Domain, Created, Path)
If you can amend the Sessions table so that it stores the entry page, eg. EntryPageViewID you will be able to heavily optimise this.