I'm trying to optimize the performance of the following update query:
UPDATE a
SET a.[qty] =
(
SELECT MAX(b.[qty])
FROM [TableA] AS b
WHERE b.[ID] = a.[ID]
AND b.[Date] = a.[Date]
AND b.[qty] <> 0
)
FROM [TableA] a
WHERE a.[qty] = 0
AND a.[status] = 'New'
It deals with a large table with over 200m. rows.
I've already tried to create an index on [qty,status], but it was not really helpfull due to the index update at the end.
Generally it is not so easy to create indexes on this table, cause there are a lot other update/insert-queries.
So I'm think to reorganize this query somehow.
Any ideas?
TableA is a heap like this:
CREATE TABLE TableA (
ID INTEGER null,
qty INTEGER null,
date date null,
status VARCHAR(50) null,
);
Execution plan: https://www.brentozar.com/pastetheplan/?id=S1KLUWO15
It's difficult to answer without seeing execution plans and table definitions, but you can avoid self-joining by using an updatable CTE/derived table with window functions
UPDATE a
SET
qty = a.maxQty
FROM (
SELECT *,
MAX(CASE WHEN a.qty <> 0 THEN a.qty END) OVER (PARTITION BY a.ID, a.Date) AS maxQty
FROM [TableA] a
) a
WHERE a.qty = 0
AND a.status = 'New';
To support this query, you will need the following index
TableA (ID, Date) INCLUDE (qty, status)
The two key columns can be in either order, and if you do a clustered index then the INCLUDE columns are included automatically.
Related
I have two tables: Sale_Source (10.000 rows) and Sale_Target (1 billion rows). I have four queries to INSERT and UPDATE Sale_Target with the data from Sale_Source.
Source table has a nonclustered index on id and Date
Target table has indexes on id and Date but i have disable them to improve the Performance
Queries
INSERT query #1:
INSERT INTO dbo.Sale_Target (id, Salevalue, Salestring, Date)
SELECT id, Salevalue, Salestring, Date
FROM dbo.Sale_Source s
WHERE NOT EXISTS (SELECT 1 FROM dbo.Sale_Target t ON s.id = t.id)
UPDATE query #1 (when the date are the same):
UPDATE t
SET t.Salevalue = s.Salevalue,
t.Salestring = s.Salestring
FROM dbo.Sale_Source s
JOIN dbo.Sale_Target t ON t.id = s.id AND s.Date = t.Date
WHERE t.Salevalue <> s.Salevalue OR t.Salestring <> s.Salestring
UPDATE query #2 (when date on SaleSource > Date on SaleTarget):
UPDATE t
SET t.Salevalue = s.Salevalue,
t.Salestring = s.Salestring
FROM dbo.Sale_Source s
JOIN dbo.Sale_Target t ON t.id = s.id AND s.Date > t.Date
UPDATE query #3 (when id on Source is null in join clause):
UPDATE t
SET t.Salevalue = null,
t.Salestring = null
FROM dbo.Sale_Source s
LEFT JOIN dbo.Sale_Target t ON t.id = s.id
WHERE s.id IS NULL
The four queries require 1 hr. 30 mins. to finish, which is very slow for the source table with only 10.000 rows. I think the problem here is that each time when four queries run, they need to JOIN two source and target tables again, which cost a lot of time.
Therefore I have an idea:
I will create a query which saves the matched rows between two tables (source and target) into table temp_matched, and save the non matched rows (non matched to target) into temp_nonmatched.
For this, I have problem now with MERGE query because in MERGE we can't save data into another table.
Use temp_nonmatched in INSERT query, in UPDATE query. I will replace table Sale_Source with temp_matched.
Do you have any idea to do it or can we optimize these four queries in another way?
Thank you.
Table Definition:
CREATE TABLE [dbo].[Sale_Target](
[id] [numeric](8, 0) NOT NULL,
[Salevalue] [numeric](8, 0) NULL,
[Salestring] [varchar](20) NULL,
[Date] [Datetime2](7) NULL
) ON [PRIMARY]
CREATE NONCLUSTERED COLUMNSTORE INDEX "NCCI_Sale_Target"
ON [dbo].[Sale_Target] (id,Date)
CREATE TABLE [dbo].[Sale_Source](
[id] [numeric](8, 0) NOT NULL,
[Salevalue] [numeric](8, 0) NULL,
[Salestring] [varchar](20) NULL,
[Date] [Datetime2](7) NULL
) ON [PRIMARY]
CREATE NONCLUSTERED COLUMNSTORE INDEX "NCCI_Sale_Source"
ON [dbo].[Sale_Target] (id,Date)
Target Tables has no Index.
First thing I would do is index the Target Table on id and date.
This query shown below is taking almost 2 hrs to run and I want to reduce the execution time of this query. Any help would be really helpful for me.
Currently:
If Exists (Select 1
From PRODUCTS prd
Join STORE_RANGE_GRP_MATCH srg On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID
And srg.Match_Flag = 'Y'
And prd.Range_Event_Id = srg.LAR_Range_Event_Id
Where srg.Range_Event_Id Not IN (Select distinct Range_Event_Id
From Last_Authorised_Range)
)
I have tried replacing the Not IN clause by Not Exists and Left join but no luck in runtime execution.
What I have used:
If Exists( Select top 1 *
From PRODUCTS prd
Join STORE srg
On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID
And srg.Match_Flag = 'Y'
And prd.Range_Event_Id = srg.LAR_Range_Event_Id
and srg.Range_Event_Id ='45655'
Where NOT EXISTS (Select top 1 *
From Last_Authorised_Range where Range_Event_Id=srg.Range_Event_Id)
)
Product table has 432837 records and the Store table also has almost the same number of records. This table I am creating in the stored procedure itself and then dropping it in the end in the stored procedure.
Create Table PRODUCTS
(
Range_Event_Id int,
Store_Range_Grp_Id int,
Ranging_Prod_No nvarchar(14) collate database_default,
Space_Break_Code nchar(1) collate database_default
)
Create Clustered Index Idx_tmpLAR_PRODUCTS
ON PRODUCTS (Range_Event_Id, Ranging_Prod_No, Store_Range_Grp_Id, Space_Break_Code)
Should I use non clustered index on this table or what all can I do to lessen the execution time? Thanks in advance
First, you don't need top 1 or distinct in exists and in subqueries. But this shouldn't affect performance.
This is the query, slightly re-arranged so I can understand it better:
Select 1
From PRODUCTS prd Join
STORE srg
On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID and
prd.Range_Event_Id = srg.LAR_Range_Event_Id
Where srg.Match_Flag = 'Y'
srg.Range_Event_Id = 45655 and
Where NOT EXISTS (Select 1
From Last_Authorised_Range lar
where lar.Range_Event_Id = srg.Range_Event_Id)
)
Do note that I removed the double quotes around 45655. I presume this column is actually a number. If so, don't confuse yourself and the optimizer by using a string for the comparison.
Then, try indexes. I think the best indexes are:
store(Range_Event_Id, Match_Flag, Orig_Store_Range_Grp_ID, LAR_Range_Event_Id)
products(Store_Range_Grp_Id, Range_Event_Id) (or any index, clustered or otherwise, that starts with these two columns in either order)
Last_Authorised_Range(Range_Event_Id)
From what you describe as the volume of data, your query should not be taking hours. I think indexes can help.
I am using the below query to update one column based on the conditions it is specified. I am using "inner join" but it is taking more than 15 seconds to run the query even if it has to update no records(0 records).
UPDATE CONFIGURATION_LIST
SET DUPLICATE_SERIAL_NUM = 0
FROM CONFIGURATION_LIST
INNER JOIN (SELECT DISTINCT APPLIED_MAT_CODE, APPLIED_SERIAL_NUMBER, COUNT(*) AS NB
FROM CONFIGURATION_LIST
WHERE
PLANT = '0067'
AND APPLIED_SERIAL_NUMBER IS NOT NULL
AND APPLIED_SERIAL_NUMBER !=''
AND DUPLICATE_SERIAL_NUM = 1
GROUP BY
APPLIED_MAT_CODE, APPLIED_SERIAL_NUMBER
HAVING
COUNT(*) = 1) T2 ON T2.APPLIED_SERIAL_NUMBER = CONFIGURATION_LIST.APPLIED_SERIAL_NUMBER
AND T2.APPLIED_MAT_CODE = CONFIGURATION_LIST.APPLIED_MAT_CODE
WHERE
CONFIGURATION_LIST.PLANT = '0067'
AND DUPLICATE_SERIAL_NUM = 1
The index is there with APPLIED_SERIAL_NUMBER and APPLIED_MAT_CODE and fragmentation is also fine.
Could you please help me on the above query performance.
First, you don't need the DISTINCT when using GROUP BY. SQL Server probably ignores it, but it is a bad idea anyway:
UPDATE CONFIGURATION_LIST
SET DUPLICATE_SERIAL_NUM = 0
FROM CONFIGURATION_LIST INNER JOIN
(SELECT APPLIED_MAT_CODE, APPLIED_SERIAL_NUMBER, COUNT(*) AS NB
FROM CONFIGURATION_LIST cl
WHERE cl.PLANT = '0067' AND
cl.APPLIED_SERIAL_NUMBER IS NOT NULL AND
cl.APPLIED_SERIAL_NUMBER <> ''
cl.DUPLICATE_SERIAL_NUM = 1
GROUP BY cl.APPLIED_MAT_CODE, cl.APPLIED_SERIAL_NUMBER
HAVING COUNT(*) = 1
) T2
ON T2.APPLIED_SERIAL_NUMBER = CONFIGURATION_LIST.APPLIED_SERIAL_NUMBER AND
T2.APPLIED_MAT_CODE = CONFIGURATION_LIST.APPLIED_MAT_CODE
WHERE CONFIGURATION_LIST.PLANT = '0067' AND
DUPLICATE_SERIAL_NUM = 1;
For this query, you want the following index: CONFIGURATION_LIST(PLANT, DUPLICATE_SERIAL_NUM, APPLIED_SERIAL_NUMBER, APPLIED_MAT_CODE, APPLIED_SERIAL_NUMBER).
The HAVING COUNT(*) = 1 suggests that you might really want NOT EXISTS (which would normally be faster). But you don't really explain what the query is supposed to be doing, you only say that this code is slow.
Looks like you're checking the table for rows that exist in the same table with the same values, and if not, update the duplicate column to zero. If your table has a unique key (identity field or composite key), you could do something like this:
UPDATE C
SET C.DUPLICATE_SERIAL_NUM = 0
FROM
CONFIGURATION_LIST C
where
not exists (
select
1
FROM
CONFIGURATION_LIST C2
where
C2.APPLIED_SERIAL_NUMBER = C.APPLIED_SERIAL_NUMBER and
C2.APPLIED_MAT_CODE = C.APPLIED_MAT_CODE and
C2.UNIQUE_KEY_HERE != C.UNIQUE_KEY_HERE
) and
C.PLANT = '0067' and
C.DUPLICATE_SERIAL_NUM = 1
I will try with a select first:
select APPLIED_MAT_CODE, APPLIED_SERIAL_NUMBER, count(*) as n
from CONFIGURATION_LIST cl
where
cl.PLANT='0067' and
cl.APPLIED_SERIAL_NUMBER IS NOT NULL and
cl.APPLIED_SERIAL_NUMBER <> ''
group by APPLIED_MAT_CODE, APPLIED_SERIAL_NUMBER;
How many rows do you get with this and how long does it take?
If you remove your DUPLICATE_SERIAL_NUM column from your table it might be very simple. The DUPLICATE_SERIAL_NUM suggests that you are searching for duplicates. As you count your rows you could introduce a simple table that contains the counts:
create table CLCOUNT ( N int unsigned, C int /* or what APPLIED_MAT_CODE is */, S int /* or what APPLIED_SERIAL_NUMBER is */, PLANT char(20) /* or what PLANT is */, index unique (C,S,PLANT), index(PLANT,N));
insert into CLCOUNT select count(*), cl.APPLIED_MAT_CODE, cl.APPLIED_SERIAL_NUMBER, cl.PLANT
from CONFIGURATION_LIST cl
where
cl.PLANT='0067' and
cl.APPLIED_SERIAL_NUMBER IS NOT NULL and
cl.APPLIED_SERIAL_NUMBER <> ''
group by APPLIED_MAT_CODE, APPLIED_SERIAL_NUMBER;
How long does this take?
Now you can simply select * from CLCOUNT where PLANT='0067' and N=1;
This is all far from being perfect. But you should be able to analyze (EXPLAIN SELECT ...) your queries and find why it takes so long.
We have a data processing application that has two separate paths that should eventually produce similar results. We also have a database-backed monitoring service that compares and utilizes the results of this processing. At any point in time, either of the two paths may or may not have produced results for the operation, but I want to be able to query a view that tells me about any results that have been produced.
Here's a simplified example of the schema I started with:
create table LeftResult (
DateId int not null,
EntityId int not null,
ProcessingValue int not null
primary key ( DateId, EntityId ) )
go
create table RightResult (
DateId int not null,
EntityId int not null,
ProcessingValue int not null
primary key ( DateId, EntityId ) )
go
create view CombinedResults
as
select
DateId = isnull( l.DateId, r.DateId ),
EntityId = isnull( l.EntityId, r.EntityId ),
LeftValue = l.ProcessingValue,
RightValue = r.ProcessingValue,
MaxValue = case
when isnull( l.ProcessingValue, 0 ) > isnull( r.ProcessingValue, 0 )
then isnull( l.ProcessingValue, 0 )
else isnull( r.ProcessingValue, 0 )
end
from LeftResult l
full outer join RightResult r
on l.DateId = r.DateId
and l.EntityId = r.EntityId
go
The problem with this is that Sql Server always chooses to scan the PK on LeftResult and RightResult rather than seek, even when queries to the view include DateId and EntityId as predicates. This seems to be due to the isnull() checks on the results. (I've even tried using index hints and forceseek, but without avail -- the query plan still shows a scan.)
However, I can't simply replace the isnull() results, since either the left or right side could be missing from the join (because the associated process hasn't populated the table yet).
I don't particularly want to duplicate the MaxValue logic across all of the consumers of the view (in reality, it's quite a bit more complex calculation, but the same idea applies.)
Is there a good strategy I can use to structure this view or queries against it so that the
query plan will utilize a seek rather than a scan?
try using left outer join for one of the tables, then union those results with the excluded rows from the other table.
like:
select (...)
from LeftResult l
left outer join RightResult r
on l.DateId = r.DateId
and l.EntityId = r.EntityId
(...)
UNION ALL
select (...)
from RightResult r
leftouter join LeftResult l
on l.DateId = r.DateId
and l.EntityId = r.EntityId
WHERE
l.dateid is null
I have a query which involves 2 tables 'Coupons' and 'CouponUsedLog' in SQL Server, the query below will obtain some information from these 2 tables for statistics study use. Somehow I feel that while my query works and returns me the desired results, I feel that I can be written in a more efficient way, can someone please advice if there's a better way to rewrite this? Am I using too many unnecessary variables and joins? Thanks.
DECLARE #CouponISSUED int=null
DECLARE #CouponUSED int=null
DECLARE #CouponAVAILABLE int=null
DECLARE #CouponEXPIRED int=null
DECLARE #CouponLastUsed Date=null
--Total CouponIssued
SET #CouponISSUED =
(
select count(*)
from Coupon C Left Join
couponusedlog CU on C.autoid = CU.Coupon_AutoID
where C.VoidedBy is null and
C.VoidedOn is null and
DeletedBy is null and
DeletedOn is null and
Card_AutoID in (Select AutoID
from Card
where MemberID = 'Mem001')
)
--Total CouponUsed
SET #CouponUSED =
(
select count(*)
from couponusedlog CU Left Join
Coupon C on CU.Coupon_AutoID = V.autoid
where CU.VoidedBy is null and
CU.VoidedOn is null and
C.Card_AutoID in (select AutoID
from Card
where MemberID = 'Mem001')
)
SET #CouponAVAILABLE = #CouponISSUED - #CouponUSED
--Expired Coupons
SET #CouponEXPIRED =
(
select Count(*)
from Coupon C Left Join
couponusedlog CU on C.autoid = CU.Coupon_AutoID
where C.VoidedBy is null and
C.VoidedOn is null and
deletedBy is null and
deletedOn is null and
Card_AutoID in (select AutoID
from Card
where MemberID = 'Mem002') and
CONVERT (date, getdate()) > C.expirydate
)
--Last Used On
SET #CouponLastUsed =
(
select CONVERT(varchar(10),
Max(VU.AddedOn), 103) AS [DD/MM/YYYY]
from couponusedlog CU Left Join
coupon C on CU.Coupon_AutoID = C.autoid
where CU.voidedBy is null and
CU.voidedOn is null and
C.Card_AutoID in (select AutoID
from Card
where MemberID = 'Mem002')
)
Select #CouponISSUED As Coupon_Issued,
#CouponUSED As Coupon_Used,
#CouponAVAILABLE As Coupon_Available,
#CouponEXPIRED As Coupon_Expired,
#CouponLastUsed As Last_Coupon_UsedOn
In general its better to do things in a single query if you you're just looking for counts of things particularly against nearly the same data set then in four separate queries.
This query combines what you need into a single query by converting your WHERE Clauses into SUMS of CASE statements. The MAX of the date is just a normal thing you can do when you're doing a count or a sum.
SELECT COUNT(*) couponissued,
SUM(CASE
WHEN deletedby IS NULL
AND deletedon IS NULL THEN 1
ELSE 0
END) AS couponused,
SUM(CASE
WHEN deletedby IS NULL
AND deletedon IS NULL
AND Getdate() > c.expirydate THEN 1
ELSE 0
END) AS couponex,
MAX(vu.addedon) CouponEXPIRED
FROM [couponusedlog] cu
LEFT JOIN [Coupon] c
ON ( cu.coupon_autoid = v.autoid )
WHERE cu.voidedby IS NULL
AND cu.voidedon IS NULL
AND ( c.card_autoid IN (SELECT [AutoID]
FROM [Card]
WHERE memberid = 'Mem001') )
You can then convert that into a Common Table Expression to do your subtraction and formatting
Are you asking this question out of a proactive desire to be as effecient as possible, or because of an actual performance issue you would like to correct? You can make this more effecient at the cost of having code that is harder to manage. If the performance is okay right now I would highly recommend you leave it because the next person to come along will be able to understand it just fine. If you make one huge effecient but garbled sql statement out of it then when you or anyone else wants to update something about it it's going to take you 3 times longer as you try to re-figure out what the heck you were thinking when you wrote it.