Update statement taking 10 minutes with large data set

Update statement taking 10 minutes with large data set - sql

I have a SQL stored procedure in which one statement is taking 95% of the total time (10 minutes) to complete. #Records has approximately 133,000 rows and Records has approximately 12,000 rows.
-- Check Category 1 first
UPDATE #Records
SET Id = (SELECT TOP 1 Id FROM Records WHERE Cat1=#Records.Cat1)
WHERE Cat1 IS NOT NULL
I have tried adding a index to Cat1 in #Records, but the statement time did not improve.
CREATE CLUSTERED INDEX IDX_C_Records_Cat1 ON #Records(Cat1)
A similar statement that follows, takes only a fraction of the time
-- Check Category 2
UPDATE #Records
SET Id = (SELECT TOP 1 Id FROM Records WHERE Cat2=#Records.Cat2)
WHERE ID IS NULL
Any ideas on why this is happening or what I can do to make this statement more time effective?
Thanks in advance.
I am running this on Microsoft SQL Server 2005.

update with join maybe
update t
set t.ID = r.ID
FROM (Select Min(ID) as ID,Cat1 From Records group by cat1) r
INNER JOIN #Records t ON r.Cat1 = t.cat1
Where t.cat1 is not null

I would say your problem is probably that you are using a correlated subquery instead of a join. Joins work in sets, correlated subqueries run row-by-agonzing-row and are essentially cursors.

In my experience, when you are trying to update a high number of records, sometimes is faster to use a cursor and iterate throught records rather than use an update query.
Maybe this help in your case.

Related

UPDATE takes long time on SQL Server

I have tables "Products" 300,000 rows, and "Imported_Products" 4,000 rows. Also I have view "View_Imported_Products" which is based on "Imported_Products" to make it well-formed.
When I run UPDATE:
UPDATE Products SET DateDeleted = GETDATE()
WHERE Suppiler = 'Supplier1' AND SKU NOT IN (SELECT SKU FROM View_Imported_Products)
It takes a lot of time about 1 minute even if I run it second time and no rows update.
I have added non-clustered indexes on Products.SKU and View_Imported_Products.SKU, also I changed NOT IN to NOT EXISTS
UPDATE Products SET DateDeleted = GETDATE() FROM Products P
WHERE Supplier = 'Supplier1' AND NOT EXISTS (SELECT SKU FROM View_Imported_Products I WHERE P.SKU=I.SKU)
But it still takes about 16 seconds to run.
What I'm doing wrong, and how to improve that update to run it fast.
Appreciate any help.
Thank you
UPDATED
SELECT SKU FROM View_ImportedProducts - runs very fast, it takes 00:00:00 sec
Changed query to use LEFT JOIN, instead NOT EXISTS - doesn't help much
SELECT * FROM Products AS P
WHERE P.Supplier = 'Supplier1' AND DateDeleted IS NULL
AND
NOT EXISTS
(
SELECT
SKU
FROM View_ImportedProducts AS I
WHERE P.SKU = I.SKU
)
takes also long time to execute

Resolved it by added Non-clustered index to "Imported_Products".SKU field. My mistake was I added non-clustered index on "View_Imported_Products".SKU, instead of original table. Thank you all for help and replies!

If a lot of rows (tens of thousands) are being updated you are creating a big hit on the log. If this is so you want to update either 1000 or 10000 rows at a time and then commit. Your transaction will have a much smaller impact on the transaction log and will execute a lot faster.

Why not using joins
UPDATE Products SET DateDeleted = GETDATE() FROM Products P
Left join View_Imported_Products I On P.SKU=I.SKU
Where I.Sku is null
And you have to create non clustered indexes on p.sku and i.sku

Can this SQL Query be optimized to run faster?

I have an SQL Query (For SQL Server 2008 R2) that takes a very long time to complete. I was wondering if there was a better way of doing it?
SELECT #count = COUNT(Name)
FROM Table1 t
WHERE t.Name = #name AND t.Code NOT IN (SELECT Code FROM ExcludedCodes)
Table1 has around 90Million rows in it and is indexed by Name and Code.
ExcludedCodes only has around 30 rows in it.
This query is in a stored procedure and gets called around 40k times, the total time it takes the procedure to finish is 27 minutes.. I believe this is my biggest bottleneck because of the massive amount of rows it queries against and the number of times it does it.
So if you know of a good way to optimize this it would be greatly appreciated! If it cannot be optimized then I guess im stuck with 27 min...
EDIT
I changed the NOT IN to NOT EXISTS and it cut the time down to 10:59, so that alone is a massive gain on my part. I am still going to attempt to do the group by statement as suggested below but that will require a complete rewrite of the stored procedure and might take some time... (as I said before, im not the best at SQL but it is starting to grow on me. ^^)

In addition to workarounds to get the query itself to respond faster, have you considered maintaining a column in the table that tells whether it is in this set or not? It requires a lot of maintenance but if the ExcludedCodes table does not change often, it might be better to do that maintenance. For example you could add a BIT column:
ALTER TABLE dbo.Table1 ADD IsExcluded BIT;
Make it NOT NULL and default to 0. Then you could create a filtered index:
CREATE INDEX n ON dbo.Table1(name)
WHERE IsExcluded = 0;
Now you just have to update the table once:
UPDATE t
SET IsExcluded = 1
FROM dbo.Table1 AS t
INNER JOIN dbo.ExcludedCodes AS x
ON t.Code = x.Code;
And ongoing you'd have to maintain this with triggers on both tables. With this in place, your query becomes:
SELECT #Count = COUNT(Name)
FROM dbo.Table1 WHERE IsExcluded = 0;
EDIT
As for "NOT IN being slower than LEFT JOIN" here is a simple test I performed on only a few thousand rows:
EDIT 2
I'm not sure why this query wouldn't do what you're after, and be far more efficient than your 40K loop:
SELECT src.Name, COUNT(src.*)
FROM dbo.Table1 AS src
INNER JOIN #temptable AS t
ON src.Name = t.Name
WHERE src.Code NOT IN (SELECT Code FROM dbo.ExcludedCodes)
GROUP BY src.Name;
Or the LEFT JOIN equivalent:
SELECT src.Name, COUNT(src.*)
FROM dbo.Table1 AS src
INNER JOIN #temptable AS t
ON src.Name = t.Name
LEFT OUTER JOIN dbo.ExcludedCodes AS x
ON src.Code = x.Code
WHERE x.Code IS NULL
GROUP BY src.Name;
I would put money on either of those queries taking less than 27 minutes. I would even suggest that running both queries sequentially will be far faster than your one query that takes 27 minutes.
Finally, you might consider an indexed view. I don't know your table structure and whether your violate any of the restrictions but it is worth investigating IMHO.

You say this gets called around 40K times. WHy? Is it in a cursor? If so do you really need a cursor. Couldn't you put the values you want for #name in a temp table and index it and then join to it?
select t.name, count(t.name)
from table t
join #name n on t.name = n.name
where NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t.code)
group by t.name
That might get you all your results in one query and is almost certainly faster than 40K separate queries. Of course if you need the count of all the names, it's even simpleer
select t.name, count(t.name)
from table t
NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t
group by t.name

NOT EXISTS typically performs better than NOT IN, but you should test it on your system.
SELECT #count = COUNT(Name)
FROM Table1 t
WHERE t.Name = #name AND NOT EXISTS (SELECT 1 FROM ExcludedCodes e WHERE e.Code = t.Code)
Without knowing more about your query it's tough to supply concrete optimization suggestions (i.e. code suitable for copy/paste). Does it really need to run 40,000 times? Sounds like your stored procedure needs reworking, if that's feasible. You could exec the above once at the start of the proc and insert the results in a temp table, which can keep the indexes from Table1, and then join on that instead of running this query.
This particular bit might not even be the bottleneck that makes your query run 27 minutes. For example, are you using a cursor over those 90 million rows, or scalar valued UDFs in your WHERE clauses?

Have you thought about doing the query once and populating the data in a table variable or temp table? Something like
insert into #temp (name, Namecount)
values Name, Count(name)
from table1
where name not in(select code from excludedcodes)
group by name
And don't forget that you could possibly use a filtered index as long as the excluded codes table is somewhat static.

Start evaluating the execution plan. Which is the heaviest part to compute?
Regarding the relation between the two tables, use a JOIN on indexed columns: indexes will optimize query execution.

number of rows in big table

SELECT COUNT(*) FROM BigTable_1
Which way I should to use to get number of rows in table if I have more than 1 billion of rows?
UPDATE: For example, if we have 'a timeout problem' with the query above is there any way to optimize it? How to do it quicker?

If you need an exact count, you have to use COUNT (*)
If you are OK with a rough count, you can use a sum of rows in the partitions
SELECT SUM (Rows)
FROM sys.partitions
WHERE 1=1
And index_id IN (0, 1)
And OBJECT_ID = OBJECT_ID('Database.schema.Table');
If you want to be funny with your COUNT, you can do the following
select COUNT (1/0) from BigTable_1

A very fast ESTIMATE:
select count(*) from table
But don't execute! Highlight the code, press ctl-l to bring up the query plan. Then hover over the leftmost arrow. A yellow box appears with the estimated number of rows.
You can query system tables to get the same data, but that is harder to remember. This way is much more impressive to onlookers.
:)

You can use sys.dm_db_partition_stats.
select sum(row_count)
from sys.dm_db_partition_stats
where object_id = object_id('TableName') and index_id < 2

Depending on your concurrency, speed, and accuracy requirements, you can get an approximate answer with triggers. Create a table
CREATE TABLE TABLE_COUNTS(TABLE_NAME VARCHAR, R_COUNT BIGINT DEFAULT 0);
INSERT INTO TABLE_COUNTS('BigTable_1', 0);
(I'm going to leave out adding a key, etc., for brevity.)
Now set up triggers.
CREATE TRIGGER bt1count_1 AFTER INSERT ON BigTable_1 FOR EACH ROW
BEGIN
UPDATE TABLE_COUNTS SET R_COUNT=R_COUNT+1 WHERE TABLE_NAME='BigTable_1';
END;
A corresponding decrement trigger goes on DELETEs. Now instead of a COUNT, you query the TABLE_COUNT table. Your result will be a little off in the case of pending transactions, but you may be able to live with that. And the cost is amortized over all of the INSERT and DELETE operations; getting the row count when you need it is fast.

Try this:
select sum(P.rows) from sys.partitions P with (nolock)
join sys.tables T with (nolock) on P.object_id = T.object_id
where T.Name = 'Table_1' and index_id = 1
it should be a lot faster. Got it from here: SELECT COUNT(*) FOR BIG TABLE

Your query will get the number of rows regardless of the quantity. Try using the query you listed in your question.

There's only 1 [accurate] way to count the rows in a table: count(*). sp_spaceused or looking at the statistics won't necessarily give you the [a?] correct answer.

if you've got a primary key you should be able to do this:
select count(PrimaryKey) from table_1

Optimising CTE for recursive queries

I have a table with self join. You can think of the structure as standard table to represent organisational hierarchy. Eg table:-
MemberId
MemberName
RelatedMemberId
This table consists of 50000 sample records. I wrote CTE recursive query and it works absolutely fine. However the time it takes to process just 50000 records is round about 3 minutes on my machine (4GB Ram, 2.4 Ghz Core2Duo, 7200 RPM HDD).
How can I possibly improve the performance because 50000 is not so huge number. Over time it will keep on increasing. This is the query which is exactly what I have in my Stored Procedure. The query's purpose is to select all the members that come under a specific member. Eg. Under Owner of the company each and every person comes. For Manager, except Owner all of the records gets returned. I hope you understand the query's purpose.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
Alter PROCEDURE spGetNonVirtualizedData
(
#MemberId int
)
AS
BEGIN
With MembersCTE As
(
Select parent.MemberId As MemberId, 0 as Level
From Members as parent Where IsNull(MemberId,0) = IsNull(#MemberId,0)
Union ALL
Select child.MemberId As MemberId , Level + 1 as Level
From Members as child
Inner Join MembersCTE on MembersCTE.MemberId = child.RelatedMemberId
)
Select Members.*
From MembersCTE
Inner Join Members On MembersCTE.MemberId = Members.MemberId
option(maxrecursion 0)
END
GO
As you can see to improve the performance, I have even made the Joins at the last step while selecting records so that all unnecessary records do not get inserted into temp table. If I made joins in my base step and recursive step of CTE (instead of Select at the last step) the query takes 20 minutes to execute!
MemberId is primary key in the table.
Thanks in advance :)

In your anchor condition you have Where IsNull(MemberId,0) = IsNull(#MemberId,0) I assume this is just because when you pass NULL as a parameter = doesn't work in terms of bringing back IS NULL values. This will cause a scan rather than a seek.
Use WHERE MemberId = #MemberId OR (#MemberId IS NULL AND MemberId IS NULL) instead which is sargable.
Also I'm assuming that you can't have an index on RelatedMemberId. If not you should add one
CREATE NONCLUSTERED INDEX ix_name ON Members(RelatedMemberId) INCLUDE (MemberId)
(though you can skip the included column bit if MemberId is the clustered index key as it will be included automatically)

Need to optimize a nested select statement

I've got the following SQL:
SELECT customfieldvalue.ISSUE
FROM customfieldvalue
WHERE customfieldvalue.STRINGVALUE
IN (SELECT customfieldvalue.STRINGVALUE
FROM customfieldvalue
WHERE customfieldvalue.CUSTOMFIELD = "10670"
GROUP BY customfieldvalue.STRINGVALUE
HAVING COUNT(*) > 1);
The inner nested select returns 3265 rows in 1.5secs on MySQL 5.0.77 when run on its own.
The customfieldvalue table contains 2286831 rows.
I want to return all values of the ISSUE column where the STRINGISSUE column value is not exclusive to that row and the CUSTOMFIELD column contains "10670".
When I try and run the query above, MySQL seems to be stuck. I've left it run for up to a minute, but I'm pretty sure the problem is my query.

Try something along these lines:
SELECT cfv1.ISSUE
COUNT(cfv2.STRINGVALUE) as indicator
FROM customfieldvalue cfv1
INNER JOIN customfieldvalue cfv2
ON cfv1.STRINGVALUE = cfv2.STRINGVALUE AND cfv2.CUSTOMFIELD = "10670"
GROUP BY cfv1.ISSUE
HAVING indicator > 1
This probably doesn't work on copy&paste as I haven't verified it, but in MySQL JOINs are often much much faster than subqueries, even orders of magnitude.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas