How can I efficiently do a database massive update?

How can I efficiently do a database massive update? - sql

I have a table with some duplicate entries. I have to discard all but one, and then update this latest one. I've tried with a temporary table and a while statement, in this way:
CREATE TABLE #tmp_ImportedData_GenericData
(
Id int identity(1,1),
tmpCode varchar(255) NULL,
tmpAlpha3Code varchar(50) NULL,
tmpRelatedYear int NOT NULL,
tmpPreviousValue varchar(255) NULL,
tmpGrowthRate varchar(255) NULL
)
INSERT INTO #tmp_ImportedData_GenericData
SELECT
MCS_ImportedData_GenericData.Code,
MCS_ImportedData_GenericData.Alpha3Code,
MCS_ImportedData_GenericData.RelatedYear,
MCS_ImportedData_GenericData.PreviousValue,
MCS_ImportedData_GenericData.GrowthRate
FROM MCS_ImportedData_GenericData
INNER JOIN
(
SELECT CODE, ALPHA3CODE, RELATEDYEAR, COUNT(*) AS NUMROWS
FROM MCS_ImportedData_GenericData AS M
GROUP BY M.CODE, M.ALPHA3CODE, M.RELATEDYEAR
HAVING count(*) > 1
) AS M2 ON MCS_ImportedData_GenericData.CODE = M2.CODE
AND MCS_ImportedData_GenericData.ALPHA3CODE = M2.ALPHA3CODE
AND MCS_ImportedData_GenericData.RELATEDYEAR = M2.RELATEDYEAR
WHERE
(MCS_ImportedData_GenericData.PreviousValue <> 'INDEFINITO')
-- SELECT * from #tmp_ImportedData_GenericData
-- DROP TABLE #tmp_ImportedData_GenericData
DECLARE #counter int
DECLARE #rowsCount int
SET #counter = 1
SELECT #rowsCount = count(*) from #tmp_ImportedData_GenericData
-- PRINT #rowsCount
WHILE #counter < #rowsCount
BEGIN
SELECT
#Code = tmpCode,
#Alpha3Code = tmpAlpha3Code,
#RelatedYear = tmpRelatedYear,
#OldValue = tmpPreviousValue,
#GrowthRate = tmpGrowthRate
FROM
#tmp_ImportedData_GenericData
WHERE
Id = #counter
DELETE FROM MCS_ImportedData_GenericData
WHERE
Code = #Code
AND Alpha3Code = #Alpha3Code
AND RelatedYear = #RelatedYear
AND PreviousValue <> 'INDEFINITO' OR PreviousValue IS NULL
UPDATE
MCS_ImportedData_GenericData
SET
PreviousValue = #OldValue, GrowthRate = #GrowthRate
WHERE
Code = #Code
AND Alpha3Code = #Alpha3Code
AND RelatedYear = #RelatedYear
AND MCS_ImportedData_GenericData.PreviousValue ='INDEFINITO'
SET #counter = #counter + 1
END
but it takes too long time, even if there are just 20000 - 30000 rows to process.
Does anyone has some suggestions in order to improve performance?
Thanks in advance!

WITH q AS (
SELECT m.*, ROW_NUMBER() OVER (PARTITION BY CODE, ALPHA3CODE, RELATEDYEAR ORDER BY CASE WHEN PreviousValue = 'INDEFINITO' THEN 1 ELSE 0 END)
FROM MCS_ImportedData_GenericData m
WHERE PreviousValue <> 'INDEFINITO'
)
DELETE
FROM q
WHERE rn > 1

Quassnoi's answer uses SQL Server 2005+ syntax, so I thought I'd put in my tuppence worth using something more generic...
First, to delete all the duplicates, but not the "original", you need a way of differentiating the duplicate records from each other. (The ROW_NUMBER() part of Quassnoi's answer)
It would appear that in your case the source data has no identity column (you create one in the temp table). If that is the case, there are two choices that come to my mind:
1. Add the identity column to the data, then remove the duplicates
2. Create a "de-duped" set of data, delete everything from the original, and insert the de-deduped data back into the original
Option 1 could be something like...
(With the newly created ID field)
DELETE
[data]
FROM
MCS_ImportedData_GenericData AS [data]
WHERE
id > (
SELECT
MIN(id)
FROM
MCS_ImportedData_GenericData
WHERE
CODE = [data].CODE
AND ALPHA3CODE = [data].ALPHA3CODE
AND RELATEDYEAR = [data].RELATEDYEAR
)
OR...
DELETE
[data]
FROM
MCS_ImportedData_GenericData AS [data]
INNER JOIN
(
SELECT
MIN(id) AS [id],
CODE,
ALPHA3CODE,
RELATEDYEAR
FROM
MCS_ImportedData_GenericData
GROUP BY
CODE,
ALPHA3CODE,
RELATEDYEAR
)
AS [original]
ON [original].CODE = [data].CODE
AND [original].ALPHA3CODE = [data].ALPHA3CODE
AND [original].RELATEDYEAR = [data].RELATEDYEAR
AND [original].id <> [data].id

I don't understand used syntax perfectly enough to post an exact answer, but here's an approach.
Identify rows you want to preserve (eg. select value, ... from .. where ...)
Do the update logic while identifying (eg. select value + 1 ... from ... where ...)
Do insert select to a new table.
Drop the original, rename new to original, recreate all grants/synonyms/triggers/indexes/FKs/... (or truncate the original and insert select from the new)
Obviously this has a prety big overhead, but if you want to update/clear millions of rows, it will be the fastest way.

Related

Using ORDER BY command in an UPDATE SQL

I am looking at updating an item number column from a number of rows in a table (which match a particular Product ID & type) but I want to order the rows by numbers in a 'Seqnumber' column then the item number column start at 1 and count each row sequentially by 1.
Using the code below I have been able to select the product id and type I want and update the item number column with sequentially increasing values, however I don't know how I can order the required rows?
DECLARE #id nvarchar(6)
SET #id = 0
UPDATE Table1
SET #id = Table1.ItemNumber = #id + 1
WHERE Product = '5486' AND Table1.Type = 'C'
I know you can't use the ORDER BY command within the UPDATE command without using it as a sub-query. But I'm not sure how I should include this in the above code?
I think I need to incorporate the code below but not sure what the SELECT statement should be and when I try I can't get it to work?
DECALRE #id nvarchar(6)
SET #id = 0
UPDATE Table1
SET #id = TABLE1.ItemNumber = #id + 1
FROM (SELECT TOP (??)* FROM Table1
WHERE Product = '5486' AND Table1.Type ='C'
ORDER BY Table1.SeqNumber ASC

You should be able to build a CTE with the values you want from the table and just update the CTE
declare #Table1 table (Product varchar(4), [Type] varchar(1), SeqNumber int, ItemNumber int)
INSERT INTO #Table1 VALUES
('5486', 'C', 3, 0),
('5486', 'C', 2, 0);
with cte as (
select *,
ROW_NUMBER() OVER (ORDER BY SeqNumber) rn
from #Table1
where Product = '5486' and Type ='C'
)
Update cte
set ItemNumber = rn

This should work:
SET #id = 0;
UPDATE Table1
SET #id = Table1.ItemNumber = (#id := #id + 1)
WHERE Product = 5486 and Table1.Type = 'C'
ORDER BY Table1.seqnumber;
You cannot use ORDER BY with a JOIN, so you need to initialize the variable before the update.

UPDATE sales_data
SET ID = ID + 1
ORDER BY ID DESC;

How to combine the values of the same field from several rows into one string in a one-to-many select?

Imagine the following two tables:
create table MainTable (
MainId integer not null, -- This is the index
Data varchar(100) not null
)
create table OtherTable (
MainId integer not null, -- MainId, Name combined are the index.
Name varchar(100) not null,
Status tinyint not null
)
Now I want to select all the rows from MainTable, while combining all the rows that match each MainId from OtherTable into a single field in the result set.
Imagine the data:
MainTable:
1, 'Hi'
2, 'What'
OtherTable:
1, 'Fish', 1
1, 'Horse', 0
2, 'Fish', 0
I want a result set like this:
MainId, Data, Others
1, 'Hi', 'Fish=1,Horse=0'
2, 'What', 'Fish=0'
What is the most elegant way to do this?
(Don't worry about the comma being in front or at the end of the resulting string.)

There is no really elegant way to do this in Sybase. Here is one method, though:
select
mt.MainId,
mt.Data,
Others = stuff((
max(case when seqnum = 1 then ','+Name+'='+cast(status as varchar(255)) else '' end) +
max(case when seqnum = 2 then ','+Name+'='+cast(status as varchar(255)) else '' end) +
max(case when seqnum = 3 then ','+Name+'='+cast(status as varchar(255)) else '' end)
), 1, 1, '')
from MainTable mt
left outer join
(select
ot.*,
row_number() over (partition by MainId order by status desc) as seqnum
from OtherTable ot
) ot
on mt.MainId = ot.MainId
group by
mt.MainId, md.Data
That is, it enumerates the values in the second table. It then does conditional aggregation to get each value, using the stuff() function to handle the extra comma. The above works for the first three values. If you want more, then you need to add more clauses.

Well, here is how I implemented it in Sybase 13.x. This code has the advantage of not being limited to a number of Names.
create proc
as
declare
#MainId int,
#Name varchar(100),
#Status tinyint
create table #OtherTable (
MainId int not null,
CombStatus varchar(250) not null
)
declare OtherCursor cursor for
select
MainId, Name, Status
from
Others
open OtherCursor
fetch OtherCursor into #MainId, #Name, #Status
while (##sqlstatus = 0) begin -- run until there are no more
if exists (select 1 from #OtherTable where MainId = #MainId) begin
update #OtherTable
set CombStatus = CombStatus + ','+#Name+'='+convert(varchar, Status)
where
MainId = #MainId
end else begin
insert into #OtherTable (MainId, CombStatus)
select
MainId = #MainId,
CombStatus = #Name+'='+convert(varchar, Status)
end
fetch OtherCursor into #MainId, #Name, #Status
end
close OtherCursor
select
mt.MainId,
mt.Data,
ot.CombStatus
from
MainTable mt
left join #OtherTable ot
on mt.MainId = ot.MainId
But it does have the disadvantage of using a cursor and a working table, which can - at least with a lot of data - make the whole process slow.

Update int column in table with unique incrementing values

I am trying to populate any rows missing a value in their InterfaceID (INT) column with a unique value per row.
I'm trying to do this query:
UPDATE prices SET interfaceID = (SELECT ISNULL(MAX(interfaceID),0) + 1 FROM prices)
WHERE interfaceID IS null
I was hoping the the (SELECT ISNULL(MAX(interfaceID),0) + 1 FROM prices) would be evaluated for every row, but its only done once and so all my affected rows are getting the same value instead of different values.
Can this be done in a single query?

declare #i int = (SELECT ISNULL(MAX(interfaceID),0) + 1 FROM prices)
update prices
set interfaceID = #i , #i = #i + 1
where interfaceID is null
should do the work

DECLARE #IncrementValue int
SET #IncrementValue = 0
UPDATE Samples SET qty = #IncrementValue,#IncrementValue=#IncrementValue+1

simple query would be, just set a variable to some number you want.
then update the column you need by incrementing 1 from that number. for all the rows it'll update each row id by incrementing 1
SET #a = 50000835 ;
UPDATE `civicrm_contact` SET external_identifier = #a:=#a+1
WHERE external_identifier IS NULL;

For Postgres
ALTER TABLE table_name ADD field_name serial PRIMARY KEY
REFERENCE: https://www.tutorialspoint.com/postgresql/postgresql_using_autoincrement.htm

Assuming that you have a primary key for this table (you should have), as well as using a CTE or a WITH, it is also possible to use an update with a self-join to the same table:
UPDATE a
SET a.interfaceId = b.sequence
FROM prices a
INNER JOIN
(
SELECT ROW_NUMBER() OVER ( ORDER BY b.priceId ) + ( SELECT MAX( interfaceId ) + 1 FROM prices ) AS sequence, b.priceId
FROM prices b
WHERE b.interfaceId IS NULL
) b ON b.priceId = a.priceId
I have assumed that the primary key is price-id.
The derived table, alias b, is used to generated the sequence via the ROW_NUMBER() function together with the primary key column(s). For each row where the column interface-id is NULL, this will generate a row with a unique sequence value together with the primary key value.
It is possible to order the sequence in some other order rather than the primary key.
The sequence is offset by the current MAX interface-id + 1 via a sub-query. The MAX() function ignores NULL values.
The WHERE clause limits the update to those rows that are NULL.
The derived table is then joined to the same table, alias a, joining on the primary key column(s) with the column to be updated set to the generated sequence.

In oracle-based products you may use the following statement:
update table set interfaceID=RowNum where condition;

Try something like this:
with toupdate as (
select p.*,
(coalesce(max(interfaceid) over (), 0) +
row_number() over (order by (select NULL))
) as newInterfaceId
from prices
)
update p
set interfaceId = newInterfaceId
where interfaceId is NULL
This doesn't quite make them consecutive, but it does assign new higher ids. To make them consecutive, try this:
with toupdate as (
select p.*,
(coalesce(max(interfaceid) over (), 0) +
row_number() over (partition by interfaceId order by (select NULL))
) as newInterfaceId
from prices
)
update p
set interfaceId = newInterfaceId
where interfaceId is NULL

My table has 500 million records. The below code worked for me.
-- update rows using a CTE - Ervin Steckl
;WITH a AS(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as rn, id
FROM accounts2
)
UPDATE a SET id=rn
OPTION (MAXDOP 1)
https://www.mssqltips.com/sqlservertip/1467/populate-a-sql-server-column-with-a-sequential-number-not-using-an-identity/

You can try :
DECLARE #counter int
SET #counter = 0
UPDATE [table]
SET [column] = #counter, #counter = #counter + 1```

SQL Server 2000 : generating and incrementing data from column conditionally without using CURSOR

:)
Is there any way to create an index, and incrementing with a given condition, but without CURSOR handling/usage
For example:
The condition in my case is that: "if the current color (this is the item to be checked) is the same as the last one: not increment, otherwise increment in one unit"
This must be in a SQL query with no CURSOR USAGE and of course a good time (work with ... 10000 rows at least)
Thanks in advance.
EDIT: I forgot to mention that NEW_INDEX Column doesn't exist. It must be generated with the with the query.
EDIT2: Is there a way that only make use of SELECT/INSERT/UPDATE statements? (not set, declare...)

Assume a table called Colors with fields ID, Color, and ColorIndex of types int, varchar, and int respectively. I also assume the OP means prev / after based on an ordering of the ID field in asc order.
You could do this without a cursor, and use a while loop...but it definately isn't set based:
DECLARE #MyID int
DECLARE #CurrentIndex int
DECLARE #CurrentColor varchar(50)
DECLARE #PreviousColor varchar(50)
SET #CurrentIndex = (SELECT 0)
SET #MyID = (SELECT TOP 1 ID FROM Colors ORDER BY ID ASC)
SET #CurrentColor = (SELECT '')
SET #PreviousColor = (SELECT Color FROM Colors WHERE ID = #MyID)
WHILE (#MyID IS NOT NULL)
BEGIN
IF (#CurrentColor <> #PreviousColor)
BEGIN
SET #PreviousColor = (SELECT Color FROM Colors WHERE ID = #MyID)
SET #CurrentIndex = (SELECT #CurrentIndex + 1)
UPDATE Colors SET ColorIndex = #CurrentIndex WHERE ID = #MyID
END
ELSE
BEGIN
UPDATE Colors SET ColorIndex = #CurrentIndex WHERE ID = #MyID
SET #PreviousColor = (SELECT Color FROM Colors WHERE ID = #MyID)
END
SET #MyID = (SELECT TOP 1 ID FROM Colors WHERE ID > #MyID ORDER BY ID ASC)
SET #CurrentColor = (SELECT Color FROM Colors WHERE ID = #MyID)
END
The result after execution:
Performance wasn't too shabby as long as ID and color are indexed. The plus side is it is a bit faster then using a regular old CURSOR and it's not as evil. Solution supports SQL 2000, 2005, and 2008 (being that you are using SQL 2000 which did not support CTEs).

declare #ID int,
#MaxID int,
#NewIndex int,
#PrevCol varchar(50)
select #ID = min(ID),
#MaxID = max(ID),
#PrevCol = '',
#NewIndex = 0
from YourTable
while #ID <= #MaxID
begin
select #NewIndex = case when Colour = #PrevCol
then #NewIndex
else #NewIndex + 1
end,
#PrevCol = Colour
from YourTable
where ID = #ID
update YourTable
set NewIndex = #NewIndex
where ID = #ID
set #ID = #ID + 1
end
https://data.stackexchange.com/stackoverflow/q/122958/

select
IDENTITY(int,1,1) as COUNTER
,c1.ID
into
#temp
from
CUSTOMERS c1
left outer join (
select
c1.ID, max(p.ID) as PRV_ID
from
CUSTOMERS c1,
(
select
ID
from
CUSTOMERS
) p
where
c1.ID > p.ID
group by
c1.ID
) k on k.ID = c1.ID
left outer join CUSTOMERS p on p.ID = k.PRV_ID
where
((c1.FAVOURITE_COLOUR < p.FAVOURITE_COLOUR)
or
(c1.FAVOURITE_COLOUR > p.FAVOURITE_COLOUR)
or
p.FAVOURITE_COLOUR is null)
update
CUSTOMERS
set
NEW_INDEX = i.COUNTER
--select *
from
CUSTOMERS
inner join (
select
c1.ID, max(t.COUNTER) as COUNTER
from
CUSTOMERS c1,
(
select
ID
,COUNTER
from
#temp
) t
where
c1.ID >= t.ID
group by
c1.ID
) i on i.ID = CUSTOMERS.ID
drop table #temp
select * from CUSTOMERS

Stored Procedure and output parameter from paging script (SQL Server 2008)

I have the below stored procedure and would like to only have one SQL statement. At the moment you can see there are two statements, one for the actual paging and one for a count of the total records which needs to be return to my app for paging.
However, the below is inefficient as I am getting the total rows from the first query:
COUNT(*) OVER(PARTITION BY 1) as TotalRows
How can I set TotalRows as my output parameter?
ALTER PROCEDURE [dbo].[Nop_LoadAllOptimized]
(
#PageSize int = null,
#PageNumber int = null,
#WarehouseCombinationID int = null,
#CategoryId int = null,
#OrderBy int = null,
#TotalRecords int = null OUTPUT
)
AS
BEGIN
WITH Paging AS (
SELECT rn = (ROW_NUMBER() OVER (
ORDER BY
CASE WHEN #OrderBy = 0 AND #CategoryID IS NOT NULL AND #CategoryID > 0
THEN pcm.DisplayOrder END ASC,
CASE WHEN #OrderBy = 0
THEN p.[Name] END ASC,
CASE WHEN #OrderBy = 5
THEN p.[Name] END ASC,
CASE WHEN #OrderBy = 10
THEN wpv.Price END ASC,
CASE WHEN #OrderBy = 15
THEN wpv.Price END DESC,
CASE WHEN #OrderBy = 20
THEN wpv.Price END DESC,
CASE WHEN #OrderBy = 25
THEN wpv.UnitPrice END ASC
)),COUNT(*) OVER(PARTITION BY 1) as TotalRows, p.*, pcm.DisplayOrder, wpv.Price, wpv.UnitPrice FROM Nop_Product p
INNER JOIN Nop_Product_Category_Mapping pcm ON p.ProductID=pcm.ProductID
INNER JOIN Nop_ProductVariant pv ON p.ProductID = pv.ProductID
INNER JOIN Nop_ProductVariant_Warehouse_Mapping wpv ON pv.ProductVariantID = wpv.ProductVariantID
WHERE pcm.CategoryID = #CategoryId
AND (wpv.Published = 1 AND pv.Published = 1 AND p.Published = 1 AND p.Deleted = 0 AND pv.Deleted = 0 and wpv.Deleted = 0)
AND wpv.WarehouseID IN (select WarehouseID from Nop_WarehouseCombination where UserWarehouseCombinationID = #WarehouseCombinationID)
)
SELECT TOP (#PageSize) * FROM Paging PG
WHERE PG.rn > (#PageNumber * #PageSize) - #PageSize
SELECT #TotalRecords = COUNT(p.ProductId) FROM Nop_Product p
INNER JOIN Nop_Product_Category_Mapping pcm ON p.ProductID=pcm.ProductID
INNER JOIN Nop_ProductVariant pv ON p.ProductID = pv.ProductID
INNER JOIN Nop_ProductVariant_Warehouse_Mapping wpv ON pv.ProductVariantID = wpv.ProductVariantID
WHERE pcm.CategoryID = #CategoryId
AND (wpv.Published = 1 AND pv.Published = 1 AND p.Published = 1 AND p.Deleted = 0 AND pv.Deleted = 0 and wpv.Deleted = 0)
AND wpv.WarehouseID IN (select WarehouseID from Nop_WarehouseCombination where UserWarehouseCombinationID = #WarehouseCombinationID)
END

I think I understand your issue here. Have you considered that the Count could be done BEFORE the CTE
and then passed in as value to the CTE as a variable.
i.e, set the value for #TotalRecords up front, pass it in, and so the CTE will use this count rather than executing the count a second time?
Does this make sense, or have I missed your point here.

no problem friend, highly possible i missed a trick here. However without the schema and data its tricky to test what I am suggesting. In the absence of someone giving a better answer, I've put this test script with data together to demo what I am talking about. If this isn't what you want then no problem. If it is just plain missing the point again, then I'll take that on the chin.
Declare #pagesize as int
Declare #PageNumber as int
Declare #TotalRowsOutputParm as int
SET #pagesize = 3
SET #PageNumber = 2;
--create some test data
DECLARE #SomeData table
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[SomeValue] [nchar](10) NULL
)
INSERT INTO #SomeData VALUES ('TEST1')
INSERT INTO #SomeData VALUES ('TEST2')
INSERT INTO #SomeData VALUES ('TEST3')
INSERT INTO #SomeData VALUES ('TEST4')
INSERT INTO #SomeData VALUES ('TEST5')
INSERT INTO #SomeData VALUES ('TEST6')
INSERT INTO #SomeData VALUES ('TEST7')
INSERT INTO #SomeData VALUES ('TEST8')
INSERT INTO #SomeData VALUES ('TEST9')
INSERT INTO #SomeData VALUES ('TEST10');
--Get total count of all rows
Set #TotalRowsOutputParm = (SELECT COUNT(SomeValue) FROM #SomeData p) ;
WITH Paging AS
(
SELECT rn = (ROW_NUMBER() OVER (ORDER BY SomeValue ASC)),
#TotalRowsOutputParm as TotalRows, p.*
FROM [SomeData] p
)
SELECT TOP (#PageSize) * FROM Paging PG
WHERE PG.rn > (#PageNumber * #PageSize) - #PageSize
PRINT #TotalRowsOutputParm

I don't think you can do it without running the query twice if you want to assign it to a variable
however, can't you just add another column and do something like this instead?
;WITH Paging AS (select *,ROW_NUMBER() OVER(ORDER BY name) AS rn FROM sysobjects)
SELECT (SELECT MAX(rn) FROM Paging) AS TotalRecords,* FROM Paging
WHERE rn < 10
Or in your case
SELECT TOP (#PageSize) *,(SELECT MAX(PG.rn) FROM Paging) AS TotalRecords
FROM Paging PG
WHERE PG.rn > (#PageNumber * #PageSize) - #PageSize
Then from the front end grab that column

In the end I decided just to use two different SQL statements, one for count, one for select.
The "COUNT(*) OVER(PARTITION BY 1) as TotalRows" actually was pretty expensive and it turned out much quicker to just use two different statements.
Thank you everyone who helped with this question.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How can I efficiently do a database massive update? - sql

WITH q AS ( SELECT m.*, ROW_NUMBER() OVER (PARTITION BY CODE, ALPHA3CODE, RELATEDYEAR ORDER BY CASE WHEN PreviousValue = 'INDEFINITO' THEN 1 ELSE 0 END) FROM MCS_ImportedData_GenericData m WHERE PreviousValue <> 'INDEFINITO' ) DELETE FROM q WHERE rn > 1

Related

Using ORDER BY command in an UPDATE SQL

How to combine the values of the same field from several rows into one string in a one-to-many select?

Update int column in table with unique incrementing values

SQL Server 2000 : generating and incrementing data from column conditionally without using CURSOR

Stored Procedure and output parameter from paging script (SQL Server 2008)

Categories

Resources