SQL UPDATE row Number - sql

I have a table serviceClusters with a column identity(1590 values). Then I have another table serviceClustersNew with the columns ID, text and comment. In this table, I have some values for text and comment, the ID is always 1. Here an example for the table:
[1, dummy1, hello1;
1, dummy2, hello2;
1, dummy3, hello3;
etc.]
WhaI want now for the values in the column ID is the continuing index of the table serviceClusters plus the current Row number: In our case, this would be 1591, 1592 and 1593.
I tried to solve the problem like this: First I updated the column ID with the maximum value, then I tryed to add the row number, but this doesnt work:
-- Update ID to the maximum value 1590
UPDATE serviceClustersNew
SET ID = (SELECT MAX(ID) FROM serviceClusters);
-- This command returns the correct values 1591, 1592 and 1593
SELECT ID+ROW_NUMBER() OVER (ORDER BY Text_ID) AS RowNumber
FROM serviceClustersNew
-- But I'm not able to update the table with this command
UPDATE serviceClustersNew
SET ID = (SELECT ID+ROW_NUMBER() OVER (ORDER BY Text_ID) AS RowNumber FROM
serviceClustersNew)
By sending the last command, I get the error "Syntax error: Ordered Analytical Functions are not allowed in subqueries.". Do you have any suggestions, how I could solve the problem? I know I could do it with a volatile table or by adding a column, but is there a way without creating a new table / altering the current table?

You have to rewrite it using UPDATE FROM, the syntax is just a bit bulky:
UPDATE serviceClustersNew
FROM
(
SELECT text_id,
(SELECT MAX(ID) FROM serviceClusters) +
ROW_NUMBER() OVER (ORDER BY Text_ID) AS newID
FROM serviceClustersNew
) AS src
SET ID = newID
WHERE serviceClustersNew.Text_ID = src.Text_ID

You are not dealing with a lot of data, so a correlated subquery can serve the same purpose:
UPDATE serviceClustersNew
SET ID = (select max(ID) from serviceClustersNew) +
(select count(*)
from serviceClustersNew scn2
where scn2.Text_Id <= serviceClustersNew.TextId
)
This assumes that the text_id is unique along the rows.

Apparently you can update a base table through a CTE... had no idea. So, just change your last UPDATE statement to this, and you should be good. Just be sure to include any fields in the CTE that you desire to update.
;WITH cte_TEST AS
( SELECT
ID,
ID+ROW_NUMBER() OVER (ORDER BY TEXT_ID) AS RowNumber FROM serviceClustersNew)
UPDATE cte_TEST
SET cte_TEST.ID = cte_TEST.RowNumber
Source:
http://social.msdn.microsoft.com/Forums/sqlserver/en-US/ee06f451-c418-4bca-8288-010410e8cf14/update-table-using-rownumber-over

Related

Big query De-duplication query is not working properly

anyone please tell me the below query is not working properly, It suppose to delete the duplicate records only and keep the one of them (latest record) but it is deleting all the record instead of keeping one of the duplicate records, why is it so?
delete
from
dev_rahul.page_content_insights
where
(sha_id,
etl_start_utc_dttm) in (
select
(a.sha_id,
a.etl_start_utc_dttm)
from
(
select
sha_id,
etl_start_utc_dttm,
ROW_NUMBER() over (Partition by sha_id
order by
etl_start_utc_dttm desc) as rn
from
dev_rahul.page_content_insights
where
(snapshot_dt) >= '2021-03-25' ) a
where
a.rn <> 1)
Query looks ok, though I don't use that syntax for cleaning up duplicates.
Can I confirm the following:
sha_id, etl_start_utc_dttm is your primary key?
You wish to keep sha_id and the latest row based on etl_start_utc_dttm field descending?
If so, try this two query pattern:
create or replace table dev_rahul.rows_not_to_delete as
SELECT col.* FROM (SELECT ARRAY_AGG(pci ORDER BY etl_start_utc_dttm desc LIMIT 1
) OFFSET(0)] col
FROM dev_rahul.page_content_insights pci
where snapshot_dt >= '2021-03-25' )
GROUP BY sha_id
);
delete dev_rahul.page_content_insights p
where not exists (select 1 from DW_pmo.rows_not_to_delete d
where p.sha_id = d.sha_id and p.etl_start_utc_dttm = d.etl_start_utc_dttm
) and snapshot_dt >= '2021-03-25';
You could do this in a singe query by putting the first statement into a CTE.

SQL Deduplication, populating the duplicates with its unique identifier

I want to be able to populate any duplicate items in a table with its unique identifier. So for example, in the table below;
PROJ00002492 should get a GlobalFamilDupID of PROJ00002492 (itself, aka the ControlNumber column), and all other duplicates should get the same value PROJ00002492.
PROJ00005876 should get the value PROJ00005876 (aka itself, the ControlNumber column).
Code:
update mstr
SET
IsGlobalFamilyUnique = case when (rn > 1) then 0 else 1 end
from(
select
ControlNumber,
MD5hash,
IsglobalFamilyUnique,
GlobalFamilyDupID,
row_number() over (partition by [MD5Hash] order by ID asc) [RN]
from dbo.tblMaster
where NuixGuid = TopLvlGuid and IsGlobalFamilyUnique is null
)mstr
The above code works, but I can't think how to populate the GlobalFamilDupID column? Will I have to do it in a separate query?
You can add the value you need into your SELECT using
FIRST_VALUE(ControlNumber) over (partition by [MD5Hash] order by ID
Then use the value in your UPDATE

Update int column in table with incrementing value

Hi how can i update column with increment values starting with certain value
for example if have table store with below sample data
ProID ProName
1 Pro1
2 Pro2
3 Pro3
etc ..
how can update ProID value with starting value for example 10 then increment the rest of values so it will be
ProID ProName
10 Pro1
11 Pro2
12 Pro3
etc ..
I am going to provide a more generic answer to this question. And, I'm going to assume that ProId has a unique index. So, the obvious solution:
update store
set ProID = ProID + 9;
is not guaranteed to work. It might generate duplicates (if there is already an id = 10). And it won't fill in gaps.
Unfortunately, I think you need to do this in two steps (when there is a unique index). The problem is duplicates as you are updating the table. If this works, then great:
with toupdate as (
select s.*, 9 + row_number() over (order by ProId) as new_ProId
from store
)
update toupdate
set ProId = new_ProId;
However, you might need to do this:
with toupdate as (
select s.*, 9 + row_number() over (order by ProId) as new_ProId
from store
)
update toupdate
set ProId = - new_ProId; -- ensure no duplicates by using a negative sign
update store
set ProId = - ProId; -- get rid of the negative sign
Having said all that, updating the primary key of a table is almost never the right thing to do. Gaps in the value are generally not a problem. You can use row_number() when you query the table to remove the gaps, if that is necessary for some reason.
with cte as
( select prodID, row_number() over (order by prodID) as rn
from table
)
update cte set prodID = rn + 9

SQL Server CTE - SELECT after UPDATE using OUTPUT INSERTED

I've seen a few posts on using CTE (WITH) that I thought would address my issue but I can't seem to make it work for my specific use case. My use case is that I have a table with a series of records, and I need to pull some number of records AFTER a small update has been made to them.
i.e.
- retrieve records where a series of conditions are met
- update one or more columns in each of those records
- return the updated records
I know I can return the IDs of the records using the following:
WITH cte AS
( SELECT TOP 1 * FROM msg
WHERE guid = 'abcd'
AND active = 1
ORDER BY created DESC )
UPDATE cte SET active = 0
OUTPUT INSERTED.msg_id
WHERE guid = 'abcd'
That nicely returns the msg_id field. I tried wrapping all of that in a SELECT * FROM msg WHERE msg_id IN () query, but it fails.
Anyone have a suggestion? For reference, using SQL Server 2008 R2.
CREATE TABLE #t (msg_id int)
;
WITH cte AS
( SELECT TOP 1 * FROM msg
WHERE guid = 'abcd'
AND active = 1
ORDER BY created DESC )
UPDATE cte SET active = 0
OUTPUT INSERTED.msg_id INTO #t
WHERE guid = 'abcd'
SELECT *
FROM #t
You can select the data that you need by just adding all columns that you want. INSERTED contains all columns, not just the ones written to. You can also output columns from the cte alias. Example:
OUTPUT INSERTED.SomeOtherColumn, cte.SomeOtherColumn

Update based on subquery fails

I am trying to do the following update in Oracle 10gR2:
update
(select voyage_port_id, voyage_id, arrival_date, port_seq,
row_number() over (partition by voyage_id order by arrival_date) as new_seq
from voyage_port) t
set t.port_seq = t.new_seq
Voyage_port_id is the primary key, voyage_id is a foreign key. I'm trying to assign a sequence number based on the dates within each voyage.
However, the above fails with ORA-01732: data manipulation operation not legal on this view
What is the problem and how can I avoid it ?
Since you can't update subqueries with row_number, you'll have to calculate the row number in the set part of the update. At first I tried this:
update voyage_port a
set a.port_seq = (
select
row_number() over (partition by voyage_id order by arrival_date)
from voyage_port b
where b.voyage_port_id = a.voyage_port_id
)
But that doesn't work, because the subquery only selects one row, and then the row_number() is always 1. Using another subquery allows a meaningful result:
update voyage_port a
set a.port_seq = (
select c.rn
from (
select
voyage_port_id
, row_number() over (partition by voyage_id
order by arrival_date) as rn
from voyage_port b
) c
where c.voyage_port_id = a.voyage_port_id
)
It works, but more complex than I'd expect for this task.
You can update some views, but there are restrictions and one is that the view must not contain analytic functions. See SQL Language Reference on UPDATE and search for first occurence of "analytic".
This will work, provided no voyage visits more than one port on the same day (or the dates include a time component that makes them unique):
update voyage_port vp
set vp.port_seq =
( select count(*)
from voyage_port vp2
where vp2.voyage_id = vp.voyage_id
and vp2.arrival_date <= vp.arrival_date
)
I think this handles the case where a voyage visits more than 1 port per day and there is no time component (though the sequence of ports visited on the same day is then arbitrary):
update voyage_port vp
set vp.port_seq =
( select count(*)
from voyage_port vp2
where vp2.voyage_id = vp.voyage_id
and (vp2.arrival_date <= vp.arrival_date)
or ( vp2.arrival_date = vp.arrival_date
and vp2.voyage_port_id <= vp.voyage_port_id
)
)
Don't think you can update a derived table, I'd rewrite as:
update voyage_port
set port_seq = t.new_seq
from
voyage_port p
inner join
(select voyage_port_id, voyage_id, arrival_date, port_seq,
row_number() over (partition by voyage_id order by arrival_date) as new_seq
from voyage_port) t
on p.voyage_port_id = t.voyage_port_id
The first token after the UPDATE should be the name of the table to update, then your columns-to-update. I'm not sure what you are trying to achieve with the select statement where it is, but you can' update the result set from the select legally.
A version of the sql, guessing what you have in mind, might look like...
update voyage_port t
set t.port_seq = (<select statement that generates new value of port_seq>)
NOTE: to use a select statement to set a value like this you must make sure only 1 row will be returned from the select !
EDIT : modified statement above to reflect what I was trying to explain. The question has been answered very nicely by Andomar above