Update int column in table with incrementing value - sql

Hi how can i update column with increment values starting with certain value
for example if have table store with below sample data
ProID ProName
1 Pro1
2 Pro2
3 Pro3
etc ..
how can update ProID value with starting value for example 10 then increment the rest of values so it will be
ProID ProName
10 Pro1
11 Pro2
12 Pro3
etc ..

I am going to provide a more generic answer to this question. And, I'm going to assume that ProId has a unique index. So, the obvious solution:
update store
set ProID = ProID + 9;
is not guaranteed to work. It might generate duplicates (if there is already an id = 10). And it won't fill in gaps.
Unfortunately, I think you need to do this in two steps (when there is a unique index). The problem is duplicates as you are updating the table. If this works, then great:
with toupdate as (
select s.*, 9 + row_number() over (order by ProId) as new_ProId
from store
)
update toupdate
set ProId = new_ProId;
However, you might need to do this:
with toupdate as (
select s.*, 9 + row_number() over (order by ProId) as new_ProId
from store
)
update toupdate
set ProId = - new_ProId; -- ensure no duplicates by using a negative sign
update store
set ProId = - ProId; -- get rid of the negative sign
Having said all that, updating the primary key of a table is almost never the right thing to do. Gaps in the value are generally not a problem. You can use row_number() when you query the table to remove the gaps, if that is necessary for some reason.

with cte as
( select prodID, row_number() over (order by prodID) as rn
from table
)
update cte set prodID = rn + 9

Related

SQL Deduplication, populating the duplicates with its unique identifier

I want to be able to populate any duplicate items in a table with its unique identifier. So for example, in the table below;
PROJ00002492 should get a GlobalFamilDupID of PROJ00002492 (itself, aka the ControlNumber column), and all other duplicates should get the same value PROJ00002492.
PROJ00005876 should get the value PROJ00005876 (aka itself, the ControlNumber column).
Code:
update mstr
SET
IsGlobalFamilyUnique = case when (rn > 1) then 0 else 1 end
from(
select
ControlNumber,
MD5hash,
IsglobalFamilyUnique,
GlobalFamilyDupID,
row_number() over (partition by [MD5Hash] order by ID asc) [RN]
from dbo.tblMaster
where NuixGuid = TopLvlGuid and IsGlobalFamilyUnique is null
)mstr
The above code works, but I can't think how to populate the GlobalFamilDupID column? Will I have to do it in a separate query?
You can add the value you need into your SELECT using
FIRST_VALUE(ControlNumber) over (partition by [MD5Hash] order by ID
Then use the value in your UPDATE

Remove duplicate row based on select statement

I have two select statements which is returning duplicated data. What I'm trying to accomplish is to remove a duplicated leg. But I'm having hard times to get to the second row programmatically.
select i.InvID, i.UID, i.StartDate, i.EndDate, i.Minutes,i.ABID from inv_v i, InvoiceLines_v i2 where
i.Period = '2014/08'
and i.EndDate = i2.EndDate
and i.Minutes = i2.Minutes
and i.Uid <> i2.Uid
and i.abid = i2.abid
order by i.EndDate
This select statement returns the following data.
As you can see it returns duplicate rows where minutes are the same ABID is the same but InvID are different. What I need to do is to remove one of the InvID where the criteria matches. Doesn't matter which one.
The second select statement is returning different data.
select i.InvID, i.UID, i.StartDate, i.EndDate, i.Minutes from InvoiceLines_v i, InvoiceLines_v i2 where
i.Period = '2014/08'
and i.EndDate = i2.EndDate
and i.Uid = i2.Uid
and i.Abid <> i2.Abid
and i.Language <> i2.Language
order by i.startdate desc
In this select statement I want to remove an InvID where UID is the same then select the lowest Mintues. In This case, I would remove the following InvIDs: 2537676 , 2537210
My goal is to remove those rows...
I could accomplish this using cursor grab the InvID and remove it by simple delete statement, but I'm trying to stay away from cursors.
Any suggestions on how I can accomplish this?
You can use exists to delete all duplicates except the one with the highest InvID by deleting those rows where another row exists with the same values but with a higher InvID
delete from inv_v
where exists (
select 1 from inv_v i2
where i2.InvID > inv_v.InvID
and i2.minutes = inv_v.minutes
and i2.EndDate = inv_v.EndDate
and i2.abid = inv_v.abid
and i2.uid <> inv_v.uid -- not sure why <> is used here, copied from question
)
I have faced similar problems regarding duplicate data and some one told me to use partition by and other methods but those were causing performance issues
However , I had a primary key in my table through which I was able to select one row from the duplicate data and then delete it.
For example in the first select statement "minutes" and "ABID" are the criteria to consider duplicacy in data.But "Invid" can be used to distinguish between the duplicate rows.
So you can use below query to remove duplicacy.
delete from inv_i where inv_id in (select max(inv_id) from inv_i group by minutes,abid having count(*) > 1 );
This simple concept was helpful to me. It can be helpful in your case if "Inv_id" is unique.
;WITH CTE AS
(
SELECT InvID
,[UID]
,StartDate
,EndDate
,[Minutes]
,ROW_NUMBER() OVER (PARTITION BY InvID, [UID] ORDER BY [Minutes] ASC) rn
FROM InvoiceLines_v
)
SELECT *
FROM CTE
WHERE rn = 1
Replace the ORIGINAL_TABLE with your table name.
QUERY 1:
WITH DUP_TABLE AS
(
SELECT ROW_NUMBER()
OVER (PARTITION BY minutes, ABID ORDER BY minutes, ABID) As ROW_NO
FROM <ORIGINAL_TABLE>
)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;
QUERY 2:
WITH DUP_TABLE AS
(
SELECT ROW_NUMBER()
OVER (PARTITION BY UID ORDER BY minutes) As ROW_NO
FROM <ORIGINAL_TABLE>
)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;

SQL UPDATE row Number

I have a table serviceClusters with a column identity(1590 values). Then I have another table serviceClustersNew with the columns ID, text and comment. In this table, I have some values for text and comment, the ID is always 1. Here an example for the table:
[1, dummy1, hello1;
1, dummy2, hello2;
1, dummy3, hello3;
etc.]
WhaI want now for the values in the column ID is the continuing index of the table serviceClusters plus the current Row number: In our case, this would be 1591, 1592 and 1593.
I tried to solve the problem like this: First I updated the column ID with the maximum value, then I tryed to add the row number, but this doesnt work:
-- Update ID to the maximum value 1590
UPDATE serviceClustersNew
SET ID = (SELECT MAX(ID) FROM serviceClusters);
-- This command returns the correct values 1591, 1592 and 1593
SELECT ID+ROW_NUMBER() OVER (ORDER BY Text_ID) AS RowNumber
FROM serviceClustersNew
-- But I'm not able to update the table with this command
UPDATE serviceClustersNew
SET ID = (SELECT ID+ROW_NUMBER() OVER (ORDER BY Text_ID) AS RowNumber FROM
serviceClustersNew)
By sending the last command, I get the error "Syntax error: Ordered Analytical Functions are not allowed in subqueries.". Do you have any suggestions, how I could solve the problem? I know I could do it with a volatile table or by adding a column, but is there a way without creating a new table / altering the current table?
You have to rewrite it using UPDATE FROM, the syntax is just a bit bulky:
UPDATE serviceClustersNew
FROM
(
SELECT text_id,
(SELECT MAX(ID) FROM serviceClusters) +
ROW_NUMBER() OVER (ORDER BY Text_ID) AS newID
FROM serviceClustersNew
) AS src
SET ID = newID
WHERE serviceClustersNew.Text_ID = src.Text_ID
You are not dealing with a lot of data, so a correlated subquery can serve the same purpose:
UPDATE serviceClustersNew
SET ID = (select max(ID) from serviceClustersNew) +
(select count(*)
from serviceClustersNew scn2
where scn2.Text_Id <= serviceClustersNew.TextId
)
This assumes that the text_id is unique along the rows.
Apparently you can update a base table through a CTE... had no idea. So, just change your last UPDATE statement to this, and you should be good. Just be sure to include any fields in the CTE that you desire to update.
;WITH cte_TEST AS
( SELECT
ID,
ID+ROW_NUMBER() OVER (ORDER BY TEXT_ID) AS RowNumber FROM serviceClustersNew)
UPDATE cte_TEST
SET cte_TEST.ID = cte_TEST.RowNumber
Source:
http://social.msdn.microsoft.com/Forums/sqlserver/en-US/ee06f451-c418-4bca-8288-010410e8cf14/update-table-using-rownumber-over

Update based on subquery fails

I am trying to do the following update in Oracle 10gR2:
update
(select voyage_port_id, voyage_id, arrival_date, port_seq,
row_number() over (partition by voyage_id order by arrival_date) as new_seq
from voyage_port) t
set t.port_seq = t.new_seq
Voyage_port_id is the primary key, voyage_id is a foreign key. I'm trying to assign a sequence number based on the dates within each voyage.
However, the above fails with ORA-01732: data manipulation operation not legal on this view
What is the problem and how can I avoid it ?
Since you can't update subqueries with row_number, you'll have to calculate the row number in the set part of the update. At first I tried this:
update voyage_port a
set a.port_seq = (
select
row_number() over (partition by voyage_id order by arrival_date)
from voyage_port b
where b.voyage_port_id = a.voyage_port_id
)
But that doesn't work, because the subquery only selects one row, and then the row_number() is always 1. Using another subquery allows a meaningful result:
update voyage_port a
set a.port_seq = (
select c.rn
from (
select
voyage_port_id
, row_number() over (partition by voyage_id
order by arrival_date) as rn
from voyage_port b
) c
where c.voyage_port_id = a.voyage_port_id
)
It works, but more complex than I'd expect for this task.
You can update some views, but there are restrictions and one is that the view must not contain analytic functions. See SQL Language Reference on UPDATE and search for first occurence of "analytic".
This will work, provided no voyage visits more than one port on the same day (or the dates include a time component that makes them unique):
update voyage_port vp
set vp.port_seq =
( select count(*)
from voyage_port vp2
where vp2.voyage_id = vp.voyage_id
and vp2.arrival_date <= vp.arrival_date
)
I think this handles the case where a voyage visits more than 1 port per day and there is no time component (though the sequence of ports visited on the same day is then arbitrary):
update voyage_port vp
set vp.port_seq =
( select count(*)
from voyage_port vp2
where vp2.voyage_id = vp.voyage_id
and (vp2.arrival_date <= vp.arrival_date)
or ( vp2.arrival_date = vp.arrival_date
and vp2.voyage_port_id <= vp.voyage_port_id
)
)
Don't think you can update a derived table, I'd rewrite as:
update voyage_port
set port_seq = t.new_seq
from
voyage_port p
inner join
(select voyage_port_id, voyage_id, arrival_date, port_seq,
row_number() over (partition by voyage_id order by arrival_date) as new_seq
from voyage_port) t
on p.voyage_port_id = t.voyage_port_id
The first token after the UPDATE should be the name of the table to update, then your columns-to-update. I'm not sure what you are trying to achieve with the select statement where it is, but you can' update the result set from the select legally.
A version of the sql, guessing what you have in mind, might look like...
update voyage_port t
set t.port_seq = (<select statement that generates new value of port_seq>)
NOTE: to use a select statement to set a value like this you must make sure only 1 row will be returned from the select !
EDIT : modified statement above to reflect what I was trying to explain. The question has been answered very nicely by Andomar above

Get last item in a table - SQL

I have a History Table in SQL Server that basically tracks an item through a process. The item has some fixed fields that don't change throughout the process, but has a few other fields including status and Id which increment as the steps of the process increase.
Basically I want to retrieve the last step for each item given a Batch Reference. So if I do a
Select * from HistoryTable where BatchRef = #BatchRef
It will return all the steps for all the items in the batch - eg
Id Status BatchRef ItemCount
1 1 Batch001 100
1 2 Batch001 110
2 1 Batch001 60
2 2 Batch001 100
But what I really want is:
Id Status BatchRef ItemCount
1 2 Batch001 110
2 2 Batch001 100
Edit: Appologies - can't seem to get the TABLE tags to work with Markdown - followed the help to the letter, and looks fine in the preview
Assuming you have an identity column in the table...
select
top 1 <fields>
from
HistoryTable
where
BatchRef = #BatchRef
order by
<IdentityColumn> DESC
It's kind of hard to make sense of your table design - I think SO ate your delimiters.
The basic way of handling this is to GROUP BY your fixed fields, and select a MAX (or MIN) for some unqiue value (a datetime usually works well). In your case, I think that the GROUP BY would be BatchRef and ItemCount, and Id will be your unique column.
Then, join back to the table to get all columns. Something like:
SELECT *
FROM HistoryTable
JOIN (
SELECT
MAX(Id) as Id.
BatchRef,
ItemCount
FROM HsitoryTable
WHERE
BacthRef = #batchRef
GROUP BY
BatchRef,
ItemCount
) as Latest ON
HistoryTable.Id = Latest.Id
Assuming the Item Ids are incrementally numbered:
--Declare a temp table to hold the last step for each item id
DECLARE #LastStepForEach TABLE (
Id int,
Status int,
BatchRef char(10),
ItemCount int)
--Loop counter
DECLARE #count INT;
SET #count = 0;
--Loop through all of the items
WHILE (#count < (SELECT MAX(Id) FROM HistoryTable WHERE BatchRef = #BatchRef))
BEGIN
SET #count = #count + 1;
INSERT INTO #LastStepForEach (Id, Status, BatchRef, ItemCount)
SELECT Id, Status, BatchRef, ItemCount
FROM HistoryTable
WHERE BatchRef = #BatchRef
AND Id = #count
AND Status =
(
SELECT MAX(Status)
FROM HistoryTable
WHERE BatchRef = #BatchRef
AND Id = #count
)
END
SELECT *
FROM #LastStepForEach
SELECT id, status, BatchRef, MAX(itemcount) AS maxItemcount
FROM HistoryTable GROUP BY id, status, BatchRef
HAVING status > 1
It's a bit hard to decypher your data the way WMD has formatted it, but you can pull of the sort of trick you need with common table expressions on SQL 2005:
with LastBatches as (
select Batch, max(Id)
from HistoryTable
group by Batch
)
select *
from HistoryTable h
join LastBatches b on b.Batch = h.Batch and b.Id = h.Id
Or a subquery (assuming the group by in the subquery works - off the top of my head I don't recall):
select *
from HistoryTable h
join (
select Batch, max(Id)
from HistoryTable
group by Batch
) b on b.Batch = h.Batch and b.Id = h.Id
Edit: I was assuming you wanted the last item for every batch. If you just need it for the one batch then the other answers (doing a top 1 and ordering descending) are the way to go.
As already suggested you probably want to reorder your query to sort it in the other direction so you actually fetch the first row. Then you'd probably want to use something like
SELECT TOP 1 ...
if you're using MSSQL 2k or earlier, or the SQL compliant variant
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY key ASC) AS rownumber,
columns
FROM tablename
) AS foo
WHERE rownumber = n
for any other version (or for other database systems that support the standard notation), or
SELECT ... LIMIT 1 OFFSET 0
for some other variants without the standard SQL support.
See also this question for some additional discussion around selecting rows. Using the aggregate function max() might or might not be faster depending on whether calculating the value requires a table scan.