Assign explicit version to existing rows of the table - sql

I have a table where records are inserted and updated. In case of updates, a new row is inserted into the table. In order to track updates for a given record, there's a column added to the table called root_record_id which holds the id of the very first record in the update chain.
For eg: Consider the record table schema as follows:
id
root_record_id
other columns
1
1
...
2
2
...
3
1
...
4
1
...
5
2
...
In this case, a record with id=1 was inserted, which was then updated to id=3 and then to id=4. Similarly the record with id=2 was inserted and then updated to id=5.
I want to add a version column to this table, where version is incremented on each update and starts with 0.
id
root_record_id
version
other columns
1
1
0
...
2
2
0
...
3
1
1
...
4
1
2
...
5
2
1
...
I tried writing queries using group by clause on root_record_id but failed to accomplish the task.

If you are looking for the general sequence on how to add the column and then pre-fill the values, then follow this fiddle: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=5a04b49fbda3883a9605f5482e252a1b
Add the version column allowing nulls:
ALTER TABLE Records ADD version int null;
Update the version according to your logic:
UPDATE Records
SET version = lkp.version
FROM Records r
INNER JOIN (
SELECT Id, COUNT(root_record_id) OVER (partition by root_record_id ORDER BY id ASC)-1 as version
FROM Records
) lkp ON r.Id = lkp.Id;
Alter the version column to NOT allow nulls
ALTER TABLE Records ALTER COLUMN version int not null;
Finally, ensure that you increment the version column during new row inserts.

DBFIDDLE
This query produces the version that you can use (in an update, or in a trigger):
SELECT
id,
root_record_id,
RANK() OVER (partition by root_record_id ORDER BY id ASC)-1 version
FROM table1
ORDER BY id;
output:
id
root_record_id
version
1
1
0
2
2
0
3
1
1
4
1
2
5
2
1

Related

Updating uniqueidentifier column with same value for rows with matching column value

I need a little help. I have this (simplified) table:
ID
Title
Subtype
RelatedUniqueID
1
My Title 1
1
NULL
2
My Title 2
1
NULL
3
My Title 3
2
NULL
4
My Title 4
2
NULL
5
My Title 5
2
NULL
6
My Title 6
3
NULL
What I am trying to accomplish is generating the same uniqueidentifier for all rows having the same subtype.
So result would be this:
ID
Title
Subtype
RelatedUniqueID
1
My Title 1
1
439753d3-9103-4d0e-9dd0-569dc71fd6a3
2
My Title 2
1
439753d3-9103-4d0e-9dd0-569dc71fd6a3
3
My Title 3
2
d0f08203-1197-4cc7-91bb-c4ca34d7cb0a
4
My Title 4
2
d0f08203-1197-4cc7-91bb-c4ca34d7cb0a
5
My Title 5
2
d0f08203-1197-4cc7-91bb-c4ca34d7cb0a
6
My Title 6
3
055838c6-a814-4bd1-a859-63d4544bb449
Requirements
One query to update all rows at once
The actual table has many more rows with hundreds of subtypes, so manually building a query for each subtype is not an option
Using SQL Server 2017
Thanks for any assist.
Because newid() is applied per-row, you have to generate the values first, so this has to involve the use of a temporary or permanent table to store the correlated ID>Subtype value.
So first you need to generate the GUID values per Subtype :
with subtypes as (
select distinct subtype
from t
)
select Subtype, NewId() RelatedId into #Id
from subtypes
And then you can use an updatable CTE to apply these to your base table:
with r as (
select t.*, id.RelatedId
from #id id
join t on t.subtype=id.Subtype
)
update r
set relatedUniqueId=RelatedId
See example DB<>Fiddle
You can use an updatable CTE with a window function to get this data:
with r as (
select t.*,
RelatedId = first_value(newid()) over (partition by t.Subtype order by ID rows unbounded preceding)
from t
)
update r
set relatedUniqueId = RelatedId;
db<>fiddle
I warn though, that newid() is somewhat unpredictable in when it is calculated, so don't try messing about with a joined update (unless you pre-save the IDs like #Stu has done).
For example, see this fiddle, the IDs were calculated differently for every row.
I have found the single query solution.
Pre-requirement for this to work is that RelatedUniqueID must already contain random values. (e.g. set default field value to newid)
UPDATE TestTable SET ForeignUniqueID = TG.ForeignUniqueID FROM TestTable TG INNER JOIN TestTable ON TestTable.SubType = TG.SubType
Update
As Stu mentions in the comments, this solution might affect performance on large datasets. Please keep that in mind.

SQL Server ID reseed after deleltion of a middle record

i have a Table from which i delete records .
The problem is when i delete a certain record,its ID flies away too, so the ID order is no longer respected within the table.
What i want is a SQL Server Procedure to rearrange records after the deletion of one of them.
Example :
ID ID ID
1 1 1
2 I delete record 2, i want to have this ===> 2 and NOT this : 3
3 3 4
4 4 5
5
You don't want to do this. The id should be a field that has no meaning other than identifying a row. You might have other tables that refer to the id and they would break.
Instead, just recalculate a sequential value when you query the table:
select t.*, row_number() over (order by id) as seqnum
from t;

Easiest way to update the ids of rows in sql server?

the primary key ID values in this table are being used in our 2 systems that were recently merged, however there is a large number of items in one of the systems that are pointing to the wrong id values, i need to update the ID(PK) values so that the 6 million existing items will be pointing to the correct row.
id like to update the id columns to the following:
ID
1 to 5
2 to 6
3 to 7
4 to 1
5 to 2
6 to 3
7 to 4
Well, assuming it is not an IDENTITY column (in which case you'll need to set IDENTITY_INSERT to on) then the following should work (see SQLFiddle for example)
UPDATE MyTable
SET ID =
CASE WHEN ID >= 4 SET ID - 3
ELSE ID + 4
END
Use update query with a case statement
Update tableName set PkId = Case PkId
When 1 then 5
When 2 then 6
When 3 then 7
When 4 then 1
When 5 then 2
When 6 then 3
When 7 then 4 End
Where PkId In (1,2,3,4,5,6,7)
If the values in your answer aer just a small subset of the values that need to be change (Do all 6 million need to change?), then you need to Create a mapping table that has the old incorrect value and the new correct value, and use that (with a join) instead of the case statement.
Update t set PkId = map.NewPkId
From tablename t
Join mappingTable m
On m.oldPkId = t.PkId

Delete duplicate id and Value ROW using SQL Server 2008 R2

In SQL Server 2008 R2 I added two duplicate ID and record in my table. When I try to delete one of the last two records I receive the following error.
The row values updated or deleted either do not make the row unique or they alter multiple rows.
The data is:
7 ABC 6
7 ABC 6
7 ABC 6
8 XYZ 1
8 XYZ 1
8 XYZ 4
7 ABC 6
7 ABC 6
I need to delete last two records:
7 ABC 6
7 ABC 6
I have been trying to delete last 2 record using the feature "Edit the Top 200 rows" to delete this duplicate id but get the error above.
Any help is appreciated. Thanks in advance:)
Since you have given no clue whatsoever that there are other columns in the table, assuming your data is in 3 columns A,B,C, you can delete 2 rows using:
;with t as (
select top(2) *
from tbl
where A = 7 and B = 'ABC' and C = 6
)
DELETE t;
This will arbitrarily match two rows based on the conditions, and delete them.
This is an outline of code I use to delete dups in tables that may have many dups.
/* I always put the rollback and commit up here in comments until I am sure I have
done what I wanted. */
BEGIN tran Jim1 -- rollback tran Jim1 -- Commit tran Jim1; DROP table PnLTest.dbo.What_Jim_Deleted
/* This creates a table to put the deleted rows in just in case I'm really screwed up */
SELECT top 1 *, NULL dupflag
INTO jt1.dbo.What_Jim_Deleted --DROP TABLE jt1.dbo.What_Jim_Deleted
FROM jt1.dbo.tab1;
/* This removes the row without removing the table */
TRUNCATE TABLE jt1.dbo.What_Jim_Deleted;
/* the cte assigns a row number to each unique security for each day, dups will have a
rownumber > 1. The fields in the partition by are from the composite key for the
table (if one exists. These are the queries that I ran to show them as dups
SELECT compkey1, compkey2, compkey3, compkey4, COUNT(*)
FROM jt1.dbo.tab1
GROUP BY compkey1, compkey2, compkey3, compkey4
HAVING COUNT(*) > 1
ORDER BY 1 DESC
*/
with getthedups as
(SELECT *,
ROW_NUMBER() OVER
(partition by compkey1,compkey2, compkey3, compkey4
ORDER BY Timestamp desc) dupflag /*This can be anything that gives some order to the rows (even if order doesn't matter) */
FROM jt1.dbo.tab1)
/* This delete is deleting from the cte which cascades to the underlying table
The Where is part of the Delete (even though it comes after the OUTPUT. The
OUTPUT takes all of the DELETED row and inserts them into the "oh shit" table,
just in case.*/
DELETE
FROM getthedups
OUTPUT DELETED.* INTO jti.dbo.What_Jim_Deleted
WHERE dupflag > 1
--Check the resulting tables here to ensure that you did what you think you did
/* If all has gone well then commit the tran and drop the "oh shit" table, or let it
hang around for a while. */

Selecting most recent and specific version in each group of records, for multiple groups

The problem:
I have a table that records data rows in foo. Each time the row is updated, a new row is inserted along with a revision number. The table looks like:
id rev field
1 1 test1
2 1 fsdfs
3 1 jfds
1 2 test2
Note: the last record is a newer version of the first row.
Is there an efficient way to query for the latest version of a record and for a specific version of a record?
For instance, a query for rev=2 would return the 2, 3 and 4th row (not the replaced 1st row though) while a query for rev=1 yields those rows with rev <= 1 and in case of duplicated ids, the one with the higher revision number is chosen (record: 1, 2, 3).
I would not prefer to return the result in an iterative way.
To get only latest revisions:
SELECT * from t t1
WHERE t1.rev =
(SELECT max(rev) FROM t t2 WHERE t2.id = t1.id)
To get a specific revision, in this case 1 (and if an item doesn't have the revision yet the next smallest revision):
SELECT * from foo t1
WHERE t1.rev =
(SELECT max(rev)
FROM foo t2
WHERE t2.id = t1.id
AND t2.rev <= 1)
It might not be the most efficient way to do this, but right now I cannot figure a better way to do this.
Here's an alternative solution that incurs an update cost but is much more efficient for reading the latest data rows as it avoids computing MAX(rev). It also works when you're doing bulk updates of subsets of the table. I needed this pattern to ensure I could efficiently switch to a new data set that was updated via a long running batch update without any windows of time where we had partially updated data visible.
Aging
Replace the rev column with an age column
Create a view of the current latest data with filter: age = 0
To create a new version of your data ...
INSERT: new rows with age = -1 - This was my slow long running batch process.
UPDATE: UPDATE table-name SET age = age + 1 for all rows in the subset. This switches the view to the new latest data (age = 0) and also ages older data in a single transaction.
DELETE: rows having age > N in the subset - Optionally purge old data
Indexing
Create a composite index with age and then id so the view will be nice and fast and can also be used to look up by id. Although this key is effectively unique, its temporarily non-unique when you're ageing the rows (during UPDATE SET age=age+1) so you'll need to make it non-unique and ideally the clustered index. If you need to find all versions of a given id ordered by age, you may need an additional non-unique index on id then age.
Rollback
Finally ... Lets say you're having a bad day and the batch processing breaks. You can quickly revert to a previous data set version by running:
UPDATE table-name SET age = age - 1 -- Roll back a version
DELETE table-name WHERE age < 0 -- Clean up bad stuff
Existing Table
Suppose you have an existing table that now needs to support aging. You can use this pattern by first renaming the existing table, then add the age column and indexing and then create the view that includes the age = 0 condition with the same name as the original table name.
This strategy may or may not work depending on the nature of technology layers that depended on the original table but in many cases swapping a view for a table should drop in just fine.
Notes
I recommend naming the age column to RowAge in order to indicate this pattern is being used, since it's clearer that its a database related value and it complements SQL Server's RowVersion naming convention. It also won't conflict with a column or view that needs to return a person's age.
Unlike other solutions, this pattern works for non SQL Server databases.
If the subsets you're updating are very large then this might not be a good solution as your final transaction will update not just the current records but all past version of the records in this subset (which could even be the entire table!) so you may end up locking the table.
This is how I would do it. ROW_NUMBER() requires SQL Server 2005 or later
Sample data:
DECLARE #foo TABLE (
id int,
rev int,
field nvarchar(10)
)
INSERT #foo VALUES
( 1, 1, 'test1' ),
( 2, 1, 'fdsfs' ),
( 3, 1, 'jfds' ),
( 1, 2, 'test2' )
The query:
DECLARE #desiredRev int
SET #desiredRev = 2
SELECT * FROM (
SELECT
id,
rev,
field,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rn
FROM #foo WHERE rev <= #desiredRev
) numbered
WHERE rn = 1
The inner SELECT returns all relevant records, and within each id group (that's the PARTITION BY), computes the row number when ordered by descending rev.
The outer SELECT just selects the first member (so, the one with highest rev) from each id group.
Output when #desiredRev = 2 :
id rev field rn
----------- ----------- ---------- --------------------
1 2 test2 1
2 1 fdsfs 1
3 1 jfds 1
Output when #desiredRev = 1 :
id rev field rn
----------- ----------- ---------- --------------------
1 1 test1 1
2 1 fdsfs 1
3 1 jfds 1
If you want all the latest revisions of each field, you can use
SELECT C.rev, C.fields FROM (
SELECT MAX(A.rev) AS rev, A.id
FROM yourtable A
GROUP BY A.id)
AS B
INNER JOIN yourtable C
ON B.id = C.id AND B.rev = C.rev
In the case of your example, that would return
rev field
1 fsdfs
1 jfds
2 test2
SELECT
MaxRevs.id,
revision.field
FROM
(SELECT
id,
MAX(rev) AS MaxRev
FROM revision
GROUP BY id
) MaxRevs
INNER JOIN revision
ON MaxRevs.id = revision.id AND MaxRevs.MaxRev = revision.rev
SELECT foo.* from foo
left join foo as later
on foo.id=later.id and later.rev>foo.rev
where later.id is null;
How about this?
select id, max(rev), field from foo group by id
For querying specific revision e.g. revision 1,
select id, max(rev), field from foo where rev <= 1 group by id