SQL Server custom record sort in table, allowing to delete records - sql

SQL Server table with custom sort has columns: ID (PK, auto-increment), OrderNumber, Col1, Col2..
By default an insert trigger copies value from ID to OrderNumber as suggested here.
Using some visual interface, user can sort records by incrementing or decrementing OrderNumber values.
However, how to deal with records being deleted in the meantime?
Example:
Say you add records with PK ID: 1,2,3,4,5 - OrderNumber receives same values. Then you delete records with ID=4,ID=5. Next record will have ID=6 and OrderNumber will receive the same value. Having a span of 2 missing OrderNumbers would force user to decrement record with ID=6 like 3 times to change it's order (i.e. 3x button pressed).
Alternatively, one could insert select count(*) from table into OrderNumber, but it would allow to have several similar values in table, when some old rows are deleted.
If one doesn't delete records, but only "deactivate" them, they're still included in sort order, just invisible for user. At the moment, solution in Java is needed, but I think the issue is language-independent.
Is there a better approach at this?

I would simply modify the script that switches the OrderNumber values so it does it correctly without relying on their being without gaps.
I don't know what arguments your script accepts and how it uses them, but the one that I've eventually come up with accept the ID of the item to move and the number of positions to move by (a negative value would mean "toward the lower OrderNumber values", and a positive one would imply the opposite direction).
The idea is as follows:
Look up the specified item's OrderNumber.
Rank all the items starting from OrderNumber in the direction determined by the second argument. The specified item thus receives the ranking of 1.
Pick the items with rankings from 1 to the one that is the absolute value of the second argument plus one. (I.e. the last item is the one where the specified item is being moved to.)
Join the resulting set with itself so that every row is joined with the next one and the last row is joined with the first one and thus use one set of rows to update the other.
This is the query that implements the above, with comments explaining some tricky parts:
Edited: fixed an issue with incorrect reordering
/* these are the arguments of the query */
DECLARE #ID int, #JumpBy int;
SET #ID = ...
SET #JumpBy = ...
DECLARE #OrderNumber int;
/* Step #1: Get OrderNumber of the specified item */
SELECT #OrderNumber = OrderNumber FROM atable WHERE ID = #ID;
WITH ranked AS (
/* Step #2: rank rows including the specified item and those that are sorted
either before or after it (depending on the value of #JumpBy */
SELECT
*,
rnk = ROW_NUMBER() OVER (
ORDER BY OrderNumber * SIGN(#JumpBy)
/* this little "* SIGN(#JumpBy)" trick ensures that the
top-ranked item will always be the one specified by #ID:
* if we are selecting rows where OrderNumber >= #OrderNumber,
the order will be by OrderNumber and #OrderNumber will be
the smallest item (thus #1);
* if we are selecting rows where OrderNumber <= #OrderNumber,
the order becomes by -OrderNumber and #OrderNumber again
becomes the top ranked item, because its negative counterpart,
-#OrderNumber, will again be the smallest one
*/
)
FROM atable
WHERE OrderNumber >= #OrderNumber AND #JumpBy > 0
OR OrderNumber <= #OrderNumber AND #JumpBy < 0
),
affected AS (
/* Step #3: select only rows that need be affected */
SELECT *
FROM ranked
WHERE rnk BETWEEN 1 AND ABS(#JumpBy) + 1
)
/* Step #4: self-join and update */
UPDATE old
SET OrderNumber = new.OrderNumber
FROM affected old
INNER JOIN affected new ON old.rnk = new.rnk % (ABS(#JumpBy) + 1) + 1
/* if old.rnk = 1, the corresponding new.rnk is N,
because 1 = N MOD N + 1 (N is ABS(#JumpBy)+1),
for old.rnk = 2 the matching new.rnk is 1: 2 = 1 MOD N + 1,
for 3, it's 2 etc.
this condition could alternatively be written like this:
new.rnk = (old.rnk + ABS(#JumpBy) - 1) % (ABS(#JumpBy) + 1) + 1
*/
Note: this assumes SQL Server 2005 or later version.
One known issue with this solution is that it will not "move" rows correctly if the specified ID cannot be moved exactly by the specified number of positions (for instance, if you want to move the topmost row up by any number of positions, or the second row by two or more positions etc.).

Ok - if I'm not mistaken, you want to defragment your OrderNumber.
What if you use ROW_NUMBER() for this ?
Example:
;WITH calc_cte AS (
SELECT
ID
, OrderNumber
, RowNo = ROW_NUMBER() OVER (ORDER BY ID)
FROM
dbo.Order
)
UPDATE
c
SET
OrderNumber = c.RowNo
FROM
calc_cte c
WHERE EXISTS (SELECT * FROM inserted i WHERE c.ID = i.ID)

Didn't want to reply my own question, but I believe I have found a solution.
Insert query:
INSERT INTO table (OrderNumber, col1, col2)
VALUES ((select count(*)+1 from table),val1,val2)
Delete trigger:
CREATE TRIGGER Cleanup_After_Delete ON table
AFTER DELETE AS
BEGIN
WITH rowtable AS (SELECT [ID], OrderNumber, rownum = ROW_NUMBER()
OVER (ORDER BY OrderNumber ASC) FROM table)
UPDATE rt SET OrderNumber = rt.rownum FROM rowtable rt
WHERE OrderNumber >= (SELECT OrderNumber FROM deleted)
END
The trigger fires up after every delete and corrects all OrderNumbers above the deleted one (no gaps). This means that I can simply change the order of 2 records by switching their OrderNumbers.
This is a working solution for my problem, however this one is also very good one, perhaps more useful for others.

Related

Update table by Preference and reorder table accordingly

I have this app when you drag and drop rows in a list it reorders and SETS that row by preference number, 1-5 for example, 1 being most priority. So if I have 5 records in the list, I can then drag each row and when dropped it will reorder the list by preference. I can move row 5 to row 1, or row 2 to row 3, etc... This will update the preference number in the SQL table according where you drop.
This app is in real-time. When new rows are added to the table automatically, they have an initial preference of "0". This query will add the next number preference to the record, so if I have rows with preferences of 1-5, a new record comes in, then it's assigned a preference of 6:
with CTE as (
select Id, Preference, cp.maxpref, row_number() over(order by Id) rn
from [RadioQDB].[dbo].[Rad5]
cross apply (
select max(preference) maxpref
from [RadioQDB].[dbo].[Rad5] p
) cp
where preference = 0
)
update cte
set preference = maxpref + rn
where preference = 0
The issue I am having now is if a record is removed from the list during an update (not user drag and drop), let's say you have 1,2,3,4,5 records in the list. If during the table update, a record is removed automatically, let's say #2, then you end up with preferences 1,3,4,5. How can I move up everything and reorder the table accordingly by preference?
1 stays the same, 3 moves to 2, 4 moves to 3 and so forth.
Thank you.
To insert a new record with Preference=Max(Preference)+1 use the following query:
insert into Rad5 values(10,(select max(Preference)+1 from Rad5));
-- inserts a new record with id=10
To reorder the records according to the Preference after deleting a record try the following:
with cte as (
select id, Preference, row_number() over (order by Preference) as rn
from Rad5)
update cte set Preference=rn;
You can use Trigger on delete from your table to call the update query automatically whenever a record is deleted, if you want to do so use the following:
create trigger Rad5_Delete on Rad5
for delete
as
with cte as (
select id, Preference, row_number() over (order by Preference) as rn
from Rad5)
update cte set Preference=rn;
See a demo from db<>fiddle.

How to update with incrementing value

I have a table in PostgreSQL that has an ID column that is supposed to be unique. However, a large number of the rows (around 3 million) currently have an ID of "1".
What I know:
The total number of rows
The current maximum value for the ID column
The number of rows with an (incorrect) ID of "1"
What I need is a query that will pull all the rows with an ID of "1" and assign them a new ID that increments automatically so that every row in the table will have a unique ID. I'd like it to start at the currentMaxId + 1 and assign each row the subsequent ID.
This is the closest I've gotten with a query:
UPDATE table_name
SET id = (
SELECT max(id) FROM table_name
) + 1
WHERE id = '1'
The problem with this is that the inner SELECT only runs the first time, thus setting the ID of the rows in question to the original max(id) + 1, not the new max(id) + 1 every time, giving me the same problem I'm trying to solve.
Any suggestions on how to tweak this query to achieve my desired result or an alternative method would be greatly appreciated!
You may do it step by step with a temporary sequence.
1) creation
create temporary sequence seq_upd;
2) set it to the proper initial value
select setval('seq_upd', (select max(id) from table_name));
3) update
update table_name set id=nextval('seq_upd') where id=1;
If you are going to work with a SEQUENCE, consider the serial pseudo data type for you id. Then you can just draw nextval() from the "owned" (not temporary) sequence, which will then be up to date automatically.
If you don't want that, you can fall back to using the ctid and row_number() for a one-time numbering:
UPDATE tbl t
SET id = x.max_id + u.rn
FROM (SELECT max(id) AS max_id FROM tbl) x
, (SELECT ctid, row_number() OVER (ORDER BY ctid) AS rn
FROM tbl WHERE id = 1) u
WHERE t.ctid = u.ctid;
Related answer on dba.SE:
numbering rows consecutively for a number of tables

Process SQL Table with no Unique Column

We have a table which keeps the log of internet usage inside our company. this table is filled by a software bought by us and we cannot make any changes to its table. This table does not have a unique key or index (to make the data writing faster as its developers say)
I need to read the data in this table to create real time reports of internet usage by our users.
currently I'm reading data from this table in chunks of 1000 records. My problem is keeping the last record I have read from the table, so I can read the next 1000 records.
what is the best possible solution to this problem?
by the way, earlier records may get deleted by the software as needed if the database file size gets big.
Depending on your version of SQL Server, you can use row_number(). Once the row_number() is assigned, then you can page through the records:
select *
from
(
select *,
row_number() over(order by id) rn
from yourtable
) src
where rn between 1 and 1000
Then when you want to get the next set of records, you could change the values in the WHERE clause to:
where rn between 1001 and 2000
Based on your comment that the data gets deleted, I would do the following.
First, insert the data into a temptable:
select *, row_number() over(order by id) rn
into #temp
from yourtable
Then you can select the data by row number in any block as needed.
select *
from #temp
where rn between 1 and 1000
This would also help;
declare #numRecords int = 1000 --Number of records needed per request
declare #requestCount int = 0 --Request number starting from 0 and increase 1 by 1
select top (#numRecords) *
from
(
select *, row_number() over(order by id) rn
from yourtable
) T
where rn > #requestCount*#numRecords
EDIT: As per comments
CREATE PROCEDURE [dbo].[select_myrecords]
--Number of records needed per request
declare #NumRecords int --(= 1000 )
--Datetime of the LAST RECORD of previous result-set or null for first request
declare #LastDateTime datetime = null
AS
BEGIN
select top (#NumRecords) *
from yourtable
where LOGTime < isnull(#LastDateTime,getdate())
order by LOGTime desc
END
Without any index you cannot efficiently select the "last" records. The solution will not scale. You cannot use "real-time" and "repeated table scans of a big logging table" in the same sentence.
Actually, without any unique identification attribute for each row you cannot even determine what's new (proof: say, you had a table full of thousands of booleans. How would you determine which ones are new? They cannot be told apart! You cannot find out.). There must be something you can use, like a combination of DateTime, IP or so. Or, you can add an IDENTITY column which is likely to be transparent to the software you use.
Probably, the software you use will tolerate you creating an index on some ID or DateTime column as this is transparent to the software. It might create more load, so be sure to test it (my guess: you'll be fine).

Selecting most recent and specific version in each group of records, for multiple groups

The problem:
I have a table that records data rows in foo. Each time the row is updated, a new row is inserted along with a revision number. The table looks like:
id rev field
1 1 test1
2 1 fsdfs
3 1 jfds
1 2 test2
Note: the last record is a newer version of the first row.
Is there an efficient way to query for the latest version of a record and for a specific version of a record?
For instance, a query for rev=2 would return the 2, 3 and 4th row (not the replaced 1st row though) while a query for rev=1 yields those rows with rev <= 1 and in case of duplicated ids, the one with the higher revision number is chosen (record: 1, 2, 3).
I would not prefer to return the result in an iterative way.
To get only latest revisions:
SELECT * from t t1
WHERE t1.rev =
(SELECT max(rev) FROM t t2 WHERE t2.id = t1.id)
To get a specific revision, in this case 1 (and if an item doesn't have the revision yet the next smallest revision):
SELECT * from foo t1
WHERE t1.rev =
(SELECT max(rev)
FROM foo t2
WHERE t2.id = t1.id
AND t2.rev <= 1)
It might not be the most efficient way to do this, but right now I cannot figure a better way to do this.
Here's an alternative solution that incurs an update cost but is much more efficient for reading the latest data rows as it avoids computing MAX(rev). It also works when you're doing bulk updates of subsets of the table. I needed this pattern to ensure I could efficiently switch to a new data set that was updated via a long running batch update without any windows of time where we had partially updated data visible.
Aging
Replace the rev column with an age column
Create a view of the current latest data with filter: age = 0
To create a new version of your data ...
INSERT: new rows with age = -1 - This was my slow long running batch process.
UPDATE: UPDATE table-name SET age = age + 1 for all rows in the subset. This switches the view to the new latest data (age = 0) and also ages older data in a single transaction.
DELETE: rows having age > N in the subset - Optionally purge old data
Indexing
Create a composite index with age and then id so the view will be nice and fast and can also be used to look up by id. Although this key is effectively unique, its temporarily non-unique when you're ageing the rows (during UPDATE SET age=age+1) so you'll need to make it non-unique and ideally the clustered index. If you need to find all versions of a given id ordered by age, you may need an additional non-unique index on id then age.
Rollback
Finally ... Lets say you're having a bad day and the batch processing breaks. You can quickly revert to a previous data set version by running:
UPDATE table-name SET age = age - 1 -- Roll back a version
DELETE table-name WHERE age < 0 -- Clean up bad stuff
Existing Table
Suppose you have an existing table that now needs to support aging. You can use this pattern by first renaming the existing table, then add the age column and indexing and then create the view that includes the age = 0 condition with the same name as the original table name.
This strategy may or may not work depending on the nature of technology layers that depended on the original table but in many cases swapping a view for a table should drop in just fine.
Notes
I recommend naming the age column to RowAge in order to indicate this pattern is being used, since it's clearer that its a database related value and it complements SQL Server's RowVersion naming convention. It also won't conflict with a column or view that needs to return a person's age.
Unlike other solutions, this pattern works for non SQL Server databases.
If the subsets you're updating are very large then this might not be a good solution as your final transaction will update not just the current records but all past version of the records in this subset (which could even be the entire table!) so you may end up locking the table.
This is how I would do it. ROW_NUMBER() requires SQL Server 2005 or later
Sample data:
DECLARE #foo TABLE (
id int,
rev int,
field nvarchar(10)
)
INSERT #foo VALUES
( 1, 1, 'test1' ),
( 2, 1, 'fdsfs' ),
( 3, 1, 'jfds' ),
( 1, 2, 'test2' )
The query:
DECLARE #desiredRev int
SET #desiredRev = 2
SELECT * FROM (
SELECT
id,
rev,
field,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rn
FROM #foo WHERE rev <= #desiredRev
) numbered
WHERE rn = 1
The inner SELECT returns all relevant records, and within each id group (that's the PARTITION BY), computes the row number when ordered by descending rev.
The outer SELECT just selects the first member (so, the one with highest rev) from each id group.
Output when #desiredRev = 2 :
id rev field rn
----------- ----------- ---------- --------------------
1 2 test2 1
2 1 fdsfs 1
3 1 jfds 1
Output when #desiredRev = 1 :
id rev field rn
----------- ----------- ---------- --------------------
1 1 test1 1
2 1 fdsfs 1
3 1 jfds 1
If you want all the latest revisions of each field, you can use
SELECT C.rev, C.fields FROM (
SELECT MAX(A.rev) AS rev, A.id
FROM yourtable A
GROUP BY A.id)
AS B
INNER JOIN yourtable C
ON B.id = C.id AND B.rev = C.rev
In the case of your example, that would return
rev field
1 fsdfs
1 jfds
2 test2
SELECT
MaxRevs.id,
revision.field
FROM
(SELECT
id,
MAX(rev) AS MaxRev
FROM revision
GROUP BY id
) MaxRevs
INNER JOIN revision
ON MaxRevs.id = revision.id AND MaxRevs.MaxRev = revision.rev
SELECT foo.* from foo
left join foo as later
on foo.id=later.id and later.rev>foo.rev
where later.id is null;
How about this?
select id, max(rev), field from foo group by id
For querying specific revision e.g. revision 1,
select id, max(rev), field from foo where rev <= 1 group by id

shuffle values in an integer column so they are always unique and sequential

I have a table I'd like to sort with a "priority" column. This column needs to be reordered when the priority of a record is changed or records are removed. Think of it as an array. The values will be modified in a UI so I want them to remain whole numbers and represent the true position within the larger recordset. The priority column won't have NULLs.
id priority
1 2
2 1
3 4
4 3
Now say I change the priority of id 4 to 2 or I insert or delete a row how do I get all priorities to reshuffle so there are no gaps or duplicates and the highest possible priority is always the number of rows?
The table has a "date_modified" field which is accurate to the second and updated on insert/update so if needed it is possible to know which record was modified last (to break a tie when 2 records have the same priority)
Assuming you have 8.4 you can use window functions.
UPDATE test_priority
SET priority = sub.new_priority
FROM (
SELECT user_id, id, priority, rank() OVER (ORDER BY priority, date_modified) new_priority
FROM test_priority
WHERE user_id = $1
) sub
WHERE test_priority.user_id = sub.user_id
AND test_priority.id = sub.id
AND test_priority.priority <> sub.new_priority
Deleting a row:
UPDATE tbl SET priority = priority - 1
WHERE priority > the_priority_of_what_you_deleted
Inserting a row (do this before the insert):
UPDATE tbl SET priority = priority + 1
WHERE priority >= the_priority_about_to_be_inserted
You can put this kind of logic into INSERT and/or DELETE triggers, if you want.