Storing item positions (for ordering) in a database efficiently - sql

Scenario:
There is a database of movies a user owns, movies are displayed on a page called "my-movies", the movies can be displayed in the order that the user desires. For example "Fight Club" in position #1, "Drive" in position #3 and so on and so forth.
The obvious solution is to store a position with each item, for example:
movieid, userid, position
1 | 1 | 1
2 | 1 | 2
3 | 1 | 3
Then when outputting the data is ordered by the position. This method works fine for output however it has a problem when updating: the position of an item all the other positions need to be updated because positions are relative. If movie #3 is now in position number 2 then movie #3 now needs to be updated to position #2. If the database contains 10,000 movies and a movie is moved from position #1 to position #9999 that's almost 10,000 rows to be updated!
My only solution is to store positioning separately, instead of having an individual field for each items position it's just one big data dump of positions that are taken in run time and associated with each item (json, xml, whatever) but that feels... inefficient because the database can't be left to do the sorting.
My summarised question: What's the most efficient way of storing items positions in a list that is friendly to fetching and updating?

August 2022: Note that the below is flawed and doesn't work when moving a movie down the list. I've posted a new answer here which fixes this issue.
If you use a combination of the position and a timestamp that the user put a movie in a given position rather than trying to maintain the actual position, then you can achieve a fairly simple means of both SELECTing and UPDATEing the data. For example; a base set of data:
create table usermovies (userid int, movieid int, position int, positionsetdatetime datetime)
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (123, 99, 1, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (123, 98, 2, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (123, 97, 3, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (123, 96, 4, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (123, 95, 5, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (123, 94, 6, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (987, 99, 1, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (987, 98, 2, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (987, 97, 3, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (987, 96, 4, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (987, 95, 5, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (987, 94, 6, getutcdate())
If you query the user's movies using a query like this:
;with usermovieswithrank as (
select userid
, movieid
, dense_rank() over (partition by userid order by position asc, positionsetdatetime desc) as movierank
from usermovies
)
select * from usermovieswithrank where userid=123 order by userid, movierank asc
Then you'll get the expected result:
USERID MOVIEID MOVIERANK
123 99 1
123 98 2
123 97 3
123 96 4
123 95 5
123 94 6
To move one of the rankings of the movies we need to update the position and the positionsetdatetime columns. For example, if userid 123 moves movie 95 from rank 5 to rank 2 then we do this:
update usermovies set position=2, positionsetdatetime=getutcdate()
where userid=123 and movieid=95
Which results in this (using the SELECT query above following the update):
USERID MOVIEID MOVIERANK
123 99 1
123 95 2
123 98 3
123 97 4
123 96 5
123 94 6
Then if userid 123 moves movie 96 to rank 1:
update usermovies set position=1, positionsetdatetime=getutcdate()
where userid=123 and movieid=96
We get:
USERID MOVIEID MOVIERANK
123 96 1
123 99 2
123 95 3
123 98 4
123 97 5
123 94 6
Of course you'll end up with duplicate position column values within the usermovies table, but with this method you'll never show that column, you simply use it along with positionsetdatetime to determine a sorted rank for each user and the rank you determine is the real position.
If at some point you want the position column to properly reflect the movie rankings without reference to the positionsetdatetime you can use the movierank from the select query above to update the usermovies position column value, as it wouldn't actually affect the determined movie rankings.

I've been struggling with what best to do with this situation and have come to the realisation that BY FAR the best solution is a list/array of the movies in the order you want them eg;
userId, moviesOrder
1 : [4,3,9,1...]
obviously you will serialise your array.
'that feels... inefficient'?
consider the user had a list of 100 movies. Searching by position will be one database query, a string to array conversion and then moviesOrder[index]. Possibly slower than a straight DB lookup but still very very fast.
OTOH, consider if you change the order;
with a position stored in the db you need up to 100 row changes, compared to an array splice. The linked list idea is interesting but doesn't work as presented, would break everything if a single element failed, and looks a hell of a lot slower too. Other ideas like leaving gaps, use float are workable although a mess, and prone to failure at some point unless you GC.
It seems like there should be a better way to do it in SQL, but there really isn't.

Store the order linked-list style. Instead of saving the absolute position, save the ID of the previous item. That way any change only requires you to update two rows.
movieid | userid | previousid
1 | 1 |
2 | 1 | 1
3 | 1 | 4
4 | 1 | 2
To get the movies in order ...
SELECT movieid WHERE userid = 1 ORDER BY previousid
-> 1, 2, 4, 3
To (say) move #4 up a space:
DECLARE #previousid int, #currentid int
SET #previousid = SELECT previousid FROM movies WHERE movieid = #currentid
-- current movie's previous becomes its preceding's preceding
UPDATE movies SET previousid =
(SELECT previousid FROM movies WHERE movieid = #previousid)
WHERE movieid = #currentid
-- the preceding movie's previous becomes the current one's previous
UPDATE movies SET previousid = #currentid WHERE movieid = #previousid
That's still 1 read + 2 writes, but it beats 10,000 writes.

Really interesting solutions here. Another possibility might be to store positions with some space, say multiples of 10 or 100.
ID NAME POSITION
7 A 100
9 B 200
13 C 300
15 D 400
21 F 500
This multiple of 100 can be done for every new addition.
Then moving a row C to position 1, would be -1 the current value or +1 after the current value. Or even -50 so that the same can be possible in future.
ID NAME POSITION
7 A 100
9 B 200
13 C 50
15 D 400
21 F 500
This can be continued, and in cases of so many movements that it is not possible, then a reorder of all the rows is done once again.

ID NAME POSITION
7 A 1
9 B 2
13 C 3
15 D 4
21 F 5
Given the current scenario if we want to move item D to position 2 we can search for the interval between 2(the position we want to move the item) and 4 (The item's current position) and write a query to ADD +1 to the position of every element inside this interval hence in this case we can do the following steps:
Search for items in the interval where position >= 2 AND position < 4, and add a +1 to its position
Set Item D position to 2.
This will generate that know :
A->1, B->3, C-> 4, D->2, F->5
In case we want to move B to D then we need to do the opposite and apply a -1 instead.
Search for items in the interval where position > 2 AND position <= 4 substract -1 from its position
Set item position to 4
When deleting an Item from the table we need to update every item where its position is greater than the position of the element that's being deleted.
And when creating and Item its position is equal to the COUNT of every item +1.
DISCLAIMER: If you have a really big amount maybe this solution is not what you want, but for most cases will do. Normally a user wont move an item from position 10000 to position 2 but if instead the users delete item 1 then the query will substract -1 to the 9999 remaining items. If this is your scenario then maybe the solution with the linked list is probably the best for you, but then ordering will be more challenging because you need to go item by item to see who's next on the list.
Example querys
-- MOVE DOWN
UPDATE movie SET position = position-1 WHERE position <= 18 AND position > 13 AND id > 0;
UPDATE movie SET position = 18 WHERE id = 130;
-- MOVE UP
UPDATE movie SET position = position+1 WHERE position < 18 AND position >= 13 AND id > 0;
UPDATE movie SET position = 13 WHERE id = 130;

Further to my 2014 answer I finally got back to this question and have built upon my previous approach and it's fatal flaw. I've come up with the following solution, using a SQL Server Stored Procedure to show the logic.
First, the table of movies:
CREATE TABLE [dbo].[usermovies]
([userid] [int], [movieid] [int], [position] [int], [subposition] [int])
And the test data. Note that when we load the data the initial movierank is set in the position column and the subposition is set to 0:
insert into usermovies (userid, movieid, position, subposition)
values (123, 99, 1, 0)
,(123, 98, 2, 0)
,(123, 97, 3, 0)
,(123, 96, 4, 0)
,(123, 95, 5, 0)
,(123, 94, 6, 0)
,(987, 99, 1, 0)
,(987, 98, 2, 0)
,(987, 97, 3, 0)
,(987, 96, 4, 0)
,(987, 95, 5, 0)
,(987, 94, 6, 0)
It is important to understand that the rank of each movie (movierank) is not determined from the position value, but instead from the rank of the row when the records are sorted by position and then by subposition. I created a view to provide the movierank:
CREATE OR ALTER VIEW vwUserMoviesWithRank
as
with userMoviesWithRanks as (
SELECT *
, dense_rank() over (partition by userid order by position asc, subposition asc) as movierank
FROM usermovies
)
SELECT * FROM userMoviesWithRanks
GO
Each user can only have one movie with a given position/subposition value, as this provides the unique movierank. Adding a unique clustered index to the table nicely enforces this rule and also, with sufficient data, would make for faster data access.
CREATE UNIQUE CLUSTERED INDEX [IX_usermovies]
ON [dbo].[usermovies] ([userid] ASC, [position] ASC, [subposition] ASC)
The below stored procedure performs the updates which allow a user's movie rankings to be changed. I've added comments to help explain the logic:
CREATE OR ALTER PROC proc_ChangeUserMovieRank
#userID int,
#movieID int,
#moveToRank int
as
DECLARE #moveFromRank int
DECLARE #movieIDAtNewRank int
DECLARE #positionAtNewRank int
DECLARE #subpositionAtNewRank int
IF #moveToRank<1 THROW 51000, '#moveToRank must be >= 1', 1;
BEGIN TRAN
-- Get the current rank of the movie being moved
SELECT #moveFromRank=movierank FROM vwUserMoviesWithRank WHERE userid=#userID and movieid=#movieID
IF #moveFromRank<>#moveToRank BEGIN
-- Get the position and subposition of the movie we need to shift down the list
-- if this move is decreasing the movie rank then we need to shift the movie at #moveToRank
-- if this move is increasing the movie rank then we need to shift the movie at #moveToRank+1, to accommodate the removal
SELECT #positionAtNewRank=position, #subpositionAtNewRank=subposition
FROM vwUserMoviesWithRank
WHERE userid=#userID and movierank=(#moveToRank + CASE WHEN #moveToRank>#moveFromRank THEN 1 ELSE 0 END)
IF #positionAtNewRank IS NULL BEGIN
-- No movie needs to be updated, so we're adding to the end of the list
-- Our destination is the position+1 of the highest ranked movie (with subposition=0)
SELECT #positionAtNewRank=max(p.position)+1, #subpositionAtNewRank=0
FROM vwUserMoviesWithRank p WHERE p.userid=#userID
END ELSE BEGIN
-- Move down (increase the subposition of) any movies with the same position value as the destination rank
UPDATE m
SET subposition=subposition+1
FROM usermovies m
WHERE userid=#userID AND position=#positionAtNewRank and subposition>=#subpositionAtNewRank
END
-- Finally move the movie to the new rank
UPDATE m
SET position=#positionAtNewRank, subposition=#subpositionAtNewRank
FROM usermovies m
WHERE m.userid=#userID AND m.movieid=#movieID
END
COMMIT TRAN
GO
Here's a test run using the test data above. The movies are listed using the following SELECT statement, I haven't repeated this each time below for the sake of brevity. Here's our movie ranking at the beginning:
SELECT movieid, movierank FROM vwUserMoviesWithRank WHERE userid=123 ORDER BY movierank
movieid movierank
----------- --------------------
99 1
98 2
97 3
96 4
95 5
94 6
Let's move movie 98 to rank 5:
EXEC proc_ChangeUserMovieRank #userID=123, #movieID=98, #moveToRank=5
movieid movierank
----------- --------------------
99 1
97 2
96 3
95 4
98 5
94 6
Move movie 94 to rank 2:
EXEC proc_ChangeUserMovieRank #userID=123, #movieID=94, #moveToRank=2
movieid movierank
----------- --------------------
99 1
94 2
97 3
96 4
95 5
98 6
Move movie 95 to rank 1:
EXEC proc_ChangeUserMovieRank #userID=123, #movieID=95, #moveToRank=1
movieid movierank
----------- --------------------
95 1
99 2
94 3
97 4
96 5
98 6
Move movie 99 to rank 4:
EXEC proc_ChangeUserMovieRank #userID=123, #movieID=99, #moveToRank=4
movieid movierank
----------- --------------------
95 1
94 2
97 3
99 4
96 5
98 6
Move movie 97 to rank 6:
EXEC proc_ChangeUserMovieRank #userID=123, #movieID=97, #moveToRank=6
movieid movierank
----------- --------------------
95 1
94 2
99 3
96 4
98 5
97 6
Move movie 97 to rank 4:
EXEC proc_ChangeUserMovieRank #userID=123, #movieID=97, #moveToRank=4
movieid movierank
----------- --------------------
95 1
94 2
99 3
97 4
96 5
98 6
Move movie 95 to rank 4:
EXEC proc_ChangeUserMovieRank #userID=123, #movieID=95, #moveToRank=4
movieid movierank
----------- --------------------
94 1
99 2
97 3
95 4
96 5
98 6
Which all looks good I think.
Note that following these operations the position/subposition data now looks like this:
select * from vwUserMoviesWithRank WHERE userid=123 order by movierank
userid movieid position subposition movierank
----------- ----------- ----------- ----------- --------------------
123 94 3 0 1
123 99 4 0 2
123 97 4 1 3
123 95 4 2 4
123 96 4 3 5
123 98 6 0 6
The values are quite different from the determined movierank.
As the movie rankings change the position may become the same across a number of rows, such as position 4 above. When this happens more rows will need to be updated when the rankings change, so it is advisable to periodically reset the position and subposition to the movierank value:
UPDATE usermovies
SET position=vwUserMoviesWithRank.movierank, subposition=0
FROM vwUserMoviesWithRank
INNER JOIN usermovies on usermovies.userid=vwUserMoviesWithRank.userid AND usermovies.movieid=vwUserMoviesWithRank.movieid
WHERE usermovies.position<>vwUserMoviesWithRank.movierank OR usermovies.subposition<>0
This works very efficiently and will scale very well and I think it all works, let me know if you think otherwise and I'll take another look (and this time I won't wait 8 years to do so!)
And just to note that I tried to add a SQL Fiddle link here but it appears that they have no SQL Server hosts presently :-/

Related

Getting two different types of sums with only one row

I have a table that looks like this:
id code total
1 2 30
1 4 60
1 2 31
2 2 10
2 4 11
What I'd like to do, is basically get one row per id for the sum of records for code 2 and the sum of records for all codes for that id. So something like this:
id code2_total overall
1 61 121
2 10 21
I've tried the following:
select id
, abs(sum(total) over (partition by id)) as overall
, (select sum(total) from table where code = '2' group by id) as code2_total
from table limit 1
But I'm getting multiple items in the subquery error. How can I achieve something like this?
Use group by with a regular sum and a conditional sum (i.e. using a case expression).
declare #MyTable table (id int, code int, total int);
insert into #MyTable (id, code, total)
values
(1, 2, 30),
(1, 4, 60),
(1, 2, 31),
(2, 2, 10),
(2, 4, 11);
select id
, sum(case when code = 2 then total else 0 end) code2_total
, sum(total) overall
from #MyTable
group by id
order by id;
Returns
id
code2_total
overall
1
61
121
2
10
21
Note limit 1 is MySQL not SQL Server and doesn't help you here anyway.
Note also that providing the DDL+DML as I have shown here makes it much easier for people to assist.

SQL - Get the value from the next lowest ID

I'm working in SQL Server 2014 Management Studio.
Not really sure how to explain this but it's best if I just explain with an example.
So I've figured out how to get the next lowest ID, that is fairly simple. But once i get that row i need to take the value from it and apply it to the next highest value.
If I have 4 rows
ID value
-------------
10 50
30 200
20 75
25 100
I want to take the value each row and applying to the row with the next highest ID. So it should look like this.
ID value
-------------
10 null or 0
30 100
20 50
25 75
Since there is no row before 10 ID, that row should have a value of null or 0, doesn't matter. And the others should just follow the pattern of taking the value from the row with the next lowest ID.
You're looking for LAG():
Select Id, Lag(Value) Over (Order By Id) As Value
From YourTable;
Working demo:
Declare #YourTable Table
(
Id Int,
Value Int
);
Insert #YourTable
Values (10, 50), (30, 200), (20, 75), (25, 100);
Select Id, Lag(Value) Over (Order By Id) As Value
From #YourTable;
Results
Id Value
10 NULL
20 50
25 75
30 100

PostgreSQL Order by stepping numbers

I need to order records from a table by a column. The old system the customer was using manually selected level 1 items, then all the children of level 1 items for level 2, then so on and so forth through level 5. That is horrible IMHO, as it requires hundreds of queries and calls to the DB.
So in the new DB structure I'm trying to make it all one query to the DB if possible and have it order it correctly the first time. The customer wants it displayed to them this way so I have no choice but to figure out a way to order this way.
This is an example of the items and their level codes (1 being the single digit codes, 2 the 2 digit codes, 3 for 4 digit codes, 4 for 6 digit codes and level 5 for 8 digit codes):
It's supposed to order basically everything that starts with a 5 goes under Code 5. Everything that starts with a 51 goes under code 51. If you look at the column n_mad_id it links to the "Mother" ID of the code that is the mother of that code, so code 51's mother is code 5. Code 5101's mother is code 51. Code 5201's mother is code 52. And so on and so forth.
Then the n_nivel column is the level that the code belongs to. Each code has a level and a mother. The top level codes (i.e. 1, 2, 3, 4, 5) are all level 1 since they are only one digit.
I was hoping that there might be an easy ORDER BY way to do this. I've been playing with it for two days and can't seem to get it to obey.
The absolutely simplest way would be to cast the n_cod field to text and then order on that:
SELECT *
FROM mytable
WHERE left(n_cod::text, 1) = '5' -- optional
ORDER BY n_cod::text;
Not pretty, but functional.
You could consider changing your table definition to make n_cod of type char(8) because you do not use it as a number anyway (in the sense of performing calculations). That would make the query a lot faster.
Interesting task. As I understand that you want to get result in order like
n_id n_cod n_nivel n_mad_id
10 5 1 0
11 51 2 10
12 5101 3 11
14 510101 4 12
...
13 52 2 10
...
?
If yes then it may do the trick:
with recursive
tt(n_id, n_mad_id, n_cod, x) as (
select t.n_id, t.n_mad_id, t.n_cod, array[t.n_id]
from yourtable t where t.n_mad_id = 0
union all
select t.n_id, t.n_mad_id, t.n_cod, x || t.n_id
from tt join yourtable t on t.n_mad_id = tt.n_id)
select * from tt order by x;
Here is my original test query:
create table t(id, parent) as values
(1, null),
(3, 1),
(7, 3),
(5, 3),
(6, 5),
(2, null),
(8, 2),
(4, 2);
with recursive
tt(id, parent, x) as (
select t.id, t.parent, array[t.id] from t where t.parent is null
union all
select t.id, t.parent, x || t.id from tt join t on t.parent = tt.id)
select * from tt order by x;
and its result:
id | parent | x
----+--------+-----------
1 | (null) | {1}
3 | 1 | {1,3}
5 | 3 | {1,3,5}
6 | 5 | {1,3,5,6}
7 | 3 | {1,3,7}
2 | (null) | {2}
4 | 2 | {2,4}
8 | 2 | {2,8}
(8 rows)
Read about recursive queries.

Creating sequentially increasing groups based on number change

How can I code this in oracle SQL?
I have the below data
Current Result
I want to generate a result that looks like the following:
Desired Result
So, I essentially want the group ID to increase as the row number changes back to 1. I am trying to use row_number, rank() and partition functions but it is not working properly. Desperate for help!
Thanks
EDIT (by Gordon):
The original question had the data in question. It is much, much better to have the values in the question as text than to refer to an image, so I'm adding it back in:
Code Row Number
214 1
214 2
210 1
210 2
210 3
214 1
I want to generate a result that looks like the following:
Code Row Number Group Id
214 1 1
214 2 1
210 1 2
210 2 2
210 3 2
214 1 3
In order to do what you want, you need a column that specifies the ordering of the rows in the table. Let me assume that you have an id or creation date or something similar.
If so, then what you want is simply a cumulative sum of the number of times that the second column is 1:
select t.*,
sum(case when RowNumber = 1 then 1 else 0 end) over (order by id) as GroupId
from t;
it's still not clear which field is ID because if it's rownumber as you said it's not going work the way that you have in expected output
create table test (id int , code int, rownumber int);
insert into test values (1,214,1);
insert into test values (2,214,2);
insert into test values (3,210,1);
insert into test values (4,210,2);
insert into test values (5,210,3);
insert into test values (6,214,1);
select s.code, sum(add_group) over (order by id) from (
select id, code, case when rownumber=1 then 1 else 0 end as add_group from test
order by id
) s
CODE SUM(ADD_GROUP)OVER(ORDERBYID)
1 214 1
2 214 1
3 210 2
4 210 2
5 210 2
6 214 3
btw the asnwer of #Gordon Linoff works better and exactly as you want but you need add additional field for order by

How to track how many times a column changed its value?

I have a table called crewWork as follows :
CREATE TABLE crewWork(
FloorNumber int, AptNumber int, WorkType int, simTime int )
After the table was populated, I need to know how many times a change in apt occurred and how many times a change in floor occurred. Usually I expect to find 10 rows on each apt and 40-50 on each floor.
I could just write a scalar function for that, but I was wondering if there's any way to do that in t-SQL without having to write scalar functions.
Thanks
The data will look like this:
FloorNumber AptNumber WorkType simTime
1 1 12 10
1 1 12 25
1 1 13 35
1 1 13 47
1 2 12 52
1 2 12 59
1 2 13 68
1 1 14 75
1 4 12 79
1 4 12 89
1 4 13 92
1 4 14 105
1 3 12 115
1 3 13 129
1 3 14 138
2 1 12 142
2 1 12 150
2 1 14 168
2 1 14 171
2 3 12 180
2 3 13 190
2 3 13 200
2 3 14 205
3 3 14 216
3 4 12 228
3 4 12 231
3 4 14 249
3 4 13 260
3 1 12 280
3 1 13 295
2 1 14 315
2 2 12 328
2 2 14 346
I need the information for a report, I don't need to store it anywhere.
If you use the accepted answer as written now (1/6/2023), you get correct results with the OP dataset, but I think you can get wrong results with other data.
CONFIRMED: ACCEPTED ANSWER HAS A MISTAKE (as of 1/6/2023)
I explain the potential for wrong results in my comments on the accepted answer.
In this db<>fiddle, I demonstrate the wrong results. I use a slightly modified form of accepted answer (my syntax works in SQL Server and PostgreSQL). I use a slightly modified form of the OP's data (I change two rows). I demonstrate how the accepted answer can be changed slightly, to produce correct results.
The accepted answer is clever but needs a small change to produce correct results (as demonstrated in the above db<>fiddle and described here:
Instead of doing this as seen in the accepted answer COUNT(DISTINCT AptGroup)...
You should do thisCOUNT(DISTINCT CONCAT(AptGroup, '_', AptNumber))...
DDL:
SELECT * INTO crewWork FROM (VALUES
-- data from question, with a couple changes to demonstrate problems with the accepted answer
-- https://stackoverflow.com/q/8666295/1175496
--FloorNumber AptNumber WorkType simTime
(1, 1, 12, 10 ),
-- (1, 1, 12, 25 ), -- original
(2, 1, 12, 25 ), -- new, changing FloorNumber 1->2->1
(1, 1, 13, 35 ),
(1, 1, 13, 47 ),
(1, 2, 12, 52 ),
(1, 2, 12, 59 ),
(1, 2, 13, 68 ),
(1, 1, 14, 75 ),
(1, 4, 12, 79 ),
-- (1, 4, 12, 89 ), -- original
(1, 1, 12, 89 ), -- new , changing AptNumber 4->1->4 ges)
(1, 4, 13, 92 ),
(1, 4, 14, 105 ),
(1, 3, 12, 115 ),
...
DML:
;
WITH groupedWithConcats as (SELECT
*,
CONCAT(AptGroup,'_', AptNumber) as AptCombo,
CONCAT(FloorGroup,'_',FloorNumber) as FloorCombo
-- SQL SERVER doesnt have TEMPORARY keyword; Postgres doesn't understand # for temp tables
-- INTO TEMPORARY groupedWithConcats
FROM
(
SELECT
-- the columns shown in Andriy's answer:
-- https://stackoverflow.com/a/8667477/1175496
ROW_NUMBER() OVER ( ORDER BY simTime) as RN,
-- AptNumber
AptNumber,
ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime) as RN_Apt,
ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime) as AptGroup,
-- FloorNumber
FloorNumber,
ROW_NUMBER() OVER (PARTITION BY FloorNumber ORDER BY simTime) as RN_Floor,
ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY FloorNumber ORDER BY simTime) as FloorGroup
FROM crewWork
) grouped
)
-- if you want to see how the groupings work:
-- SELECT * FROM groupedWithConcats
-- otherwise just run this query to see the counts of "changes":
SELECT
COUNT(DISTINCT AptCombo)-1 as CountAptChangesWithConcat_Correct,
COUNT(DISTINCT AptGroup)-1 as CountAptChangesWithoutConcat_Wrong,
COUNT(DISTINCT FloorCombo)-1 as CountFloorChangesWithConcat_Correct,
COUNT(DISTINCT FloorGroup)-1 as CountFloorChangesWithoutConcat_Wrong
FROM groupedWithConcats;
ALTERNATIVE ANSWER
The accepted-answer may eventually get updated to remove the mistake. If that happens I can remove my warning but I still want leave you with this alternative way to produce the answer.
My approach goes like this: "check the previous row, if the value is different in previous row vs current row, then there is a change". SQL doesn't have idea or row order functions per se (at least not like in Excel for example; )
Instead, SQL has window functions. With SQL's window functions, you can use the window function RANK plus a self-JOIN technique as seen here to combine current row values and previous row values so you can compare them. Here is a db<>fiddle showing my approach, which I pasted below.
The intermediate table, showing the columns which has a value 1 if there is a change, 0 otherwise (i.e. FloorChange, AptChange), is shown at the bottom of the post...
DDL:
...same as above...
DML:
;
WITH rowNumbered AS (
SELECT
*,
ROW_NUMBER() OVER ( ORDER BY simTime) as RN
FROM crewWork
)
,joinedOnItself AS (
SELECT
rowNumbered.*,
rowNumberedRowShift.FloorNumber as FloorShift,
rowNumberedRowShift.AptNumber as AptShift,
CASE WHEN rowNumbered.FloorNumber <> rowNumberedRowShift.FloorNumber THEN 1 ELSE 0 END as FloorChange,
CASE WHEN rowNumbered.AptNumber <> rowNumberedRowShift.AptNumber THEN 1 ELSE 0 END as AptChange
FROM rowNumbered
LEFT OUTER JOIN rowNumbered as rowNumberedRowShift
ON rowNumbered.RN = (rowNumberedRowShift.RN+1)
)
-- if you want to see:
-- SELECT * FROM joinedOnItself;
SELECT
SUM(FloorChange) as FloorChanges,
SUM(AptChange) as AptChanges
FROM joinedOnItself;
Below see the first few rows of the intermediate table (joinedOnItself). This shows how my approach works. Note the last two columns, which have a value of 1 when there is a change in FloorNumber compared to FloorShift (noted in FloorChange), or a change in AptNumber compared to AptShift (noted in AptChange).
floornumber
aptnumber
worktype
simtime
rn
floorshift
aptshift
floorchange
aptchange
1
1
12
10
1
0
0
2
1
12
25
2
1
1
1
0
1
1
13
35
3
2
1
1
0
1
1
13
47
4
1
1
0
0
1
2
12
52
5
1
1
0
1
1
2
12
59
6
1
2
0
0
1
2
13
68
7
1
2
0
0
Note instead of using the window function RANK and JOIN, you could use the window function LAG to compare values in the current row to the previous row directly (no need to JOIN). I don't have that solution here, but it is described in the Wikipedia article example:
Window functions allow access to data in the records right before and after the current record.
If I am not missing anything, you could use the following method to find the number of changes:
determine groups of sequential rows with identical values;
count those groups;
subtract 1.
Apply the method individually for AptNumber and for FloorNumber.
The groups could be determined like in this answer, only there's isn't a Seq column in your case. Instead, another ROW_NUMBER() expression could be used. Here's an approximate solution:
;
WITH marked AS (
SELECT
FloorGroup = ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY FloorNumber ORDER BY simTime),
AptGroup = ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime)
FROM crewWork
)
SELECT
FloorChanges = COUNT(DISTINCT FloorGroup) - 1,
AptChanges = COUNT(DISTINCT AptGroup) - 1
FROM marked
(I'm assuming here that the simTime column defines the timeline of changes.)
UPDATE
Below is a table that shows how the distinct groups are obtained for AptNumber.
AptNumber RN RN_Apt AptGroup (= RN - RN_Apt)
--------- -- ------ ---------
1 1 1 0
1 2 2 0
1 3 3 0
1 4 4 0
2 5 1 4
2 6 2 4
2 7 3 4
1 8 5 => 3
4 9 1 8
4 10 2 8
4 11 3 8
4 12 4 8
3 13 1 12
3 14 2 12
3 15 3 12
1 16 6 10
… … … …
Here RN is a pseudo-column that stands for ROW_NUMBER() OVER (ORDER BY simTime). You can see that this is just a sequence of rankings starting from 1.
Another pseudo-column, RN_Apt contains values produces by the other ROW_NUMBER, namely ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime). It contains rankings within individual groups of identical AptNumber values. You can see that, for a newly encountered value, the sequence starts over, and for a recurring one, it continues where it stopped last time.
You can also see from the table that if we subtract RN from RN_Apt (could be the other way round, doesn't matter in this situation), we get the value that uniquely identifies every distinct group of same AptNumber values. You might as well call that value a group ID.
So, now that we've got these IDs, it only remains for us to count them (count distinct values, of course). That will be the number of groups, and the number of changes is one less (assuming the first group is not counted as a change).
add an extra column changecount
CREATE TABLE crewWork(
FloorNumber int, AptNumber int, WorkType int, simTime int ,changecount int)
increment changecount value for each updation
if want to know count for each field then add columns corresponding to it for changecount
Assuming that each record represents a different change, you can find changes per floor by:
select FloorNumber, count(*)
from crewWork
group by FloorNumber
And changes per apartment (assuming AptNumber uniquely identifies apartment) by:
select AptNumber, count(*)
from crewWork
group by AptNumber
Or (assuming AptNumber and FloorNumber together uniquely identifies apartment) by:
select FloorNumber, AptNumber, count(*)
from crewWork
group by FloorNumber, AptNumber