How to get a correct row_number reset using row_number()? - sql

Here's my Fiddle.
Requirement: Every time Entry Form changes, then reset numbering on new_form_line_no
The last column New_form_line_no correctly resets as expected on Line_number=7.
But, also want it to reset on line #3 because Entry_form changes from PR to OM.
Where should I make my correction to get the following results?
20 1 1 R OM 1
20 2 1 N PR 1
20 3 2 R OM 1 --This should reset to 1
20 4 3 A OM 2
20 5 4 2 OM 3
20 6 5 P OM 4
20 7 47 S OL 1
20 8 48 A OL 2
20 9 49 T OL 3
20 10 50 2 OL 4
20 11 51 T OL 5
20 12 52 L OL 6
20 13 53 S OL 7
20 14 54 O OL 8

The problem here is that the first two records can appear in either order since they both have the same values in columns used in ORDER BY clause. If you don't mind that, or you have another way to determine which should be first (and you can alter the ORDER BY in analytic functions below), you can try following solution:
SELECT
data.*,
row_number() OVER (PARTITION BY entry_id, entry_form, same_as_prev_or_next
ORDER BY entry_seq) AS new_form_line_no_2
FROM (
SELECT
entry_id,
row_number() OVER (ORDER BY entry_id, entry_seq) AS line_number,
entry_seq,
entry_text,
entry_form,
CASE
WHEN lag(entry_form, 1, 0) OVER (PARTITION BY entry_id ORDER BY entry_seq) = entry_form
OR lead(entry_form, 1, 0) OVER (PARTITION BY entry_id ORDER BY entry_seq) = entry_form THEN 1
ELSE 0
END AS same_as_prev_or_next
FROM entrants
) data
ORDER BY entry_seq
;
It does not return what you expect, but it is due to the fact I mentioned - the order of the first two rows is inconclusive.
SQLFiddle: http://sqlfiddle.com/#!4/abb58/18

Related

calculate avg(value) for last 10 records postgresql

i have a tricky task,
lets assume we have table "Racings", and there we have columns TRACK, CAR, CIRCLE_TIME
here is an example how data could be look like:
id
track
car
circle_time
10
1
10
15
9
1
10
14
8
1
10
16
7
1
10
15
6
1
10
13
5
2
10
7
4
2
10
4
3
2
10
5
2
3
10
8
1
3
10
10
what i need, i to add one more coumn like avg3_circle_time which will show me an average time from last 3 circle_time from each track, example:
id
track
car
circle_time
avg3_circle_time
10
1
10
15
15
9
1
10
14
15
8
1
10
16
14.6
7
1
10
15
null
6
1
10
13
null
5
2
10
7
5.3
4
2
10
4
null
3
2
10
5
null
2
3
10
8
null
1
3
10
10
null
I know how it could works in oracle, you could use something like rowid, but in case of postgresql i don't know, i have a draft like .....avg(circle_time) OVER(PARTITION BY track,car.....) as avg3_circle_time..... help me to solve that task please
You can use window functions to calculate moving averages:
SELECT track, id, car, circle_time, AVG(circle_time) OVER (
PARTITION BY track
ORDER BY id
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
)
FROM t
ORDER BY track, id
Depending on your definition of previous three, the window could be ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING.
If you want only values when at least 3 circles available
select *
, case when lag(id, 2) over(partition by TRACK, CAR order by id) is not null then
avg(CIRCLE_TIME) over(partition by TRACK, CAR order by id rows between 2 preceding and current row) end a
from Racing
order by id desc;
db<>fiddle
Output
id track car circle_time a
10 1 10 15 15.0000000000000000
9 1 10 14 15.0000000000000000
8 1 10 16 14.6666666666666667
7 1 10 15 null
6 1 10 13 null
5 2 10 7 5.3333333333333333
4 2 10 4 null
3 2 10 5 null
2 3 10 8 null
1 3 10 10 null
Use LAED() then checking one of the next 2 rows is NULL or not. THEN sum of three values for calculating average.
-- PostgreSQL
SELECT *
, CASE WHEN next_circle_time IS NULL OR next_next_circle_time IS NULL
THEN NULL
ELSE ((t.circle_time + COALESCE(next_circle_time, 0) + COALESCE(next_next_circle_time, 0)) / 3 :: DECIMAL) :: DECIMAL(10, 1)
END avg_circle_time
FROM (SELECT *
, LEAD(circle_time, 1) OVER (PARTITION BY track ORDER BY id DESC) next_circle_time
, LEAD(circle_time, 2) OVER (PARTITION BY track ORDER BY id DESC) next_next_circle_time
FROM Racings) t
Another way Use AVG()
SELECT *
, CASE WHEN LEAD(circle_time, 2) OVER (PARTITION BY track ORDER BY id DESC) IS NULL
OR LEAD(circle_time, 1) OVER (PARTITION BY track ORDER BY id DESC) IS NULL
THEN NULL
ELSE AVG(circle_time) OVER (PARTITION BY track ORDER BY id DESC ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING)
END :: DECIMAL(10, 2) avg_circle_time
FROM Racings
Please check from url where both query exists https://dbfiddle.uk/?rdbms=postgres_11&fiddle=f0cd868623725a1b92bf988cfb2deba3
Several of the posted answers end up repeating the window definition. You can avoid this with the window clause:
select *,
case when row_number() over(track_window) > 2
then trunc(avg(CIRCLE_TIME) over(track_window rows 2 preceding), 1)
end a
from Racing
window track_window as (partition by track order by id)
order by id desc
Note how, in this sample, track_window is defined once, then reused for both row_number and avg. In the latter case, the window clause is embellished with a frame as well (rows 2 preceding).

MSSQL choose top 2 ranked for each record

Hi I have a table that looks like this
StudentID
ParentID
Rank
1
11
1
1
15
5
1
16
6
2
21
1
2
22
2
3
31
1
3
37
7
3
38
8
4
41
1
4
42
2
So I want to pull only the top 2 ranks per student the out come will look like:
StudentID
ParentID
Rank
1
11
1
1
15
5
2
21
1
2
22
2
3
31
1
3
37
7
4
41
1
4
42
2
What would be the best way to do this? What makes it complicated is that every student has a parent ranked 1 but not every student has a parent ranked 2. What sql statement would I use to pull just the next ranked parent after 1?
The canonical solution is to use row_number():
select t.*
from (select t.*,
row_number() over (partition by studentid order by rank) as seqnum
from t
) t
where seqnum <= 2;
Under some circumstances, it might be faster to use:
select t.*
from t
where t.rank in (select top (2) t2.rank
from t t2
where t2.studentid = t.studentid
order by rank asc
) ;
I think this might have better performance if there were few students that had lots of ranks per student and you had an index on studentid, rank.

Delete rows, which are duplicated and follow each other consequently

It's hard to formulate, so i'll just show an example and you are welcome to edit my question and title.
Suppose, i have a table
flag id value datetime
0 b 1 343 13
1 a 1 23 12
2 b 1 21 11
3 b 1 32 10
4 c 2 43 11
5 d 2 43 10
6 d 2 32 9
7 c 2 1 8
For each id i want to squeze the table by flag columns such that all duplicate flag values that follow each other collapse to one row with sum aggregation. Desired result:
flag id value
0 b 1 343
1 a 1 23
2 b 1 53
3 c 2 75
4 d 2 32
5 c 2 1
P.S: I found functions like CONDITIONAL_CHANGE_EVENT, which seem to be able to do that, but the examples of them in docs dont work for me
Use the differnece of row number approach to assign groups based on consecutive row flags being the same. Thereafter use a running sum.
select distinct id,flag,sum(value) over(partition by id,grp) as finalvalue
from (
select t.*,row_number() over(partition by id order by datetime)-row_number() over(partition by id,flag order by datetime) as grp
from tbl t
) t
Here's an approach which uses CONDITIONAL_CHANGE_EVENT:
select
flag,
id,
sum(value) value
from (
select
conditional_change_event(flag) over (order by datetime desc) part,
flag,
id,
value
from so
) t
group by part, flag, id
order by part;
The result is different from your desired result stated in the question because of order by datetime. Adding a separate column for the row number and sorting on that gives the correct result.

Add order within group and mark whether a row is the last in its group

I have a table in an SQL Server database on the following form, sorted according to id.
id group
1 10
17 10
24 10
2 20
16 20
72 20
104 20
8 30
9 30
I would like to select every row grouped according to the row group and add the following information to this table: the order (as sorted) within the group and whether the row is the last row in the group. In other words, something similar to this:
id group order last
1 10 1 0
17 10 2 0
24 10 3 1
2 20 1 0
16 20 2 0
72 20 3 0
104 20 4 1
8 30 1 0
9 30 2 1
I've tried fiddling around with ROW_NUMBER, but I'm not all that experienced with SQL Server and I can't get it to work. Does anyone have a suggestion?
Use ROW_NUMBER window function
select id,[group],
row_number()over(partition by [group] order by id) as [order],
case when row_number()over(partition by [group] order by id desc) = 1 then 1 else 0 end as Last
From yourtable

How to track how many times a column changed its value?

I have a table called crewWork as follows :
CREATE TABLE crewWork(
FloorNumber int, AptNumber int, WorkType int, simTime int )
After the table was populated, I need to know how many times a change in apt occurred and how many times a change in floor occurred. Usually I expect to find 10 rows on each apt and 40-50 on each floor.
I could just write a scalar function for that, but I was wondering if there's any way to do that in t-SQL without having to write scalar functions.
Thanks
The data will look like this:
FloorNumber AptNumber WorkType simTime
1 1 12 10
1 1 12 25
1 1 13 35
1 1 13 47
1 2 12 52
1 2 12 59
1 2 13 68
1 1 14 75
1 4 12 79
1 4 12 89
1 4 13 92
1 4 14 105
1 3 12 115
1 3 13 129
1 3 14 138
2 1 12 142
2 1 12 150
2 1 14 168
2 1 14 171
2 3 12 180
2 3 13 190
2 3 13 200
2 3 14 205
3 3 14 216
3 4 12 228
3 4 12 231
3 4 14 249
3 4 13 260
3 1 12 280
3 1 13 295
2 1 14 315
2 2 12 328
2 2 14 346
I need the information for a report, I don't need to store it anywhere.
If you use the accepted answer as written now (1/6/2023), you get correct results with the OP dataset, but I think you can get wrong results with other data.
CONFIRMED: ACCEPTED ANSWER HAS A MISTAKE (as of 1/6/2023)
I explain the potential for wrong results in my comments on the accepted answer.
In this db<>fiddle, I demonstrate the wrong results. I use a slightly modified form of accepted answer (my syntax works in SQL Server and PostgreSQL). I use a slightly modified form of the OP's data (I change two rows). I demonstrate how the accepted answer can be changed slightly, to produce correct results.
The accepted answer is clever but needs a small change to produce correct results (as demonstrated in the above db<>fiddle and described here:
Instead of doing this as seen in the accepted answer COUNT(DISTINCT AptGroup)...
You should do thisCOUNT(DISTINCT CONCAT(AptGroup, '_', AptNumber))...
DDL:
SELECT * INTO crewWork FROM (VALUES
-- data from question, with a couple changes to demonstrate problems with the accepted answer
-- https://stackoverflow.com/q/8666295/1175496
--FloorNumber AptNumber WorkType simTime
(1, 1, 12, 10 ),
-- (1, 1, 12, 25 ), -- original
(2, 1, 12, 25 ), -- new, changing FloorNumber 1->2->1
(1, 1, 13, 35 ),
(1, 1, 13, 47 ),
(1, 2, 12, 52 ),
(1, 2, 12, 59 ),
(1, 2, 13, 68 ),
(1, 1, 14, 75 ),
(1, 4, 12, 79 ),
-- (1, 4, 12, 89 ), -- original
(1, 1, 12, 89 ), -- new , changing AptNumber 4->1->4 ges)
(1, 4, 13, 92 ),
(1, 4, 14, 105 ),
(1, 3, 12, 115 ),
...
DML:
;
WITH groupedWithConcats as (SELECT
*,
CONCAT(AptGroup,'_', AptNumber) as AptCombo,
CONCAT(FloorGroup,'_',FloorNumber) as FloorCombo
-- SQL SERVER doesnt have TEMPORARY keyword; Postgres doesn't understand # for temp tables
-- INTO TEMPORARY groupedWithConcats
FROM
(
SELECT
-- the columns shown in Andriy's answer:
-- https://stackoverflow.com/a/8667477/1175496
ROW_NUMBER() OVER ( ORDER BY simTime) as RN,
-- AptNumber
AptNumber,
ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime) as RN_Apt,
ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime) as AptGroup,
-- FloorNumber
FloorNumber,
ROW_NUMBER() OVER (PARTITION BY FloorNumber ORDER BY simTime) as RN_Floor,
ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY FloorNumber ORDER BY simTime) as FloorGroup
FROM crewWork
) grouped
)
-- if you want to see how the groupings work:
-- SELECT * FROM groupedWithConcats
-- otherwise just run this query to see the counts of "changes":
SELECT
COUNT(DISTINCT AptCombo)-1 as CountAptChangesWithConcat_Correct,
COUNT(DISTINCT AptGroup)-1 as CountAptChangesWithoutConcat_Wrong,
COUNT(DISTINCT FloorCombo)-1 as CountFloorChangesWithConcat_Correct,
COUNT(DISTINCT FloorGroup)-1 as CountFloorChangesWithoutConcat_Wrong
FROM groupedWithConcats;
ALTERNATIVE ANSWER
The accepted-answer may eventually get updated to remove the mistake. If that happens I can remove my warning but I still want leave you with this alternative way to produce the answer.
My approach goes like this: "check the previous row, if the value is different in previous row vs current row, then there is a change". SQL doesn't have idea or row order functions per se (at least not like in Excel for example; )
Instead, SQL has window functions. With SQL's window functions, you can use the window function RANK plus a self-JOIN technique as seen here to combine current row values and previous row values so you can compare them. Here is a db<>fiddle showing my approach, which I pasted below.
The intermediate table, showing the columns which has a value 1 if there is a change, 0 otherwise (i.e. FloorChange, AptChange), is shown at the bottom of the post...
DDL:
...same as above...
DML:
;
WITH rowNumbered AS (
SELECT
*,
ROW_NUMBER() OVER ( ORDER BY simTime) as RN
FROM crewWork
)
,joinedOnItself AS (
SELECT
rowNumbered.*,
rowNumberedRowShift.FloorNumber as FloorShift,
rowNumberedRowShift.AptNumber as AptShift,
CASE WHEN rowNumbered.FloorNumber <> rowNumberedRowShift.FloorNumber THEN 1 ELSE 0 END as FloorChange,
CASE WHEN rowNumbered.AptNumber <> rowNumberedRowShift.AptNumber THEN 1 ELSE 0 END as AptChange
FROM rowNumbered
LEFT OUTER JOIN rowNumbered as rowNumberedRowShift
ON rowNumbered.RN = (rowNumberedRowShift.RN+1)
)
-- if you want to see:
-- SELECT * FROM joinedOnItself;
SELECT
SUM(FloorChange) as FloorChanges,
SUM(AptChange) as AptChanges
FROM joinedOnItself;
Below see the first few rows of the intermediate table (joinedOnItself). This shows how my approach works. Note the last two columns, which have a value of 1 when there is a change in FloorNumber compared to FloorShift (noted in FloorChange), or a change in AptNumber compared to AptShift (noted in AptChange).
floornumber
aptnumber
worktype
simtime
rn
floorshift
aptshift
floorchange
aptchange
1
1
12
10
1
0
0
2
1
12
25
2
1
1
1
0
1
1
13
35
3
2
1
1
0
1
1
13
47
4
1
1
0
0
1
2
12
52
5
1
1
0
1
1
2
12
59
6
1
2
0
0
1
2
13
68
7
1
2
0
0
Note instead of using the window function RANK and JOIN, you could use the window function LAG to compare values in the current row to the previous row directly (no need to JOIN). I don't have that solution here, but it is described in the Wikipedia article example:
Window functions allow access to data in the records right before and after the current record.
If I am not missing anything, you could use the following method to find the number of changes:
determine groups of sequential rows with identical values;
count those groups;
subtract 1.
Apply the method individually for AptNumber and for FloorNumber.
The groups could be determined like in this answer, only there's isn't a Seq column in your case. Instead, another ROW_NUMBER() expression could be used. Here's an approximate solution:
;
WITH marked AS (
SELECT
FloorGroup = ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY FloorNumber ORDER BY simTime),
AptGroup = ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime)
FROM crewWork
)
SELECT
FloorChanges = COUNT(DISTINCT FloorGroup) - 1,
AptChanges = COUNT(DISTINCT AptGroup) - 1
FROM marked
(I'm assuming here that the simTime column defines the timeline of changes.)
UPDATE
Below is a table that shows how the distinct groups are obtained for AptNumber.
AptNumber RN RN_Apt AptGroup (= RN - RN_Apt)
--------- -- ------ ---------
1 1 1 0
1 2 2 0
1 3 3 0
1 4 4 0
2 5 1 4
2 6 2 4
2 7 3 4
1 8 5 => 3
4 9 1 8
4 10 2 8
4 11 3 8
4 12 4 8
3 13 1 12
3 14 2 12
3 15 3 12
1 16 6 10
… … … …
Here RN is a pseudo-column that stands for ROW_NUMBER() OVER (ORDER BY simTime). You can see that this is just a sequence of rankings starting from 1.
Another pseudo-column, RN_Apt contains values produces by the other ROW_NUMBER, namely ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime). It contains rankings within individual groups of identical AptNumber values. You can see that, for a newly encountered value, the sequence starts over, and for a recurring one, it continues where it stopped last time.
You can also see from the table that if we subtract RN from RN_Apt (could be the other way round, doesn't matter in this situation), we get the value that uniquely identifies every distinct group of same AptNumber values. You might as well call that value a group ID.
So, now that we've got these IDs, it only remains for us to count them (count distinct values, of course). That will be the number of groups, and the number of changes is one less (assuming the first group is not counted as a change).
add an extra column changecount
CREATE TABLE crewWork(
FloorNumber int, AptNumber int, WorkType int, simTime int ,changecount int)
increment changecount value for each updation
if want to know count for each field then add columns corresponding to it for changecount
Assuming that each record represents a different change, you can find changes per floor by:
select FloorNumber, count(*)
from crewWork
group by FloorNumber
And changes per apartment (assuming AptNumber uniquely identifies apartment) by:
select AptNumber, count(*)
from crewWork
group by AptNumber
Or (assuming AptNumber and FloorNumber together uniquely identifies apartment) by:
select FloorNumber, AptNumber, count(*)
from crewWork
group by FloorNumber, AptNumber