Count similar names block per independent partitions - sql

I have a dataframe that looks like this:
id name datetime
44 once 2022-11-22T15:41:00
44 once 2022-11-22T15:42:00
44 once 2022-11-22T15:43:00
44 twice 2022-11-22T15:44:00
44 once 2022-11-22T16:41:00
55 thrice 2022-11-22T17:44:00
55 thrice 2022-11-22T17:46:00
55 once 2022-11-22T17:47:00
55 once 2022-11-22T17:51:00
55 twice 2022-11-22T18:41:00
55 thrice 2022-11-22T18:51:00
My desired output is
id name datetime cnt
44 once 2022-11-22T15:41:00 3
44 once 2022-11-22T15:42:00 3
44 once 2022-11-22T15:43:00 3
44 twice 2022-11-22T15:44:00 1
44 once 2022-11-22T16:41:00 1
55 thrice 2022-11-22T17:44:00 2
55 thrice 2022-11-22T17:46:00 2
55 once 2022-11-22T17:47:00 2
55 once 2022-11-22T17:51:00 2
55 twice 2022-11-22T18:41:00 1
55 thrice 2022-11-22T18:51:00 1
where the new column, cnt, is the maximum count of the name column per block that they follow themselves consecutively.
I attempted the problem by doing:
select
id,
name,
datetime,
row_number() over (partition by id order by datetime) rn1,
row_number() over (partition by id, name order by name, datetime) rn2
from table
but it is obviously not giving the desired output.
I tried also looking at the solutions in SQL count consecutive days but could not figure out from answers given there.

As noted in the question you linked to, this is a typical gaps & islands problem.
The solution is provided in the answers to that question, but I've applied to your sample data specifically for you here:
with gp as (
select *,
Row_Number() over(partition by id order by [datetime])
- Row_Number() over(partition by id, name order by [datetime]) g
from t
)
select id, name, [datetime],
Count(*) over(partition by id, name, g) cnt
from gp;
See Demo DBFiddle

Related

Get sum qty over specific range

I have the below table
substring(area,6,3)
qty
101
10
103
15
102
11
104
30
105
25
107
17
108
23
106
48
And I am looking to get a result as below without repeating the IIF ( as it's a cumulative of 4 sequences) in the area:
new_area(substring(area,6,3)
sum_qty
101-104
66
105-108
117
I don't know how to create the new area column to be able to get the sum qty
Looking forward to your help.
Please also add an explanation so I will understand how the query is running.
I think this is what you are looking for.
We just use the window function row_number() to create the Grp
NOTE: If you have repeating values in AREA use dense_rank() instead of row_number()
Example
Select new_area = concat(min(area),'-',max(area))
,qty = sum(qty)
From (
Select area=substring(area,6,3)
,qty
,Grp = (row_number() over (order by substring(area,6,3))-1) / 4
From YourTable
) A
Group By Grp
Results
new_area qty
101-104 66
105-108 113 -- get different results
If you were to run the subquery, you would see the following.
Then it becomes a small matter to aggregate the data grouped by the created column GRP

Can't get the cumulative sum(running total) within a group in SQL Server

I'm trying to get a running total within a group but my current code just gives me an aggregate sum.
For example, my data looks like this
ID ShiftNum Status Type Rate HourlyWage Hours Total_Amount
12542 1 Full A 1 12.5 40 500
12542 1 Full A 1 12.5 35 420
12542 2 Full A 1 10 40 400
12542 2 Full B 1.2 10 40 480
17842 1 Full A 1 11 27 297
17842 1 Full B 1.3 11 30 429
And what I want is a running total within the same ID, Shift Number, and Status. For example, I want something like this as my final result
ID ShiftNum Status Type Rate HourlyWage Hours Total_Amount Running_Tot
12542 1 Full A 1 12.5 40 500 500
12542 1 Full A 1 12.5 35 420 920
12542 2 Full A 1 10 40 400 400
12542 2 Full B 1.2 10 40 480 880
17842 1 Full A 1 11 27 297 297
17842 1 Full B 1.3 11 30 429 726
However, my current code just gives me the total sum within each group. For example, 920, 920 for row 1&2. Here's my code.
Select a.*,
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status ORDER BY ID, ShiftNum, Status) as Runnint_Tot
from table a
How do I fix my code to get the final result I want?
You need an ordering column that uniquely defines each row. There is not an obvious one in your row, but something like this:
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status ORDER BY hours) as Running_Tot
Or:
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status
ORDER BY (SELECT NULL)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as Running_Tot
The problem you are facing is because the ORDER BY keys have ties. The default window frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Note the RANGE. That means that all rows with ties are combined.
Also note that there is no utility to including the PARTITION BY keys in the ORDER BY (well . . . there is one exception in SQL Server if you don't care about the ordering, then including a key can be a handy short-cut). The ordering occurs within a partition.
If your rows can have exact duplicates, I would first suggest that you add a primary key. But, in the meantime, you could use:
with a as (
select a.*,
row_number() over (order by id, shiftnum, status) as seqnum
from tablea a
)
Select a.*,
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status ORDER BY seqnum) as Running_Tot
from a;
The ordering will be arbitrary, but it will at least accumulate.

PostgreSQL - Show the last N rows for each sorted group

With respect to the following posts:
Retrieving the last record in each group - MySQL
Grouped LIMIT in PostgreSQL: show the first N rows for each group?
Postgresql limit by N groups
I wrote a query to find the latest 3 entries of the last 3 groups of log events partitioned by day with a maximum of 9 total entries and I managed to gather the following data from a postgresql log table:
The query I used to get them is the following:
SELECT
*
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY created_at::date ORDER BY created_at::time DESC) AS row_number,
DENSE_RANK() OVER (ORDER BY created_at::date DESC) AS group_number,
l.*
FROM
logs l
WHERE
account_id = 1) subquery
WHERE
subquery.row_number <= 3
AND group_number <= 3
LIMIT 9;
However I'm missing one last step: The results are "grouped" by day in descending order (which is good) but within each group the ordering by time doesn't seem to work.
Effectively the expected order should be (displaying only each row's id):
| EXISTING ROW ORDERING | EXPECTED ROW ORDERING |
| --------------------- | --------------------- |
52 56
53 53
56 52
46 48
48 47
47 46
30 30
31 31
32 32
Any ideas? Thanks.
If you want the data in a particular order, then you need to have an order by. SQL tables and result sets represent unordered sets. The only exception is when the outermost query has an order by.
So:
order by created_at desc

Appropriate ranking

I have this kind of data:
contract nr startdate
1 12 01-01-2000
1 12 03-01-2000
1 22 07-01-2000
2 77 12-04-2001
2 78 17-04-2001
My simple goal here is to rank each number within a specific contract, taking into account the start date. The output should look like this:
contract nr startdate my_rank
1 12 01-01-2000 1
1 12 03-01-2000 1
1 22 07-01-2000 2
2 77 12-04-2001 1
2 78 17-04-2001 2
I have tried almost all possible combinations, but couldn't figure it out.
select dense_rank() over
(partition by contract order by nr) as my_rank,* from my_data
The above is close enough, problem is that in some cases a 1 is assigned to the most recent contract, in other cases it is assigned to the least recent (?).
Any hint?
Your ranking is by nr.
If you want to rank by the contract date, you need to incorporate that. But these are different by contract. So, that requires an additional calculation:
select dense_rank() over (partition by contract order by min_startdate) as my_rank,
d.*
from (select d.*,
min(startdate) over (partition by contract, nr) as min_startdate
from my_data d
) d;
I don't know if you want the min() or max() of the start date for your ordering.

Max 3 values of every group

I sorted my data to something like this and now I want to get top 3 max person_occurence by every company but I couldn't figure out how to do it.
person_occurence company person_id
67 company_1 110
66 company_2 176
64 company_3 100
64 company_3 196
63 company_4 127
62 company_1 150
61 company_5 120
60 company_3 140
59 company_5 154
59 company_5 162
59 company_4 194
58 company_4 109
58 company_3 128
58 company_1 156
I used this query to get max of every company but can't get top 3 max person_occurence
SELECT max(agent_occurence), company FROM table GROUP BY company;
Use window functions:
select t.*
from (select t.*,
row_number() over (partition by company order by person_occurrence desc) as seqnum
from t
) t
where seqnum <= 3;
Now, this assumes that you want the top 3 regardless of ties -- that is, if 4 are tied with the same highest value, this returns three of them. Ties make things more difficult. You may want dense_rank() or rank() instead.
You can use correlated subquery :
SELECT t1.*
FROM table t1
WHERE t1.person_occurencee = (SELECT max(t2.person_occurence) FROM table t2 WHERE t1.company = t2.company);
If your DBMS supports analytical function then you can also do :
select t.*
from (select t.*,
rank() over (partition by company order by person_occurrence desc) as seq
from t
) t
where seq <= 3;