Count consecutive ocurrences SQL - PostgreSQL - sql

I am trying to count the number of consecutive weeks an employee went to work. So I have this table that has whether jon or andy went to work on certain weeks (I have all week of the year).
I am trying on Postgresql
What I would like know the number of times each person went consecutively to work x number of weeks.
So the way the below is read is that Andy went twice two consecutive weeks.
I feel like I am close. On python I could use a for loop probably, but on Postgresql I am a bit lost.
Thanks!

We group each amount of consecutive weeks worked per person and then group by the result and the person.
select person
,consecutive_weeks
,count(*)/consecutive_weeks as times
from (
select person
,sum(case when "went to work?" = 1 then 1 end) over(partition by person, grp) as consecutive_weeks
from (
select *
,count(mrk) over(partition by person order by week) as grp
from (
select *
,case when "went to work?" <> lag("went to work?") over(partition by person order by week) then 1 end as mrk
from t
) t
) t
) t
where consecutive_weeks is not null
group by person, consecutive_weeks
order by person
person
consecutive_weeks
times
andy
2
2
john
3
1
john
2
1
Fiddle

You can find groups of weeks where a person was present, assigning a running id to each row of the group, and then apply a count on the results, performing a group by on the id:
with cte as (
select t3.person, t3.k, count(*) c from
(select t.*, (select sum((t1.person = t.person and t1.week <= t.week and t1.at_work = 0)::int) k from tbl t1)
from tbl t) t3
where t3.at_work != 0 group by t3.person, t3.k
)
select c.person, c.c, count(*) c1 from cte c group by c.person, c.c order by c1
See fiddle.

Related

calculate 2 cumulative sums for 2 different groups

i have a table that looks like this:
id position value
5 senior 10000
6 senior 20000
8 senior 30000
9 junior 5000
4 junior 7000
3 junior 10000
It is sorted by position and value (asc) already. I want to calculate the number of seniors and juniors that can fit in a budget of 50,000 such that preference is given to seniors.
So for example, here 2 seniors (first and second) + 3 juniors can fit in the budget of 50,000.
id position value cum_sum
5 senior 10000 10000
6 senior 20000 30000
8 senior 30000 60000 ----not possible because it is more than 50000
----------------------------------- --- so out of 50k, 30k is used for 2 seniors.
9 junior 5000 5000
4 junior 7000 12000
1 junior 7000 19000 ---with the remaining 20k, these 3 juniors can also fit
3 junior 10000 29000
so the output should look like this:
juniors seniors
3 2
how can i achieve this in sql?
Here's one possible solution: DB Fiddle
with seniorsCte as (
select id, position, value, total
from budget b
inner join (
select id, position, value, (sum(value) over (order by value, id)) total
from people
where position = 'senior'
) as s
on s.total <= b.amount
)
, juniorsCte as (
select j.id, j.position, j.value, j.total + r.seniorsTotal
from (
select coalesce(max(total), 0) seniorsTotal
, max(b.amount) - coalesce(max(total), 0) remainingAmount
from budget b
cross join seniorsCte
) as r
inner join (
select id, position, value, (sum(value) over (order by value, id)) total
from people
where position = 'junior'
) as j
on j.total <= r.remainingAmount
)
/* use this if you want the specific records
select *
from seniorsCte
union all
select *
from juniorsCte
*/
select (select count(1) from seniorsCte) seniors
, (select count(1) from juniorsCte) juniors
From your question I suspect you're familiar with window functions; but in case not; the below query pulls back all rows from the people table where the position is senior, and creates a column, total which is our cumulative total of the value of the rows returned, starting with the lowest value, ascending (then sorting by id to ensure consistent behaviour if there's multiple rows with the same value; though that's not strictly required if we're happy to get those in an arbitrary order).
select id, position, value, (sum(value) over (order by value, id)) total
from people
where position = 'senior'
The budget table I just use to hold a single row/value saying what our cutoff is; i.e. this avoids hardcoding the 50k value you mentioned, so we can easily amend it as required.
The common table expressions (CTEs) I've used to allow us to filter our juniors subquery based on the output of our seniors subquery (i.e. as we only want those juniors up to the difference between the budget and the senior's total), whilst allowing us to return the results of juniors and seniors independently (i.e. if we wanted to return the actual rows, rather than just totals, this allows us to perform a union all between the two sets; as demonstrated in the commented out code.
For it to work, the sum has to be not only cumulative, but also selective. As mentioned in the comment, you can achieve that with a recursive cte: online demo
with recursive
ordered as --this will be fed to the actual recursive cte
( select *,
row_number() over (order by position desc,value asc)
from test_table)
,recursive_cte as
( select id,
position,
value,
value*(value<50000)::int as cum_sum,
value<50000 as is_hired,
2 as next_i
from ordered
where row_number=1
union
select o.id,
o.position,
o.value,
case when o.value+r.cum_sum<50000 then o.value+r.cum_sum else r.cum_sum end,
(o.value+r.cum_sum)<50000 as is_hired,
r.next_i+1 as next_i
from recursive_cte r,
ordered o
where o.row_number=next_i
)
select count(*) filter (where position='junior') as juniors,
count(*) filter (where position='senior') as seniors
from recursive_cte
where is_hired;
row_number() over () is a window function
count(*) filter (where...) is an aggregate filter. It's a faster variant of the sum(case when expr then a else 0 end) or count(nullif(expr)) approach, for when you only wish to sum a specific subset of values. That's just to put those in columns as you did in your expected result, but it could be done with a select position, count(*) from recursive_cte where is_hired group by position, stacked.
All it does is order your list according to your priorities in the first cte, then go through it row by row in the second one, collecting the cumulative sum, based on whether it's still below your limit/budget.
postgresql supports window SUM(col) OVER()
with cte as (
SELECT *, SUM(value) OVER(PARTITION BY position ORDER BY id) AS cumulative_sum
FROM mytable
)
select position, count(1)
from cte
where cumulative_sum < 50000
group by position
An other way to do it to get results in one row :
with cte as (
SELECT *, SUM(value) OVER(PARTITION BY position ORDER BY id) AS cumulative_sum
FROM mytable
),
cte2 as (
select position, count(1) as _count
from cte
where cumulative_sum < 50000
group by position
)
select
sum(case when position = 'junior' then _count else null end) juniors,
sum(case when position = 'senior' then _count else null end) seniors
from cte2
Demo here
This example of using a running total:
select
count(case when chek_sum_jun > 0 and position = 'junior' then position else null end) chek_jun,
count(case when chek_sum_sen > 0 and position = 'senior' then position else null end) chek_sen
from (
select position,
20000 - sum(case when position = 'junior' then value else 0 end) over (partition by position order by value asc rows between unbounded preceding and current row ) chek_sum_jun,
50000 - sum(case when position = 'senior' then value else 0 end) over (partition by position order by value asc rows between unbounded preceding and current row ) chek_sum_sen
from test_table) x
demo : https://dbfiddle.uk/ZgOoSzF0

SQL How to get Account ids that had a purchase gap of 3 months

I have the following table. I need to query ACCOUNT_ID where the ACCOUNT_ID has had at least a 3 month gap.
For example, ACCOUNT_ID 123 has the following. They have a gap of more than 3 months. How can I get this id?
I'm honestly super stuck here and have no idea what to do.
WITH LAST_PURCHASED AS (
SELECT
lp."ACCOUNT_ID",
MAX(lp."PO_DATE") AS "LAST_PUR_DATE",
SUM(lp."QTY") AS "TTL_QTY"
FROM A
AS lp
GROUP BY 1
)
SELECT
*
FROM A
INNER JOIN LAST_PURCHASED lp ON (A."ACCOUNT_ID" = lp."ACCOUNT_ID")
WHERE
lp."LAST_PURCHASED" >= DATEADD(MONTH,-3,A."PO_DATE")
LIMIT 10;
You can use windows function as follows:
select t.* from
(select t.*, max(diff) over (partition by t."ACCOUNT_ID") as max_diff from
(select t.*,
datediff(t."PO_DATE", lead(t."PO_DATE")
over (partition by t."ACCOUNT_ID"
order by t."PO_DATE") ) as diff
from A t) t ) t
where max_diff >= 3

Oracle ListaGG, Top 3 most frequent values, given in one column, grouped by ID

I have a problem regarding SQL query , it can be done in "plain" SQL, but as I am sure that I need to use some group concatenation (can't use MySQL) so second option is ORACLE dialect as there will be Oracle database. Let's say we have following entities:
Table: Veterinarian visits
Visit_Id,
Animal_id,
Veterinarian_id,
Sickness_code
Let's say there is 100 visits (100 visit_id) and each animal_id visits around 20 times.
I need to create a SELECT , grouped by Animal_id with 3 columns
animal_id
second shows aggregated amount of flu visits for this particular animal (let's say flu, sickness_code = 5)
3rd column shows top three sicknesses codes for each animal (top 3 most often codes for this particular animal_id)
How to do it? First and second columns are easy, but third? I know that I need to use LISTAGG from Oracle, OVER PARTITION BY, COUNT and RANK, I tried to tie it together but didn't work out as I expected :( How should this query look like?
Here sample data
create table VET as
select
rownum+1 Visit_Id,
mod(rownum+1,5) Animal_id,
cast(NULL as number) Veterinarian_id,
trunc(10*dbms_random.value)+1 Sickness_code
from dual
connect by level <=100;
Query
basically the subqueries do the following:
aggregate count and calculate flu count (in all records of the animal)
calculate RANK (if you need realy only 3 records use ROW_NUMBER - see discussion below)
Filter top 3 RANKs
LISTAGGregate result
with agg as (
select Animal_id, Sickness_code, count(*) cnt,
sum(case when SICKNESS_CODE = 5 then 1 else 0 end) over (partition by animal_id) as cnt_flu
from vet
group by Animal_id, Sickness_code
), agg2 as (
select ANIMAL_ID, SICKNESS_CODE, CNT, cnt_flu,
rank() OVER (PARTITION BY ANIMAL_ID ORDER BY cnt DESC) rnk
from agg
), agg3 as (
select ANIMAL_ID, SICKNESS_CODE, CNT, CNT_FLU, RNK
from agg2
where rnk <= 3
)
select
ANIMAL_ID, max(CNT_FLU) CNT_FLU,
LISTAGG(SICKNESS_CODE||'('||CNT||')', ', ') WITHIN GROUP (ORDER BY rnk) as cnt_lts
from agg3
group by ANIMAL_ID
order by 1;
gives
ANIMAL_ID CNT_FLU CNT_LTS
---------- ---------- ---------------------------------------------
0 1 6(5), 1(4), 9(3)
1 1 1(5), 3(4), 2(3), 8(3)
2 0 1(5), 10(3), 4(3), 6(3), 7(3)
3 1 5(4), 2(3), 4(3), 7(3)
4 1 2(5), 10(4), 1(2), 3(2), 5(2), 7(2), 8(2)
I intentionally show Sickness_code(count visits) to demonstarte that top 3 can have ties that you should handle.
Check the RANK function. Using ROW_NUMBER is not deterministic in this case.
I think the most natural way uses two levels of aggregation, along with a dash of window functions here and there:
select vas.animal,
sum(case when sickness_code = 5 then cnt else 0 end) as numflu,
listagg(case when seqnum <= 3 then sickness_code end, ',') within group (order by seqnum) as top3sicknesses
from (select animal, sickness_code, count(*) as cnt,
row_number() over (partition by animal order by count(*) desc) as seqnum
from visits
group by animal, sickness_code
) vas
group by vas.animal;
This uses the fact that listagg() ignores NULL values.

Which row has the highest value?

I have a table of election results for multiple nominees and polls. I need to determine which nominee had the most votes for each poll.
Here's a sample of the data in the table:
PollID NomineeID Votes
1 1 108
1 2 145
1 3 4
2 1 10
2 2 41
2 3 0
I'd appreciate any suggestions or help anyone can offer me.
This will match the highest, and will also bring back ties.
select sd.*
from sampleData sd
inner join (
select PollID, max(votes) as MaxVotes
from sampleData
group by PollID
) x on
sd.PollID = x.PollID and
sd.Votes = x.MaxVotes
SELECT
t.NomineeID,
t.PollID
FROM
( SELECT
NomineeID,
PollID,
RANK() OVER (PARTITION BY i.PollID ORDER BY i.Votes DESC) AS Rank
FROM SampleData i) t
WHERE
t.Rank = 1
SELECT PollID, NomineeID, Votes
FROM
table AS ABB2
JOIN
(SELECT PollID, MAX(Votes) AS most_votes
FROM table) AS ABB1 ON ABB1.PollID = ABB2.PollID AND ABB1.most_votes = ABB2.Votes
Please note, if you have 2 nominees with the same number of most votes for the same poll, they'll both be pulled using this query
select Pollid, Nomineeid, Votes from Poll_table
where Votes in (
select max(Votes) from Poll_table
group by Pollid
);

grouping and aggregates with subqueries

I have a query that is designed to find the number of people who went to a hospital more than once. What I have works, but is there a way to do it without the subquery?
SELECT count(*) as counts, hospitals.hospitalname
FROM Patient INNER JOIN
hospitals ON Patient.hospitalnpi = hospitals.npi
WHERE (hospitals.hospitalname = 'X')
group by patientid, hospitalname
having count(patient.patientid) >1
order by count(*) desc
This will always return the number of correct rows (30), but not the number 30. If I remove the group by patientid then I get the entire result set returned.
I solved this problem by doing
select COUNT(*),hospitalname
from
(
SELECT count(*) as counts,hospitals.hospitalname
FROM hospitals INNER JOIN
Patient ON hospitals.npi = Patient.hospitalnpi
group by patientid, hospitals.hospitalname
having count(patient.patientid) >1
) t
group by t.hospitalname
order by t.hospitalname desc
I feel that there has to be a more elegant solution than using subqueries all the time. How could this be improved?
sample data from first query
row # revisits
1 2
2 2
3 2
4 2
same data from second, working query
row# hosp. name revisitAggregate
1 x 30
2 y 15
3 z 5
Simple one-to-many relationship between patient and hospitals
It's super hacky, but here you are:
SELECT TOP 1
ROW_NUMBER() OVER (order by patient.patientid) as Count
FROM
Patient
INNER JOIN hospitals
ON Patient.hospitalnpi = hospitals.npi
WHERE
(hospitals.hospitalname = 'X')
GROUP BY
patientid,
hospitalname
HAVING
count(patient.patientid) >1
ORDER BY
Count desc
select distinct hospitalname, count(*) over (partition by hospitalname) from (
SELECT hospitalname, count(*) over (partition by patientid,
hospitals.hospitalname) as counter
FROM hospitals INNER JOIN
Patient ON hospitals.npi = Patient.hospitalnpi
WHERE (hospitals.hospitalname = 'X')
) Z
where counter > 1