Assign incremental id based on number series in ordered sql table - sql

My table of interview candidates has three columns and looks like this (attempt is what I want to calculate):
candidate_id
interview_stage
stage_reached_at
attempt <- want to calculate
1
1
2019-01-01
1
1
2
2019-01-02
1
1
3
2019-01-03
1
1
1
2019-11-01
2
1
2
2019-11-02
2
1
1
2021-01-01
3
1
2
2021-01-02
3
1
3
2021-01-03
3
1
4
2021-01-04
3
The table represents candidate_id 1 who has had 3 separate interview attempts at a company.
Made it to interview_stage 3 on the 1st attempt
Made it to interview_stage 2 on the 2nd attempt
Made it to interview_stage 4 on the 3d attempt
Question: Can I somehow use the number series if I order by stage_reached_at? As soon as the next step for a particular candidate_id is lower than the row before, I know it's a new process.
I want to be able to group on candidate_id and process_grouping at the end of the day.
Thx in advance.

You can use lag() and then a cumulative sum:
select t.*,
sum(case when prev_interview_stage >= interview_stage then 1 else 0 end) over (partition by candidate_id order by stage_reached_at) as attempt
from (select t.*,
lag(interview_stage) over (partition by candidate_id order by stage_reached_at) as prev_interview_stage
from t
) t;
Note: Your question specifically says "lower". I wonder, though, if you really mean "lower or equal to". If the latter, change the >= to >.

Related

sql snowflake, aggregate over window or sth

I have a table below
days
balance
user_id
wanted column
2022/08/01
10
1
1
2022/08/02
11
1
1
2022/08/03
10
1
1
2022/08/03
0
2
1
2022/08/05
3
2
2
2022/08/06
3
2
2
2022/08/07
3
3
3
2022/08/08
0
2
3
since I'm new to SQL couldn't aggregate over window by clauses, correctly.
which means; I want to find unique users that have balance>0 per day.
thanks
update:
exact output wanted:
days
unque users
2022/08/01
1
2022/08/02
1
2022/08/03
1
2022/08/05
2
2022/08/06
2
2022/08/07
3
2022/08/08
3
update: how if I want to accumulate the number of unique users over time? with consideration of new users [means: counting users who didn't exist before], and the balance > 0
everyones help is appreaciated deeply :)
SELECT
*,
COUNT(DISTINCT CASE WHEN balance > 0 THEN USER_ID END) OVER (ORDER BY days)
FROM
your_table

How to update one row between DateFrom and DateTo, when I have two changes in the same day?

#FinalChangeSet collects data from 3 history tables: #tmpEmployeeHistory, #PayrollHistory and #ContractHistory. Basically when I have a change recorded in one of those tables, I insert the PersonId and ChangeDate (which is DATE_FROM from the history tables) in #tmpFinalChangeSet. After the data collection, I have to complete the image of the lines in #tmpFinalChangeSet, with updates from those 3 tables. For instance each line in #tmpFinalChangeSet has a StatusId(EmployeeHistory), GrossIncome(PayrollHistory), ContractID(ContractHistory). Each line from #tmpFinalChangeSet has to be updated on StatusId, GrossIncome and ContractId, so that we know, in each moment in time (ChangeDate) what is the info associated with that PersonId.
I can expect 2 changes in the same day only in #EmployeeHistory, that's why I have DaySeqNum column, to record which was first. If for a PersonId we have 2 changes in the same day, both of them go in #tmpFinalChangeSet, because we expect a different StatusId.
#EmployeeHistory
PersonId
Date_From
DateTo
DaySeqNum
StatusId
1
2018-06-01
2018-12-16
1
1
1
2018-12-17
2018-12-17
1
1
1
2018-12-17
2019-04-30
2
5
1
2019-05-01
2019-07-31
1
1
Expected result set:
Id
PersonId
ChangeDate
DaySeqNum
StatusId
1
1
2018-06-16
1
1
2
1
2018-12-17
1
1
3
1
2018-12-17
2
5
4
1
2019-01-01
1
5
5
1
2019-05-01
1
1
And I need to fill StatusId in #tmpFinalChangeSet. The Update below won't affect line 4.
UPDATE t SET
t.StatusId = eh.StatusId
FROM #tmpFinalChangeset t
JOIN #tmpEmployementHistory eh ON EH.PERSON_ID = T.PersonId
AND T.ChangeDate >= EH.DATE_FROM
AND T.ChangeDate <= ISNULL(EH.DATE_TO, '99991231')
AND EH.DaySeqNum = T.DaySeqNum
The result set
Id
PersonId
ChangeDate
DaySeqNum
StatusId
1
1
2018-06-16
1
1
2
1
2018-12-17
1
1
3
1
2018-12-17
2
5
4
1
2019-01-01
1
5
1
2019-05-01
1
1
The and EH.DaySeqNum = T.DaySeqNum was added so I wouldn't get the same result for ChangeDate = '2018-12-17' - which would've been StatusId = 1 (first line encountered)
I need two uniquely identify each ChangeDate between DateFrom and DateTo in #EmployeeHistory. But because sometimes I have 2 changes in the same date (eg:'2018-12-17') I can't just use between and in the on clause of the join. I need some sort of conditional JOIN statement, that I can't figure out right now.
Hope it's a bit clearer now, apologies for the long text. :)
Later edit: My only solution was two separate update statements. Pretty much I use the statement above twice, only difference being that the first time I don't have and EH.DaySeqNum = T.DaySeqNum . That works just fine, because I'll correct any line with DaySeqNumb = 2 in the next update, which would've left line 4 in my example outside.
I'm still looking for a better way to do this. (tbh I was thinking of a second way: dateadd-ing 1 second to the ChangeDates with DaySeqNum = 2, which would help me uniquely identify each date.)

Resetting a Count in SQL

I have data that looks like this:
ID num_of_days
1 0
2 0
2 8
2 9
2 10
2 15
3 10
3 20
I want to add another column that increments in value only if the num_of_days column is divisible by 5 or the ID number increases so my end result would look like this:
ID num_of_days row_num
1 0 1
2 0 2
2 8 2
2 9 2
2 10 3
2 15 4
3 10 5
3 20 6
Any suggestions?
Edit #1:
num_of_days represents the number of days since the customer last saw a doctor between 1 visit and the next.
A customer can see a doctor 1 time or they can see a doctor multiple times.
If it's the first time visiting, the num_of_days = 0.
SQL tables represent unordered sets. Based on your question, I'll assume that the combination of id/num_of_days provides the ordering.
You can use a cumulative sum . . . with lag():
select t.*,
sum(case when prev_id = id and num_of_days % 5 <> 0
then 0 else 1
end) over (order by id, num_of_days)
from (select t.*,
lag(id) over (order by id, num_of_days) as prev_id
from t
) t;
Here is a db<>fiddle.
If you have a different ordering column, then just use that in the order by clauses.

SQL - count value breaks in a column when specifying break parameters

This builds on a previous question of mine here.
I have a table which tracks service involvement (srvc_invl) for two individuals (name) over a period of time (day).
name day srvc_inv
Liam 1 1
Liam 2 0
Liam 3 1
Liam 4 0
Liam 5 0
Liam 6 1
Liam 7 0
Noel 1 0
Noel 2 0
Noel 3 1
Noel 4 0
Noel 5 1
Noel 6 1
Noel 7 1
My goal is to count the number of unique service involvements per individual. Previously, we accomplished this by counting breaks in service involvement 1's and 0's using a lag function:
select name, count(*)
from (select t.*,
lag(srvc_inv, 1, 0) over (partition by name order by day) as prev_srvc_inv
from t
) t
where prev_srvc_inv = 0 and srvc_inv = 1
group by name;
However, I have just found out that breaks in service involvement can be defined differently based on the program of interest. I.e. for some programs, one-day of non-consecutive service counts as a break, for example:
day srvc_inv
1 1
2 0
3 1
= 2 service episodes
but for other programs, two or more days of non-consecutive service counts as a break, for example:
day srvc_inv
1 1
2 0
3 1
= 1 service episode, but
day srvc_inv
1 1
2 0
3 0
4 1
5 0
= 2 service episodes
Using the table at the top of this post, let us assume we are analyzing a program that considers two days of non-consecutive service involvement to be a service break and thus a distinct service episode.
How would I modify the above query, or write a new query, that would allow me to specify the break number parameters?
My desired output is as follows:
name srvc_episodes
Liam 2
Noel 1
Thank you so so so much for any help anyone can offer on this!
Use a running sum rather than lag(). This gives you more flexibility:
select name, count(*)
from (select t.*,
sum(srvc_inc) over (partition by name
order by day
rows between 2 preceding and 1 preceding
) as sum_srvc_inc_2
from t
) t
where (sum_srvc_inc_2 = 0 or sum_srvc_inc_2 is null) and srvc_inc = 1
group by name;
You would adjust the "2"s for the length of time for the split.
Try this:
SELECT NAME,
SUM(CASE WHEN SRVC_INV = 1
AND (LAG1 = 1 OR LAG2 = 1 OR (LAG1 IS NULL AND LAG2 IS NULL))
THEN 1
ELSE 0
END) AS SERVICE_EPISODES
FROM
(SELECT NAME, SRVC_INV,
LAG(SRVC_INV,1) OVER (PARTITION BY NAME ORDER BY DAY) AS LAG1,
LAG(SRVC_INV,2) OVER (PARTITION BY NAME ORDER BY DAY) AS LAG2
FROM T)
GROUP BY NAME
Cheers!!

Calculating number of trips without using a loop

I am currently working on postgres and below is the question that I have.
We have a customer ID and the date when the person visited a property. Based on this I need to calculate the number of trips. Consecutive dates are considered as one trip. Eg: If a person visits on first date the trip no is first, post that he visits consecutively for three days that will counted as trip two.
Below is the input
ID Date
1 1-Jan
1 2-Jan
1 5-Jan
1 1-Jul
2 1-Jan
2 2-Feb
2 5-Feb
2 6-Feb
2 7-Feb
2 12-Feb
Expected output
ID Date Trip no
1 1-Jan 1
1 2-Jan 1
1 5-Jan 2
1 1-Jul 3
2 1-Jan 1
2 2-Feb 2
2 5-Feb 3
2 6-Feb 3
2 7-Feb 3
2 12-Feb 4
I am able to implement successfully using loop but its running very slow given the volume of the data.
Can you please suggest a workaround where we can not use loop.
Subtract a sequence from the dates -- these will be constant for a particular trip. Then you can use dense_rank() for the numbering:
select t.*,
dense_rank() over (partition by id order by grp) as trip_num
from (select t.*,
(date - row_number() over (partition by id order by date) * interval '1 day'
) as grp
from t
) t;