Get minimum per row based on value from another column - sql

Below is the base table of data..
date
customer_id
score1
score2
01/01/22
a
1
1
02/01/22
a
1
1
01/01/22
b
2
2
02/01/22
b
4
1
01/01/22
c
1
1
02/01/22
c
1
4
01/01/22
d
5
1
02/01/22
d
10
1
This is the result that I want to achieve, where I only pull through the rows where there has been a change in either score1 or score2 from the previous date. In this case, b has gone from 2 to 4 on score1 and 2 to 1 on score2, c has gone from 1 to 4 on score 2, d has gone from 5 to 10 on score 1.
date
customer_id
score1
score2
02/01/22
b
4
1
02/01/22
c
1
4
02/01/22
d
10
1
Unsure if there is a function to do this. Altenatively, would it be best to have two separate tables initially and use a join to achieve this. Using SQL presto if that helps.
Many thanks!

We can use the LAG() window function here:
WITH cte AS (
SELECT t.*, LAG(score, 1, score1) OVER (PARTITION BY customer_id
ORDER BY date) AS lag_score_1,
LAG(score, 1, score2) OVER (PARTITION BY customer_id
ORDER BY date) AS lag_score_2
FROM yourTable t
)
SELECT date, customer_id, score
FROM cte
WHERE score1 <> lag_score_1 OR score2 <> lag_score_2
ORDER BY date;
This answer uses the 3 parameter version of LAG(). The second parameter specifies a step of 1, while the third parameter specifies a default value in case there is no previous value. In this case, we use score as the default such that the earliest record in each customer partition is ignored.

Related

Finding adjacent column values from the last non-null value of a certain column in Snowflake (SQL) using partition by

Say I have the following table:
ID
T
R
1
2
1
3
Y
1
4
1
5
1
6
Y
1
7
I would like to add a column which equals the value from column T based on the last non-null value from column R. This means the following:
ID
T
R
GOAL
1
2
1
3
Y
1
4
Y
3
1
5
4
1
6
Y
4
1
7
6
I do have many ID's so I need to make use of the OVER (PARTITION BY ...) clause. Also, if possible, I would like to use a single statement, like
SELECT *
, GOAL
FROM TABLE
So without any extra select statement.
T is in ascending order so just null it out according to R and take the maximum looking backward.
select *,
max(case when R is not null then T end)
over (
partition by id
order by T
rows between unbounded preceding and 1 preceding
) as GOAL
from TBL
http://sqlfiddle.com/#!18/c927a5/5

Assign incremental id based on number series in ordered sql table

My table of interview candidates has three columns and looks like this (attempt is what I want to calculate):
candidate_id
interview_stage
stage_reached_at
attempt <- want to calculate
1
1
2019-01-01
1
1
2
2019-01-02
1
1
3
2019-01-03
1
1
1
2019-11-01
2
1
2
2019-11-02
2
1
1
2021-01-01
3
1
2
2021-01-02
3
1
3
2021-01-03
3
1
4
2021-01-04
3
The table represents candidate_id 1 who has had 3 separate interview attempts at a company.
Made it to interview_stage 3 on the 1st attempt
Made it to interview_stage 2 on the 2nd attempt
Made it to interview_stage 4 on the 3d attempt
Question: Can I somehow use the number series if I order by stage_reached_at? As soon as the next step for a particular candidate_id is lower than the row before, I know it's a new process.
I want to be able to group on candidate_id and process_grouping at the end of the day.
Thx in advance.
You can use lag() and then a cumulative sum:
select t.*,
sum(case when prev_interview_stage >= interview_stage then 1 else 0 end) over (partition by candidate_id order by stage_reached_at) as attempt
from (select t.*,
lag(interview_stage) over (partition by candidate_id order by stage_reached_at) as prev_interview_stage
from t
) t;
Note: Your question specifically says "lower". I wonder, though, if you really mean "lower or equal to". If the latter, change the >= to >.

Calculating number of trips without using a loop

I am currently working on postgres and below is the question that I have.
We have a customer ID and the date when the person visited a property. Based on this I need to calculate the number of trips. Consecutive dates are considered as one trip. Eg: If a person visits on first date the trip no is first, post that he visits consecutively for three days that will counted as trip two.
Below is the input
ID Date
1 1-Jan
1 2-Jan
1 5-Jan
1 1-Jul
2 1-Jan
2 2-Feb
2 5-Feb
2 6-Feb
2 7-Feb
2 12-Feb
Expected output
ID Date Trip no
1 1-Jan 1
1 2-Jan 1
1 5-Jan 2
1 1-Jul 3
2 1-Jan 1
2 2-Feb 2
2 5-Feb 3
2 6-Feb 3
2 7-Feb 3
2 12-Feb 4
I am able to implement successfully using loop but its running very slow given the volume of the data.
Can you please suggest a workaround where we can not use loop.
Subtract a sequence from the dates -- these will be constant for a particular trip. Then you can use dense_rank() for the numbering:
select t.*,
dense_rank() over (partition by id order by grp) as trip_num
from (select t.*,
(date - row_number() over (partition by id order by date) * interval '1 day'
) as grp
from t
) t;

Delete rows, which are duplicated and follow each other consequently

It's hard to formulate, so i'll just show an example and you are welcome to edit my question and title.
Suppose, i have a table
flag id value datetime
0 b 1 343 13
1 a 1 23 12
2 b 1 21 11
3 b 1 32 10
4 c 2 43 11
5 d 2 43 10
6 d 2 32 9
7 c 2 1 8
For each id i want to squeze the table by flag columns such that all duplicate flag values that follow each other collapse to one row with sum aggregation. Desired result:
flag id value
0 b 1 343
1 a 1 23
2 b 1 53
3 c 2 75
4 d 2 32
5 c 2 1
P.S: I found functions like CONDITIONAL_CHANGE_EVENT, which seem to be able to do that, but the examples of them in docs dont work for me
Use the differnece of row number approach to assign groups based on consecutive row flags being the same. Thereafter use a running sum.
select distinct id,flag,sum(value) over(partition by id,grp) as finalvalue
from (
select t.*,row_number() over(partition by id order by datetime)-row_number() over(partition by id,flag order by datetime) as grp
from tbl t
) t
Here's an approach which uses CONDITIONAL_CHANGE_EVENT:
select
flag,
id,
sum(value) value
from (
select
conditional_change_event(flag) over (order by datetime desc) part,
flag,
id,
value
from so
) t
group by part, flag, id
order by part;
The result is different from your desired result stated in the question because of order by datetime. Adding a separate column for the row number and sorting on that gives the correct result.

Get average number of entries for each unique id within another unique id

I am using MS SQL, and I have a table like this:
dogid incident_id incident_flags
1 1 a
1 1 c
1 1 d
1 20 b
1 20 a
2 12 NA
2 14 a
2 14 b
I would like to find out the average number of incident_flags per incident_id for each dogid. So for instance, I would like this output to look like:
dogid av_flags
1 2.5
2 1
These were found by:
For dogid 1, we have an incidentid with 3 flags, and an incidentid with 2 flags. Av(3, 2) = 2.5
For dogid 2, we have an incidentid with 0 flags (NA should be counted as 0, it will never occur with another incident_flag on the same incidentid), and an incident with 2 flags. Av(0, 2) = 1
Incident_id is unique to each dogid (you will never get, say, incident_id 1 under both dogid 1 and another dogid). Incident_flags will not be repeated for one incident (you can't have "a" twice under incident_id 1) but can be repeated for other incidents eg. can get incident_flag "a" for incident 1 and incident 20.
How would I go about this?
Using aggregate functions:
SQL Fiddle
SELECT
dogid,
av_flags = SUM(CASE WHEN incident_flags <> 'NA' THEN 1 ELSE 0 END)/
(COUNT(DISTINCT incident_id) * 1.0)
FROM tbl
GROUP BY dogid