Get most frequent value from a windowing function - sql

I have a SQL table that looks like:
user_id role date
1 1 2019-11-26 21:20:54.397+00
1 2 2019-11-27 22:46:28.923+00
2 1 2019-12-06 22:17:53.925+00
2 3 2019-12-13 00:12:28.006+00
3 1 2019-11-25 21:57:17.701+00
3 1 2019-12-06 20:48:28.314+00
3 1 2019-12-15 23:59:06.81+00
4 3 2019-12-04 15:26:10.639+00
4 3 2019-11-22 19:20:01.025+00
4 3 2019-11-25 12:38:53.169+00
I would like to get the most frequent role according to past dates and use. The result should looks like:
user_id role date most_frequent_role
1 1 2019-11-26 21:20:54.397+00 NULL
1 2 2019-11-27 22:46:28.923+00 1
2 1 2019-12-06 22:17:53.925+00 NULL
2 3 2019-12-13 00:12:28.006+00 1
3 1 2019-11-25 21:57:17.701+00 NULL
3 1 2019-12-06 20:48:28.314+00 1
3 1 2019-12-15 23:59:06.81+00 1
4 3 2019-12-04 15:26:10.639+00 NULL
4 3 2019-11-22 19:20:01.025+00 3
4 3 2019-11-25 12:38:53.169+00 3

Following query will work for you.
select test.user_id,test.role,test.role_date,
case when test.role_date in
(select min(role_date) from test group by user_id) then NULL
else t.role end as MOST_FREQUENT_ROLE
from
(select user_id,min(role) as role from test group by user_id
)t
join test on t.user_id=test.user_id
order by user_id,role_date
Output
USER_ID ROLE ROLE_DATE MOST_FREQUENT_ROLE
1 1 26-NOV-19 -
1 2 27-NOV-19 1
2 1 06-DEC-19 -
2 3 13-DEC-19 1
3 1 25-NOV-19 -
3 1 06-DEC-19 1
3 1 15-DEC-19 1
4 3 22-NOV-19 -
4 3 25-NOV-19 3
4 3 04-DEC-19 3

If you strictly want to go with window function, Try below -
SELECT user_id
,role
,date
,CASE WHEN date = MIN(date) OVER(PARTITION BY user_id ORDER BY date)
THEN NULL
ELSE MIN(role) OVER(PARTITION BY user_id) END MOST_FREQUENT_ROLE
FROM YOUR_TABLE;

Technically, what you are trying to calculate is the mode (this is a statistical term).
Postgres has a built-in mode() function. Alas, it does not work as you need as a window function, so it provides little help.
I would recommend using a lateral join:
select t.*, m.role
from t left join lateral
(select t2.role
from t t2
where t2.user_id = t.user_id and
t2.date < t.date
group by t2.role
order by count(*) desc,
max(date) desc -- in the event of ties, use the most recent
limit 1
) m
on 1=1
order by user_id, date;
Here is a db<>fiddle. Note that I added some rows to give an example of where the running mode changes.
This will not be particularly efficient but an index on (user_id, date, role) should help.
If you have just a handful of roles there are probably more efficient solutions. If that is the case and performance is an issue, ask a new question.

Related

Retrieving last record in each group from database with order by

There is a table ticket that contains data as shown below:
Id Impact group create_date
------------------------------------------
1 3 ABC 2020-07-28 00:42:00.0
1 2 ABC 2020-07-28 00:45:00.0
1 3 ABC 2020-07-28 00:48:00.0
1 3 ABC 2020-07-28 00:52:00.0
1 3 XYZ 2020-07-28 00:55:00.0
1 3 XYZ 2020-07-28 00:59:00.0
Expected result:
Id Impact group create_date
------------------------------------------
1 3 ABC 2020-07-28 00:42:00.0
1 2 ABC 2020-07-28 00:45:00.0
1 3 ABC 2020-07-28 00:52:00.0
1 3 XYZ 2020-07-28 00:59:00.0
At present, this is the query that I use:
WITH final AS (
SELECT p.*,
ROW_NUMBER() OVER(PARTITION BY p.id,p.group,p.impact
ORDER BY p.create_date desc, p.impact) AS rk
FROM ticket p
)
SELECT f.*
FROM final f
WHERE f.rk = 1
Result, i am getting is:
Id Impact group create_date
-----------------------------------------
1 2 ABC 2020-07-28 00:45:00.0
1 3 ABC 2020-07-28 00:52:00.0
1 3 XYZ 2020-07-28 00:59:00.0
it seems that partition by is getting precedence over order by values. is there other way to achieve expected result. I am running these queries on amazon Redshift.
You could use LEAD() to check if the Impact changes between rows, taking only the rows where the value will change.
WITH
look_forward AS
(
SELECT
*,
LEAD(impact) OVER (PARTITION BY id, group ORDER BY create_date) AS lead_impact
FROM
ticket
)
SELECT
*
FROM
look_forward
WHERE
lead_impact IS NULL
OR lead_impact <> impact
You seem to want rows where id/impact/group change relative to the next row. A simple way is to look at the next create_date overall and the next create_date for the group. If these are the same, then filter:
select t.*
from (select t.*,
lead(create_date) over (order by create_date) as next_create_date,
lead(create_date) over (partition by id, impact, group order by create_date) as next_create_date_img
from ticket t
) t
where next_create_date_img is null or next_create_date_img <> next_create_date;

Select only one row by more conditions

This query is supported by PostgreSQL but H2 can not run query because of Over(partition by) . Question is how to select only one row with latest created time for different values in 2 columns.
Example:
id name created ecid psid
1 aa 2019-02-07 1 1
2 bb 2019-02-01 1 1
3 cc 2019-02-05 2 2
4 dd 2019-02-06 2 3
5 ee 2019-02-08 2 3
Result:
id name created ecid psid
1 aa 2019-02-07 1 1
3 cc 2019-02-05 2 2
5 ee 2019-02-08 2 3
SELECT s.*, MAX(s.created) OVER (PARTITION BY s.ecid, s.psid) AS latest FROM ...
WHERE latest = created
use correlated subquery
select t1.* from table t1
where t1.created = ( select max(created)
from table t2 where t1.ecid=t2.ecid and t1.psid=t2.psid)
Use NOT EXISTS:
select t.* from tablename t
where not exists (
select 1 from tablename
where ecid = t.ecid and psid = t.psid and created > t.created
)

Select unique field values with condition

I`m new at SQL and my problem is:
I have a table like
card shop time date
1 1 0000 20171001
2 2 0125 20171002
2 1 0344 20171002
3 3 0342 20171103
4 5 1334 20171104
4 4 1225 20171105
5 4 1452 20171106
I need to select two fields (card(card must be unique) and shop) by the minimum value of the columns time and date (date is in priority).
The result should look like this:
card shop time date
1 1 0000 20171001
2 2 0125 20171002
3 3 0342 20171103
4 5 1334 20171104
5 4 1452 20171106
Thank you in advance!
For SQL Server you could use WITH TIES
select top 1 with ties *
from yourTable
order by row_number() over (partition by card order by date asc, time asc)
you can use sub-query and aggregate function
select * from yourtable t
where t.date in (select min(date) from yourtable t1
where t.card=t1.card )

How to get minimum date by group id per client id

I am trying to Select Group ID, and minimum dates per clients ID.
This is sample data set.
ClientID GroupID DataDate
1 9 2016-05-01
2 8 2015-04-01
3 7 2016-07-05
1 6 2015-01-05
1 5 2014-11-12
2 4 2016-11-02
1 3 2013-02-14
2 2 2011-04-01
I wrote
SELECT
clientID, MIN(DataDate)
FROM sampleTable
GROUP BY clientID
But in this query, I do not have GroupID selected. I need to include GroupID to join another table.
If I do:
Select
ClientID, GroupID, MIN(DataDate)
FROM sampleTable
GROUP BY ClientID, GroupID
It won't really get minimum dates per client.
Could you help me. How I should do this?
You can use ROW_NUMBER instead:
SELECT
ClientID, GroupID, DataDate
FROM (
SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY ClientID ORDER BY DataDate)
FROM SampleData
) t
WHERE rn = 1
If you want to include ties, use RANK instead of ROW_NUMBER.
I hope i understood your question correctly .
You want to display min dates for each client id's
If my table has data like this:
CID GID D1
1 9 03-06-2016
1 6 01-06-2017
1 5 01-06-2015
1 3 01-06-2014
2 4 01-06-2017
2 8 01-06-2014
3 5 03-06-2016
2 4 01-06-2011
Output :
CID GID D1
1 3 01-06-2014
2 4 01-06-2011
3 5 03-06-2016
This is what i think you can go with .
select cx.cid,cx.gid, cx.d1 from cli cx where cx.d1=(select min(c1.d1) from cli c1 where c1.cid=cx.cid)
group by cx.cid,cx.gid,cx.d1
order by cx.gid
Hope it helps.

How to do grouping by a date span?

Conside this Table Structure.
Key ID VISITDATE
1 1 2011-01-07
2 1 2011-01-09
3 2 2011-01-10
4 1 2011-01-12
5 3 2011-01-12
6 1 2011-01-15
7 2 2011-01-21
9 1 2011-02-28
10 2 2011-03-21
11 1 2011-01-06
I need to get all the IDs,Key,min(VisitDate) where VisitDate is within 10 days span?if you have two visits within 10 days one row need to be there in the result.
Result
KEY ID VISITDATE
11 1 2011-01-06
3 2 2011-01-10
5 3 2011-01-12
7 2 2011-01-21
9 1 2011-02-28
10 2 2011-03-21
Can this be done without a self join. i have a query which does a self join with the table on ID and check the datediff.is there a better solution?can we use recursive CTE here?
EDIT
Prefer a solution which can use the index on date column
Yes a CTE would work nicely for this (everything with me is CTEs lately)...
;WITH TenDayVisits
AS (
SELECT
ID
,MIN(VisitDate) AS VisitDate
FROM Visits
GROUP BY ID
UNION ALL
SELECT
t.ID
,v.VisitDate
FROM Visits AS v
JOIN TenDayVisits AS t ON v.ID = t.ID
AND DATEDIFF(dd,t.Visitdate,v.VisitDate) > 10
)
SELECT
DISTINCT
v.[key]
,t.id
,t.VisitDate
FROM TenDayVisits as T
JOIN Visits AS v ON t.id = v.id
AND t.VisitDate = v.VisitDate