How to do grouping by a date span? - sql

Conside this Table Structure.
Key ID VISITDATE
1 1 2011-01-07
2 1 2011-01-09
3 2 2011-01-10
4 1 2011-01-12
5 3 2011-01-12
6 1 2011-01-15
7 2 2011-01-21
9 1 2011-02-28
10 2 2011-03-21
11 1 2011-01-06
I need to get all the IDs,Key,min(VisitDate) where VisitDate is within 10 days span?if you have two visits within 10 days one row need to be there in the result.
Result
KEY ID VISITDATE
11 1 2011-01-06
3 2 2011-01-10
5 3 2011-01-12
7 2 2011-01-21
9 1 2011-02-28
10 2 2011-03-21
Can this be done without a self join. i have a query which does a self join with the table on ID and check the datediff.is there a better solution?can we use recursive CTE here?
EDIT
Prefer a solution which can use the index on date column

Yes a CTE would work nicely for this (everything with me is CTEs lately)...
;WITH TenDayVisits
AS (
SELECT
ID
,MIN(VisitDate) AS VisitDate
FROM Visits
GROUP BY ID
UNION ALL
SELECT
t.ID
,v.VisitDate
FROM Visits AS v
JOIN TenDayVisits AS t ON v.ID = t.ID
AND DATEDIFF(dd,t.Visitdate,v.VisitDate) > 10
)
SELECT
DISTINCT
v.[key]
,t.id
,t.VisitDate
FROM TenDayVisits as T
JOIN Visits AS v ON t.id = v.id
AND t.VisitDate = v.VisitDate

Related

T-SQL How to configure Group by so that specific values would be correctly shown

My current T-SQL query provides the following results:
Query:
WITH CTE AS
(
SELECT SubscriberID, sum(valueMB) as ValuesMB
FROM dbo.InternetNetwork
GROUP BY SubscriberID
),
CTE2 AS (
SELECT ab.planID, a.SubscriberID, MAX(ValuesMB) as MaximumValue
FROM CTE AS a
left join
Subscriber as ab on a.SubscriberID= ab.SubscriberID
GROUP BY ab.planID, a.SubscriberID
)
select *
FROM CTE2 as b
ORDER BY b.MaximumValue desc
Output:
planID | SubscriberID | MaxValue
19 1555 97536.00
18 3528 97478.00
2 4029 93413.00
Query #2:
WITH CTE AS
(
SELECT SubscriberID, sum(valueMB) as ValuesMB
FROM dbo.InternetNetwork
GROUP BY SubscriberID
),
CTE2 AS(
SELECT ab.planID, MAX(ValuesMB) as MaximumValue
FROM CTE AS a
left join
Subscriber as ab on a.SubscriberID= ab.SubscriberID
GROUP BY ab.planID
)
SELECT pl.OperatorID, MAX(b.MaximumValue) as Super
FROM CTE2 as b
left join
Plan as pl on b.planID= pl.planID
GROUP BY pl.operatorID
ORDER BY pl.operatorID
Output #2:
OperatorID | Value
1 93413.00
2 86017.00
3 97536.00
I would like to also include a subscriberID, but I'm unable to figure out a way to do so, as the only way to do it, is including in the last SELECT and adding to GROUP BY, which when done, makes a mess of a result which is not accurate.
My desired output:
OperatorID | Value | SubscriberID
1 93413.00 4029
2 86017.00 164
3 97536.00 1544
internet network data:
SubscriberID ValuesMB
1 28
1 27
2 27
2 27
2 27
3 29
3 28
3 27
3 27
4 27
4 27
4 29
Subscriber Data:
SubscriberID PersonID PlanID
1 1 3
2 2 10
3 2 6
4 3 14
5 3 1
6 4 18
7 5 5
8 5 1
9 5 9
10 5 16
11 6 13
12 6 13
13 6 20
14 6 16
15 7 4
Plan data
PlanID OperatorID
1 1
2 1
3 2
4 2
5 2
6 2
7 2
8 2
9 2
10 2
11 2
12 3
13 3
14 3
15 3
16 3
17 3
18 3
19 3
20 3
The tables are somewhat like this related InternetNetwork-> Subscriber -> Plan. InternetNetwork contains how much each Subscribed has used. Each Subscriber has Plan associated with him. Each Plan contains a different Operator, there are only three. I wish to list all three operators, the data transferred by the subscriber of the plan that has the operator and Subscriber ID.
Window functions allow you to have fields in your select along with aggregate functions. You can do something like this
;WITH CTE AS
(
SELECT I.SubscriberID,
S.PlanID,
SUM(ValuesMB) OVER(PARTITION BY i.SubscriberID)as ValuesMB
FROM dbo.InternetNetwork I
JOIN Subscriber S
ON I.SubscriberID = S.SubscriberID
),
CTE2 AS
(
SELECT p.operatorID,
a.SubscriberID,
a.ValuesMB,
ROW_NUMBER() OVER(PARTITION BY p.operatorID ORDER BY a.ValuesMB DESC) as rn
FROM CTE a
join [Plan] P
on a.planID = P.planID
)
SELECT operatorID,
ValuesMB,
SubscriberID
FROM CTE2
where rn = 1

Get most frequent value from a windowing function

I have a SQL table that looks like:
user_id role date
1 1 2019-11-26 21:20:54.397+00
1 2 2019-11-27 22:46:28.923+00
2 1 2019-12-06 22:17:53.925+00
2 3 2019-12-13 00:12:28.006+00
3 1 2019-11-25 21:57:17.701+00
3 1 2019-12-06 20:48:28.314+00
3 1 2019-12-15 23:59:06.81+00
4 3 2019-12-04 15:26:10.639+00
4 3 2019-11-22 19:20:01.025+00
4 3 2019-11-25 12:38:53.169+00
I would like to get the most frequent role according to past dates and use. The result should looks like:
user_id role date most_frequent_role
1 1 2019-11-26 21:20:54.397+00 NULL
1 2 2019-11-27 22:46:28.923+00 1
2 1 2019-12-06 22:17:53.925+00 NULL
2 3 2019-12-13 00:12:28.006+00 1
3 1 2019-11-25 21:57:17.701+00 NULL
3 1 2019-12-06 20:48:28.314+00 1
3 1 2019-12-15 23:59:06.81+00 1
4 3 2019-12-04 15:26:10.639+00 NULL
4 3 2019-11-22 19:20:01.025+00 3
4 3 2019-11-25 12:38:53.169+00 3
Following query will work for you.
select test.user_id,test.role,test.role_date,
case when test.role_date in
(select min(role_date) from test group by user_id) then NULL
else t.role end as MOST_FREQUENT_ROLE
from
(select user_id,min(role) as role from test group by user_id
)t
join test on t.user_id=test.user_id
order by user_id,role_date
Output
USER_ID ROLE ROLE_DATE MOST_FREQUENT_ROLE
1 1 26-NOV-19 -
1 2 27-NOV-19 1
2 1 06-DEC-19 -
2 3 13-DEC-19 1
3 1 25-NOV-19 -
3 1 06-DEC-19 1
3 1 15-DEC-19 1
4 3 22-NOV-19 -
4 3 25-NOV-19 3
4 3 04-DEC-19 3
If you strictly want to go with window function, Try below -
SELECT user_id
,role
,date
,CASE WHEN date = MIN(date) OVER(PARTITION BY user_id ORDER BY date)
THEN NULL
ELSE MIN(role) OVER(PARTITION BY user_id) END MOST_FREQUENT_ROLE
FROM YOUR_TABLE;
Technically, what you are trying to calculate is the mode (this is a statistical term).
Postgres has a built-in mode() function. Alas, it does not work as you need as a window function, so it provides little help.
I would recommend using a lateral join:
select t.*, m.role
from t left join lateral
(select t2.role
from t t2
where t2.user_id = t.user_id and
t2.date < t.date
group by t2.role
order by count(*) desc,
max(date) desc -- in the event of ties, use the most recent
limit 1
) m
on 1=1
order by user_id, date;
Here is a db<>fiddle. Note that I added some rows to give an example of where the running mode changes.
This will not be particularly efficient but an index on (user_id, date, role) should help.
If you have just a handful of roles there are probably more efficient solutions. If that is the case and performance is an issue, ask a new question.

filling running total over all month although its null

I have 2 tables. One only with all periods. Second with Account, Amount and period.
I want to build a View that lists Amount kumulated, period and account. Also if I don't have an fact for a period in my table should be appear the period in my view with the last amount.
select distinct
account, b.periode,
SUM(amount) OVER (PARTITION BY account ORDER BY b.periode RANGE UNBOUNDED PRECEDING)
from
fakten a
full join
perioden b on a.periode = b.periode
order by b.periode
it like this:
1 1 6
2 1 4
1 2 13
2 2 3
NULL 3 NULL
1 4 46
2 5 48
NULL 6 NULL
NULL 7 NULL
1 8 147
NULL 9 NULL
NULL 10 NULL
NULL 11 NULL
NULL 12 NULL
I need it like:
1 1 6
2 1 4
1 2 13
2 2 3
1 3 13
2 3 3
1 4 46
2 4 3
1 5 46
2 5 48
1 6 46
2 6 46
and so one...
Any ideas?
full join is not the right approach. Instead, generate the rows you want using a cross join. Then use left join and group by to do the calculation.
select a.account, p.periode,
SUM(f.amount) OVER (PARTITION BY a.account ORDER BY p.periode)
from (select distinct account from fakten) a cross join -- you probably have an account table, use it
perioden p
on a.periode = p.periode left join
fakten f
on f.account = a.account and f.periode = p.periode
group by a.account, p.periode
order by a.account, p.periode;

How to get minimum date by group id per client id

I am trying to Select Group ID, and minimum dates per clients ID.
This is sample data set.
ClientID GroupID DataDate
1 9 2016-05-01
2 8 2015-04-01
3 7 2016-07-05
1 6 2015-01-05
1 5 2014-11-12
2 4 2016-11-02
1 3 2013-02-14
2 2 2011-04-01
I wrote
SELECT
clientID, MIN(DataDate)
FROM sampleTable
GROUP BY clientID
But in this query, I do not have GroupID selected. I need to include GroupID to join another table.
If I do:
Select
ClientID, GroupID, MIN(DataDate)
FROM sampleTable
GROUP BY ClientID, GroupID
It won't really get minimum dates per client.
Could you help me. How I should do this?
You can use ROW_NUMBER instead:
SELECT
ClientID, GroupID, DataDate
FROM (
SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY ClientID ORDER BY DataDate)
FROM SampleData
) t
WHERE rn = 1
If you want to include ties, use RANK instead of ROW_NUMBER.
I hope i understood your question correctly .
You want to display min dates for each client id's
If my table has data like this:
CID GID D1
1 9 03-06-2016
1 6 01-06-2017
1 5 01-06-2015
1 3 01-06-2014
2 4 01-06-2017
2 8 01-06-2014
3 5 03-06-2016
2 4 01-06-2011
Output :
CID GID D1
1 3 01-06-2014
2 4 01-06-2011
3 5 03-06-2016
This is what i think you can go with .
select cx.cid,cx.gid, cx.d1 from cli cx where cx.d1=(select min(c1.d1) from cli c1 where c1.cid=cx.cid)
group by cx.cid,cx.gid,cx.d1
order by cx.gid
Hope it helps.

Query for max to_date for one user id?

I am getting some unexpected results from a SQL query.
Table data:
users:
id username
1 admin
2 x1
3 y1
4 z1
my_connections:
id user_id friend_user_id status
1 1 2 201
2 2 1 201
3 2 4 201
4 1 3 200
5 2 3 201
6 3 2 201
7 4 2 201
8 4 1 200
jobs:
id user_id company_name designation from_date to_date
1 1 A 1 2011-06-01 2011-07-30
2 1 B 11 2011-08-02 2014-01-20
3 2 c 12 2012-05-02 2014-01-20
4 3 D 13 2010-05-02 2014-01-20
5 4 E 11 2009-05-25 2014-01-01
Here is my query:
SELECT users.id,users.username,my_connections.user_id,my_connections.friend_user_id,my_connections.status,jobs.user_id,jobs.company_name,
jobs.designation,jobs.from_date,MAX(jobs.to_date)
FROM users
LEFT JOIN jobs ON jobs.user_id = users.id
LEFT JOIN my_connections ON my_connections.friend_user_id = users.id
WHERE my_connections.status = 201 AND users.id IN (1,3,4)
GROUP BY jobs.company_name
ORDER BY jobs.to_date DESC
And the results:
id username user_id friend_user_id status user_id company_name designs from_date to_date
3 .. 2 3 201 3 D .. 2010-05-02 2014-01-20
4 .. 2 4 201 4 E .. 2009-05-25 2014-01-01
1 .. 2 1 201 1 A .. 2011-08-02 2014-01-20
1 .. 2 1 201 1 B .. 2011-06-01 2011-07-30
In the result, I wanted one row per friend_user_id, with the maximum value of to_date. Instead I am getting multiple rows (if there are multiple rows in the jobs table).
How can I fix this query?
if you want unique results on the friend_user_id field you must group by friend_user_id. This will guarantee unique results on the friend_user_id column. But im pretty sure you don't want that because it may show incorrect data. I am still unsure how the query is working because the group by only contains one field. You must group by all the raw fields in the select query and perform aggregate functions on fields that are not in the group by clause for example:
SELECT users.id,users.username,my_connections.user_id,my_connections.friend_user_id,my_connections.status,jobs.user_id,jobs.company_name,
jobs.designation,jobs.from_date,MAX(jobs.to_date)
FROM users
LEFT JOIN jobs ON jobs.user_id = users.id
LEFT JOIN my_connections ON my_connections.friend_user_id = users.id
WHERE my_connections.status = 201 AND users.id IN (1,3,4)
GROUP BY users.id,users.username,my_connections.user_id,my_connections.friend_user_id,my_connections.status,jobs.user_id,jobs.company_name,
jobs.designation,jobs.from_date
ORDER BY jobs.to_date DESC
In this query all of the fields in the group by clause are in the select clause. Now all the fields not included in the group by clause can use functions like: MAX(), AVG(), SUM() etc.