I have the following table
ID FromDate ToDate
1 2020-01-01 2020-12-31
1 2021-01-01 2021-12-31
1 2022-03-01 2022-12-31
If the difference between "ToDate" from any row and FromDate in the subsequent row is less than
30 days then I should get 1 row with FromDate and the second ToDate.
Below is what I would expect to get:
ID FromDate ToDate
1 2020-01-01 2021-12-31
1 2022-03-01 2022-12-31
Any suggestions would be greatly appreciated
Related
I'm new to SSAS so be gentle!
I have (simplified):
fact table that has an ID, start date, start datetime, end date, end datetime
A date dimension that has a granularity from Year to Calendar Date.
What I'd like to be able to do is get the count of ID per hour per date member/current member. However I'm not exactly sure how to get there.
Fact Table Example
ID
Start Date
End Date
Start DateTime
End DateTime
1
2022-01-01
2022-01-04
2022-01-01 23:00
2022-01-04 05:33
53
2022-01-01
2022-01-07
2022-01-01 04:00
2022-01-07 12:05
Wanted results:
Date
Hour
Count
2022-01-02
00:00
1
2022-01-02
01:00
1
2022-01-02
02:00
1
2022-01-02
03:00
1
2022-01-02
04:00
2
2022-01-02
05:00
2
I expect I need an hour dimension that somehow links to the date dimension and then some sort of measure that does a between comparison but not exactly sure how to go about this.
Any help is appreciated!
Edit: above tables may not be showing right for some reason. Looks great when I go to edit them...
Giving up after a few hours of failed attempts.
My data is in the following format - event_date can never be higher than create_date.
I'd need to calculate on a rolling n-day basis (let's say 3) the sum of units where the create_date and event_date were within the same 3-day window. The data is illustrative but each event_date can have over 500+ different create_dates associated with it and the number isn't constant. There is a possibility of event_dates missing.
So let's say for 2022-02-03, I only want to sum units where both the event_date and create_date values were between 2022-02-01 and 2022-02-03.
event_date
create_date
rowid
units
2022-02-01
2022-01-20
1
100
2022-02-01
2022-02-01
2
100
2022-02-02
2022-01-21
3
100
2022-02-02
2022-01-23
4
100
2022-02-02
2022-01-31
5
100
2022-02-02
2022-02-02
6
100
2022-02-03
2022-01-30
7
100
2022-02-03
2022-02-01
8
100
2022-02-03
2022-02-03
9
100
2022-02-05
2022-02-01
10
100
2022-02-05
2022-02-03
11
100
The output I'd need to get to (added in brackets the rows I'd need to include in the calculation for each date but my result would only need to include the numerical sum) . I tried calculating using either dates but neither of them returned the results I needed.
date
units
2022-02-01
100 (Row 2)
2022-02-02
300 (Row 2,5,6)
2022-02-03
300 (Row 2,6,8,9)
2022-02-04
200 (Row 6,9)
2022-02-05
200 (Row 9,11)
In Python I solved above with a definition that looped through filtering a dataframe for each date but I am struggling to do the same in SQL.
Thank you!
Consider below approach
with events_dates as (
select date from (
select min(event_date) min_date, max(event_date) max_date
from your_table
), unnest(generate_date_array(min_date, max_date)) date
)
select date, sum(units) as units, string_agg('' || rowid) rows_included
from events_dates
left join your_table
on create_date between date - 2 and date
and event_date between date - 2 and date
group by date
if applied to sample data in your question - output is
Date 1
Date 2
Date 3
Date 4
LineCount
Month_Gap
2020-01-01
2019-10-01
2019-09-06
1
2020-01-01
2019-10-01
2019-09-13
2019-09-06
2
0
2020-01-01
2019-10-01
2019-08-13
2019-09-06
2
1
If the LineCount is 1, then Month_Gap should be the maximum month difference between (Date1 & Date3) and (Date2 & Date3). Date3 will always be in between Date1 and Date2.
In this Case, the output should be the max month difference between (2020/01/01 - 2019/09/06) and (2019/10/01 - 2019/09/06), which is 3 months:
Date 1
Date 2
Date 3
Date 4
LineCount
Month_Gap
2020-01-01
2019-10-01
2019-09-06
1
3
2020-01-01
2019-10-01
2019-09-13
2019-09-06
2
0
2020-01-01
2019-10-01
2019-08-13
2019-09-06
2
1
I was trying something like this but not sure how to go about it -
CASE WHEN LineCount = 1 THEN MAX(DATE_DIFF(.....), which won't work I guess.
The pattern you should use is
SELECT TIMESTAMPDIFF("MONTH", LEAST(date1,date2,date3,date4), GREATEST(date1,date2,date3,date4)) as `maximum_difference`;
This will simply look through your columns, find the least and greatest, and return the result.
SELECT
CASE WHEN LineCount = 1 THEN GREATEST(DATE_DIFF('month', Date3, Date1),
DATE_DIFF('month', Date3, Date2)) END AS Month_Gap
I would like to transpose my table to see trends in a data. The data is formatted as such:
UserId is can occur multiple times because of different assessment periods. Let's say a user with ID 1 inccured some charges in January, February, and March. There are currently three rows that contain data from these periods respectively.
I would like to see everything as one row - independently of the number of periods (up to 12 months), for each user ID.
This would enable me to see and compare changes between assessment periods and attributes.
Current format:
UserId AssessmentDate Attribute1 Attribute2 Attribute3
1 2020-01-01 00:00:00.000 -01:00 20.13 123.11 405.00
1 2021-02-01 00:00:00.000 -01:00 1.03 78.93 11.34
1 2021-03-01 00:00:00.000 -01:00 15.03 310.10 23.15
2 2021-02-01 00:00:00.000 -01:00 14.31 41.30 63.20
2 2021-03-01 00:03:45.000 -01:00 0.05 3.50 1.30
Desired format:
UserId LastAssessmentDate Attribute1_M-2 Attribute2_M-1 ... Attribute3_M0
1 2021-03-01 00:00:00.000 -01:00 20.13 123.11 23.15
2 2021-03-01 00:03:45.000 -01:00 NULL 41.30 1.30
Either SQL or Pandas - both work for me. Thanks for the help!
I have a table
ID Value Date
1 10 2017-10-02 02:50:04.480
2 20 2017-10-01 07:28:53.593
3 30 2017-09-30 23:59:59.000
4 40 2017-09-30 23:59:59.000
5 50 2017-09-30 02:36:07.520
I compare Value with previous date. But, I don't need compare result between first day in current month and last day in previous month. For this table, I don't need to compare result between 2017-10-01 07:28:53.593 and 2017-09-30 23:59:59.000 How it can be done?
Result table for this example:
ID Value Date Diff
1 10 2017-10-02 02:50:04.480 10
2 20 2017-10-01 07:28:53.593 NULL
3 30 2017-09-30 23:59:59.000 10
4 40 2017-09-29 23:59:59.000 10
5 50 2017-09-28 02:36:07.520 NULL
You can use this.
SELECT * ,
LEAD(Value) OVER( PARTITION BY DATEPART(YEAR,[Date]), DATEPART(MONTH,[Date]) ORDER BY ID ) - Value AS Diff
FROM MyTable
ORDER BY ID
you can use a query like below
select *,
diff=LEAD(Value) OVER( PARTITION BY Month(Date),Year(Date) ORDER BY Date desc)-Value
from t
order by id asc
see working demo