SQL Partition Data by Date Range ignoring date gaps and weekends - sql

Thank you in advance for your patience, and help!
I am trying to partition my data in a way that displays date ranges.
IMAGE: Data Set - Current Results - Desired Results
In the image you can see what my data set looks like. The results I'm currently getting. As well as, the results I would like to get.
Here is the code that I've got that is getting me my current results. I'm struggling understanding PARTITION.
Edit I can bring Saturday and Sunday back in, if the data is necessary to have all 365 consecutive days. I'm simply removing it from the Data Source in the WHERE clause.
--DELETE TEMP TABLE USED TO STORE CONSECUTIVE ABSENCES
DROP TABLE IF EXISTS #StuConsecAtt;
--CREATE TEMP TABLE THAT STORES CONSECUTIVE ABSENCE DATE RANGES
CREATE TABLE #StuConsecAtt(
SIS_NUMBER INT,
ABS_FROM DATE,
ABS_TO DATE
);
--INSERT CONSECUTIVE ABSENCE DATA NEW TABLE
WITH stuAtt
AS (
SELECT *
,DATEADD(DAY, - ROW_NUMBER() OVER (
PARTITION BY SIS_NUMBER ORDER BY ABS_DATE
), ABS_DATE) AS grp
FROM #stuCalAtt
)
INSERT INTO #StuConsecAtt
(ABS_FROM, ABS_TO, SIS_NUMBER)
SELECT min(ABS_DATE) AS [From]
,max(ABS_DATE) AS [To]
-- ,[ABS_REASON]
,SIS_NUMBER
FROM stuATT
GROUP BY SIS_NUMBER
,grp
ORDER BY [From];
SELECT * FROM #StuConsecAtt
WHERE ABS_TO > ABS_FROM;
EDIT BELOW
DATA
Looking at the data I'm trying to put consecutive days with ABSENT_DATE = Y in a single group. Below 10/4 through 10/11 are consecutive (but the weekends would be ABSENT_DAY = N) so I removed the weekends. No because 10/4 through 10/11 are grouped together (consecutive in the dataset), all with ABSENT_DAY = Y, I would like to group them so I can get the outcome range of 10/4-10/11. Just like the following range would be 10/18 - 10/19. Where the weekend gap is, is cause the issue.
SIS_NUMBER CALENDAR_DATE WEEK_DAY ABS_DATE SCHOOL_DAY ABSENT_DAY
641861 2017-10-03 Tuesday NULL Y N
641861 2017-10-04 Wednesday 2017-10-04 Y Y
641861 2017-10-05 Thursday 2017-10-05 Y Y
641861 2017-10-06 Friday 2017-10-06 Y Y
641861 2017-10-09 Monday 2017-10-09 Y Y
641861 2017-10-10 Tuesday 2017-10-10 Y Y
641861 2017-10-11 Wednesday 2017-10-11 Y Y
641861 2017-10-12 Thursday NULL N N
641861 2017-10-13 Friday NULL N N
641861 2017-10-16 Monday NULL Y N
641861 2017-10-17 Tuesday NULL Y N
641861 2017-10-18 Wednesday 2017-10-18 Y Y
641861 2017-10-19 Thursday 2017-10-19 Y Y
CURRENT RESULTS
SIS_NUMBER FROM_DATE TO_DATE
641861 2017-10-04 2017-10-06
641861 2017-10-09 2017-10-11
641861 2017-10-18 2017-10-19
DESIRED RESULTS
SIS_NUMBER FROM_DATE TO_DATE
641861 2017-10-04 2017-10-11
641861 2017-10-18 2017-10-19

Related

How to break datetime in 12 hour chunks and use it for aggregation in Presto SQL?

I have been trying to break the datetime in 12 hour chunk in Presto SQL but was unsuccessful.
Raw data table:
datetime
Login
2022-05-08 07:10:00.000
1234
2022-05-09 23:20:00.000
5678
2022-05-09 06:20:00.000
5674
2022-05-08 09:20:00.000
8971
The output table should look like below. I have to get count of login in 12 hour chunks. So, first should be from 00:00:00.000 to 11:59:00:000 and the next chunk from 12:00:00.000 to 23:59:00:000
Output:
datetime
count
2022-05-08 00:00:00.000
2
2022-05-08 12:00:00.000
0
2022-05-09 00:00:00.000
1
2022-05-09 12:20:00.000
1
This should work:
Extract the hour from the timestamp, then integer divide it by 12. That will make it 0 till 11:59, and 1 till 23:59. Then, multiply that back by 12.
Use that resulting integer to DATE_ADD() it with unit 'HOUR' to the timestamp of the row truncated to the day.
SELECT
DATE_ADD('HOUR',(HOUR(ts) / 12) * 12, TRUNC(ts,'DAY')) AS halfday
, SUM(login) AS count_login
FROM indata
GROUP BY
halfday
;
-- out halfday | count_login
-- out ---------------------+-------------
-- out 2022-05-08 00:00:00 | 15879
-- out 2022-05-08 12:00:00 | 5678
This query worked for me.
SELECT
DATE_ADD('HOUR',(HOUR(ts) / 12) * 12, date_trunc('DAY',ts)) AS halfday
, SUM(login) AS count_login
FROM indata
GROUP BY
halfday
;

aligning tables with different dates

I have two tables, called tblDaily and tblWeekly.
So tblDaily contains daily data & tblWeekly contains data that is stored every friday.
So obviously it is easy to join the daily table to the weekly table when the date in the daily data is a friday.
My question is what is the best way to join when the date is not a friday. So for example say I had the date 2018-05-09 (Wednesday) I would like to join it on the previous friday (2018-05-04). What is the optimal way of doing this?
I read about a calendar table, would that be the correct way to go? Although I'm not sure how that would work in this case?
tblDaily
date val
2018-04-30 2 'mon
2018-05-01 3 'tues
2018-05-02 3 'wed
2018-05-03 3 'thurs
2018-05-04 3 'fri
2018-05-07 2 'mon
2018-05-08 3 'tues
2018-05-09 3 'wed
2018-05-10 3 'thurs
2018-05-11 3 'fri
2018-05-14 3 'mon
tblWeekly
date val
2018-05-04 2 'fri
2018-05-11 3 'fri
This might work:
SELECT
[dailydate] = D.[date],
[dailyval] = D.[val],
[weeklydate] = W.[date],
[weeklyval] = W.[val]
FROM
[tblDaily] AS D
OUTER APPLY (SELECT TOP (1) _W.*
FROM [tblWeekly] AS _W
WHERE _W.[date] <= D.[date]
ORDER BY _W.[date] DESC) AS W;
This query produces the following results:
dailydate dailyval weeklydate weeklyval
2018-04-30 2 NULL NULL
2018-05-01 3 NULL NULL
2018-05-02 3 NULL NULL
2018-05-03 3 NULL NULL
2018-05-04 3 2018-05-04 2
2018-05-07 2 2018-05-04 2
2018-05-08 3 2018-05-04 2
2018-05-09 3 2018-05-04 2
2018-05-10 3 2018-05-04 2
2018-05-11 3 2018-05-11 3
2018-05-14 3 2018-05-11 3
Try something like this:
select * from tblDaily a join tblWeekly b on a.date1= dateadd(day,-5,b.date2)
Try this simple join:
select *
from tblDaily [d]
--first condition in join is to match firdays exactly
left join tblWeekly [w] on [w].[date] = [d].[date] or
--here you are joining fridays from tblWeekly to last friday before the date in tblDaily
[w].[date] = dateadd(day, -datepart(weekday, [d].[date]) - 1, [d].[date])
Here is SQL fiddle.

Compare values for consecutive dates of same month

I have a table
ID Value Date
1 10 2017-10-02 02:50:04.480
2 20 2017-10-01 07:28:53.593
3 30 2017-09-30 23:59:59.000
4 40 2017-09-30 23:59:59.000
5 50 2017-09-30 02:36:07.520
I compare Value with previous date. But, I don't need compare result between first day in current month and last day in previous month. For this table, I don't need to compare result between 2017-10-01 07:28:53.593 and 2017-09-30 23:59:59.000 How it can be done?
Result table for this example:
ID Value Date Diff
1 10 2017-10-02 02:50:04.480 10
2 20 2017-10-01 07:28:53.593 NULL
3 30 2017-09-30 23:59:59.000 10
4 40 2017-09-29 23:59:59.000 10
5 50 2017-09-28 02:36:07.520 NULL
You can use this.
SELECT * ,
LEAD(Value) OVER( PARTITION BY DATEPART(YEAR,[Date]), DATEPART(MONTH,[Date]) ORDER BY ID ) - Value AS Diff
FROM MyTable
ORDER BY ID
you can use a query like below
select *,
diff=LEAD(Value) OVER( PARTITION BY Month(Date),Year(Date) ORDER BY Date desc)-Value
from t
order by id asc
see working demo

How to select periods of time with empty data?

I want to find out all periods with empty data, given the following table my_table:
id day
29 2017-06-05
26 2017-06-05
30 2017-06-06
30 2017-06-06
21 2017-06-06
21 2017-07-01
29 2017-07-01
30 2017-07-20
The answer would be:
Empty_start Empty_end
2017-06-07 2017-06-30
2017-07-02 2017-07-19
It's important that the number of months is considered. For example, in the first row the answer 2017-06-31 would be incorrect.
How can I write this query in Hive?
You can use lag() or lead():
select date_add(day, 1) as empty_start, date_add(next_day, -1) as empty_end
from (select day,
lead(day) over (order by day) as next_day
from t
group by day
) t
where next_day <> date_add(day, 1);

Computation of period Start date

I have a table that hold the start date and the end date of a financial period.
CHARGE_PERIOD_ID START_DATE END_DATE
13 2013-03-31 00:00:00.000 2013-04-27 00:00:00.000
14 2013-04-28 00:00:00.000 2013-05-25 00:00:00.000
15 2013-05-26 00:00:00.000 2013-06-29 00:00:00.000
16 2013-06-30 00:00:00.000 2013-07-27 00:00:00.000
17 2013-07-28 00:00:00.000 2013-08-24 00:00:00.000
18 2013-08-25 00:00:00.000 2013-09-28 00:00:00.000
19 2013-09-29 00:00:00.000 2013-10-26 00:00:00.000
20 2013-10-27 00:00:00.000 2013-11-23 00:00:00.000
21 2013-11-24 00:00:00.000 2013-12-28 00:00:00.000
22 2013-12-29 00:00:00.000 2014-01-25 00:00:00.000
23 2014-01-26 00:00:00.000 2014-02-22 00:00:00.000
24 2014-02-23 00:00:00.000 2014-03-29 00:00:00.000
The user of a report wants the current financial year split into 12 periods and want to give to feed in 2 parameters into the report , a year and a period number which will go into my sql . So something like #year=2014 #period=1 will be recieved . I have to write some sql to go to this table and set a period start date of 31/03/2014 and a period end date of 27/04/2014.
So in pseudo code:
Look up period 1 for 2014 and return period start date of 31/03/2014 and period end date of 27/04/2014.
#PERIOD_START_DATE = select the the first period that starts in March for the given year . all financial period starts in March.
#PERIOD_END_DATE = select the corresponding END_DATE from the table .
The question is how to begin to code this or my design approach? Should I create a function that calcualtes this or should I do a CTE and add a column which will hold the period number in the way they want etc .
Thinking about it more I think I need a mapping table . So the real question is can I do this without a mapping table ?
DECLARE #Year INT
DECLARE #Period INT
SET #Year= 2013
SET #Period = 1
;WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY
CASE WHEN MONTH([START_DATE])<3 THEN YEAR([START_DATE]) -1 ELSE YEAR([START_DATE]) END
ORDER BY
CASE WHEN MONTH([START_DATE])<3 THEN YEAR([START_DATE]) - 1 ELSE YEAR([START_DATE]) END
,CASE WHEN MONTH([START_DATE])<3 THEN MONTH([START_DATE]) + 12 ELSE MONTH([START_DATE]) END
) AS RN
FROM Periods
)
SELECT * FROM CTE
WHERE RN = #Period
AND CASE WHEN MONTH([START_DATE])<3 THEN YEAR([START_DATE]) -1 ELSE YEAR([START_DATE]) END = #Year
SQLFiddle DEMO