selecting a date n days ago excluding weekends - sql

I have a table with daily dates starting from 31st December 1999 up to 31st December 2050, excluding weekends.
Say given a particular date, for this example lets use 2019-03-14. I want to pick the date that was 30 days previous (the number of days needs to be flexible as it won't always be 30), ignoring weekends which in this case would be 2019-02-01.
How to do this?
I wrote the query below & it indeed lists 30 days previous to the specified date.
select top 30 Date
from DateDimension
where IsWeekend = 0 and Date <= '2019-03-14'
order by Date desc
So I thought I could use the query below to get the correct answer of 2019-02-01
;with ds as
(
select top 30 Date
from DateDimension
where IsWeekend = 0 and Date <= '2019-03-14'
)
select min(Date) from ds
However this doesn't work. It returns me the first date in my table, 1999-12-31.
2019-03-14
2019-03-13
2019-03-12
2019-03-11
2019-03-08
2019-03-07
2019-03-06
2019-03-05
2019-03-04
2019-03-01
2019-02-28
2019-02-27
2019-02-26
2019-02-25
2019-02-22
2019-02-21
2019-02-20
2019-02-19
2019-02-18
2019-02-15
2019-02-14
2019-02-13
2019-02-12
2019-02-11
2019-02-08
2019-02-07
2019-02-06
2019-02-05
2019-02-04
2019-02-01

TOP is meaningless without an ORDER BY, so you could do something like
;with ds as
(
select top 30 Date
from DateDimension
where IsWeekend = 0 and Date <= '2019-03-14'
order by Date DESC
)
select min(Date) from ds;
even better would be to use the ANSI syntax instead of TOP:
select Date
from DateDimension
where IsWeekend = 0 and Date <= '2019-03-14'
order by Date DESC
OFFSET 30 ROWS FETCH NEXT 1 ROW ONLY;
DISCLAIMER - code not tested since you did not provide DDL and sample data
HTH

Related

Rolling Sum Calculation Based on 2 Date Fields

Giving up after a few hours of failed attempts.
My data is in the following format - event_date can never be higher than create_date.
I'd need to calculate on a rolling n-day basis (let's say 3) the sum of units where the create_date and event_date were within the same 3-day window. The data is illustrative but each event_date can have over 500+ different create_dates associated with it and the number isn't constant. There is a possibility of event_dates missing.
So let's say for 2022-02-03, I only want to sum units where both the event_date and create_date values were between 2022-02-01 and 2022-02-03.
event_date
create_date
rowid
units
2022-02-01
2022-01-20
1
100
2022-02-01
2022-02-01
2
100
2022-02-02
2022-01-21
3
100
2022-02-02
2022-01-23
4
100
2022-02-02
2022-01-31
5
100
2022-02-02
2022-02-02
6
100
2022-02-03
2022-01-30
7
100
2022-02-03
2022-02-01
8
100
2022-02-03
2022-02-03
9
100
2022-02-05
2022-02-01
10
100
2022-02-05
2022-02-03
11
100
The output I'd need to get to (added in brackets the rows I'd need to include in the calculation for each date but my result would only need to include the numerical sum) . I tried calculating using either dates but neither of them returned the results I needed.
date
units
2022-02-01
100 (Row 2)
2022-02-02
300 (Row 2,5,6)
2022-02-03
300 (Row 2,6,8,9)
2022-02-04
200 (Row 6,9)
2022-02-05
200 (Row 9,11)
In Python I solved above with a definition that looped through filtering a dataframe for each date but I am struggling to do the same in SQL.
Thank you!
Consider below approach
with events_dates as (
select date from (
select min(event_date) min_date, max(event_date) max_date
from your_table
), unnest(generate_date_array(min_date, max_date)) date
)
select date, sum(units) as units, string_agg('' || rowid) rows_included
from events_dates
left join your_table
on create_date between date - 2 and date
and event_date between date - 2 and date
group by date
if applied to sample data in your question - output is

Rolling Sum for Last 12 Months in SQL

I'm trying to get the rolling sum for the past 12 months (Oct 2019-Sept 2020, etc.)> So far, I figured out to get the current year total (which I also want), however, I'm stuck on a legit 12 month rolling sum.
SELECT DATEADD(MONTH, DATEDIFF(Month, 0, ENTRY_DATE), 0) AS Payout_Month, SUM(PRINCIPAL_AMT) Payout_amt,
SUM(SUM(PRINCIPAL_AMT)) OVER (PARTITION BY YEAR(ENTRY_DATE) ORDER BY MIN(ENTRY_DATE)) as YearRollingSum,
SUM(SUM(PRINCIPAL_AMT)) OVER (PARTITION BY Year(ENTRY_DATE)
ORDER BY MIN(ENTRY_DATE)
ROWS BETWEEN 12 PRECEDING AND 1 PRECEDING
) AS TwelveMonthRollingSum
FROM ACCOUNTHISTORY
WHERE LEFT(TOKEN_STRING, 4) LIKE '%Py%'
AND FOCUS_TELLER_ID = 6056
AND PRINCIPAL_AMT > 0 AND PRINCIPAL_AMT < 25
AND ENTRY_DATE >= '01/01/2019'
GROUP BY DATEADD(MONTH, DATEDIFF(Month, 0, ENTRY_DATE), 0), YEAR(ENTRY_DATE)
Order BY DATEADD(MONTH, DATEDIFF(Month, 0, ENTRY_DATE), 0)
here's what my current output looks like
Payout_Month Payout_amt YearRollingSum TwelveMonthRollingSum
2019-01-01 00:00:00.000 5696.50 5696.50 NULL
2019-02-01 00:00:00.000 11205.60 16902.10 5696.50
2019-03-01 00:00:00.000 23341.50 40243.60 16902.10
2019-04-01 00:00:00.000 25592.80 65836.40 40243.60
2019-05-01 00:00:00.000 28148.30 93984.70 65836.40
2019-06-01 00:00:00.000 27190.90 121175.60 93984.70
2019-07-01 00:00:00.000 25079.80 146255.40 121175.60
2019-08-01 00:00:00.000 30206.90 176462.30 146255.40
2019-09-01 00:00:00.000 28000.80 204463.10 176462.30
2019-10-01 00:00:00.000 29076.60 233539.70 204463.10
2019-11-01 00:00:00.000 29001.30 262541.00 233539.70
2019-12-01 00:00:00.000 28366.00 290907.00 262541.00
2020-01-01 00:00:00.000 32062.40 32062.40 NULL
2020-02-01 00:00:00.000 28526.70 60589.10 32062.40
2020-03-01 00:00:00.000 29056.50 89645.60 60589.10
2020-04-01 00:00:00.000 28016.00 117661.60 89645.60
2020-05-01 00:00:00.000 25173.30 142834.90 117661.60
2020-06-01 00:00:00.000 27646.10 170481.00 142834.90
2020-07-01 00:00:00.000 36083.70 206564.70 170481.00
2020-08-01 00:00:00.000 34872.20 241436.90 206564.70
2020-09-01 00:00:00.000 35727.10 277164.00 241436.90
2020-10-01 00:00:00.000 34030.80 311194.80 277164.00
AS you can see, it resets at the beginning of the year for the last column. Any ideas?
Basically, you want to remove the partition by clause from the rolling 12 month sum. I would also suggest a few optimizations to the query:
select
x.payout_month,
sum(ah.principal_amt) payout_amt,
sum(sum(ah.principal_amt)) over (
partition by year(x.payout_month)
order by x.payout_month
) as yearrollingsum,
sum(sum(ah.principal_amt)) over (
order by x.payout_month
rows between 12 preceding and 1 preceding
) as twelvemonthrollingsum
from accounthistory ah
cross apply (values (datefromparts(year(ah.entry_date), month(entry_date), 1))) x(ah.payout_month)
where
left(ah.token_string, 4) like '%py%'
and ah.focus_teller_id = 6056
and ah.principal_amt > 0 and principal_amt < 25
and ah.entry_date >= '20190101'
group by x.payout_month
order by x.payout_month
The main change is that the payout_month is computed only once, in a lateral join, using datefromparts(). You can then use it all over the query, and consistently in the order by clauses of the window functions.
Note that your strategy will fail to produce a proper results if you ever have a month without any sale (the rows clause of the window function will spread over the preceding month, which is not what you want). If that's something that may happen, then an alternative is a subquery, or another lateral join.

Impala: Split single row into multiple rows based on Date and time

I want to split a single row into multiple rows based on time.
SrNo Employee StartDate EndDate
---------------------------------------------------------------------------
1 emp1 30/03/2020 09:00:00 31/03/2020 07:15:00
2 emp2 01/04/2020 09:00:00 02/04/2020 08:00:00
Expected output is below:
SrNo Employee StartDate EndDate
---------------------------------------------------------------------------
1 emp1 30/03/2020 09:00:00 30/03/2020 11:59:00
1 emp1 31/03/2020 00:00:00 31/03/2020 07:15:00
2 emp2 01/04/2020 09:00:00 01/04/2020 11:59:00
2 emp2 02/04/2020 00:00:00 02/04/2020 08:00:00
Day start from 00:00 AM to next day 00:00 AM. When EndDate time is greater than 00:00 AM (midnight) then split this date in two rows. First row end date is 30/03/2020 11:59:00 and next row start 31/03/2020 00:00:00.
Please help me to get is solved.
This would be a good spot for a recursive CTE, but unfortunatly Hive does not support those. Here is another aproach, that uses a derived table of numbers to split the periods:
select
t.SrNo,
t.Employee,
greatest(t.startDate, date_add(to_date(t.startDate), x.n)) startDate,
least(t.endDate, date_add(to_date(t.startDate), x.n + 1)) endDate
from mytable t
inner join (select 0 n union all select 1 union all select 2) x
on date_add(to_date(t.startDate), x.n) <= t.endDate
You can expand the subquery to handle more possible periods per row.
Also note that this generates half-open intervals, where the end of the preceding interval is equal to the start of the next one (while in your resultset there is a one minute lag). The logic is that the interval is inclusive on its smaller bound and exclusive on the the outer bound (that way, you make sure to not leave any gap).

Compare values for consecutive dates of same month

I have a table
ID Value Date
1 10 2017-10-02 02:50:04.480
2 20 2017-10-01 07:28:53.593
3 30 2017-09-30 23:59:59.000
4 40 2017-09-30 23:59:59.000
5 50 2017-09-30 02:36:07.520
I compare Value with previous date. But, I don't need compare result between first day in current month and last day in previous month. For this table, I don't need to compare result between 2017-10-01 07:28:53.593 and 2017-09-30 23:59:59.000 How it can be done?
Result table for this example:
ID Value Date Diff
1 10 2017-10-02 02:50:04.480 10
2 20 2017-10-01 07:28:53.593 NULL
3 30 2017-09-30 23:59:59.000 10
4 40 2017-09-29 23:59:59.000 10
5 50 2017-09-28 02:36:07.520 NULL
You can use this.
SELECT * ,
LEAD(Value) OVER( PARTITION BY DATEPART(YEAR,[Date]), DATEPART(MONTH,[Date]) ORDER BY ID ) - Value AS Diff
FROM MyTable
ORDER BY ID
you can use a query like below
select *,
diff=LEAD(Value) OVER( PARTITION BY Month(Date),Year(Date) ORDER BY Date desc)-Value
from t
order by id asc
see working demo

How to select periods of time with empty data?

I want to find out all periods with empty data, given the following table my_table:
id day
29 2017-06-05
26 2017-06-05
30 2017-06-06
30 2017-06-06
21 2017-06-06
21 2017-07-01
29 2017-07-01
30 2017-07-20
The answer would be:
Empty_start Empty_end
2017-06-07 2017-06-30
2017-07-02 2017-07-19
It's important that the number of months is considered. For example, in the first row the answer 2017-06-31 would be incorrect.
How can I write this query in Hive?
You can use lag() or lead():
select date_add(day, 1) as empty_start, date_add(next_day, -1) as empty_end
from (select day,
lead(day) over (order by day) as next_day
from t
group by day
) t
where next_day <> date_add(day, 1);