Period and Quarter Sequence - sql

I'm trying to find a way to do a sequence for date periods and quarters(not sure if this is the correct term).
Basically this will help people to navigate dates based on weeks, periods, and quarters once I join this to our sales data. For example, if I just want to know the sales from last week, I could just use WHERE WeekSequence = -1... Another example is, a manager wants to get the sales data for the past quarter, I could just use WHERE QuarterSequence = -1... something like that.
My current table:
WeekStartDate WeekEndDate CurrentWeek Period Quarter WeekSequence
----------------------------------------------------------------------
2020-08-03 2020-08-09 0 2 1 -5
2020-08-10 2020-08-16 0 2 1 -4
2020-08-17 2020-08-23 0 2 1 -3
2020-08-24 2020-08-30 0 2 1 -2
2020-08-31 2020-09-06 0 2 1 -1
2020-09-07 2020-09-13 1 3 1 0
2020-09-14 2020-09-20 0 3 1 1
2020-09-21 2020-09-27 0 3 1 2
2020-09-28 2020-10-04 0 3 1 3
2020-10-05 2020-10-11 0 4 2 4
2020-10-12 2020-10-18 0 4 2 5
What I want it to look like(highlighted):

If I understand correctly, just use window functions:
select t.*,
(period -
max(case when currentweek = 1 then period end) over ()
) as periodsequence,
(quarter -
max(case when currentweek = 1 then quarter end) over ()
) as quartersequence
from t;
You can include this in a view rather than putting it in a table.

Related

How to count consecutive days in a table where days are duplicated "PostgresSQL"

Hello I would like to know the highest count of consecutive days a user has trained for.
My logs table that stores the records looks like this:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
The closest I could get is with this query, which does work only if the user has trained on one ground at a day.
SELECT COUNT(*) AS days_in_row
FROM (SELECT row_number() OVER (ORDER BY day) - day AS grp
FROM logs
WHERE created_at >= '2023-01-24 00:00:00'
AND user_id = 1) x
GROUP BY grp
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
This query would return a count of 5 consecutive days which is correct.
However my query doesn't work once a user trains multiple times on different training grounds in one day:
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
3
2
2023-01-26 10:00:00
5
1
4
1
2023-01-27 10:00:00
Than the query from above would return a count of 2 consecutive days which is not what I expect instead I would expect the number four because the user has trained the following days in row (1,2,3,4).
Thank you for reading.
Select only distinct data of interest first
SELECT min(created_at) start, COUNT(*) AS days_in_row
FROM (SELECT created_at, row_number() OVER (ORDER BY day) - day AS grp
FROM (
select distinct day, created_at
from logs
where created_at >= '2023-01-24 00:00:00'
AND user_id = 1) t
) x
GROUP BY grp

filter data based on month start and month end

Given a dataframe with date column in this format.
Date Group
2020-05-18 1
2020-06-22 1
2019-07-11 1
2018-03-01 1
2021-01-21 2
2021-05-05 2
2021-09-11 2
And two strings;
Start = 2020-05 (indicating month start)
End = 2021-09 (indicating month end)
I want to filter out the data so that only the dates that fall within the start and end date are available in the dataframe.
Expected output:
Date Group
2020-05-18 1
2020-06-22 1
2021-01-21 2
2021-05-05 2
2021-09-11 2
# Creating dummy data
d = {'dt':['2020-05-18',
'2020-06-22',
'2019-07-11',
'2018-03-01',
'2021-01-21',
'2021-05-05',
'2021-09-11'],
'group':[1,1,1,1,2,2,2]}
dt_df = pd.DataFrame(data=d)
dt_df
dt_df['dt'] = pd.to_datetime(dt_df['dt'])
dt_df
Inital Input:
0 2020-05-18
1 2020-06-22
2 2019-07-11
3 2018-03-01
4 2021-01-21
5 2021-05-05
6 2021-09-11
Name: dt, dtype: datetime64[ns]
Start = '2020-05'
End = '2021-09'
Start = pd.to_datetime(Start)
End = pd.to_datetime(End)
End = End+np.timedelta64(1, 'M')
Use loc to select only dates between Start and End timestamp.
dt_df.loc[(dt_df['dt'] - Start >= np.timedelta64(0,'D')) & (dt_df['dt'] - End <= np.timedelta64(0, 'D'))]
Output:
dt group
0 2020-05-18 1
1 2020-06-22 1
4 2021-01-21 2
5 2021-05-05 2
6 2021-09-11 2

How do I use SQL window to sum rows with a condition

Assume this is my table:
id start_date event_date sales
------------------------------------
1 2020-09-09 2020-08-30 27.9
1 2020-09-09 2020-09-01 15
1 2020-09-09 2020-09-05 25
1 2020-09-09 2020-09-06 20.75
2 2020-09-09 2020-01-30 5
2 2020-09-09 2020-08-01 12
I'm trying to use a window function, where I want to sum sales in event_date for 7 days prior to the start date for each id, so the output I'm trying to reach looks like this...
id start_date event_date sales sales_7_days
-------------------------------------------------
1 2020-09-09 2020-08-30 27.9 0
1 2020-09-09 2020-09-01 15 0 <---- this is not within 7 days of start_date
1 2020-09-09 2020-09-05 25 25 <---- this is within 7 days of start_date
1 2020-09-09 2020-09-06 20.75 40.75 <---- this is within 7 days of start_date
2 2020-09-09 2020-01-30 5 0
2 2020-09-09 2020-09-03 12 12
This is what I've tried so far, but the problem is it seems to start summing from 7 days previous to event_date rather than start_date.
SELECT
id,
start_date,
event_date,
sales,
CASE WHEN event_date >= DATE_ADD(start_date, -7) THEN SUM(sales) \
OVER(PARTITION BY id ORDER BY event_date RANGE BETWEEN INTERVAL 7 DAYS PRECEDING AND CURRENT ROW) ELSE 0 END AS sales_7_days
FROM
sample_df
ORDER BY
id,
start_date,
event_date
So the query above is producing the below (which I don't want, because the window sum starts from event_date rather than start_date)
id start_date event_date sales sales_7_days
-------------------------------------------------
1 2020-09-09 2020-08-30 27.9 0
1 2020-09-09 2020-09-01 15 0
1 2020-09-09 2020-09-05 25 67.9
1 2020-09-09 2020-09-06 20.75 60.75
2 2020-09-09 2020-01-30 5 0
2 2020-09-09 2020-09-03 12 17
Does anybody have any tips here?
where I want to sum sales in event_date for 7 days prior to the start date for each id
Because the start date is constant for each id, this is a constant. You can calculate it as:
select s.*,
sum(case when event_date <= start_date and event_date >= start_date - interval 7 day
then sales
end) over (partition by id)
from sample_df s;
Your results suggest, though, that you really want a cumulative sum based on the event_date. That's fine, but a different question. The answer for that is to tweak the SQL:
select s.*,
sum(case when event_date <= start_date and event_date >= start_date - interval 7 day
then sales
end) over (partition by id order by event_date)
from sample_df s;

Get all dates for all date ranges in table using SQL Server

I have table dbo.WorkSchedules(Id, From, To) where I store date ranges for work schedules. I want to create a view that will have all dates for all rows of WorkSchedules. Thanks to this I have 1 view with all dates for all schedules.
On web I only found solutions for 1 row like 2 parameters start and end. My issue is different where I have multiple rows with start and end range.
Example:
WorkSchedules
Id | From | To
---+------------+-----------
1 | 2018-01-01 | 2018-01-05
2 | 2018-01-08 | 2018-01-12
Desired result
1 | 2018-01-01
2 | 2018-01-02
3 | 2018-01-03
4 | 2018-01-04
5 | 2018-01-05
6 | 2018-01-08
7 | 2018-01-09
8 | 2018-01-10
9 | 2018-01-11
10| 2018-01-12
If you are regularly dealing with "jobs" and "schedules" then I propose that you need a permanent calendar table (a table where each row is a unique date). You can create rows for dates dynamically but why do this many times when you can do it once and just re-use?
A calendar table, even of several decades, isn't "big" and when indexed they can be very fast as well. You can also store information about holidays and/or fiscal periods etc.
There are many scripts available to produce these tables, here's an answer with 2 scripts on this site: https://stackoverflow.com/a/5635628/2067753
Assuming you use the second (more comprehensive) script, then you can exclude weekends, or other conditions such as holidays, from query results.
Once you have a permanent Calendar table this style of query may be used:
CREATE TABLE WorkSchedules(
Id INTEGER NOT NULL PRIMARY KEY
,[From] DATE NOT NULL
,[To] DATE NOT NULL
);
INSERT INTO WorkSchedules(Id,[From],[To]) VALUES (1,'2018-01-01','2018-01-05');
INSERT INTO WorkSchedules(Id,[From],[To]) VALUES (2,'2018-01-12','2018-01-12');
with range as (
select min(ws.[From]) as dt_from, max(ws.[To]) dt_to
from WorkSchedules as ws
)
select c.*
from calendar as c
inner join range on c.date between range.dt_from and range.dt_to
where c.KindOfDay = 'BANKDAY'
order by c.date
and the result looks like this (note: "News Years Day" has been excluded)
Date Year Quarter Month Week Day DayOfYear Weekday Fiscal_Year Fiscal_Quarter Fiscal_Month KindOfDay Description
---- --------------------- ------ --------- ------- ------ ----- ----------- --------- ------------- ---------------- -------------- ----------- -------------
1 02.01.2018 00:00:00 2018 1 1 1 2 2 2 2018 1 1 BANKDAY NULL
2 03.01.2018 00:00:00 2018 1 1 1 3 3 3 2018 1 1 BANKDAY NULL
3 04.01.2018 00:00:00 2018 1 1 1 4 4 4 2018 1 1 BANKDAY NULL
4 05.01.2018 00:00:00 2018 1 1 1 5 5 5 2018 1 1 BANKDAY NULL
5 08.01.2018 00:00:00 2018 1 1 2 8 8 1 2018 1 1 BANKDAY NULL
6 09.01.2018 00:00:00 2018 1 1 2 9 9 2 2018 1 1 BANKDAY NULL
7 10.01.2018 00:00:00 2018 1 1 2 10 10 3 2018 1 1 BANKDAY NULL
8 11.01.2018 00:00:00 2018 1 1 2 11 11 4 2018 1 1 BANKDAY NULL
9 12.01.2018 00:00:00 2018 1 1 2 12 12 5 2018 1 1 BANKDAY NULL
Without the where clause the full range is:
Date Year Quarter Month Week Day DayOfYear Weekday Fiscal_Year Fiscal_Quarter Fiscal_Month KindOfDay Description
---- --------------------- ------ --------- ------- ------ ----- ----------- --------- ------------- ---------------- -------------- ----------- ----------------
1 01.01.2018 00:00:00 2018 1 1 1 1 1 1 2018 1 1 HOLIDAY New Year's Day
2 02.01.2018 00:00:00 2018 1 1 1 2 2 2 2018 1 1 BANKDAY NULL
3 03.01.2018 00:00:00 2018 1 1 1 3 3 3 2018 1 1 BANKDAY NULL
4 04.01.2018 00:00:00 2018 1 1 1 4 4 4 2018 1 1 BANKDAY NULL
5 05.01.2018 00:00:00 2018 1 1 1 5 5 5 2018 1 1 BANKDAY NULL
6 06.01.2018 00:00:00 2018 1 1 1 6 6 6 2018 1 1 SATURDAY NULL
7 07.01.2018 00:00:00 2018 1 1 1 7 7 7 2018 1 1 SUNDAY NULL
8 08.01.2018 00:00:00 2018 1 1 2 8 8 1 2018 1 1 BANKDAY NULL
9 09.01.2018 00:00:00 2018 1 1 2 9 9 2 2018 1 1 BANKDAY NULL
10 10.01.2018 00:00:00 2018 1 1 2 10 10 3 2018 1 1 BANKDAY NULL
11 11.01.2018 00:00:00 2018 1 1 2 11 11 4 2018 1 1 BANKDAY NULL
12 12.01.2018 00:00:00 2018 1 1 2 12 12 5 2018 1 1 BANKDAY NULL
and weekends and holidays may be excluded using the column KindOfDay
See this as a demonstration (with build of calendar table) here: http://rextester.com/CTSW63441
Ok, I worked this out for you, thinking you mean that you meant 01/08/2018 as a From date in the second row.
/*WorkSchedules
Id| From | To
1 | 2018-01-01 | 2018-01-05
2 | 2018-01-08 | 2018-01-12
*/
--DROP TABLE #WorkSchedules;
CREATE TABLE #WorkSchedules (
ID int,
[DateFrom] DATE,
[DateTo] DATE
)
INSERT INTO #WorkSchedules
SELECT 1, '2018-01-01', '2018-01-05'
UNION
SELECT 2, '2018-01-08', '2018-01-12'
;WITH CTEDATELIMITS AS (
SELECT [DateFrom], [DateTo]
FROM #WorkSchedules
)
,CTEDATES AS
(
SELECT [DateFrom] as [DateResult] FROM CTEDATELIMITS
UNION ALL
SELECT DATEADD(Day, 1, [DateResult]) FROM CTEDATES
JOIN CTEDATELIMITS ON CTEDATES.[DateResult] >= CTEDATELIMITS.[DateFrom]
AND CTEDATES.dateResult < CTEDATELIMITS.[DateTo]
)
SELECT [DateResult] FROM CTEDATES
ORDER BY [DateResult]
You would use a recursive CTE:
with dates as (
select from, to, from as date
from WorkSchedules
union all
select from, to, dateadd(day, 1, date)
from dates
where date < to
)
select row_number() over (order by date), date
from dates;
Note that from and to are reserved words in SQL. They are lousy names for identifiers. I have not escaped them because I assume they are not the actual names of the columns.

find nonbreaking period with condition

There are quotas for hotels per day in a table. How to get number of days when hotel is daily available?
q_id q_hotel q_date q_value
1 1 2013-02-01 1
2 1 2013-02-02 1
3 1 2013-02-03 1
4 1 2013-02-04 0
5 1 2013-02-05 2
6 1 2013-02-06 3
7 1 2013-02-07 3
8 1 2013-02-08 2
9 1 2013-02-09 0
10 1 2013-02-10 0
11 1 2013-02-11 1
12 1 2013-02-12 1
Wanted output:
q_hotel q_date days_available
1 2013-02-01 3
1 2013-02-02 2
1 2013-02-03 1
1 2013-02-04 0
1 2013-02-05 4
1 2013-02-06 3
1 2013-02-07 2
1 2013-02-08 1
1 2013-02-09 0
1 2013-02-10 0
1 2013-02-11 2
1 2013-02-12 1
For now I can get number of days if there is zero quote after needed date exists - I find closest unavailable day and calculate dates difference.
http://sqlfiddle.com/#!12/1a64c/14
select q_hotel
,q_date
,(select extract(day from (min(B.q_date)-A.q_date)) from Table1 B where B.q_date>A.q_date
and B.q_value=0 and A.q_value<>0)
from Table1 A
But there is a problem when I don't have a zero closing date.
Here is a solution:
select
a.q_date
, a.q_hotel
, case
when
a.q_value = 0
then
0
else
(
select
extract
( day from
min ( b.q_date ) - a.q_date + interval '1 day'
)
from table1 b
where b.q_date >= a.q_date
and b.q_hotel = a.q_hotel
and not exists
(
select 1
from table1 c
where c.q_date = b.q_date + interval '1 day'
and b.q_hotel = a.q_hotel
and q_value <> 0
)
)
end as days_available
from table1 a