How to match the closet date in sql (redshift)? - sql

For example, my table A is work_schedule:
Employee_id
Week_start
Work_schedule
A
2021-01-03
Day shift
A
2021-01-10
Day shift
A
2021-01-17
Night shift
B
2020-12-27
Day shift
B
2021-01-03
Day shift
Table B is employee_history:
Employee_id
Calendar_date
Tenure
A
2020-12-20
0
A
2020-12-21
1
A
---
2-30
A
2021-01-19
31
A
2021-01-20
32
B
2020-12-15
0
B
2020-12-16
1
B
---
Employee can choose work schedule 2 weeks ahead, and I want to fetch tenure at the snapshot date (2 weeks ahead). For employee A, the 14 days time period can match a calendar_date. But for employee B, he started within 2 weeks. I want to have the closet date to the 2-week date.
The ideal output is:
Employee_id
Week_start
Work_schedule
Calendar_date (to calculate tenure)
Tenure (at 2 weeks ago)
A
2021-01-03
Day shift
2020-12-20
0
A
2021-01-10
Day shift
2020-12-27
7
A
2021-01-17
Night shift
2021-01-03
14
B
2020-12-27
Day shift
2020-12-15
0
B
2021-01-03
Day shift
2020-12-20
5
For one record to fetch closet date, I can use
order by abs(datediff(day, (week_start - 14), calendar_date)) asc
limit 1
For example, fetch ‘2020-12-15’ as the closest date to ‘2020-12-13’.
select employee_id, calendar_date, tenure
from employee_history h
where employee_id = B
order by abs(datediff(day, ('2020-12-27' - 14), date_key)) asc
limit 1
But I have more than one employees in this situation, how can I get the closest calendar_date for all those that cannot find a match for exactly 2 weeks?

Related

How to get last N week data in different year

I need to get last 6 weeks data from some table, right now the logic that I use is this
WEEK([date column]) BETWEEN WEEK(NOW()) - 6 AND WEEK(NOW())
It run as I want, but January is near and I realize that this query will not working as it is. I try to run my query on 15th January 2022, I only get data from 1st January to 15th January when I use my logic.
TGL MINGGU_KE
2022-01-01 | 1
2022-01-02 | 2
2022-01-03 | 2
2022-01-04 | 2
2022-01-05 | 2
2022-01-06 | 2
2022-01-07 | 2
2022-01-08 | 2
2022-01-09 | 3
2022-01-10 | 3
2022-01-11 | 3
2022-01-12 | 3
2022-01-13 | 3
2022-01-14 | 3
2022-01-15 | 3
Can I get the last 6 weeks data including last year?
This is my dbfiddle: https://dbfiddle.uk/o9BeAFJF
You can round the dates to the first day of the week using ROUND, TRUNC or THIS_WEEK
WITH
SEARCH_WEEK (TGL) AS (
VALUES date '2020-12-01'
UNION ALL
SELECT tgl + 1 DAY FROM SEARCH_WEEK WHERE tgl < CURRENT date
),
BASE_DATE (base_date) AS (
VALUES date '2022-01-15'
),
OPTIONS (OPTION, OPTION_BASE_DATE) AS (
SELECT OPTION, option_base_date FROM base_date CROSS JOIN LATERAL (
VALUES
('ROUND D', ROUND(base_date, 'D')),
('ROUND IW', ROUND(base_date, 'IW')),
('ROUND W', ROUND(base_date, 'W')),
('ROUND WW', ROUND(base_date, 'WW')),
('TRUNC D', TRUNC(base_date, 'D')),
('TRUNC IW', TRUNC(base_date, 'IW')),
('TRUNC W', TRUNC(base_date, 'W')),
('TRUNC WW', TRUNC(base_date, 'WW')),
('THIS_WEEK', THIS_WEEK(base_date)),
('THIS_WEEK + 1 DAY', THIS_WEEK(base_date) + 1 DAY)
) a (OPTION, OPTION_BASE_DATE)
)
SELECT
OPTION,
MIN(TGL) BEGIN,
max(tgl) END,
dayname(MIN(TGL)) day_BEGIN,
dayname(max(tgl)) day_end,
days_between(max(tgl), min(tgl)) + 1 duration_in_days
FROM
SEARCH_WEEK
CROSS JOIN options
WHERE
TGL BETWEEN option_base_date - 35 DAYS AND option_base_date + 6 DAYS
GROUP BY OPTION
OPTION
BEGIN
END
DAY_BEGIN
DAY_END
DURATION_IN_DAYS
ROUND D
2021-12-12
2022-01-22
Sunday
Saturday
42
ROUND IW
2021-12-13
2022-01-23
Monday
Sunday
42
ROUND W
2021-12-11
2022-01-21
Saturday
Friday
42
ROUND WW
2021-12-11
2022-01-21
Saturday
Friday
42
THIS_WEEK
2021-12-05
2022-01-15
Sunday
Saturday
42
THIS_WEEK + 1 DAY
2021-12-06
2022-01-16
Monday
Sunday
42
TRUNC D
2021-12-05
2022-01-15
Sunday
Saturday
42
TRUNC IW
2021-12-06
2022-01-16
Monday
Sunday
42
TRUNC W
2021-12-11
2022-01-21
Saturday
Friday
42
TRUNC WW
2021-12-11
2022-01-21
Saturday
Friday
42
fiddle
you can use dateadd to get first day of week six weeks ago like this:
Select * from tableName
where [dateColumn] between dateadd(WEEK,-6,getdate()) and getdate()
You can use DATEADD to get last 6 weeks of data as follows:
Select * from [TableName] where [DateColumn] between
DATEADD(WEEK,-6,GETDATE()) and GETDATE();

SQL: Calculate duration based on dates and parameters (Change Log)

I have a dataset that is like a ticketing system change log, I am trying to calculate kind of like an SLA time across the records, like how long did this specific ticket sit with this Group 2 for before it was either resolved or moved to another group to resolve.
The data looks like so:
ID
field
value
start
end
1
assignment_group
Group 1
2022-03-21 08:00:00
2022-03-21 08:05:00
1
incident_state
Work in Progress
2022-03-21 08:05:00
2022-03-21 08:30:00
1
assignment_group
Group 2
2022-03-21 08:35:00
2022-03-21 08:50:00
1
assigned_to
User 1
2022-03-21 08:50:00
2022-03-21 08:51:00
1
incident_state
Work in Progress
2022-03-21 09:00:00
2022-03-21 09:30:00
1
incident_state
Resolved
2022-03-21 09:30:00
2022-03-21 09:31:00
2
assignment_group
Group 2
2022-01-21 11:30:00
2022-01-21 11:35:00
2
assigned_to
User 1
2022-01-21 11:35:00
2022-01-21 11:37:00
2
incident_state
Work in Progress
2022-01-21 11:40:00
2022-01-21 11:55:00
2
assignment_group
Group 3
2022-01-21 11:58:00
2022-01-21 12:00:00
2
assigned_to
User 2
2022-01-21 12:05:00
2022-01-21 12:06:00
2
incident_state
Resolved
2022-01-21 12:10:00
2022-01-21 12:07:00
The issue I am having is calculating the duration based on the start time the ticket was assigned to a specific group and the end time of when either the ticket was resolved by that group or moved to another group to resolve. For example, I am only interested in Group 2, the duration for ticket 1 for when the ticket was sitting with Group 2 till the ticket was resolved by Group 2 is 2022-03-21 08:35:00 to 2022-03-21 09:31:00, so duration is 1 hour and 1 minute. But for Ticket 2, the ticket sat with Group 2 from 2022-01-21 11:30:00 till it was transferred to another group to resolve at 2022-01-21 11:58:00.
My code looks like so at the moment, I join two tables to pull in the ticket information and then the ticket state changes (so every time an action is taken on that ticket). Then I am left with the table above. I kind of guess I need to use a lead function but I can't figure out how to get the correct end time for the correct record (When incident_state = resolved OR assignment_group = another group):
WITH incidents as
(
SELECT number, sys_id AS SYS_ID_INCIDENT
FROM tables.ServiceIncidents
),
changes as
(
SELECT id, start, field, field_value, value, `end`
FROM tables.IncidentInstances
),
incident_changes as
(
SELECT *, TIMESTAMP_DIFF(changes.`end`,changes.start, MINUTE) as Duration, row_number() over (partition by number order by start) as RN
FROM incidents
LEFT JOIN changes
ON (incidents.SYS_ID_INCIDENT = changes.id)
),
IAMtickets as
(
SELECT i.number, i.SYS_ID_INCIDENT, start, field, value, `end`, Duration, RN
FROM incident_changes i
INNER JOIN
(SELECT DISTINCT number FROM incident_changes WHERE value ='Group 2') r
ON i.number = r.number
),
cte as
(
SELECT *, lead(value) over (partition by IAMtickets.number order by start)
FROM IAMtickets
)
SELECT * FROM CTE
I want the output to be something like this:
ID
Assigned to
Duration
Outcome
1
User 1
1 hour 1 minute
Resolved
2
User 1
28 minutes
Transferred

SQL BigQuery: How to populate dates from rows cycle_base and cycle_interval

I'm having trouble populating dates with variable cycle_base (day of week) and cycle_interval (days) columns in Google BigQuery SQL.
The idea is to populate a date array for 2022 for each product where the dates fall within the valid_from and valid_to dates and where the dates are generated with the respective cycle_interval
A snippet from my data looks like this:
cycle_base
valid_from
valid_to
cycle_interval
product
2016-09-19
2020-04-20
2022-12-31
7
A
2018-12-17
2020-01-27
2022-12-31
28
B
2019-12-30
2020-01-27
2022-12-31
56
C
I tried generating a date array and then joining those dates on the DAYOFWEEK, which ofcourse only works for rows with a interval of 7 days. But I can't seem to find a way to achieve the above with the other intervals.
Edit
Expected data for 2022:
The cycle_base represents the day of the week. The exact date is irrelevant really. Edit: The date represents the starting point from which to interval the weeks.
Product A starts on monday for every week.
Product B starts on monday for every 3 weeks.
Product C starts on monday for every 8 weeks.
date
product
dayofweek
cycle_interval
2022-01-03
A
monday
7
2022-01-10
A
monday
7
...
...
...
...
2022-01-03
B
monday
28
2022-01-17
B
monday
28
...
...
...
...
2022-01-03
C
monday
56
2022-02-21
C
monday
56
...
...
...
...
Hope someone can point me in the right direction :)
Thanks in advance,
Glenn

Rolling Sum Calculation Based on 2 Date Fields

Giving up after a few hours of failed attempts.
My data is in the following format - event_date can never be higher than create_date.
I'd need to calculate on a rolling n-day basis (let's say 3) the sum of units where the create_date and event_date were within the same 3-day window. The data is illustrative but each event_date can have over 500+ different create_dates associated with it and the number isn't constant. There is a possibility of event_dates missing.
So let's say for 2022-02-03, I only want to sum units where both the event_date and create_date values were between 2022-02-01 and 2022-02-03.
event_date
create_date
rowid
units
2022-02-01
2022-01-20
1
100
2022-02-01
2022-02-01
2
100
2022-02-02
2022-01-21
3
100
2022-02-02
2022-01-23
4
100
2022-02-02
2022-01-31
5
100
2022-02-02
2022-02-02
6
100
2022-02-03
2022-01-30
7
100
2022-02-03
2022-02-01
8
100
2022-02-03
2022-02-03
9
100
2022-02-05
2022-02-01
10
100
2022-02-05
2022-02-03
11
100
The output I'd need to get to (added in brackets the rows I'd need to include in the calculation for each date but my result would only need to include the numerical sum) . I tried calculating using either dates but neither of them returned the results I needed.
date
units
2022-02-01
100 (Row 2)
2022-02-02
300 (Row 2,5,6)
2022-02-03
300 (Row 2,6,8,9)
2022-02-04
200 (Row 6,9)
2022-02-05
200 (Row 9,11)
In Python I solved above with a definition that looped through filtering a dataframe for each date but I am struggling to do the same in SQL.
Thank you!
Consider below approach
with events_dates as (
select date from (
select min(event_date) min_date, max(event_date) max_date
from your_table
), unnest(generate_date_array(min_date, max_date)) date
)
select date, sum(units) as units, string_agg('' || rowid) rows_included
from events_dates
left join your_table
on create_date between date - 2 and date
and event_date between date - 2 and date
group by date
if applied to sample data in your question - output is

SQL How Many Employees Are Working, Group By Hour

I have a table, a timetable, with check-in and check-out times of the employees:
ID Date Check-in Check out
1 1-1-2011 11:00 18:00
2 1-1-2011 11:00 19:00
3 1-1-2011 16:00 18:30
4 1-1-2011 17:00 20:00
Now I want to know how many employees are working, every (half) hour.
The result I want to see:
Hour Count
11 2
12 2
13 2
14 2
15 2
16 3
17 3
18 2,5
19 1
Every 'Hour' you must read as 'till the next full hour', ex. 11 -> 11:00 - 12:00
Any ideas?
Build an additional table, called Hours, containing the following data:
h
00:00
00:30
01:00
...
23:30
then, run
Select h as 'hour' ,count(ID) as 'count' from timetable,hours where [Check_in]<=h and h<=[Check_out] group by h