iterate through results of one table as input into another - sql

Hope you could please help me. I have looked at a few questions on here that are similar in nature, but I have not been able to find what I am looking for, so hopefully this isn't a duplicate question.
I am trying to build a process whereby customers get billed for the number of business days they were renting a product.
I have one table which shows the start date and the end date for a particular rental.
And my organisation provides a Calendar service where if you feed it the start date, end date and calendar, it will return the number of business days.
My problem is that the start dates and end dates from the first table need to be passed as arguments in the query for the second table and I am struggling to find an efficient way to do this.
Table Customer_Data (c.200k rows):
| FirstName | LastName | StartDate | EndDate |
|:-----:|:-----:|:------:|:------|
| John | Doe | 02-02-2022 | 09-02-2022 |
CalendarService: requires 3 arguments - a start date, an end date and a calendar (e.g. LDN or NYK etc).
I have the following code:
SELECT
*
FROM CalendarService
WHERE Calendar = 'LDN'
and StartDate in (select startdate from customer_data)
and endDate in (select endDate from customer_data)
My issue is this returns over 5m rows as it essentially takes every single start date and compares that to every single end date to give me all the possible business day combinations, instead of only running it for the combinations in the first table.
Is there a more efficient way to pass each row in customer_data through the calendar_service?
I am using a proprietary data platform for my organisation that uses an adapted SQL variant so need to keep solutions generic. The platform does also take Python and R, but I dont know either of them.

Here is an example inline table function. Not sure what you are pulling out of the CalendarService table so this leaves it open to add additional fields to the output opposed to using a single valued scalar function. You will still need some way of determining that Calendar value more dynamically...not sure on your options for that so I just have it set to a static value in this example.It would seem to me that should be something to track on your Customer_Data table, but that's up to you and your options.
IF OBJECT_ID('dbo.fnGetBusinessDaysFromDateRange') IS NOT NULL
DROP FUNCTION dbo.fnGetBusinessDaysFromDateRange
GO
CREATE FUNCTION dbo.fnGetBusinessDaysFromDateRange
(
#StartDate DATETIME, #EndDate DATETIME, #CalendarType VARCHAR(8)
)
RETURNS TABLE
AS
RETURN (
SELECT
CalendarService.DaysValue as BusinessDays,
CalendarService.SomeValue
FROM CalendarService
WHERE CalendarService.Calendar = #CalendarType
AND CalendarService.StartDate = #StartDate
AND CalendarService.EndDate = #EndDate
)
GO
--Usage:
SELECT
CD.*,
CalendarBusinessDays.BusinessDays,
CalendarBusinessDays.SomeValue
FROM Customer_Data CD
CROSS APPLY dbo.fnGetBusinessDaysFromDateRange(CD.StartDate, CD.EndDate, 'NKY') as CalendarBusinessDays

Related

Adding x work days onto a date in SQL Server?

I'm a bit confused if there is a simple way to do this.
I have a field called receipt_date in my data table and I wish to add 10 working days to this (with bank holidays).
I'm not sure if there is any sort of query I could use to join onto this table from my original to calculate 10 working days from this, I've tried a few sub queries but I couldn't get it right or perhaps its not possible to do this. I didn't know if there was a way to extract the 10th rowcount after the receipt date to get the calendar date if I only include 'Y' into the WHERE?
Any help appreciated.
This is making several assumptions about your data, because we have none. One method, however, would be to create a function, I use a inline table value function here, to return the relevant row from your calendar table. Note that this assumes that the number of days must always be positive, and that if you provide a date that isn't a working day that day 0 would be the next working day. I.e. adding zero working days to 2021-09-05 would return 2021-09-06, or adding 3 would return 2021-09-09. If that isn't what you want, this should be more than enough for you to get there yourself.
CREATE FUNCTION dbo.AddWorkingDays (#Days int, #Date date)
RETURNS TABLE AS
RETURN
WITH Dates AS(
SELECT CalendarDate,
WorkingDay
FROM dbo.CalendarTable
WHERE CalendarDate >= #Date)
SELECT CalendarDate
FROM Dates
WHERE WorkingDay = 1
ORDER BY CalendarDate
OFFSET #Days ROWS FETCH NEXT 1 ROW ONLY;
GO
--Using the function
SELECT YT.DateColumn,
AWD.CalendarDate AS AddedWorkingDays
FROM dbo.YourTable YT
CROSS APPLY dbo.AddWorkingDays(10,YT.DateColumn) AWD;

SQL: Dynamic Join Based on Row Value

Context:
I am working with some complicated schema and have got many CTEs and joins to get to this point. This is a watered-down version and completely different source data and example to illustrate my point (data anonymity). Hopefully it provides enough of a snapshot.
Data Overview:
I have a service which generates a production forecast looking ahead 30 days. The forecast is generated for each facility, for each shift (morning/afternoon). Each forecast produced covers all shifts (morning/afternoon/evening) so they share a common generation_id but different forecast_profile_key.
What I am trying to do: I want to find the SUM of the forecast error for a given forecast generation constrained by a dynamic date range based on whether the date is a weekday or weekend. The SUM must be grouped only on similar IDs.
Basically, the temp table provides one record per facility per date per shift with the forecast error. I want to SUM the historical error dynamically for a facility/shift/date based on whether the date is weekday/weekend, and only SUM the error where the IDs match up.. (hope that makes sense!!)
Specifics: I want to find the SUM grouped by 'week_part_grouping', 'forecast_profile_key', 'forecast_profile' and 'forecast_generation_id'. The part I am struggling with is that I only want to SUM the error dynamically based on date: (a) if the date is a weekday, I want to SUM the error from up to the 5 recent-most days in a 7 day look back period, or (b) if the date is a weekend, I want to SUM the error from up to the 3 recent-most days in a 16 day look back period.
Ideally, having an extra column for 'total_forecast_error_in_lookback_range'.
Specific examples:
For 'facility_a', '2020-11-22' is a weekend. The lookback range is 16 days, so any date between '2020-11-21' and '2020-11-05' is eligible. The 3 recent-most dates would be '2020-11-21', '2020-11-15' and '2020-11'14'. Therefore, the sum of error would be 2000+3250+1050.
For 'facility_a', '2020-11-20' is a weekday. The lookback range is 7 days, so any date between '2020-11-19 and '2020-11-13'. That would work out to be '2020-11-19':'2020-11-16' and '2020-11-13'.
For 'facility_b', notice there is a change in the 'forecast_generation_id'. So, the error for '2020-11-20' would be only be 4565.
What I have tried: I'll confess to not being quite sure how to break down this portion. I did consider a case statement on the week_part but then got into a nested mess. I considered using a RANK windowed function but I didn't make much progress as was unsure how to implement the dynamic lookback component. I then also thought about doing some LISTAGG to get all the dates and do a REGEXP wildcard lookup but that would be very slow..
I am seeking pointers how to go about achieving this in SQL. I don't know if I am missing something from my toolkit here to go about breaking this down into something I can implement.
DROP TABLE IF EXISTS seventh__error_calc;
create temporary table seventh__error_calc
(
facility_name varchar,
shift varchar,
date_actuals date,
week_part_grouping varchar,
forecast_profile_key varchar,
forecast_profile_id varchar,
forecast_generation_id varchar,
count_dates_in_forecast bigint,
forecast_error bigint
);
Insert into seventh__error_calc
VALUES
('facility_a','morning','2020-11-22','weekend','facility_a_morning_Sat_Sun','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','1000'),
('facility_a','morning','2020-11-21','weekend','facility_a_morning_Sat_Sun','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','2000'),
('facility_a','morning','2020-11-20','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','3000'),
('facility_a','morning','2020-11-19','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','2500'),
('facility_a','morning','2020-11-18','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','1200'),
('facility_a','morning','2020-11-17','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','5000'),
('facility_a','morning','2020-11-16','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','4400'),
('facility_a','morning','2020-11-15','weekend','facility_a_morning_Sat_Sun','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','3250'),
('facility_a','morning','2020-11-14','weekend','facility_a_morning_Sat_Sun','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','1050'),
('facility_a','morning','2020-11-13','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','2450'),
('facility_a','morning','2020-11-12','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','2450'),
('facility_a','morning','2020-11-11','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','2450'),
('facility_a','morning','2020-11-10','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','2450'),
('facility_a','morning','2020-11-09','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','2450'),
('facility_a','morning','2020-11-08','weekend','facility_a_morning_Sat_Sun','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','2450'),
('facility_b','morning','2020-11-22','weekend','facility_b_morning_Sat_Sun','Profile#facility_b#dfc3989b#b6e5386a','6809dea6','8','3400'),
('facility_b','morning','2020-11-21','weekend','facility_b_morning_Sat_Sun','Profile#facility_b#dfc3989b#b6e5386a','6809dea6','8','2800'),
('facility_b','morning','2020-11-20','weekday','facility_b_morning_Mon_Fri','Profile#facility_b#dfc3989b#b6e5386a','6809dea6','8','3687'),
('facility_b','morning','2020-11-19','weekday','facility_b_morning_Mon_Fri','Profile#facility_b#dfc3989b#b6e5386a','6809dea6','8','4565'),
('facility_b','morning','2020-11-18','weekday','facility_b_morning_Mon_Fri','Profile#facility_b#dfc3989b#b6e5386a','7252fzw5','8','1262'),
('facility_b','morning','2020-11-17','weekday','facility_b_morning_Mon_Fri','Profile#facility_b#dfc3989b#b6e5386a','7252fzw5','8','8765'),
('facility_b','morning','2020-11-16','weekday','facility_b_morning_Mon_Fri','Profile#facility_b#dfc3989b#b6e5386a','7252fzw5','8','5678'),
('facility_b','morning','2020-11-15','weekend','facility_b_morning_Mon_Fri','Profile#facility_b#dfc3989b#b6e5386a','7252fzw5','8','2893'),
('facility_b','morning','2020-11-14','weekend','facility_b_morning_Sat_Sun','Profile#facility_b#dfc3989b#b6e5386a','7252fzw5','8','1928'),
('facility_b','morning','2020-11-13','weekday','facility_b_morning_Sat_Sun','Profile#facility_b#dfc3989b#b6e5386a','7252fzw5','8','4736')
;
SELECT *
FROM seventh__error_calc
This achieved what I was trying to do. There were two learning points here.
Self Joins. I've never used one before but can now see why they are powerful!
Using a CASE statement in the WHERE clause.
Hope this might help someone else some day!
select facility_name,
forecast_profile_key,
forecast_profile_id,
shift,
date_actuals,
week_part_grouping,
forecast_generation_id,
sum(forecast_error) forecast_err_calc
from (
select rank() over (partition by forecast_profile_id, forecast_profile_key, facility_name, a.date_actuals order by b.date_actuals desc) rnk,
a.facility_name, a.forecast_profile_key, a.forecast_profile_id, a.shift, a.date_actuals, a.week_part_grouping, a.forecast_generation_id, b.forecast_error
from seventh__error_calc a
join seventh__error_calc b
using (facility_name, forecast_profile_key, forecast_profile_id, week_part_grouping, forecast_generation_id)
where case when a.week_part_grouping = 'weekend' then b.date_actuals between a.date_actuals - 16 and a.date_actuals
when a.week_part_grouping = 'weekday' then b.date_actuals between a.date_actuals - 7 and a.date_actuals
end
) src
where case when week_part_grouping = 'weekend' then rnk < 4
when week_part_grouping = 'weekday' then rnk < 6
end

Calculating working time with overlapping events (SQL)

I have found similar queries on StackOverflow (e.g. Finding simultaneous events in a database between times) but nothing that matches exactly what I am after as far as I can tell so thought it OK to add as a new question.
I have a table that logs jobs (or "Activities"), with a start/end time for the job. I need to calculate working time (you can disregard non-working days, break times etc. as I have that covered). The complication is an individual can work on simultaneous jobs, overlapping at different points (the assumption is equal effort on simultaneous jobs), and the working time needs to reflect that. Minute accuracy is all that is required, not to the second.
Based on other suggestions I have this query, implemented as a table-valued function. It will look at each minute that activity is running, and if any other activities are running in the same period for the same person, and make calculations based on that. It works, but is very inefficient - taking over a minute to execute. Any ideas how I can do this more efficiently?
Running SQL 2005. I have done the obvious such as to add indexes on foreign keys by the way.
CREATE FUNCTION [dbo].[WorkActivity_WorkTimeCalculations] (#StartDate smalldatetime, #EndDate smalldatetime)
RETURNS #retActivity TABLE
(
ActivityID bigint PRIMARY KEY NOT NULL,
WorkMins decimal NOT NULL
)
/********************************************************************
Summary: Calculates the WORKING time on each activity running in a given date/time range
Remarks: Takes into account staff working simultaneously on jobs
(evenly distributes working time across simultaneous jobs)
Input Params: #StartDate - the start of the period to calculate
#EndDate - the end of the period to calculate
Output Params:
Returns: Recordset of activities and associated working time (minutes)
********************************************************************/
AS
BEGIN
-- any work activities still running use the overall end date as the activity's end date for the purpose of calculating
-- simulateneous jobs running
-- POPULATE A TEMP TABLE WITH EVERY MINUTE IN THE DATE RANGE
DECLARE #Minutes TABLE (MinuteDateTime smalldatetime NOT NULL)
;WITH cte AS (
SELECT #StartDate AS myDate
UNION ALL
SELECT DATEADD(minute,1,myDate)
FROM cte
WHERE DATEADD(minute,1,myDate) <= #EndDate
)
INSERT INTO #Minutes (MinuteDateTime)
SELECT myDate FROM cte
OPTION (MAXRECURSION 0)
-- POPULATE A TEMP TABLE WITH WORKLOAD PER EMPLOYEE PER MINUTE
DECLARE #JobsRunningByStaff TABLE (StaffID smallint NOT NULL, MinuteDateTime smalldatetime NOT NULL, JobsRunning decimal NOT NULL)
INSERT INTO #JobsRunningByStaff (StaffID, MinuteDateTime, JobsRunning)
SELECT wka_StaffID, MinuteDateTime, COUNT(DISTINCT wka_ItemID) JobsRunning
FROM dbo.WorkActivities
INNER JOIN #Minutes ON (MinuteDateTime BETWEEN wka_StartTime AND DATEADD(minute,-1,ISNULL(wka_EndTime,#EndDate)))
GROUP BY wka_StaffID, MinuteDateTime
-- FINALLY MAKE THE CALCULATIONS FOR EACH ACTIVITY
INSERT INTO #retActivity
SELECT wka_ActivityID, SUM(1/JobsRunning)WorkMins
FROM dbo.WorkActivities
INNER JOIN #JobsRunningByStaff ON (wka_StaffID = StaffID AND MinuteDateTime BETWEEN wka_StartTime AND DATEADD(minute,-1,ISNULL(wka_EndTime,#EndDate)))
GROUP BY wka_ActivityID
RETURN
END
Some example data (sorry for the poor formatting!)...
Source Data from WorkActivities table:
ACTIVITY ID | START TIME | END TIME | STAFF ID
1 | 03/03/2016 10:30 | 03/03/2016 10:50 | 1
2 | 03/03/2016 10:40 | 03/03/2016 11:00 | 1
And the desired results for a function call of SELECT * FROM dbo.WorkActivity_WorkTimeCalculations ('03-Mar-2016 10:30','03-Mar-2016 11:30'):
ACTIVITY ID | WORKMINS
1 | 25
2 | 15
So, the results take into account between 10:40 and 10:50 there are two jobs happening simultaneously, so calculates 5 mins working time on each over that period.
As suggested by posters, indexing made a significant difference - creating an index with wka_StartTime and wka_EndTime sorted it.
(sorry, couldn't see how to mark the comments made by others as an answer!)

SQL query with week days

I would like to know what is the best way of creating a report that will be grouped by the last 7 days - but not every day i have data. for example:
08/01/10 | 0
08/02/10 | 5
08/03/10 | 6
08/04/10 | 10
08/05/10 | 0
08/06/10 | 11
08/07/10 | 1
is the only option is to create a dummy table with those days and join them altogether?
thank you
Try something like this
WITH LastDays (calc_date)
AS
(SELECT DATEADD(DAY, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP) - 6, 0)
UNION ALL
SELECT DATEADD(DAY, 1, calc_date)
FROM LastDays
WHERE DATEADD(DAY, 1, calc_date) < CURRENT_TIMESTAMP)
SELECT ...
FROM LastDays l LEFT JOIN (YourQuery) t ON (l.cal_date = t.YourDateColumn);
Many people will suggest methods for dynamically creating a range of dates that you can then join against. This will certainly work but in my experience a calendar table is the way to go. This will make the SQL trivial and generic at the cost of maintaining the calendar table.
At some point in the future someone will come along and ask for another report that excludes weekends. You then have to make your dynamic days generation account for weekends. Then someone will ask for working-days excluding public-holidays at which point you have no choice but to create a calendar table.
I would suggest you bite the bullet and create a calendar table to join against. Pre-populate it with every date and if you want to think ahead then add columns for "Working Day" and maybe even week number if your company uses a non-standard week-number for reporting
You don't mention the specific language (please do for a more detailed answer), but most versions of sql have a function for the current date (GetDate(), for instance). You could take that date, subtract x (7) days and build your WHERE statement like that.
Then you could GROUP BY the day-part of that date.
select the last 7 transactions and left join it with your query and then group by the date column. hope this helps.

3rd <day_of_week> of the Month - MySQL

I'm working on a recurrence application for events. I have a date range of say, January 1 2010 to December 31 2011. I want to return all of the 3rd Thursdays (arbitrary) of the each month, efficiently. I could do this pretty trivially in code, the caveat is that it must be done in a stored procedure. Ultimately I'd want something like:
CALL return_dates(event_id);
That event_id has a start_date of 1/1/2010 and end_date of 12/31/2011. Result set would be something like:
1/20/2010
2/14/2010
3/17/2010
4/16/2010
5/18/2010
etc.
I'm just curious what the most efficient method of doing this would be, considering I might end up with a very large result set in my actual usage.
One idea that comes to mind - you can create a table and store the dates you're interested in there.
Ok, I haven't tested it, but I think the most efficient way of doing it is via a tally table which is a useful thing to have in the db anyway:
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[num_seq]') AND type in (N'U'))
DROP TABLE [dbo].[num_seq];
SELECT TOP 100000 IDENTITY(int,1,1) AS n
INTO num_seq
FROM MASTER..spt_values a, MASTER..spt_values b;
CREATE UNIQUE CLUSTERED INDEX idx_1 ON num_seq(n);
You can then use this to build up the date range between the two dates. It's fast because
it just uses the index (in fact often faster than a loop, so I'm led to believe)
create procedure getDates
#eventId int
AS
begin
declare #startdate datetime
declare #enddate datetime
--- get the start and end date, plus the start of the month with the start date in
select #startdate=startdate,
#enddate=enddate
from events where eventId=#eventId
select
#startdate+n AS date,
from
dbo.num_seq tally
where
tally.n<datediff(#monthstart, #enddate) and
Datepart(dd,#startdate+n) between 15 and 21 and
Datepart(dw, #startdate+n) = '<day>'
Aside from getting the start and end dates, the third x id each month must be between the 15th and the 21st inclusive.
The day names in that range must be unique, so we can locate it straight away.
If you wanted the second dayname, just modify the range appropriately or use a parameter to calculate it.
It constucts a date table using the startdate, and then adding days on (via the list of numbers in the tally table) until it reaches the end date.
Hope it helps!