Query start and stop dates between two date fields - sql

I have a start order date field and a stop order field I need to check the database to see if the start and stop orders are outside of any of the pay period start and end fields. Say 01-aug-10 and 14-aug-10 and 15-aug-10 and 28-aug-10 and 29-aug-10 and 11-sep-10 are all of the pay periods in one month. The start order 01-aug-10 and the end order is 14-aug-10. Yet when I do a SQL that said (where end order not between pay period start and pay period end) the 01-aug-10 to 14-aug-10 still shows up. My needs are if it finds any dates that match stop looking and go to the next record start order and stop order and start the next search till we hit the end of the record search requirements.
I am looking to search by month and by year to keep the query responsive. The database is quite large. My query seams like it only does the between check once and then it shows all of the data that does fit between the pay period start and stop, and that is the data I do not want to see!

What dbms?
So you have a table like this?
CREATE TABLE something
(
pay_period_start date NOT NULL,
pay_period_end date NOT NULL,
CONSTRAINT something_pkey PRIMARY KEY (pay_period_start),
CONSTRAINT something_pay_period_end_key UNIQUE (pay_period_end),
CONSTRAINT something_check CHECK (pay_period_end > pay_period_start)
);
insert into something values ('2010-08-01', '2010-08-14');
insert into something values ('2010-08-15', '2010-08-28');
insert into something values ('2010-08-29', '2010-09-11');
Then I can run this query. ('2010-08-14' is the value of your stop order column or end order column or something like that.)
select * from something
where '2010-08-14' not between pay_period_start and pay_period_end
order by pay_period_start;
and I get
2010-08-15;2010-08-28
2010-08-29;2010-09-11
For pairs of dates, use the OVERLAPS operator. This query
select * from something
where
(date '2010-08-01', date '2010-08-14') overlaps
(pay_period_start, pay_period_end)
order by pay_period_start;
returns
2010-08-01;2010-08-14
To exclude rows where start order and end order exactly match a pay period, use something like this:
select * from something
where (
(date '2010-08-01', date '2010-08-14') overlaps
(pay_period_start, pay_period_end) and
(date '2010-08-01' <> pay_period_start) and
(date '2010-08-14' <> pay_period_end)
)
order by pay_period_start;

Related

SQL: Dynamic Join Based on Row Value

Context:
I am working with some complicated schema and have got many CTEs and joins to get to this point. This is a watered-down version and completely different source data and example to illustrate my point (data anonymity). Hopefully it provides enough of a snapshot.
Data Overview:
I have a service which generates a production forecast looking ahead 30 days. The forecast is generated for each facility, for each shift (morning/afternoon). Each forecast produced covers all shifts (morning/afternoon/evening) so they share a common generation_id but different forecast_profile_key.
What I am trying to do: I want to find the SUM of the forecast error for a given forecast generation constrained by a dynamic date range based on whether the date is a weekday or weekend. The SUM must be grouped only on similar IDs.
Basically, the temp table provides one record per facility per date per shift with the forecast error. I want to SUM the historical error dynamically for a facility/shift/date based on whether the date is weekday/weekend, and only SUM the error where the IDs match up.. (hope that makes sense!!)
Specifics: I want to find the SUM grouped by 'week_part_grouping', 'forecast_profile_key', 'forecast_profile' and 'forecast_generation_id'. The part I am struggling with is that I only want to SUM the error dynamically based on date: (a) if the date is a weekday, I want to SUM the error from up to the 5 recent-most days in a 7 day look back period, or (b) if the date is a weekend, I want to SUM the error from up to the 3 recent-most days in a 16 day look back period.
Ideally, having an extra column for 'total_forecast_error_in_lookback_range'.
Specific examples:
For 'facility_a', '2020-11-22' is a weekend. The lookback range is 16 days, so any date between '2020-11-21' and '2020-11-05' is eligible. The 3 recent-most dates would be '2020-11-21', '2020-11-15' and '2020-11'14'. Therefore, the sum of error would be 2000+3250+1050.
For 'facility_a', '2020-11-20' is a weekday. The lookback range is 7 days, so any date between '2020-11-19 and '2020-11-13'. That would work out to be '2020-11-19':'2020-11-16' and '2020-11-13'.
For 'facility_b', notice there is a change in the 'forecast_generation_id'. So, the error for '2020-11-20' would be only be 4565.
What I have tried: I'll confess to not being quite sure how to break down this portion. I did consider a case statement on the week_part but then got into a nested mess. I considered using a RANK windowed function but I didn't make much progress as was unsure how to implement the dynamic lookback component. I then also thought about doing some LISTAGG to get all the dates and do a REGEXP wildcard lookup but that would be very slow..
I am seeking pointers how to go about achieving this in SQL. I don't know if I am missing something from my toolkit here to go about breaking this down into something I can implement.
DROP TABLE IF EXISTS seventh__error_calc;
create temporary table seventh__error_calc
(
facility_name varchar,
shift varchar,
date_actuals date,
week_part_grouping varchar,
forecast_profile_key varchar,
forecast_profile_id varchar,
forecast_generation_id varchar,
count_dates_in_forecast bigint,
forecast_error bigint
);
Insert into seventh__error_calc
VALUES
('facility_a','morning','2020-11-22','weekend','facility_a_morning_Sat_Sun','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','1000'),
('facility_a','morning','2020-11-21','weekend','facility_a_morning_Sat_Sun','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','2000'),
('facility_a','morning','2020-11-20','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','3000'),
('facility_a','morning','2020-11-19','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','2500'),
('facility_a','morning','2020-11-18','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','1200'),
('facility_a','morning','2020-11-17','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','5000'),
('facility_a','morning','2020-11-16','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','4400'),
('facility_a','morning','2020-11-15','weekend','facility_a_morning_Sat_Sun','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','3250'),
('facility_a','morning','2020-11-14','weekend','facility_a_morning_Sat_Sun','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','1050'),
('facility_a','morning','2020-11-13','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','2450'),
('facility_a','morning','2020-11-12','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','2450'),
('facility_a','morning','2020-11-11','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','2450'),
('facility_a','morning','2020-11-10','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','2450'),
('facility_a','morning','2020-11-09','weekday','facility_a_morning_Mon_Fri','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','2450'),
('facility_a','morning','2020-11-08','weekend','facility_a_morning_Sat_Sun','Profile#facility_a#dfc3989b#b6e5386a','6809dea6','8','2450'),
('facility_b','morning','2020-11-22','weekend','facility_b_morning_Sat_Sun','Profile#facility_b#dfc3989b#b6e5386a','6809dea6','8','3400'),
('facility_b','morning','2020-11-21','weekend','facility_b_morning_Sat_Sun','Profile#facility_b#dfc3989b#b6e5386a','6809dea6','8','2800'),
('facility_b','morning','2020-11-20','weekday','facility_b_morning_Mon_Fri','Profile#facility_b#dfc3989b#b6e5386a','6809dea6','8','3687'),
('facility_b','morning','2020-11-19','weekday','facility_b_morning_Mon_Fri','Profile#facility_b#dfc3989b#b6e5386a','6809dea6','8','4565'),
('facility_b','morning','2020-11-18','weekday','facility_b_morning_Mon_Fri','Profile#facility_b#dfc3989b#b6e5386a','7252fzw5','8','1262'),
('facility_b','morning','2020-11-17','weekday','facility_b_morning_Mon_Fri','Profile#facility_b#dfc3989b#b6e5386a','7252fzw5','8','8765'),
('facility_b','morning','2020-11-16','weekday','facility_b_morning_Mon_Fri','Profile#facility_b#dfc3989b#b6e5386a','7252fzw5','8','5678'),
('facility_b','morning','2020-11-15','weekend','facility_b_morning_Mon_Fri','Profile#facility_b#dfc3989b#b6e5386a','7252fzw5','8','2893'),
('facility_b','morning','2020-11-14','weekend','facility_b_morning_Sat_Sun','Profile#facility_b#dfc3989b#b6e5386a','7252fzw5','8','1928'),
('facility_b','morning','2020-11-13','weekday','facility_b_morning_Sat_Sun','Profile#facility_b#dfc3989b#b6e5386a','7252fzw5','8','4736')
;
SELECT *
FROM seventh__error_calc
This achieved what I was trying to do. There were two learning points here.
Self Joins. I've never used one before but can now see why they are powerful!
Using a CASE statement in the WHERE clause.
Hope this might help someone else some day!
select facility_name,
forecast_profile_key,
forecast_profile_id,
shift,
date_actuals,
week_part_grouping,
forecast_generation_id,
sum(forecast_error) forecast_err_calc
from (
select rank() over (partition by forecast_profile_id, forecast_profile_key, facility_name, a.date_actuals order by b.date_actuals desc) rnk,
a.facility_name, a.forecast_profile_key, a.forecast_profile_id, a.shift, a.date_actuals, a.week_part_grouping, a.forecast_generation_id, b.forecast_error
from seventh__error_calc a
join seventh__error_calc b
using (facility_name, forecast_profile_key, forecast_profile_id, week_part_grouping, forecast_generation_id)
where case when a.week_part_grouping = 'weekend' then b.date_actuals between a.date_actuals - 16 and a.date_actuals
when a.week_part_grouping = 'weekday' then b.date_actuals between a.date_actuals - 7 and a.date_actuals
end
) src
where case when week_part_grouping = 'weekend' then rnk < 4
when week_part_grouping = 'weekday' then rnk < 6
end

PostgreSQL: Query for tstzrange that contains last instant of a quarter

Given a PostgreSQL table that is supposed to contain rows with continuous, non-overlapping valid_range ranges such as:
CREATE TABLE tracking (
id INT PRIMARY KEY,
valid_range TSTZRANGE NOT NULL,
EXCLUDE USING gist (valid_range WITH &&)
);
INSERT INTO tracking (id, valid_range) VALUES
(1, '["2017-03-01 13:00", "2017-03-31 14:00")'),
(2, '["2017-03-31 14:00", "2017-04-01 00:00")'),
(3, '["2017-04-01 00:00",)');
That creates a table that contains:
id | valid_range
----+-----------------------------------------------------
1 | ["2017-03-01 13:00:00-07","2017-03-31 14:00:00-06")
2 | ["2017-03-31 14:00:00-06","2017-04-01 00:00:00-06")
3 | ["2017-04-01 00:00:00-06",)
I need to query for the row that was the valid row at the end of a given quarter, where I'm defining "at the end of a quarter" as "the instant in time right before the date changed to be the first day of the new quarter." In the above example, querying for the end of Q1 2017 (Q1 ends at the end of 2017-03-31, and Q2 begins 2017-04-01), I want my query to return only the row with ID 2.
What is the best way to express this condition in PostgreSQL?
SELECT * FROM tracking WHERE valid_range #> TIMESTAMPTZ '2017-03-31' is wrong because it returns the row that contains midnight on 2017-03-31, which is ID 1.
valid_range #> TIMESTAMPTZ '2017-04-01' is also wrong because it skips over the row that was actually valid right at the end of the quarter (ID 2) and instead returns the row with ID 3, which is the row that starts the new quarter.
I'm trying to avoid using something like ...ORDER BY valid_range DESC LIMIT 1 in the query.
Note that the end of the ranges must always be exclusive, I cannot change that.
The best answer I've come up with so far is
SELECT
*
FROM
tracking
WHERE
lower(valid_range) < '2017-04-01'
AND upper(valid_range) >= '2017-04-01'
This seems like the moral equivalent of saying "I want to reverse the inclusivity/exclusivity of the bounds on this TSTZRANGE column for this query" which makes me think I'm missing a better way of doing this. I wouldn't be surprised if it also negates the benefits of typical indexes on a range column.
You can use <# operator for check when value is within range:
SELECT *
FROM tracking
WHERE to_timestamp('2017-04-01','YYY-MM-DD')::TIMESTAMP WITH TIME ZONE <# valid_range;
Test PostgreSQL queries online

SQL Aggregation with only one table

So this problem has been bugging me a little for the last week or so. I'm working with a database which hasn't exactly been designed in a way that I like and I'm having to do a lot of work-arounds to get the queries to function in a way I would like.
Essentially, I'm trying to remove duplicate entries that occur as a result of an instance caused by a previous entry. For the sake of argument say that a customer places an order or issues a job (this only occurs once) but as a result of the interactions a series of other rows are created to represent, sub-orders or jobs. Essentially, all duplicate records should have the same finish time so what I'm trying to create is a query which will return the record which has the earliest start time and ignore all other records which have the same finish time. All this occurs within the same table.
Something like:
select starttime
, endtime
, description
, entrynumber
from table
where starttime = min
and endtime = endtime
Probably what you want is something like this:
;WITH OrderedTable AS
(
Select ROW_NUMBER() OVER (PARTITION BY endtime ORDER BY starttime) as rn, starttime, endtime, description, entrynumber
From Table
)
Select starttime, endtime, description, entrynumber
FROM OrderedTable
WHERE rn=1
What this does is group all the rows with the same end time, ordered by start time and give them an additional "row number" column starting at 1 and increasing. If you filter by rn = 1, you get only the earliest start time rows, ignoring the rest.

SQL find nearest date without going over, or return the oldest record

I have a view in SQL Server with prices of items over time. My users will be passing a date variable and I want to return the closest record without going over, or if no such record exists return the oldest record present. For example, with the data below, if the user passes April for item A it will return the March record and for item B it will return the June record.
I've tried a lot of variations with Union All and Order by but keep getting a variety of errors. Is there a way to write this using a Case Statement?
example:
case when min(Month)>Input Date then min(Month)
else max(Month) where Month <= Input Date?
Sincere apologies for attaching sample dataset as an image, I couldn't get it to format right otherwise.
Sample Dataset
You can use SELECT TOP (1) with order by DATE DESC + Item type + date comparison to get the latest. ORDER BY will order records by date, then you get the latest either this month (if exists) or earlier months.
Here's a rough outline of a query (without more of your table it's hard to be exact):
WITH CTE AS
(
SELECT
ITEM,
PRICE,
MIN(ACTUAL_DATE) OVER (PARTITION BY ITEM ORDER BY ITEM) AS MIN_DATE,
MAX(INPUT_DATE<=ACTUAL_DATE) OVER (PARTITION BY ITEM ORDER BY ITEM,ACTUAL_DATE) AS MATCHED_DATE
FROM TABLE
)
SELECT
CTE.ITEM,
CTE.PRICE,
CASE
WHEN
CTE.MATCHED_DATE IS NOT NULL
THEN
CTE.MATCHED_DATE
ELSE
CTE.MIN_DATE
END AS MOSTLY_MATCHED_DATE
FROM CTE
GROUP BY
CTE.ITEM,
CTE.PRICE
The idea is that in a Common Table Expression, you use the PARTITION BY function to identify the key date for each item, record by record, and then you do a test in aggregate to pull either your matched record or your default record.

MySQL to get the count of rows that fall on a date for each day of a month

I have a table that contains a list of community events with columns for the days the event starts and ends. If the end date is 0 then the event occurs only on the start day. I have a query that returns the number of events happening on any given day:
SELECT COUNT(*) FROM p_community e WHERE
(TO_DAYS(e.date_ends)=0 AND DATE(e.date_starts)=DATE('2009-05-13')) OR
(DATE('2009-05-13')>=DATE(e.date_starts) AND DATE('2009-05-13')<=DATE(e.date_ends))
I just sub in any date I want to test for "2009-05-13".
I need to be be able to fetch this data for every day in an entire month. I could just run the query against each day one at a time, but I'd rather run one query that can give me the entire month at once. Does anyone have any suggestions on how I might do that?
And no, I can't use a stored procedure.
Try:
SELECT COUNT(*), DATE(date) FROM table WHERE DATE(dtCreatedAt) >= DATE('2009-03-01') AND DATE(dtCreatedAt) <= DATE('2009-03-10') GROUP BY DATE(date);
This would get the amount for each day in may 2009.
UPDATED: Now works on a range of dates spanning months/years.
Unfortunately, MySQL lacks a way to generate a rowset of given number of rows.
You can create a helper table:
CREATE TABLE t_day (day INT NOT NULL PRIMARY KEY)
INSERT
INTO t_day (day)
VALUES (1),
(2),
…,
(31)
and use it in a JOIN:
SELECT day, COUNT(*)
FROM t_day
JOIN p_community e
ON day BETWEEN DATE(e.start) AND IF(DATE(e.end), DATE(e.end), DATE(e.start))
GROUP BY
day
Or you may use an ugly subquery:
SELECT day, COUNT(*)
FROM (
SELECT 1 AS day
UNION ALL
SELECT 2 AS day
…
UNION ALL
SELECT 31 AS day
) t_day
JOIN p_community e
ON day BETWEEN DATE(e.start) AND IF(DATE(e.end), DATE(e.end), DATE(e.start))
GROUP BY
day