Trying to exclude a subset within a few days of a report - sql

I am trying to exclude trades of a certain typology if they are less than 4 days but this filter im using is super slow in returning results. it also doesnt capture trades which are booked 10th of may expiring 13th june which it should. if I amend it to include these it will also include trades within the same month less than 4 days. can someone help me make it more efficient and capturing what I want. using orcale sql developer
and ( DC.M_TYPOLOGY <> 'Repo BD' OR (DC.M_TYPOLOGY ='Repo BD' and
(((to_char( DC.M_OPTMAT , 'YYYY')- to_char (DC.M_TRADEDATE, 'YYYY'))= 0
and (to_char( DC.M_OPTMAT , 'MM')- to_char (DC.M_TRADEDATE, 'MM') < 1))
and (to_char( DC.M_OPTMAT , 'DD')- to_char (DC.M_TRADEDATE, 'DD') > 4))))

All that conversion is unnecessary (and wrong - what happens when trades span year end?). We can do arithmetic with Oracle dates so this is the same:
( DC.M_TYPOLOGY <> 'Repo BD'
Note: this answer posted before the recent edit to the question. I'm leaving this in place but I will revise it once the OP has clarified the rules they want to enforce.
Of course, this may not speed your query up. Performance problems can arise from many different causes and there simply isn't enough detail provided.


SQL: Dynamic Join Based on Row Value

I am working with some complicated schema and have got many CTEs and joins to get to this point. This is a watered-down version and completely different source data and example to illustrate my point (data anonymity). Hopefully it provides enough of a snapshot.
Data Overview:
I have a service which generates a production forecast looking ahead 30 days. The forecast is generated for each facility, for each shift (morning/afternoon). Each forecast produced covers all shifts (morning/afternoon/evening) so they share a common generation_id but different forecast_profile_key.
What I am trying to do: I want to find the SUM of the forecast error for a given forecast generation constrained by a dynamic date range based on whether the date is a weekday or weekend. The SUM must be grouped only on similar IDs.
Basically, the temp table provides one record per facility per date per shift with the forecast error. I want to SUM the historical error dynamically for a facility/shift/date based on whether the date is weekday/weekend, and only SUM the error where the IDs match up.. (hope that makes sense!!)
Specifics: I want to find the SUM grouped by 'week_part_grouping', 'forecast_profile_key', 'forecast_profile' and 'forecast_generation_id'. The part I am struggling with is that I only want to SUM the error dynamically based on date: (a) if the date is a weekday, I want to SUM the error from up to the 5 recent-most days in a 7 day look back period, or (b) if the date is a weekend, I want to SUM the error from up to the 3 recent-most days in a 16 day look back period.
Ideally, having an extra column for 'total_forecast_error_in_lookback_range'.
Specific examples:
For 'facility_a', '2020-11-22' is a weekend. The lookback range is 16 days, so any date between '2020-11-21' and '2020-11-05' is eligible. The 3 recent-most dates would be '2020-11-21', '2020-11-15' and '2020-11'14'. Therefore, the sum of error would be 2000+3250+1050.
For 'facility_a', '2020-11-20' is a weekday. The lookback range is 7 days, so any date between '2020-11-19 and '2020-11-13'. That would work out to be '2020-11-19':'2020-11-16' and '2020-11-13'.
For 'facility_b', notice there is a change in the 'forecast_generation_id'. So, the error for '2020-11-20' would be only be 4565.
What I have tried: I'll confess to not being quite sure how to break down this portion. I did consider a case statement on the week_part but then got into a nested mess. I considered using a RANK windowed function but I didn't make much progress as was unsure how to implement the dynamic lookback component. I then also thought about doing some LISTAGG to get all the dates and do a REGEXP wildcard lookup but that would be very slow..
I am seeking pointers how to go about achieving this in SQL. I don't know if I am missing something from my toolkit here to go about breaking this down into something I can implement.
DROP TABLE IF EXISTS seventh__error_calc;
create temporary table seventh__error_calc
facility_name varchar,
shift varchar,
date_actuals date,
week_part_grouping varchar,
forecast_profile_key varchar,
forecast_profile_id varchar,
forecast_generation_id varchar,
count_dates_in_forecast bigint,
forecast_error bigint
Insert into seventh__error_calc
FROM seventh__error_calc
This achieved what I was trying to do. There were two learning points here.
Self Joins. I've never used one before but can now see why they are powerful!
Using a CASE statement in the WHERE clause.
Hope this might help someone else some day!
select facility_name,
sum(forecast_error) forecast_err_calc
from (
select rank() over (partition by forecast_profile_id, forecast_profile_key, facility_name, a.date_actuals order by b.date_actuals desc) rnk,
a.facility_name, a.forecast_profile_key, a.forecast_profile_id, a.shift, a.date_actuals, a.week_part_grouping, a.forecast_generation_id, b.forecast_error
from seventh__error_calc a
join seventh__error_calc b
using (facility_name, forecast_profile_key, forecast_profile_id, week_part_grouping, forecast_generation_id)
where case when a.week_part_grouping = 'weekend' then b.date_actuals between a.date_actuals - 16 and a.date_actuals
when a.week_part_grouping = 'weekday' then b.date_actuals between a.date_actuals - 7 and a.date_actuals
) src
where case when week_part_grouping = 'weekend' then rnk < 4
when week_part_grouping = 'weekday' then rnk < 6

If statement in a WHERE clause for between two dates

I have a script that counts the number of doses a client has had between their start date and 180 days out.
Now i am trying to have some form of an IF (or CASE) statement in the where clause so its either between the first date and 180 days out OR if that 180 days exceeds 6/30/20, then just do the count between start date and 6/30/20.
In my research i couldnt find anything about using an IF else (or CASE) with dates, in the WHERE function.
This is my current script in SQL Server
DATEADD(DAY,180,MIN(take_on_date)) AS Days_180,
COUNT(t.dose_number) AS Doses
, display_id
FROM factMedHist
GROUP BY Clinic, display_id
) AS m
INNER JOIN factMedHist AS t
ON t.Clinic = m.Clinic
AND t.display_id = m.display_id
WHERE t.take_on_date
BETWEEN m.FirstDate AND DATEADD(DAY,180,m.FirstDate)
GROUP BY t.Clinic, t.display_id,m.FirstDate
So "start date" = "FirstDate" = "min(TAKE_ON_DATE)". And "Client" = "display_id" but you group and join on the tuple <display_id, Clinic>. I see many struggles in the future based on this unfortunately common issue. Consistent terminology is important.
So here is one take on your issue. A bit verbose to demonstrate what it does. Note also the provision of a MVCE - something that you should provide to encourage others to help. It is a bit of effort you should not expect others to take on just to solve your issues.
You were on the correct path with CASE - but lost it when you started thinking of it as a control-of-flow construct as it is in most other languages. You can compute the startdate and enddate for each client (clinic, display_id)
with cte as (select *,
min(takedate) over (partition by display_id, clinic order by takedate) as startdate,
dateadd(day, 180, min(takedate) over (partition by display_id, clinic order by takedate)) as enddate
from #medhist
You were doing that - but the problem is that you need reference that end date in the where clause to filter the rows as desired. Like this:
where takedate <= case when enddate <= '20200630' then enddate else '20200630' end
Fiddle here. Notice that the first take date is irrelevant in the where clause. This is one way to achieve your result. Another obvious approach is to use a conditional sum. That would be good practice if you wanted to increase your knowledge. Depending on your situation one might be more efficient than the other. CTEs are just syntactic sugar but do allow the building of logic in a piece-by-piece approach - something I find can be very helpful in developing a completed tsql statement.
WHERE t.take_on_date BETWEEN m.FirstDate AND DATEADD(DAY,180,m.FirstDate)
OR t.take_on_date > DATEADD(DAY, 180, '20200630')

Why are different result between use date_part and exactly date parameter query data in peroid date?

I'm try to count distinct value in some columns in a table.
i have a logic and i try to write in 2 way
But i get diffent results from this two query.
Can any one help to clarify me? I dont know what wrong is code or i think.
select count(distinct membership_id) from members_membership m
where date_part(year,m.membership_expires)>=2019
and date_part(month,m.membership_expires)>=7
and date_part(day,m.membership_expires)>=1
and date_part(year,m.membership_creationdate)<=2019
and date_part(month,m.membership_creationdate)<=7
and date_part(day,m.membership_creationdate)<=1
select count(distinct membership_id) from members_membership m
where m.membership_expires>='2019-07-01'
and m.membership_creationdate<='2019-07-01'
I actually think that this is the query you intend to run:
COUNT(DISTINCT membership_id)
FROM members_membership m
m.membership_expires >= '2019-07-01' AND
m.membership_creationdate < '2019-07-01';
It doesn't make sense for a membership to expire at the same moment it gets created, so if it expires on midnight of 1st-July 2019, then it should have been created strictly before that point in time.
That being said, the problem with the first query is that, e.g., the restriction on the month being on or before July would apply to every year, not just 2019. It is difficult to write a date inequality using the year, month, and day terms separately. For this reason, the second version you used is preferable. It is also sargable, meaning that an index on membership_expires or membership_creationdate can be used.
There is an issue with the first query:
select count(distinct membership_id) from members_membership m
where date_part(year,m.membership_expires)>=2019
and date_part(month,m.membership_expires)>=7
and date_part(day,m.membership_expires)>=1
and date_part(year,m.membership_creationdate)<=2019
and date_part(month,m.membership_creationdate)<=7
and date_part(day,m.membership_creationdate)<=1; -- do you think that any day is less than 1??
-- this condition will be satisfy by only 01-Jul-2019, But I think you need all the dates before 01-Jul-2019
and date_part(day,m.membership_creationdate)<=1 is culprit of the issue.
even membership_creationdate = 15-jan-1901 will not satisfy above condition.
You need to always use date functions on date columns to avoid such type of issue. (Your second query is perfectly fine)
The reason could be due to a time component.
The proper comparison for the first query is:
select count(distinct membership_id)
from members_membership m
where m.membership_expires >= '2019-07-01' and
m.membership_creationdate < '2019-07-02'
--------------------------------^ not <= ---^ next day
This logic should work regardless of whether or not the "date" has a time component.

PostgreSQL group by epoch month

I got a crazy amount of rows that uses epoch time.
id, customerid, orderid, uxtime
On my desk right now is to build a admin page that allows others to quickly wade through this humongous pile of rows.
They want to be able to choose year and month and get that months list.
That means: Choose 2016 and April that should return all ids from april-16.
This has be able to be done in a smart cool sql-sentence. That is where you come in. I am working on it and making some progress but I am pretty sure all of you is so much quicker than me. :)
To convert "April 2016" to a unix epoch use make_date() and extract()
extract(epoch from make_date(2016,4,1))
You also need an upper bound for a where clause which would typically be the first of the next month:
extract(epoch from make_date(2016,4,1) + interval '1' month)
So your SQL statement would be something like this:
select ...
from ...
where uxtime >= extract(epoch from make_date(2016,4,1))
and uxtime < extract(epoch from make_date(2016,4,1) + interval '1' month);
A slightly shorter way of writing it would be:
select ...
from ...
where to_char(to_timestamp(uxtime), 'yyyy-mm') = '2016-04'
The above however will be a lot slower the the first solution because it cant' make use of an index on uxtime
You could create an index on to_char(to_timestamp(uxtime), 'yyyy-mm') if you really prefer that solution to speed up the query.

Calculating working days including holidays between dates without a calendar table in oracle SQL

Okay, so I've done quite a lot of reading on the possibility of emulating the networkdays function of excel in sql, and have come to the conclusion that by far the easiest solution is to have a calendar table which will flag working days or non working days. However, due to circumstances out of my control, we don't have access to such a luxury and it's unlikely that we will any time in the near future.
Currently I have managed to bodge together what is undoubtedly a horrible ineffecient query in SQL that does work - the catch is, it will only work for a single client record at a time.
= 'Sunday ' THEN 0
= 'Saturday ' THEN 0
IN ('03-01-2000','21-04-2000','24-04-2000','01-05-2000','29-05-2000','28-08-2000','25-12-2000','26-12-2000','01-01-2001','13-04-2001','16-04-2001','07-05-2001','28-05-2001','27-08-2001','25-12-2001','26-12-2001','01-01-2002','29-03-2002','01-04-2002','06-04-2002','03-06-2002','04-06-2002','26-08-2002','25-12-2002','26-12-2002','01-01-2003','18-04-2003','21-04-2003','05-05-2003','26-05-2003','25-08-2003','25-12-2003','26-12-2003','01-01-2004','09-04-2004','12-04-2004','03-05-2004','31-05-2004','30-08-2004','25-12-2004','26-12-2004','27-12-2004','28-12-2004','01-01-2005','03-01-2005','25-03-2005','28-03-2005','02-05-2005','30-05-2005','29-08-2005','27-12-2005','28-12-2005','02-01-2006','14-04-2006','17-04-2006','01-05-2006','29-05-2006','28-08-2006','25-12-2006','26-12-2006','02-01-2007','06-04-2007','09-04-2007','07-05-2007','28-05-2007','27-08-2007','25-12-2007','26-12-2007','01-01-2008','21-03-2008','24-03-2008','05-05-2008','26-05-2008','25-08-2008','25-12-2008','26-12-2008','01-01-2009','10-04-2009','13-04-2009','04-05-2009','25-05-2009','31-08-2009','25-12-2009','28-12-2009','01-01-2010','02-04-2010','05-04-2010','03-05-2010','31-05-2010','30-08-2010','24-12-2010','27-12-2010','28-12-2010','31-12-2010','03-01-2011','22-04-2011','25-04-2011','29-04-2011','02-05-2011','30-05-2011','29-08-2011','26-12-2011','27-12-2011')
END)-1 AS Week_Day
Basically, I'm wondering if a) I should stop wasting my time on this or b) is it possible to get this to work for multiple clients? Any pointers appreciated thanks!
Edit: Further clarification - I already work out timescales using excel, but it would be ideal if we could do it in the report as the report in question is something that we would like end users to be able to run without any further manipulation.
MarkBannister's answer works perfectly albeit slowly (though I had expected as much given it's not the preferred solution) - the challenge now lies in me integrating this into an existing report!
calendar_cte as (select
to_date('01-01-2000')+level-1 calendar_date,
case when to_char(to_date('01-01-2000')+level-1, 'day') in ('sunday ','saturday ') then 0 when to_date('01-01-2000')+level-1 in ('03-01-2000','21-04-2000','24-04-2000','01-05-2000','29-05-2000','28-08-2000','25-12-2000','26-12-2000','01-01-2001','13-04-2001','16-04-2001','07-05-2001','28-05-2001','27-08-2001','25-12-2001','26-12-2001','01-01-2002','29-03-2002','01-04-2002','06-04-2002','03-06-2002','04-06-2002','26-08-2002','25-12-2002','26-12-2002','01-01-2003','18-04-2003','21-04-2003','05-05-2003','26-05-2003','25-08-2003','25-12-2003','26-12-2003','01-01-2004','09-04-2004','12-04-2004','03-05-2004','31-05-2004','30-08-2004','25-12-2004','26-12-2004','27-12-2004','28-12-2004','01-01-2005','03-01-2005','25-03-2005','28-03-2005','02-05-2005','30-05-2005','29-08-2005','27-12-2005','28-12-2005','02-01-2006','14-04-2006','17-04-2006','01-05-2006','29-05-2006','28-08-2006','25-12-2006','26-12-2006','02-01-2007','06-04-2007','09-04-2007','07-05-2007','28-05-2007','27-08-2007','25-12-2007','26-12-2007','01-01-2008','21-03-2008','24-03-2008','05-05-2008','26-05-2008','25-08-2008','25-12-2008','26-12-2008','01-01-2009','10-04-2009','13-04-2009','04-05-2009','25-05-2009','31-08-2009','25-12-2009','28-12-2009','01-01-2010','02-04-2010','05-04-2010','03-05-2010','31-05-2010','30-08-2010','24-12-2010','27-12-2010','28-12-2010','31-12-2010','03-01-2011','22-04-2011','25-04-2011','29-04-2011','02-05-2011','30-05-2011','29-08-2011','26-12-2011','27-12-2011','01-01-2012','02-01-2012') then 0 else 1 end working_day
from dual
connect by level <= 1825 + sysdate - to_date('01-01-2000') )
sum(c.working_day)-1 AS Week_Day
join calendar_cte c
on c.calendar_date between a.ASM_START_DATE and a.ASM_END_DATE
and a.ASM_END_DATE >= '01/01/2012'
There are a few ways to do this. Perhaps the simplest might be to create a CTE that produces a virtual calendar table, based on Oracle's connect by syntax, and then join it to the Assesments table, like so:
with calendar_cte as (
select to_date('01-01-2000')+level-1 calendar_date,
case when to_char(to_date('01-01-2000')+level-1, 'Day')
in ('Sunday ','Saturday ') then 0
when to_date('01-01-2000')+level-1
in ('03-01-2000','21-04-2000','24-04-2000','01-05-2000','29-05-2000','28-08-2000','25-12-2000','26-12-2000','01-01-2001','13-04-2001','16-04-2001','07-05-2001','28-05-2001','27-08-2001','25-12-2001','26-12-2001','01-01-2002','29-03-2002','01-04-2002','06-04-2002','03-06-2002','04-06-2002','26-08-2002','25-12-2002','26-12-2002','01-01-2003','18-04-2003','21-04-2003','05-05-2003','26-05-2003','25-08-2003','25-12-2003','26-12-2003','01-01-2004','09-04-2004','12-04-2004','03-05-2004','31-05-2004','30-08-2004','25-12-2004','26-12-2004','27-12-2004','28-12-2004','01-01-2005','03-01-2005','25-03-2005','28-03-2005','02-05-2005','30-05-2005','29-08-2005','27-12-2005','28-12-2005','02-01-2006','14-04-2006','17-04-2006','01-05-2006','29-05-2006','28-08-2006','25-12-2006','26-12-2006','02-01-2007','06-04-2007','09-04-2007','07-05-2007','28-05-2007','27-08-2007','25-12-2007','26-12-2007','01-01-2008','21-03-2008','24-03-2008','05-05-2008','26-05-2008','25-08-2008','25-12-2008','26-12-2008','01-01-2009','10-04-2009','13-04-2009','04-05-2009','25-05-2009','31-08-2009','25-12-2009','28-12-2009','01-01-2010','02-04-2010','05-04-2010','03-05-2010','31-05-2010','30-08-2010','24-12-2010','27-12-2010','28-12-2010','31-12-2010','03-01-2011','22-04-2011','25-04-2011','29-04-2011','02-05-2011','30-05-2011','29-08-2011','26-12-2011','27-12-2011')
then 0
else 1
end working_day
from dual
connect by level <= 36525 + sysdate - to_date('01-01-2000') )
sum(c.working_day) AS Week_Day
join calendar_cte c
on c.calendar_date between a.ASM_START_DATE and a.ASM_END_DATE
a.ASM_END_DATE >= '01/01/2012' -- and a.ASM_ID = 'A00000'
This will produce a virtual table populated with dates from 01 January 2000 to 10 years after the current date, with all weekends marked as non-working days and all days specified in the second in clause (ie. up to 27 December 2011) also marked as non-working days.
The drawback of this method (or any method where the holiday dates are hardcoded into the query) is that each time new holiday dates are defined, every single query that uses this approach will have to have those dates added.
If you can't use a calendar table in Oracle, you might be better off exporting to Excel. Brute force always works.
Networkdays() "returns the number of whole working days between start_date and end_date. Working days exclude weekends and any dates identified in holidays."
Excluding weekends seems fairly straightforward. Every 7-day period will contain two weekend days. You'll just need to take some care with the leftover days.
Holidays are a different story. You have to either store them or pass them as an argument. If you could store them, you'd store them in a calendar table, and your problem would be over. But you can't do that.
So you're looking at passing them as an argument. Off the top of my head--and I haven't had any tea yet this morning--I'd consider a common table expression or a wrapper for a stored procedure.