Can't access CTE via inner join SQL Server - sql

I know I'm missing something obvious but it's not so obvious to me!
I've got a table valued function that produces a nice interval range of dates given a start, end, interval (thanks to another SO answer!).
I've another TVF that produces the latest part transaction given a date.
However, I was after being able to produce the last parts transaction in a series of dates lying between the start and end dates given. So, given March to May and an interval of say, 2 days, I'd get a sort of time series between the two.
However, I've hit a wall now with CTE's and was trying to avoid going into procedural/cursor style looping to do this.
This is the code:
WITH datesTbl(DateValue)
AS (SELECT DateValue
FROM [dbo].[DateRange]('2016-03-18', '2016-04-27', 1))
SELECT *
FROM datesTbl dr
INNER JOIN dbo.MoveDateDiff(dr.Datevalue, DATEADD(day, 1, dr.DateValue), 14792) pm
ON DATEDIFF(Day, dr.dateValue, pm.MovementDate) <= 1;
I know I've other conceptual errors in the underlying TVF's however here I'm wanting to find a way past the fact I can't seem to access the CTE in the first part of the Inner Join statement (there is no syntax error after the ON declaration!).
Any guidance would be gratefully received!

When you use a TVF, you want APPLY, not JOIN:
WITH datesTbl(DateValue) as (
SELECT DateValue
FROM [dbo].[DateRange]('2016-03-18', '2016-04-27', 1)
)
SELECT *
FROM datesTbl dr CROSS APPLY
dbo.MoveDateDiff(dr.Datevalue, DATEADD(day, 1, dr.DateValue), 14792) pm
WHERE DATEDIFF(Day, dr.dateValue, pm.MovementDate) <= 1;

Related

If statement in a WHERE clause for between two dates

I have a script that counts the number of doses a client has had between their start date and 180 days out.
Now i am trying to have some form of an IF (or CASE) statement in the where clause so its either between the first date and 180 days out OR if that 180 days exceeds 6/30/20, then just do the count between start date and 6/30/20.
In my research i couldnt find anything about using an IF else (or CASE) with dates, in the WHERE function.
This is my current script in SQL Server
SELECT
t.clinic,
t.display_id,
m.FirstDate,
DATEADD(DAY,180,MIN(take_on_date)) AS Days_180,
COUNT(t.dose_number) AS Doses
FROM (SELECT CLINIC
, display_id
, MIN(TAKE_ON_DATE) AS FirstDate
FROM factMedHist
GROUP BY Clinic, display_id
) AS m
INNER JOIN factMedHist AS t
ON t.Clinic = m.Clinic
AND t.display_id = m.display_id
WHERE t.take_on_date
BETWEEN m.FirstDate AND DATEADD(DAY,180,m.FirstDate)
GROUP BY t.Clinic, t.display_id,m.FirstDate
So "start date" = "FirstDate" = "min(TAKE_ON_DATE)". And "Client" = "display_id" but you group and join on the tuple <display_id, Clinic>. I see many struggles in the future based on this unfortunately common issue. Consistent terminology is important.
So here is one take on your issue. A bit verbose to demonstrate what it does. Note also the provision of a MVCE - something that you should provide to encourage others to help. It is a bit of effort you should not expect others to take on just to solve your issues.
You were on the correct path with CASE - but lost it when you started thinking of it as a control-of-flow construct as it is in most other languages. You can compute the startdate and enddate for each client (clinic, display_id)
with cte as (select *,
min(takedate) over (partition by display_id, clinic order by takedate) as startdate,
dateadd(day, 180, min(takedate) over (partition by display_id, clinic order by takedate)) as enddate
from #medhist
)
You were doing that - but the problem is that you need reference that end date in the where clause to filter the rows as desired. Like this:
where takedate <= case when enddate <= '20200630' then enddate else '20200630' end
Fiddle here. Notice that the first take date is irrelevant in the where clause. This is one way to achieve your result. Another obvious approach is to use a conditional sum. That would be good practice if you wanted to increase your knowledge. Depending on your situation one might be more efficient than the other. CTEs are just syntactic sugar but do allow the building of logic in a piece-by-piece approach - something I find can be very helpful in developing a completed tsql statement.
WHERE t.take_on_date BETWEEN m.FirstDate AND DATEADD(DAY,180,m.FirstDate)
OR t.take_on_date > DATEADD(DAY, 180, '20200630')

Two queries returning different results when they should be equivalent?

Our dataset is fundamentally joining a set of dates (weeks from the current week into the past) to a set of sections based on whether those sections started on or before and ended on or after that week. While originally this query gave us the results we expected, this week it began providing us incorrect results. After a bunch of tinkering, we discovered that if we changed the query to a LEFT JOIN and then filtered the query using a WHERE clause, it gave us correct results again.
What's the difference? Why does one work and the other doesn't? (Bonus points: why did the original query work for weeks before suddenly experiencing this error?) Performing the same inner join on Redshift delivers correct results, so it seems to be a Snowflake nuance that we don't understand.
Original query:
WITH week_list AS
(
SELECT DATEADD(week, -4, DATE_TRUNC(week, CURRENT_DATE())) AS week_value
UNION ALL
SELECT DATEADD(week, 1, week_value)
FROM week_list
WHERE DATEADD(week, 1, week_value) < CURRENT_DATE()
),
active_sections_per_week AS
(
SELECT
wl.week_value, s.id section_id
FROM week_list wl
JOIN schema.sections s ON wl.week_value >= DATE_TRUNC(week, s.starts_at)
AND wl.week_value <= DATE_TRUNC(week, s.ends_at)
)
SELECT
aspw.week_value,
COUNT(DISTINCT aspw.section_id) count_sections
FROM
active_sections_per_week aspw
GROUP BY 1
ORDER BY 1 DESC
Results: One row, dated 2019-12-30 (4 weeks ago). No data for the past three weeks.
Note: If you adjust the DATEADD in the first CTE, whatever is the first date returned will always seem to join successfully. This behavior started only within the last week--previously, this query provided the expected number of rows (in other words, the number of weeks specified in that first DATEADD).
"Fixed" query:
WITH week_list AS
(
SELECT DATEADD(week, -4, DATE_TRUNC(week, CURRENT_DATE())) AS week_value
UNION ALL
SELECT DATEADD(week, 1, week_value)
FROM week_list
WHERE DATEADD(week, 1, week_value) < CURRENT_DATE()
),
active_sections_per_week AS
(
SELECT wl.week_value, s.id section_id
FROM week_list wl
LEFT JOIN schema.sections s ON wl.week_value >= DATE_TRUNC(week, s.starts_at)
AND wl.week_value <= DATE_TRUNC(week, s.ends_at)
WHERE s.id IS NOT NULL
)
SELECT aspw.week_value, COUNT(DISTINCT aspw.section_id) count_sections
FROM active_sections_per_week aspw
GROUP BY 1
ORDER BY 1 DESC
Results: returns four rows, weeks dated 2019-12-30 to 2020-01-20, with appropriate section counts.
This is a recursive CTE on "week_list". Redshift does not support recursive CTEs.
Snowflake does support recursive CTEs, which would explain the difference in behavior.
It's hard to test this without the underlying data. If you're getting correct results in Redshift, then chances are you do not need or want a recursive CTE. You can modify it so that "week_list" does not reference itself.
As for why it worked before, it's likely the table state and recursive CTE worked only under special cases. When CURRENT_DATE() advanced, it took it out of that special case. Also, the inner join and left outer join where s.id IS NOT NULL would be equivalent if not in a recursive CTE.
You can read more about recursive CTEs here:
https://docs.snowflake.net/manuals/user-guide/queries-cte.html#recursive-ctes-and-hierarchical-data
the recursive CTE can be avoided if the -4 weeks is a constant with this code:
WITH week_list AS (
SELECT DATEADD(week, column1, DATE_TRUNC(week, CURRENT_DATE()))
FROM VALUES (-4),(-3),(-2),(-1),(0)
)
with the JOIN snowflake will move the filters higher in the execution stack, and you might have found a bug. Where-as with the LEFT JOIN (even though it has a equivalent WHERE clause it most likely avoiding the aggressive broken optimization.
There was a software release last night for us, but we are on an Enterprise account so you might have been upgrade 2 days prior. This release had a number of bugs that impacted us, we had it rolled back (for us)
Thank you for all of the feedback! The good news is you all helped me get to a solution that I think I am satisfied with. I have also followed up with Snowflake so they can investigate this behavior and see if it was user error on my part due to not understanding how recursive CTEs process, or whether it is possibly a bug introduced in a recent release.
Here's what I found: while recursion works for the use case I was applying it to (generating a list of dates based on CURRENT_DATE), it is not strictly necessary. Since we want a list of dates, I could just as easily generate a table and use the row numbers to perform the DATEADD adjustments.
It looks like this:
SELECT DATEADD(week, '-' || ROW_NUMBER() OVER (ORDER BY NULL),
DATEADD(week, 1, DATE_TRUNC(week, CURRENT_DATE()))) AS week_value
FROM table (generator(rowcount => 200))
One of the big benefits to this approach is I am no longer limited by the MAX_RECURSIONS setting in Snowflake (which is set to 100 by default). Since I am using this data to create graphs of activity over time, having 200 values gives me more than three years of history rather than just shy of 2 years of history. I also don't have to contact my Snowflake rep if I want to expand it.
Changing the week_list CTE to this non-recursive approach seems to fix whatever issue was causing the INNER JOIN to perform incorrectly. We still don't understand why the recursive CTE seemed to work for several weeks and then suddenly started misbehaving, but if Snowflake can shed light on that via our support ticket, I will double back here to provide an update. Thank you all for your help and guidance!

SQL Server : INSERT INTO within a SELECT

In SSRS 2008, my query is in the form
WITH CTE ( *that makes a list of dates, unfortunately only starting from today, not from whatever past date the report demands*)
SELECT
*a bunch of columns*
FROM
asimpletable p1
CROSS APPLY
(*CTE*) dates1
LEFT OUTER JOIN
(*demand list subquery with several selects pulling from several tables*) d1 ON (*simple.PartID AND CTE.demandDate*)
LEFT OUTER JOIN
(*supply list subquery with several selects pulling from several tables*) s1 ON (*yada*)
*..and 4 other joins to subqueries*
WHERE
*enough conditions that I don't want to copy it 6 times for 6 different INSERT INTO statements*
Because the CTE can only start from today, when I display all the select by date, any that was dated in the past are lumped out of order.
So I want to make a temp table to cross apply and feed in all the dates I need. I can see how to do it with 6 INSERT INTO, copying the joined subqueries, but not only would it look ugly, it would be worse to keep all the subqueries in sync.
I liked the sound of this, but can't see how to return the selects nor how to apply it to several subqueries https://stackoverflow.com/a/1101516/5612361.
Reason the CTE idea only starts from today:
SET #CTEStartDate = cast(getdate() as date);
SET #CTEEndDate = cast(dateadd(day,100,getdate()) as date);
WITH Dates(eaDate) AS (
SELECT #CTEStartDate AS eaDate --Select First day in range
UNION ALL
SELECT DATEADD(DAY, 1, eaDate) FROM Dates WHERE eaDate < #CTEEndDate --Add a record for every day in the range
)
When i try to use
SET #CTEStartDate = cast(dateadd(day,-30,getdate()) as date);
I get
For more information about this error navigate to the report server on
the local server machine, or enable remote errors
---------------------------- Cannot read the next data row for the dataset DataSet1. (rsErrorReadingNextDataRow)
---------------------------- An error has occurred during report processing. (rsProcessingAborted)
As soon as I take out that dateadd it resolves without a hitch.

counting date and time for historical reporting

I am currently working on a query that will be used in junction with share-point to run reports. I have a query that I know will work with Oracle, but the company I am working for is running SQL Server 2005.
What the report will do is give the person the ability to select any date and time, and give the count for that specific operation. The problem is that there are large gaps in the time stamps (because it takes a little while for the product to get to the next operation). The date type is varchar, so i used substrings to parse out the year, month, day, and time. I have sample data available.
The people looking at the reports want the ability to say at this time and day how many units went through this operation.
I know this is is confusing, let me know if you need any clarification.
Here is the oracle syntax
SELECT T3.PAYMENT_DATE, T3."Hr", T3."Min",
(SELECT COUNT(*)
FROM INVOICE_ARCHIVE T4
WHERE TO_NUMBER(TO_CHAR(T4.PAYMENT_DATE, 'MM')) <= T3."Hr"
AND TO_NUMBER(TO_CHAR(T4.PAYMENT_DATE, 'DD')) <= T3."Min") AS "NUM"
FROM(SELECT T1.PAYMENT_DATE, T2."Hr", T2."Min"
FROM (SELECT (FLOOR((LEVEL + 359)/60)) AS "Hr",
MOD((LEVEL + 359), 60) AS "Min"
FROM dual CONNECT BY LEVEL <= 961) T2, INVOICE_ARCHIVE T1
ORDER BY T1.PAYMENT_DATE, T2."Hr", T2."Min") T3
The answer to your question is the datepart() function in SQL Server. This will allow you to extract minutes and hours from dates.
The harder part is the "connect by level" portion. How is this being used? You might need to use recursive CTEs to handle this.
With the little hint from spencer, the following may suffice for your query:
SELECT T3.PAYMENT_DATE, T3."Hr", T3."Min",
(SELECT COUNT(*)
FROM INVOICE_ARCHIVE T4
WHERE datepart(month, T4.PAYMENT_DATE) <= T3."Hr" AND
datepart(day, T4.PAYMENT_DATE, 'DD') <= T3."Min"
) AS "NUM"
FROM (SELECT T1.PAYMENT_DATE, T2."Hr", T2."Min"
FROM (SELECT top 961 (FLOOR((LEVEL + 359)/60)) AS "Hr",
MOD((LEVEL + 359), 60) AS "Min"
FROM (select top 961 row_number() over (order by (select NULL)) as level
from invoice_archive
) t
) T2 cross join
INVOICE_ARCHIVE T1
) T3
ORDER BY T3.PAYMENT_DATE, T3."Hr", T3."Min"
I made the following changes:
Changed the date arithmetic to use datepart() instead of to_char() .
Replaced the method for getting a list of numbers, by using row_number() instead of connect by level
Made the cross join explicit
Moved the order by to the outer query, since neither SQL Server nor Oracle guarantee the results of an order by in a subquery (and SQL Server does not allow it unless you have a "TOP" query)

Calculating working days including holidays between dates without a calendar table in oracle SQL

Okay, so I've done quite a lot of reading on the possibility of emulating the networkdays function of excel in sql, and have come to the conclusion that by far the easiest solution is to have a calendar table which will flag working days or non working days. However, due to circumstances out of my control, we don't have access to such a luxury and it's unlikely that we will any time in the near future.
Currently I have managed to bodge together what is undoubtedly a horrible ineffecient query in SQL that does work - the catch is, it will only work for a single client record at a time.
SELECT O_ASSESSMENTS.ASM_ID,
O_ASSESSMENTS.ASM_START_DATE,
O_ASSESSMENTS.ASM_END_DATE,
sum(CASE
When TO_CHAR(O_ASSESSMENTS.ASM_START_DATE + rownum -1,'Day')
= 'Sunday ' THEN 0
When TO_CHAR(O_ASSESSMENTS.ASM_START_DATE + rownum -1,'Day')
= 'Saturday ' THEN 0
WHEN O_ASSESSMENTS.ASM_START_DATE + rownum - 1
IN ('03-01-2000','21-04-2000','24-04-2000','01-05-2000','29-05-2000','28-08-2000','25-12-2000','26-12-2000','01-01-2001','13-04-2001','16-04-2001','07-05-2001','28-05-2001','27-08-2001','25-12-2001','26-12-2001','01-01-2002','29-03-2002','01-04-2002','06-04-2002','03-06-2002','04-06-2002','26-08-2002','25-12-2002','26-12-2002','01-01-2003','18-04-2003','21-04-2003','05-05-2003','26-05-2003','25-08-2003','25-12-2003','26-12-2003','01-01-2004','09-04-2004','12-04-2004','03-05-2004','31-05-2004','30-08-2004','25-12-2004','26-12-2004','27-12-2004','28-12-2004','01-01-2005','03-01-2005','25-03-2005','28-03-2005','02-05-2005','30-05-2005','29-08-2005','27-12-2005','28-12-2005','02-01-2006','14-04-2006','17-04-2006','01-05-2006','29-05-2006','28-08-2006','25-12-2006','26-12-2006','02-01-2007','06-04-2007','09-04-2007','07-05-2007','28-05-2007','27-08-2007','25-12-2007','26-12-2007','01-01-2008','21-03-2008','24-03-2008','05-05-2008','26-05-2008','25-08-2008','25-12-2008','26-12-2008','01-01-2009','10-04-2009','13-04-2009','04-05-2009','25-05-2009','31-08-2009','25-12-2009','28-12-2009','01-01-2010','02-04-2010','05-04-2010','03-05-2010','31-05-2010','30-08-2010','24-12-2010','27-12-2010','28-12-2010','31-12-2010','03-01-2011','22-04-2011','25-04-2011','29-04-2011','02-05-2011','30-05-2011','29-08-2011','26-12-2011','27-12-2011')
THEN 0
ELSE 1
END)-1 AS Week_Day
From O_ASSESSMENTS,
ALL_OBJECTS
WHERE O_ASSESSMENTS.ASM_QSA_ID IN ('TYPE1')
AND O_ASSESSMENTS.ASM_END_DATE >= '01/01/2012'
AND O_ASSESSMENTS.ASM_ID = 'A00000'
AND ROWNUM <= O_ASSESSMENTS.ASM_END_DATE-O_ASSESSMENTS.ASM_START_DATE+1
GROUP BY
O_ASSESSMENTS.ASM_ID,
O_ASSESSMENTS.ASM_START_DATE,
O_ASSESSMENTS.ASM_END_DATE
Basically, I'm wondering if a) I should stop wasting my time on this or b) is it possible to get this to work for multiple clients? Any pointers appreciated thanks!
Edit: Further clarification - I already work out timescales using excel, but it would be ideal if we could do it in the report as the report in question is something that we would like end users to be able to run without any further manipulation.
Edit:
MarkBannister's answer works perfectly albeit slowly (though I had expected as much given it's not the preferred solution) - the challenge now lies in me integrating this into an existing report!
with
calendar_cte as (select
to_date('01-01-2000')+level-1 calendar_date,
case when to_char(to_date('01-01-2000')+level-1, 'day') in ('sunday ','saturday ') then 0 when to_date('01-01-2000')+level-1 in ('03-01-2000','21-04-2000','24-04-2000','01-05-2000','29-05-2000','28-08-2000','25-12-2000','26-12-2000','01-01-2001','13-04-2001','16-04-2001','07-05-2001','28-05-2001','27-08-2001','25-12-2001','26-12-2001','01-01-2002','29-03-2002','01-04-2002','06-04-2002','03-06-2002','04-06-2002','26-08-2002','25-12-2002','26-12-2002','01-01-2003','18-04-2003','21-04-2003','05-05-2003','26-05-2003','25-08-2003','25-12-2003','26-12-2003','01-01-2004','09-04-2004','12-04-2004','03-05-2004','31-05-2004','30-08-2004','25-12-2004','26-12-2004','27-12-2004','28-12-2004','01-01-2005','03-01-2005','25-03-2005','28-03-2005','02-05-2005','30-05-2005','29-08-2005','27-12-2005','28-12-2005','02-01-2006','14-04-2006','17-04-2006','01-05-2006','29-05-2006','28-08-2006','25-12-2006','26-12-2006','02-01-2007','06-04-2007','09-04-2007','07-05-2007','28-05-2007','27-08-2007','25-12-2007','26-12-2007','01-01-2008','21-03-2008','24-03-2008','05-05-2008','26-05-2008','25-08-2008','25-12-2008','26-12-2008','01-01-2009','10-04-2009','13-04-2009','04-05-2009','25-05-2009','31-08-2009','25-12-2009','28-12-2009','01-01-2010','02-04-2010','05-04-2010','03-05-2010','31-05-2010','30-08-2010','24-12-2010','27-12-2010','28-12-2010','31-12-2010','03-01-2011','22-04-2011','25-04-2011','29-04-2011','02-05-2011','30-05-2011','29-08-2011','26-12-2011','27-12-2011','01-01-2012','02-01-2012') then 0 else 1 end working_day
from dual
connect by level <= 1825 + sysdate - to_date('01-01-2000') )
SELECT
a.ASM_ID,
a.ASM_START_DATE,
a.ASM_END_DATE,
sum(c.working_day)-1 AS Week_Day
From
O_ASSESSMENTS a
join calendar_cte c
on c.calendar_date between a.ASM_START_DATE and a.ASM_END_DATE
WHERE a.ASM_QSA_ID IN ('TYPE1')
and a.ASM_END_DATE >= '01/01/2012'
GROUP BY
a.ASM_ID,
a.ASM_START_DATE,
a.ASM_END_DATE
There are a few ways to do this. Perhaps the simplest might be to create a CTE that produces a virtual calendar table, based on Oracle's connect by syntax, and then join it to the Assesments table, like so:
with calendar_cte as (
select to_date('01-01-2000')+level-1 calendar_date,
case when to_char(to_date('01-01-2000')+level-1, 'Day')
in ('Sunday ','Saturday ') then 0
when to_date('01-01-2000')+level-1
in ('03-01-2000','21-04-2000','24-04-2000','01-05-2000','29-05-2000','28-08-2000','25-12-2000','26-12-2000','01-01-2001','13-04-2001','16-04-2001','07-05-2001','28-05-2001','27-08-2001','25-12-2001','26-12-2001','01-01-2002','29-03-2002','01-04-2002','06-04-2002','03-06-2002','04-06-2002','26-08-2002','25-12-2002','26-12-2002','01-01-2003','18-04-2003','21-04-2003','05-05-2003','26-05-2003','25-08-2003','25-12-2003','26-12-2003','01-01-2004','09-04-2004','12-04-2004','03-05-2004','31-05-2004','30-08-2004','25-12-2004','26-12-2004','27-12-2004','28-12-2004','01-01-2005','03-01-2005','25-03-2005','28-03-2005','02-05-2005','30-05-2005','29-08-2005','27-12-2005','28-12-2005','02-01-2006','14-04-2006','17-04-2006','01-05-2006','29-05-2006','28-08-2006','25-12-2006','26-12-2006','02-01-2007','06-04-2007','09-04-2007','07-05-2007','28-05-2007','27-08-2007','25-12-2007','26-12-2007','01-01-2008','21-03-2008','24-03-2008','05-05-2008','26-05-2008','25-08-2008','25-12-2008','26-12-2008','01-01-2009','10-04-2009','13-04-2009','04-05-2009','25-05-2009','31-08-2009','25-12-2009','28-12-2009','01-01-2010','02-04-2010','05-04-2010','03-05-2010','31-05-2010','30-08-2010','24-12-2010','27-12-2010','28-12-2010','31-12-2010','03-01-2011','22-04-2011','25-04-2011','29-04-2011','02-05-2011','30-05-2011','29-08-2011','26-12-2011','27-12-2011')
then 0
else 1
end working_day
from dual
connect by level <= 36525 + sysdate - to_date('01-01-2000') )
SELECT a.ASM_ID,
a.ASM_START_DATE,
a.ASM_END_DATE,
sum(c.working_day) AS Week_Day
From O_ASSESSMENTS a
join calendar_cte c
on c.calendar_date between a.ASM_START_DATE and a.ASM_END_DATE
WHERE a.ASM_QSA_ID IN ('TYPE1') and
a.ASM_END_DATE >= '01/01/2012' -- and a.ASM_ID = 'A00000'
GROUP BY
a.ASM_ID,
a.ASM_START_DATE,
a.ASM_END_DATE
This will produce a virtual table populated with dates from 01 January 2000 to 10 years after the current date, with all weekends marked as non-working days and all days specified in the second in clause (ie. up to 27 December 2011) also marked as non-working days.
The drawback of this method (or any method where the holiday dates are hardcoded into the query) is that each time new holiday dates are defined, every single query that uses this approach will have to have those dates added.
If you can't use a calendar table in Oracle, you might be better off exporting to Excel. Brute force always works.
Networkdays() "returns the number of whole working days between start_date and end_date. Working days exclude weekends and any dates identified in holidays."
Excluding weekends seems fairly straightforward. Every 7-day period will contain two weekend days. You'll just need to take some care with the leftover days.
Holidays are a different story. You have to either store them or pass them as an argument. If you could store them, you'd store them in a calendar table, and your problem would be over. But you can't do that.
So you're looking at passing them as an argument. Off the top of my head--and I haven't had any tea yet this morning--I'd consider a common table expression or a wrapper for a stored procedure.