Snowflake SQL Query Left Join Issue

Snowflake SQL Query Left Join Issue - sql

For one of our code, left join is not behaving properly in snowflake. Require your help if you can find solution around the same.
We have a sample data setup like mentioned below with basicc table join.
CREATE TABLE patient_test(pid INT);
INSERT INTO patient_test (pid) VALUES (100);
CREATE TABLE pateint_entry_test (pid INT,DateAdded DATETIME);
INSERT INTO pateint_entry_test (pid, DateAdded) VALUES (100, '2020-07-13');
Now look below code where I am just giveing you sample sub query that we are using with other query set. Where our motivation was to get date entry for each patient based on given start/end date.
WITh patient_cte AS(
SELECT * FROM patient_test
)
,
dates AS(
SELECT DATEDIFF(day, CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-06') AS TIMESTAMP_NTZ)),
CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-12') AS TIMESTAMP_NTZ))) AS Total_Days,
CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-06') AS TIMESTAMP_NTZ)) AS Start_Date,
CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-12') AS TIMESTAMP_NTZ)) AS end_date
)
,
cte2 (date) as (
SELECT TO_DATE(START_DATE) FROM dates
UNION ALL
SELECT TO_DATE(DATEADD(day, 1, date)) FROM cte2 WHERE date < (SELECT TOP 1 END_DATE FROM dates)
),
cte3 AS (
select * from patient_cte
cross join cte2
)
SELECT cte3.pid as p_pid,
pateint_entry_test.pid as p_entry_pid,
pateint_entry_test.DateAdded,
cte3."DATE" ,
IFNULL( pateint_entry_test.DateAdded, cte3."DATE") AS CALCULATEDDATEMEASURED
FROM cte3
LEFT JOIN pateint_entry_test ON
cte3.pid = pateint_entry_test.pid AND
cte3."DATE" = TO_DATE(pateint_entry_test.DateAdded)
Output of the query gives result as below.
Where you can see CALCULATEDDATEMEASURED for Row number 2 to 7 are coming as 2020-07-06 00:00:00. But as DAETADDED is coming for null then it should come proper date based on DATE column value ( Based on this condition IFNULL( pateint_entry_test.DateAdded, cte3."DATE"))
Expecting below output from the query
Not sure what is wrong, but its not behaving as expected. Aprreciate your help on this. Thank you.

I'm not sure if this is a bug, but it's due to type coercion based on the way you wrote your query. Here's your query with the TO_DATE logic applied in the IFNULL statement the same as it is in the join (along with a COALESCE to show that it produces the same result as IFNULL):
WITh patient_cte AS(
SELECT * FROM patient_test
)
,
dates AS(
SELECT DATEDIFF(day, CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-06') AS TIMESTAMP_NTZ)),
CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-12') AS TIMESTAMP_NTZ))) AS Total_Days,
CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-06') AS TIMESTAMP_NTZ)) AS Start_Date,
CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-12') AS TIMESTAMP_NTZ)) AS end_date
)
,
cte2 (date) as (
SELECT TO_DATE(START_DATE) FROM dates
UNION ALL
SELECT TO_DATE(DATEADD(day, 1, date)) FROM cte2 WHERE date < (SELECT TOP 1 END_DATE FROM dates)
),
cte3 AS (
select * from patient_cte
cross join cte2
)
SELECT cte3.pid as p_pid,
pateint_entry_test.pid as p_entry_pid,
pateint_entry_test.DateAdded,
cte3."DATE",
IFNULL( pateint_entry_test.DateAdded, cte3."DATE") AS ORIGINAL_ERROR,
IFNULL( to_date(pateint_entry_test.DateAdded), cte3."DATE") AS CALCULATEDDATEMEASURED,
coalesce(to_date(pateint_entry_test.DateAdded), cte3."DATE") as from_coalesce
FROM cte3
LEFT JOIN pateint_entry_test
ON cte3.pid = pateint_entry_test.pid
AND cte3."DATE" = to_date(pateint_entry_test.DateAdded);
Running this in Snowflake produces this:

Related

SQL fill next date (month) with loop

I have input table, and need to add missing dates, but not to max, but up to next available month.
so I need to use loop.
SET #mindate = '2021.01'
SET #maxdate = CAST( GETDATE() AS Date ) --date today as max date
while
begin
if #mindate => #maxdate
begin
break
end
set #mindate = #mindate + 1
end
then i can get 1+.. but it does not stop to 7 month
so i totally got stuck with writing loop.
Data table :
could anybody help on code? as most examples are with joins, to data tables, or to one max value.

Paul, I'm assuming that you forgot to specify the month in your mock data.
I hope the code below may help you understand how non-trivial is what you are trying to accomplish :-) Kudos for your will to get rid of loops.
To make it better, I propose a denormalization (CAUTION!):
create another column price_valid_until
the latest prices records will have price_valid_until = '21000101' (aka, far away in the future)
when registering a new price, update the previous with new price_valid_from - 1 day
Here's the solution, with a pretty complex, but efficient query (http://sqlfiddle.com/#!18/4ab23/4)
create table price_history(
SKU varchar(255),
price_valid_from date,
price decimal(16, 2)
)
insert into price_history
values
('a', '20210101', 10),
('a', '20210107', 12),
('b', '20210102', 4),
('b', '20210110', 2),
('b', '20210214', 5);
-- This fiddler won't let me initialize and reference:
--
-- declare
-- #from_date date,
-- #to_date date;
--
-- select
-- #from_date = min(date_from),
-- #to_date = max(date_from)
-- from price_history
with
date_range as(
select
min(price_valid_from) as from_date,
--
eomonth(
max(price_valid_from)
) as to_date
from price_history
),
--
all_dates as(
select from_date as date_in_range
from date_range
-- ----------
union all
-- ----------
select dateadd(day, 1, date_in_range)
from all_dates
where
date_in_range < (
select to_date
from date_range
)
),
--
price_history_boundaries as(
select
ph.SKU,
ph.price,
--
ph.price_valid_from,
-- The latest price, so far, is valid until 01/01/2100
coalesce(
dateadd(
day,
-1,
min(ph_next.price_valid_from)
),
'21000101'
) as price_valid_until
from
price_history ph
left outer join price_history ph_next
on(
ph_next.SKU = ph.SKU
and ph_next.price_valid_from > ph.price_valid_from
)
group by ph.SKU, ph.price_valid_from, ph.price
)
select
phb.SKU,
ad.date_in_range,
phb.price
from
all_dates ad
inner join price_history_boundaries phb
on(
phb.price_valid_from <= ad.date_in_range
and phb.price_valid_until >= ad.date_in_range
)
order by phb.SKU, ad.date_in_range

You can easily achieve your desired result by creating list of dates from which to join to. Here I've used a recursive CTE to create a range of dates, adding 1 month per iteration up to the current date.
It's then a simple matter of joining to your source data, here lead() is handy for limiting the joined rows. Also, assuming SQL Server from the usage of Getdate:
declare #start date=(select Min([date]) from sourcetable);
with m as (
select 1 num, #start [Date]
union all
select num+1 , DateAdd(month,1,m.[date])
from m
where DateAdd(month,1,m.[date]) <= GetDate()
), t as (
select *, Lead([date],1,GetDate()) over (order by [date]) NextDate
from sourcetable
)
select m.[Date], t.sku, t.price
from m
join t on m.[date] >= t.[date] and m.[date] < t.nextdate
See Working Fiddle

SQL Union as Subquery to create Date Ranges from Start Date

I have three tabels, each of them has a date column (the date column is an INT field and needs to stay that way). I need a UNION accross all three tables so that I get the list of unique dates in accending order like this:
20040602
20051215
20060628
20100224
20100228
20100422
20100512
20100615
Then I need to add a column to the result of the query where I subtract one from each date and place it one row above as the end date. Basically I need to generate the end date from the start date somehow and this is what I got so far (not working):
With Query1 As (
Select date_one As StartDate
From table_one
Union
Select date_two As StartDate
From table_two
Union
Select date_three e As StartDate
From table_three
Order By Date Asc
)
Select Query1.StartDate - 1 As EndDate
From Query1
Thanks a lot for your help!

Building on your existing union cte, we can use lead() in the outer query to get the start_date of the next record, and withdraw 1 from it.
with q as (
select date_one start_date from table_one
union select date_two from table_two
union select date_three from table_three
)
select
start_date,
dateadd(day, -1, lead(start_date) over(order by start_date)) end_date
from q
order by start_date
If the datatype the original columns are numeric, then you need to do some casting before applying date functions:
with q as (
select cast(cast(date_one as varchar(8)) as date) start_date from table_one
union select cast(cast(date_two as varchar(8)) as date) from table_two
union select cast(cast(date_three as varchar(8)) as date) from table_three
)
select
start_date,
dateadd(day, -1, lead(start_date) over(order by start_date)) end_date
from q
order by start_date

generate_series() equivalent in snowflake

I'm trying to find the snowflake equivalent of generate_series() (the PostgreSQL syntax).
SELECT generate_series(timestamp '2017-11-01', CURRENT_DATE, '1 day')

Just wanted to expand on Marcin Zukowski's comment to say that these gaps started to show up almost immediately after using a date range generated this way in a JOIN.
We ultimately ended up doing this instead!
select
dateadd(
day,
'-' || row_number() over (order by null),
dateadd(day, '+1', current_date())
) as date
from table (generator(rowcount => 90))

I had a similar problem and found an approach, which avoids the issue of a generator requiring a constant value by using a session variable in addition to the already great answers here. This is closest to the requirement of the OP to my mind.
-- set parameter to be used as generator "constant" including the start day
set num_days = (Select datediff(day, TO_DATE('2017-11-01','YYYY-MM-DD'), current_date()+1));
-- use parameter in bcrowell's answer now
select
dateadd(
day,
'-' || row_number() over (order by null),
dateadd(day, '+1', current_date())
) as date
from table (generator(rowcount => ($num_days)));
-- clean up previously set variable
unset num_days;

WITH RECURSIVE rec_cte AS (
-- start date
SELECT '2017-11-01'::DATE as dt
UNION ALL
SELECT DATEADD('day',1,dt) as dt
FROM rec_cte
-- end date (inclusive)
WHERE dt < current_date()
)
SELECT * FROM rec_cte

Adding this answer for completitude, in case you have an initial and last date:
select -1 + row_number() over(order by 0) i, start_date + i generated_date
from (select '2020-01-01'::date start_date, '2020-01-15'::date end_date)
join table(generator(rowcount => 10000 )) x
qualify i < 1 + end_date - start_date

I found the generator function in Snowflake quite limiting for all but the simplest use cases. For example, it was not clear how to take a single row specification, explode it into a table of dates and join it back to the original spec table.
Here is an alternative that uses recursive CTEs.
-- A 2 row table that contains "specs" for a date range
create local temp table date_spec as
select 1 as id, '2022-04-01'::date as start_date, current_date() as end_date
union all
select 2, '2022-03-01', '2032-03-30'
;
with explode_date(id, date, next_date, end_date) as (
select
id
, start_date as date -- start_date is the first date
, date + 1 as next_date -- next_date is the date of for the subsequent row in the recursive cte
, end_date
from date_spec
union all
select
ds.id
, ed.next_date -- the current_date is the value of next_date from above
, ed.next_date + 1
, ds.end_date
from date_spec ds
join explode_date ed
on ed.id = ds.id
where ed.date <= ed.end_date -- keep running until you hit the end_date
)
select * from explode_date
order by id, date desc
;

This is how I was able to generate a series of dates in Snowflake. I set row count to 1095 to get 3 years worth of dates, you can of course change that to whatever suits your use case
select
dateadd(day, '-' || seq4(), current_date()) as dte
from
table
(generator(rowcount => 1095))
Originally found here
EDIT: This solution is not correct. seq4 does not guarantee a sequence without gaps. Please follow other answers, not this one. Thanks #Marcin Zukowski for pointing that out.

Find missing date as compare to calendar

I am explain problem in short.
select distinct DATE from #Table where DATE >='2016-01-01'
Output :
Date
2016-11-23
2016-11-22
2016-11-21
2016-11-19
2016-11-18
Now i need to find out missing date a compare to our calender dates from year '2016'
i.e. Here date '2016-11-20' is missing.
I want list of missing dates.
Thanks for reading this. Have nice day.

You need to generate dates and you have to find missing ones. Below with recursive cte i have done it
;WITH CTE AS
(
SELECT CONVERT(DATE,'2016-01-01') AS DATE1
UNION ALL
SELECT DATEADD(DD,1,DATE1) FROM CTE WHERE DATE1<'2016-12-31'
)
SELECT DATE1 MISSING_ONE FROM CTE
EXCEPT
SELECT * FROM #TABLE1
option(maxrecursion 0)

Using CTE and get all dates in CTE table then compare with your table.
CREATE TABLE #yourTable(_Values DATE)
INSERT INTO #yourTable(_Values)
SELECT '2016-11-23' UNION ALL
SELECT '2016-11-22' UNION ALL
SELECT '2016-11-21' UNION ALL
SELECT '2016-11-19' UNION ALL
SELECT '2016-11-18'
DECLARE #DATE DATE = '2016-11-01'
;WITH CTEYear (_Date) AS
(
SELECT #DATE
UNION ALL
SELECT DATEADD(DAY,1,_Date)
FROM CTEYear
WHERE _Date < EOMONTH(#DATE,0)
)
SELECT * FROM CTEYear
WHERE NOT EXISTS(SELECT 1 FROM #yourTable WHERE _Date = _Values)
OPTION(maxrecursion 0)

You need to generate the dates and then find the missing ones. A recursive CTE is one way to generate a handful of dates. Another way is to use master..spt_values as a list of numbers:
with n as (
select row_number() over (order by (select null)) - 1 as n
from master..spt_values
),
d as (
select dateadd(day, n.n, cast('2016-01-01' as date)) as dte
from n
where n <= 365
)
select d.date
from d left join
#table t
on d.dte = t.date
where t.date is null;
If you are happy enough with ranges of missing dates, you don't need a list of dates at all:
select date, (datediff(day, date, next_date) - 1) as num_missing
from (select t.*, lead(t.date) over (order by t.date) as next_date
from #table t
where t.date >= '2016-01-01'
) t
where next_date <> dateadd(day, 1, date);

Calculating per day in SQL

I have an sql table like that:
Id Date Price
1 21.09.09 25
2 31.08.09 16
1 23.09.09 21
2 03.09.09 12
So what I need is to get min and max date for each id and dif in days between them. It is kind of easy. Using SQLlite syntax:
SELECT id,
min(date),
max(date),
julianday(max(date)) - julianday(min(date)) as dif
from table group by id
Then the tricky one: how can I receive the price per day during this difference period. I mean something like this:
ID Date PricePerDay
1 21.09.09 25
1 22.09.09 0
1 23.09.09 21
2 31.08.09 16
2 01.09.09 0
2 02.09.09 0
2 03.09.09 12
I create a cte as you mentioned with calendar but dont know how to get the desired result:
WITH RECURSIVE
cnt(x) AS (
SELECT 0
UNION ALL
SELECT x+1 FROM cnt
LIMIT (SELECT ((julianday('2015-12-31') - julianday('2015-01-01')) + 1)))
SELECT date(julianday('2015-01-01'), '+' || x || ' days') as date FROM cnt
p.s. If it will be in sqllite syntax-would be awesome!

You can use a recursive CTE to calculate all the days between the min date and max date. The rest is just a left join and some logic:
with recursive cte as (
select t.id, min(date) as thedate, max(date) as maxdate
from t
group by id
union all
select cte.id, date(thedate, '+1 day') as thedate, cte.maxdate
from cte
where cte.thedate < cte.maxdate
)
select cte.id, cte.date,
coalesce(t.price, 0) as PricePerDay
from cte left join
t
on cte.id = t.id and cte.thedate = t.date;

One method is using a tally table.
To build a list of dates and join that with the table.
The date stamps in the DD.MM.YY format are first changed to the YYYY-MM-DD date format.
To make it possible to actually use them as a date in the SQL.
At the final select they are formatted back to the DD.MM.YY format.
First some test data:
create table testtable (Id int, [Date] varchar(8), Price int);
insert into testtable (Id,[Date],Price) values (1,'21.09.09',25);
insert into testtable (Id,[Date],Price) values (1,'23.09.09',21);
insert into testtable (Id,[Date],Price) values (2,'31.08.09',16);
insert into testtable (Id,[Date],Price) values (2,'03.09.09',12);
The SQL:
with Digits as (
select 0 as n
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9
),
t as (
select Id,
('20'||substr([Date],7,2)||'-'||substr([Date],4,2)||'-'||substr([Date],1,2)) as [Date],
Price
from testtable
),
Dates as (
select Id, date(MinDate,'+'||(d2.n*10+d1.n)||' days') as [Date]
from (
select Id, min([Date]) as MinDate, max([Date]) as MaxDate
from t
group by Id
) q
join Digits d1
join Digits d2
where date(MinDate,'+'||(d2.n*10+d1.n)||' days') <= MaxDate
)
select d.Id,
(substr(d.[Date],9,2)||'.'||substr(d.[Date],6,2)||'.'||substr(d.[Date],3,2)) as [Date],
coalesce(t.Price,0) as Price
from Dates d
left join t on (d.Id = t.Id and d.[Date] = t.[Date])
order by d.Id, d.[Date];
The recursive SQL below was totally inspired by the excellent answer from Gordon Linoff.
And a recursive SQL is probably more performant for this anyway.
(He should get the 15 points for the accepted answer).
The difference in this version is that the datestamps are first formatted to YYYY-MM-DD.
with t as (
select Id,
('20'||substr([Date],7,2)||'-'||substr([Date],4,2)||'-'||substr([Date],1,2)) as [Date],
Price
from testtable
),
cte as (
select Id, min([Date]) as [Date], max([Date]) as MaxDate from t
group by Id
union all
select Id, date([Date], '+1 day'), MaxDate from cte
where [Date] < MaxDate
)
select cte.Id,
(substr(cte.[Date],9,2)||'.'||substr(cte.[Date],6,2)||'.'||substr(cte.[Date],3,2)) as [Date],
coalesce(t.Price, 0) as PricePerDay
from cte
left join t
on (cte.Id = t.Id and cte.[Date] = t.[Date])
order by cte.Id, cte.[Date];

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Snowflake SQL Query Left Join Issue - sql

Related

SQL fill next date (month) with loop

SQL Union as Subquery to create Date Ranges from Start Date

generate_series() equivalent in snowflake

Find missing date as compare to calendar

Calculating per day in SQL

Categories

Resources