How to get DATE_DIFF in decimal - sql

My friends are migrating from Netezza to BigQuery. In Netezza "month_between" function gives them back a decimal result. Meanwhile in BQ date_diff is always an integer. Is there a way to get fractional output in BQ?
(their logic)

You could write an UDF:
CREATE TEMP FUNCTION months_between_impl(date_1 DATE, date_2 DATE) AS (
CASE
WHEN date_1 = date_2
THEN 0
WHEN EXTRACT(DAY FROM DATE_ADD(date_1, INTERVAL 1 DAY)) = 1
AND EXTRACT(DAY FROM DATE_ADD(date_2, INTERVAL 1 DAY)) = 1
THEN DATE_DIFF(date_1,date_2, MONTH)
WHEN EXTRACT(DAY FROM date_1) = 1
AND EXTRACT(DAY FROM DATE_ADD(date_2, INTERVAL 1 DAY)) = 1
THEN DATE_DIFF(DATE_ADD(date_1, INTERVAL -1 DAY), date_2, MONTH) + 1/31
ELSE DATE_DIFF(DATE_ADD(date_1, INTERVAL -1 DAY), date_2, MONTH) - 1 + EXTRACT(DAY FROM DATE_ADD(date_1, INTERVAL -1 DAY)) / 31 + (31 - EXTRACT(DAY FROM date_2) + 1) / 31
END
);
CREATE TEMP FUNCTION months_between(date_1 DATE, date_2 DATE) AS (
TRUNC(months_between_impl(date_1, date_2),9)
);
WITH
t AS (
SELECT DATE("2005-02-02") AS from_date, DATE("2005-01-01") AS to_date, "1.032258064516129" AS Expected
UNION ALL
SELECT DATE("2007-03-15"), DATE("2007-02-20"), "0.838709677419354"
UNION ALL
SELECT DATE("2008-03-29"), DATE("2008-02-29"), "1.0"
UNION ALL
SELECT DATE("2008-03-31"), DATE("2008-02-29"), "1.0"
UNION ALL
SELECT DATE("2005-11-29"), DATE("2006-03-01"), "-3.096774194"
UNION ALL
SELECT DATE("1993-07-01"), DATE("1993-03-31"), "3.03225806"
UNION ALL
SELECT DATE("2005-03-31"), DATE("2005-01-01"), "2.967741935"
UNION ALL
SELECT DATE("2008-03-30"), DATE("2008-02-29"), "1.032258064516129"
)
SELECT
from_date, to_date, expected, months_between(from_date, to_date) months_Between
FROM t;
added by Mikhail
Below is real run on Netezza showing that above UDF actually returns totally correct result (as for some reason the numbers in expected column are not what really Netezza returns - rather correct numbers are under result column - which as I mentioned exactly what Felipe's UDF produces)

Related

How to only add business days to a date in BigQuery?

For a given date I want to add business days to it. For example, if today is 10-17-2022 and I have a field that is 8 business days. How can I add 8 business days to 10-17-2022 which would be 10-27-2022.
Current Data:
BUSINESS_DAYS
Date
8
10-11-2022
10
10-13-2022
9
10-12-2022
Desired Output Data
BUSINESS_DAYS
Date
FINAL_DATE
8
10-11-2022
10-21-2022
10
10-13-2022
10-27-2022
9
10-12-2022
10-25-2022
As you can see we are skipping all weekends. We can ignore holidays for now.
Update:
Using
The suggest logic I got the following answer. I changed the names up.
I used:
DATE_ADD(A.PO_SENT_DATE , INTERVAL
(CAST(PREDICTED_LEAD_TIME AS INT64)
+ (date_diff(A.PO_SENT_DATE , DATE_ADD(A.PO_SENT_DATE , INTERVAL CAST(PREDICTED_LEAD_TIME AS INT64) DAY), week)* 2))
DAY) as FINAL_DATE
Update2: Using the following:
DATE_ADD(`Date`, INTERVAL
(BUSINESS_DAYS
+ (date_diff( DATE_ADD(`Date`, INTERVAL BUSINESS_DAYS DAY),`Date`, week) * 2))
DAY) as FINAL_DATE
There are instances where the result falls on the weekend. See screenshot below. 10-22-2022 falls on a Saturday.
Consider below simple solution
select *,
( select day
from unnest(generate_date_array(date, date + (div(business_days, 5) + 1) * 7)) day
where not extract(dayofweek from day) in (1, 7)
qualify row_number() over(order by day) = business_days + 1
) final_date
from your_table
if applied to sample data in your question
with your_table as (
select 8 business_days, date '2022-10-11' date union all
select 10, '2022-10-13' union all
select 9, '2022-10-12'
)
output is
The solution from #mikhailberlyant is really really cool, and very innovative. However if you have a lot of rows in your table and value of "business_days" column varies a lot, query will be less efficient especially for larger "business_days" values as implementation needs to generate entire range of array for each row, unnest it, and then do manipulation in that array.
This might help you do calculation without any array business:
select day, add_days as add_business_days,
DATE_ADD(day, INTERVAL cast(add_days +2*ceil((add_days -(5-(
(case when EXTRACT(DAYOFWEEK FROM day) = 7 then 1 else EXTRACT(DAYOFWEEK FROM day) end)
-1)))/5)+(case when EXTRACT(DAYOFWEEK FROM day) = 7 then 1 else 0 end) as int64) DAY) as final_day
from
(select parse_date('%Y-%m-%d', "2022-10-11") as day, 8 as add_days)

create table with dates - sql

I have a query that can create a table with dates like below:
with digit as (
select 0 as d union all
select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9
),
seq as (
select a.d + (10 * b.d) + (100 * c.d) + (1000 * d.d) as num
from digit a
cross join
digit b
cross join
digit c
cross join
digit d
order by 1
)
select (last_day(sysdate)::date - seq.num)::date as "Date"
from seq;
How could this be changed to generate only dates
Thanks
demo:db<>fiddle
WITH dates AS (
SELECT
date_trunc('month', CURRENT_DATE) AS first_day_of_month,
date_trunc('month', CURRENT_DATE) + interval '1 month -1 day' AS last_day_of_month
)
SELECT
generate_series(first_day_of_month, last_day_of_month, interval '1 day')::date
FROM dates
date_trunc() truncates a type date (or timestamp) to a certain date part. date_trunc('month', ...) removes all parts but year and month. All other parts are set to their lowest possible values. So, the day part is set to 1. That's why you get the first day of month with this.
adding a month returns the first of the next month, subtracting a day from this results in the last day of the current month.
Finally you can generate a date series with start and end date using the generate_series() function
Edit: Redshift does not support generate_series() with type date and timestamp but with integer. So, we need to create an integer series instead and adding the results to the first of the month:
db<>fiddle
WITH dates AS (
SELECT
date_trunc('month', CURRENT_DATE) AS first_day_of_month,
date_trunc('month', CURRENT_DATE) + interval '1 month -1 day' AS last_day_of_month
)
SELECT
first_day_of_month::date + gs
FROM
dates,
generate_series(
date_part('day', first_day_of_month)::int - 1,
date_part('day', last_day_of_month)::int - 1
) as gs
This answers the original version of the question.
You would use generate_series():
select gs.dte
from generate_series(date_trunc('month', now()::date),
date_trunc('month', now()::date) + interval '1 month' - interval '1 day',
interval '1 day'
) gs(dte);
Here is a db<>fiddle.

Create column with true/false for previous month

I have a column with dates in the following format:
yyyy/m/d
How can I create a calculated column that has as values true for every entry that is on the previous month and false for the rest? I've tried as below, but i don't get the correct output.
IF(Date = DATE_SUB(DATE_TRUNC(Date, MONTH), INTERVAL 1 MONTH), TRUE, FALSE) AS PreviousMonth
Below is for BigQuery Standard SQL
If your date column is of STRING data type with format yyyy/m/d (as it is stated in original question) - you can use below
#standardSQL
SELECT *,
DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH) =
DATE_TRUNC(PARSE_DATE('%Y/%m/%d', date), MONTH) AS PreviousMonth
FROM `project.dataset.table`
If your data column is of DATE data type - you should use below
#standardSQL
SELECT *,
DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH) =
DATE_TRUNC(date, MONTH) AS PreviousMonth
FROM `project.dataset.table`
You can try this alternative option:
WITH querytable AS(
SELECT '2020/06/29' AS input_date
UNION ALL
SELECT '2020/07/29' AS input_date
UNION ALL
SELECT '2020/08/30' AS input_date )
SELECT
PrevMonth,
CurrMonth,
IF(PrevMonth = CurrMonth, TRUE, FALSE) true_false
FROM (
SELECT EXTRACT(MONTH FROM PARSE_DATE('%Y/%m/%d', input_date)) AS PrevMonth,
EXTRACT(MONTH FROM DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH)) AS CurrMonth
FROM
querytable)

Need to split datetime ranges by each day

I have a table that needs to be split on the basis of datetime
Input Table
ID| Start | End
--------------------------------------------
A | 2019-03-04 23:18:04| 2019-03-04 23:21:25
--------------------------------------------
A | 2019-03-04 23:45:05| 2019-03-05 00:15:14
--------------------------------------------
Required Output
ID| Start | End
--------------------------------------------
A | 2019-03-04 23:18:04| 2019-03-04 23:21:25
--------------------------------------------
A | 2019-03-04 23:45:05| 2019-03-04 23:59:59
--------------------------------------------
A | 2019-03-05 00:00:00| 2019-03-05 00:15:14
--------------------------------------------
Thanks!!
Try this below code. This will only work if the start and end date fall in two consecutive day. Not if the start and end date difference is more than 1 day.
MSSQL:
SELECT ID,[Start],[End]
FROM Input_Table A
WHERE DATEDIFF(DD,[Start],[End]) = 0
UNION ALL
SELECT ID,[Start], CAST(CAST(CAST([Start] AS DATE) AS VARCHAR(MAX)) +' 23:59:59' AS DATETIME)
FROM Input_Table A
WHERE DATEDIFF(DD,[Start],[End]) > 0
UNION ALL
SELECT ID,CAST(CAST([End] AS DATE) AS DATETIME),[End]
FROM Input_Table A
WHERE DATEDIFF(DD,[Start],[End]) > 0
ORDER BY 1,2,3
PostgreSQL:
SELECT ID,
TO_TIMESTAMP(startDate,'YYYY-MM-DD HH24:MI:SS'),
TO_TIMESTAMP(endDate, 'YYYY-MM-DD HH24:MI:SS')
FROM mytemp A
WHERE DATE_PART('day', endDate::date) -
DATE_PART('day',startDate::date) = 0
UNION ALL
SELECT ID,
TO_TIMESTAMP(startDate,'YYYY-MM-DD HH24:MI:SS'),
TO_TIMESTAMP(CONCAT(CAST(CAST (startDate AS DATE) AS VARCHAR) ,
' 23:59:59') , 'YYYY-MM-DD HH24:MI:SS')
FROM mytemp A
WHERE DATE_PART('day', endDate::date) -
DATE_PART('day',startDate::date) > 0
UNION ALL
SELECT ID,
TO_TIMESTAMP(CAST(CAST (endDate AS DATE) AS VARCHAR) ,
'YYYY-MM-DD HH24:MI:SS') ,
TO_TIMESTAMP(endDate,'YYYY-MM-DD HH24:MI:SS')
FROM mytemp A
WHERE DATE_PART('day', endDate::date) -
DATE_PART('day',startDate::date) > 0;
PostgreSQL Demo Here
demo:db<>fiddle
This works even when range crosses more than one day
WITH cte AS (
SELECT
id,
start_time,
end_time,
gs,
lag(gs) over (PARTITION BY id ORDER BY gs) -- 2
FROM
a
LEFT JOIN LATERAL
generate_series(start_time::date + 1, end_time::date, interval '1 day') gs --1
ON TRUE
)
SELECT -- 3
id,
COALESCE(lag, start_time) AS start_time,
gs - interval '1 second'
FROM
cte
WHERE gs IS NOT NULL
UNION
SELECT DISTINCT ON (id) -- 4
id,
CASE WHEN start_time::date = end_time::date THEN start_time ELSE end_time::date END, -- 5
end_time
FROM
cte
CTE: the generate_series function generates one row per day new day. So, there is no value if there is no date change
CTE: the lag() window function allows to move the current date value into the next row (the current end is the next start)
With this data set you can calculate the new start and end values. If there is no gs value: There is no date change. This ignored at this point. For all cases with date changes: If there is no lag value, it is the beginning (so it cannot got a previous value). In this case, the normal start_time is taken, otherwise it is a new day which takes the date break time. The end_time is taken with the last second of the day (interval - '1 second')
The second part: Because of the date breaks there is always one additional record which need to be unioned. The last record is from the beginning of the end_time (so cast to date). The CASE clause combines this step with the case of no date change which has been ignored so far. So if start_time and end_time are at the same date, here the original start_time is taken.
Unfortunately, Redshift doesn't have a convenient way to generate a series of numbers. If you table is big enough, you can use it to generate numbers. "Big enough" means that the number of rows is greater than the longest span. Perhaps another table would work, if not this one.
Once you have that, you can use this logic:
with n as (
select row_number() over () - 1 as n
from t
)
select t.id,
greatest(t.s, date_trunc('day', t.s) + n.n * interval '1 day') as s,
least(t.e, date_trunc('day', t.s) + (n.n + 1) * interval '1 day' - interval '1 second') as e
from t join
n
on t.e >= date_trunc('day', t.s) + n.n * interval '1 day';
Here is a db<>fiddle. It uses an old version of Postgres, but not quite old enough for Redshift.
Simulate loop for interval generation using recursive CTE, i.e. take range from start to midnight in seed row, take another day in subsequent rows etc.
with recursive input as (
select 'A' as id, timestamp '2019-03-04 23:18:04' as s, timestamp '2019-03-04 23:21:25' as e union
select 'A' as id, timestamp '2019-03-04 23:45:05' as s, timestamp '2019-03-05 00:15:14' as e union
select 'B' as id, timestamp '2019-03-06 23:45:05' as s, timestamp '2019-03-08 00:15:14' as e union
select 'C' as id, timestamp '2019-03-10 23:45:05' as s, timestamp '2019-03-15 00:15:14' as e
), generate_id as (
select row_number() over () as unique_id, * from input
), rec (unique_id, id, s, e) as (
select unique_id, id, s, least(e, s::date::timestamp + interval '1 day')
from generate_id seed
union
select remaining.unique_id, remaining.id, previous.e, least(remaining.e, previous.e::date::timestamp + interval '1 day')
from rec as previous
join generate_id remaining on previous.unique_id = remaining.unique_id and previous.e < remaining.e
)
select id, s, e from rec
order by id,s,e
Note:
your id column appears not to be unique, so I added custom unique_id column. If id was unique, CTE generate_id was unnecessary. Uniqueness is unavoidable for recursive query to work.
close-open range is better for representation of such data, rather than close-close range. So end time in my query returns 00:00:00, not 23:59:59. If it's not suitable for you, modify query as an exercise.
UPDATE: query works on Postgres. OP originally tagged question postgres, then changed tag to redshift.

BigQuery Where Date is Less Than or Equal to 3 Days Minus Current Date

I'm trying to create a query to only return data where date is minus 3 days from the current date. I've tried:
date <= DATE_ADD(CURRENT_DATE(), -3, 'DAY')
But this returns Error: Expected INTERVAL expression
See WHERE clause in below example
#standardSQL
WITH yourTable AS (
SELECT i, date
FROM UNNEST(GENERATE_DATE_ARRAY('2017-04-15', '2017-04-28')) AS date WITH OFFSET AS i
)
SELECT *
FROM yourTable
WHERE date <= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)
-- ORDER BY date
Btw, in case if you are still with Legacy SQL - see below example
#legacySQL
SELECT *
FROM -- yourTable
(SELECT 1 AS id, DATE('2017-04-20') AS date),
(SELECT 2 AS id, DATE('2017-04-21') AS date),
(SELECT 3 AS id, DATE('2017-04-22') AS date),
(SELECT 4 AS id, DATE('2017-04-23') AS date),
(SELECT 5 AS id, DATE('2017-04-24') AS date),
(SELECT 6 AS id, DATE('2017-04-25') AS date)
WHERE TIMESTAMP(date) <= DATE_ADD(TIMESTAMP(CURRENT_DATE()), -3, 'DAY')
-- ORDER BY date
This works with a string formatted date.
DATE(TIMESTAMP(date)) <= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)
Just tested this and seems to work.
I added this :
and DATE(TIMESTAMP(datevalue)) >= DATE_SUB(CURRENT_DATE(), INTERVAL 21 DAY)
and managed to get all records greater than last 21 days worth. Only thing I changed from #ericbrownaustin 's code was changed the 'date' in the first piece of code in the second set of parenthesis.