Generating multiple rows from one row in SQL based on a column - sql

I have a table with these five columns:
The ID is the PI here. BEGIN_WINDOW and END_WINDOW are TIMESTAMP columns. The DURATION_DAYS_RUP is calculated by dividing DURATION_HRS by 24 and rounding up.
What I'm trying to do is based on the DURATION_DAYS_RUP, I need to create multiple rows.
If that value is 1, then the row created is just the same row with same ID, BEGIN_WINDOW and END_WINDOW.
If it's greater than 1, for ex. 2, two rows would be created - one row where the ID is the same, BEGIN_WINDOW is the value from the original row, and END_WINDOW is 24 hrs + BEGIN_WINDOW and the second row would be the same ID, BEGIN_WINDOW is the END_WINDOW of that first row, and END_WINOW is this row's BEGIN_WINDOW + 24 hours.
See the example below:
I've researched a lot but can't seem to find the trick to doing this. If anyone has an idea, would be greatly appreciated!

You could use Teradata's EXPAND ON syntax:
SELECT x.ID, BEGIN(pd) as BEGIN_WINDOW, BEGIN(pd) + INTERVAL '24' HOUR as END_WINDOW
FROM mytable x
EXPAND ON PERIOD(x.BEGIN_WINDOW, x.END_WINDOW) AS pd
BY INTERVAL '24' HOUR;

You can use a recursive query:
with recursive cte (id, begin_window, end_window, duration_days_rup) as (
select
id,
begin_window,
case when duration_days_rup = 1 then end_window else begin_window + interval '1' day end,
duration_days_rup - 1
from mytable
union all
select
id,
begin_window + interval '1' day,
case when duration_days_rup = 1 then end_window else end_window + interval '1' day end,
duration_days_rup - 1
from cte
where duration_days_rup > 0
)
select id, begin_window, end from cte
Looking at your query, I doubt that you really need the duray_days_rup column, which is derived information. We could use straight date comparisons. I think the logic you want is:
with recursive cte (id, begin_window, end_window, real_end_window) as (
select
id,
begin_window,
least(end_window, begin_window + interval '1' day),
end_window
from mytable
union all
select
id,
begin_window + interval '1' day,
least(real_end_window, end_window + interval '1' day),
real_end_window
from cte
where begin_window + interval '1' day > real_end_window
)
select id, begin_window, end from cte

Related

SQL Query to group dates and includes different dates in the aggregation

I have a table with two columns, dates and number of searches in each date. What I want to do is group by the dates, and find the sum of number of searches for each date.
The trick is that for each group, I also want to include the number of searches for the date exactly the following week, and the number of searches for the date exactly the previous week.
So If I have
Date
Searches
2/3/2023
2
2/10/2023
4
2/17/2023
1
2/24/2023
5
I want the output for the 2/10/2023 and 2/17/2023 groups to be
Date
Sum
2/10/2023
7
2/17/2023
10
How can I write a query for this?
You can use a correlated query for this:
select date, (
select sum(searches)
from t as x
where x.date between t.date - interval '7 day' and t.date + interval '7 day'
) as sum_win
from t
Replace interval 'x day' with the appropriate date add function for your RDBMS.
If your RDBMS supports interval in window functions then a much better solution would be:
select date, sum(searches) over (
order by date
range between interval '7 day' preceding and interval '7 day' following
) as sum_win
from t
Assuming weekly rows
CREATE TABLE Table1
([Dates] date, [Searches] int)
;
INSERT INTO Table1
([Dates], [Searches])
VALUES
('2023-02-03 00:00:00', 2),
('2023-02-10 00:00:00', 4),
('2023-02-17 00:00:00', 1),
('2023-02-24 00:00:00', 5)
;
;with cte as (
select dates
, searches
+ lead(searches) over(order by dates)
+ lag(searches) over(order by dates) as sum_searches
from table1)
select * from cte
where sum_searches is not null;
dates
sum_searches
2023-02-10
7
2023-02-17
10
fiddle

How to only add business days to a date in BigQuery?

For a given date I want to add business days to it. For example, if today is 10-17-2022 and I have a field that is 8 business days. How can I add 8 business days to 10-17-2022 which would be 10-27-2022.
Current Data:
BUSINESS_DAYS
Date
8
10-11-2022
10
10-13-2022
9
10-12-2022
Desired Output Data
BUSINESS_DAYS
Date
FINAL_DATE
8
10-11-2022
10-21-2022
10
10-13-2022
10-27-2022
9
10-12-2022
10-25-2022
As you can see we are skipping all weekends. We can ignore holidays for now.
Update:
Using
The suggest logic I got the following answer. I changed the names up.
I used:
DATE_ADD(A.PO_SENT_DATE , INTERVAL
(CAST(PREDICTED_LEAD_TIME AS INT64)
+ (date_diff(A.PO_SENT_DATE , DATE_ADD(A.PO_SENT_DATE , INTERVAL CAST(PREDICTED_LEAD_TIME AS INT64) DAY), week)* 2))
DAY) as FINAL_DATE
Update2: Using the following:
DATE_ADD(`Date`, INTERVAL
(BUSINESS_DAYS
+ (date_diff( DATE_ADD(`Date`, INTERVAL BUSINESS_DAYS DAY),`Date`, week) * 2))
DAY) as FINAL_DATE
There are instances where the result falls on the weekend. See screenshot below. 10-22-2022 falls on a Saturday.
Consider below simple solution
select *,
( select day
from unnest(generate_date_array(date, date + (div(business_days, 5) + 1) * 7)) day
where not extract(dayofweek from day) in (1, 7)
qualify row_number() over(order by day) = business_days + 1
) final_date
from your_table
if applied to sample data in your question
with your_table as (
select 8 business_days, date '2022-10-11' date union all
select 10, '2022-10-13' union all
select 9, '2022-10-12'
)
output is
The solution from #mikhailberlyant is really really cool, and very innovative. However if you have a lot of rows in your table and value of "business_days" column varies a lot, query will be less efficient especially for larger "business_days" values as implementation needs to generate entire range of array for each row, unnest it, and then do manipulation in that array.
This might help you do calculation without any array business:
select day, add_days as add_business_days,
DATE_ADD(day, INTERVAL cast(add_days +2*ceil((add_days -(5-(
(case when EXTRACT(DAYOFWEEK FROM day) = 7 then 1 else EXTRACT(DAYOFWEEK FROM day) end)
-1)))/5)+(case when EXTRACT(DAYOFWEEK FROM day) = 7 then 1 else 0 end) as int64) DAY) as final_day
from
(select parse_date('%Y-%m-%d', "2022-10-11") as day, 8 as add_days)

create table with dates - sql

I have a query that can create a table with dates like below:
with digit as (
select 0 as d union all
select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9
),
seq as (
select a.d + (10 * b.d) + (100 * c.d) + (1000 * d.d) as num
from digit a
cross join
digit b
cross join
digit c
cross join
digit d
order by 1
)
select (last_day(sysdate)::date - seq.num)::date as "Date"
from seq;
How could this be changed to generate only dates
Thanks
demo:db<>fiddle
WITH dates AS (
SELECT
date_trunc('month', CURRENT_DATE) AS first_day_of_month,
date_trunc('month', CURRENT_DATE) + interval '1 month -1 day' AS last_day_of_month
)
SELECT
generate_series(first_day_of_month, last_day_of_month, interval '1 day')::date
FROM dates
date_trunc() truncates a type date (or timestamp) to a certain date part. date_trunc('month', ...) removes all parts but year and month. All other parts are set to their lowest possible values. So, the day part is set to 1. That's why you get the first day of month with this.
adding a month returns the first of the next month, subtracting a day from this results in the last day of the current month.
Finally you can generate a date series with start and end date using the generate_series() function
Edit: Redshift does not support generate_series() with type date and timestamp but with integer. So, we need to create an integer series instead and adding the results to the first of the month:
db<>fiddle
WITH dates AS (
SELECT
date_trunc('month', CURRENT_DATE) AS first_day_of_month,
date_trunc('month', CURRENT_DATE) + interval '1 month -1 day' AS last_day_of_month
)
SELECT
first_day_of_month::date + gs
FROM
dates,
generate_series(
date_part('day', first_day_of_month)::int - 1,
date_part('day', last_day_of_month)::int - 1
) as gs
This answers the original version of the question.
You would use generate_series():
select gs.dte
from generate_series(date_trunc('month', now()::date),
date_trunc('month', now()::date) + interval '1 month' - interval '1 day',
interval '1 day'
) gs(dte);
Here is a db<>fiddle.

Transform days into hours in postgres

I have Hours,Minute,Second, and days as results in the Postgres query. I want to convert everything into hours.
Example
Row 1 result: 19:53:45
Row 2 result: 1 day 05:41:58
Now I want to convert days into hours like below
Row 1 result:19:53:45
Row 2 result: 29:41:58
Can someone help me how to do it in the postgres sql?
cast(col as interval hour to minute) should work, according to Standard SQL.
Anyway, this seems to work:
col - extract(day from col) * interval '1' day -- remove the days
+ extract(day from col) * interval '24' hour -- and add back as hours
See fiddle
Presumably, you want the result as a string, because times are limited to 24 hours. You can construct it as:
select *,
(case when ar[1] like '%day%'
then (split_part(col, ' ', 1)::int * 24 + split_part(ar[1], ' ', 3)::int)::text ||
right(col, 6)
else col
end)
from (values ('19:53:45'), ('1 day 05:41:58')) v(col) cross join lateral
regexp_split_to_array(col, ':') a(ar);
You can also do this without a:
select *,
(case when col like '%day%'
then (split_part(col, ' ', 1)::int * 24 + (regexp_match(col, ' ([0-9]+):'))[1]::int)::text ||
right(col, 6)
else col
end)
from (values ('19:53:45'), ('1 day 05:41:58')) v(col) ;

Need to split datetime ranges by each day

I have a table that needs to be split on the basis of datetime
Input Table
ID| Start | End
--------------------------------------------
A | 2019-03-04 23:18:04| 2019-03-04 23:21:25
--------------------------------------------
A | 2019-03-04 23:45:05| 2019-03-05 00:15:14
--------------------------------------------
Required Output
ID| Start | End
--------------------------------------------
A | 2019-03-04 23:18:04| 2019-03-04 23:21:25
--------------------------------------------
A | 2019-03-04 23:45:05| 2019-03-04 23:59:59
--------------------------------------------
A | 2019-03-05 00:00:00| 2019-03-05 00:15:14
--------------------------------------------
Thanks!!
Try this below code. This will only work if the start and end date fall in two consecutive day. Not if the start and end date difference is more than 1 day.
MSSQL:
SELECT ID,[Start],[End]
FROM Input_Table A
WHERE DATEDIFF(DD,[Start],[End]) = 0
UNION ALL
SELECT ID,[Start], CAST(CAST(CAST([Start] AS DATE) AS VARCHAR(MAX)) +' 23:59:59' AS DATETIME)
FROM Input_Table A
WHERE DATEDIFF(DD,[Start],[End]) > 0
UNION ALL
SELECT ID,CAST(CAST([End] AS DATE) AS DATETIME),[End]
FROM Input_Table A
WHERE DATEDIFF(DD,[Start],[End]) > 0
ORDER BY 1,2,3
PostgreSQL:
SELECT ID,
TO_TIMESTAMP(startDate,'YYYY-MM-DD HH24:MI:SS'),
TO_TIMESTAMP(endDate, 'YYYY-MM-DD HH24:MI:SS')
FROM mytemp A
WHERE DATE_PART('day', endDate::date) -
DATE_PART('day',startDate::date) = 0
UNION ALL
SELECT ID,
TO_TIMESTAMP(startDate,'YYYY-MM-DD HH24:MI:SS'),
TO_TIMESTAMP(CONCAT(CAST(CAST (startDate AS DATE) AS VARCHAR) ,
' 23:59:59') , 'YYYY-MM-DD HH24:MI:SS')
FROM mytemp A
WHERE DATE_PART('day', endDate::date) -
DATE_PART('day',startDate::date) > 0
UNION ALL
SELECT ID,
TO_TIMESTAMP(CAST(CAST (endDate AS DATE) AS VARCHAR) ,
'YYYY-MM-DD HH24:MI:SS') ,
TO_TIMESTAMP(endDate,'YYYY-MM-DD HH24:MI:SS')
FROM mytemp A
WHERE DATE_PART('day', endDate::date) -
DATE_PART('day',startDate::date) > 0;
PostgreSQL Demo Here
demo:db<>fiddle
This works even when range crosses more than one day
WITH cte AS (
SELECT
id,
start_time,
end_time,
gs,
lag(gs) over (PARTITION BY id ORDER BY gs) -- 2
FROM
a
LEFT JOIN LATERAL
generate_series(start_time::date + 1, end_time::date, interval '1 day') gs --1
ON TRUE
)
SELECT -- 3
id,
COALESCE(lag, start_time) AS start_time,
gs - interval '1 second'
FROM
cte
WHERE gs IS NOT NULL
UNION
SELECT DISTINCT ON (id) -- 4
id,
CASE WHEN start_time::date = end_time::date THEN start_time ELSE end_time::date END, -- 5
end_time
FROM
cte
CTE: the generate_series function generates one row per day new day. So, there is no value if there is no date change
CTE: the lag() window function allows to move the current date value into the next row (the current end is the next start)
With this data set you can calculate the new start and end values. If there is no gs value: There is no date change. This ignored at this point. For all cases with date changes: If there is no lag value, it is the beginning (so it cannot got a previous value). In this case, the normal start_time is taken, otherwise it is a new day which takes the date break time. The end_time is taken with the last second of the day (interval - '1 second')
The second part: Because of the date breaks there is always one additional record which need to be unioned. The last record is from the beginning of the end_time (so cast to date). The CASE clause combines this step with the case of no date change which has been ignored so far. So if start_time and end_time are at the same date, here the original start_time is taken.
Unfortunately, Redshift doesn't have a convenient way to generate a series of numbers. If you table is big enough, you can use it to generate numbers. "Big enough" means that the number of rows is greater than the longest span. Perhaps another table would work, if not this one.
Once you have that, you can use this logic:
with n as (
select row_number() over () - 1 as n
from t
)
select t.id,
greatest(t.s, date_trunc('day', t.s) + n.n * interval '1 day') as s,
least(t.e, date_trunc('day', t.s) + (n.n + 1) * interval '1 day' - interval '1 second') as e
from t join
n
on t.e >= date_trunc('day', t.s) + n.n * interval '1 day';
Here is a db<>fiddle. It uses an old version of Postgres, but not quite old enough for Redshift.
Simulate loop for interval generation using recursive CTE, i.e. take range from start to midnight in seed row, take another day in subsequent rows etc.
with recursive input as (
select 'A' as id, timestamp '2019-03-04 23:18:04' as s, timestamp '2019-03-04 23:21:25' as e union
select 'A' as id, timestamp '2019-03-04 23:45:05' as s, timestamp '2019-03-05 00:15:14' as e union
select 'B' as id, timestamp '2019-03-06 23:45:05' as s, timestamp '2019-03-08 00:15:14' as e union
select 'C' as id, timestamp '2019-03-10 23:45:05' as s, timestamp '2019-03-15 00:15:14' as e
), generate_id as (
select row_number() over () as unique_id, * from input
), rec (unique_id, id, s, e) as (
select unique_id, id, s, least(e, s::date::timestamp + interval '1 day')
from generate_id seed
union
select remaining.unique_id, remaining.id, previous.e, least(remaining.e, previous.e::date::timestamp + interval '1 day')
from rec as previous
join generate_id remaining on previous.unique_id = remaining.unique_id and previous.e < remaining.e
)
select id, s, e from rec
order by id,s,e
Note:
your id column appears not to be unique, so I added custom unique_id column. If id was unique, CTE generate_id was unnecessary. Uniqueness is unavoidable for recursive query to work.
close-open range is better for representation of such data, rather than close-close range. So end time in my query returns 00:00:00, not 23:59:59. If it's not suitable for you, modify query as an exercise.
UPDATE: query works on Postgres. OP originally tagged question postgres, then changed tag to redshift.