How to make two weeks date_trunk in SQL (Vertica)? - sql

I need to turn each timestamp to its date_trunk with two weeks interval. Say, same as date_trunk('week', event_time), but it would be date_trunk('2 weeks', event_time). So I'd have a timestamp column, and its two-weeks date_trunk column as following.
I tried going with date_trunc('week', event_time) + '1 week'::interval or date_trunc('week', event_time) +7 but it just makes an offset from my event_date.
Does anyone know how to fix it?

This answer assumes that ISO week #1 and #2 should map to 2-week #1, weeks 3 and 4 map to 2-week #2 etc. We can try using floor and division here:
SELECT
event_time,
FLOOR((WEEK(event_time) + 1) / 2) AS two_week
FROM yourTable;

You can use case expression:
select (case when mod(week(event_time), 1)
then date_trunc('week', event_time)
else date_trunc('week', event_time) - interval '1 week'
end)

So my final code actually is (with a help of #Tim Biegeleisen):
select distinct
event_date
, min(event_date) over (partition by two_week) as two_week
from (
select distinct
event_time::date as event_date
, FLOOR(week((DATE_TRUNC('week', event_time) - 1)) / 2) AS two_week
from MyTable

Try one of my favourite Vertica-specific functions on date / time: TIME_SLICE() .
SELECT
dt
, TIME_SLICE(dt,24*7*2,'HOUR') AS hebdo
FROM dt;
The biggest unit is the hour, but you can multiply it ....

Related

SQL Query to group dates and includes different dates in the aggregation

I have a table with two columns, dates and number of searches in each date. What I want to do is group by the dates, and find the sum of number of searches for each date.
The trick is that for each group, I also want to include the number of searches for the date exactly the following week, and the number of searches for the date exactly the previous week.
So If I have
Date
Searches
2/3/2023
2
2/10/2023
4
2/17/2023
1
2/24/2023
5
I want the output for the 2/10/2023 and 2/17/2023 groups to be
Date
Sum
2/10/2023
7
2/17/2023
10
How can I write a query for this?
You can use a correlated query for this:
select date, (
select sum(searches)
from t as x
where x.date between t.date - interval '7 day' and t.date + interval '7 day'
) as sum_win
from t
Replace interval 'x day' with the appropriate date add function for your RDBMS.
If your RDBMS supports interval in window functions then a much better solution would be:
select date, sum(searches) over (
order by date
range between interval '7 day' preceding and interval '7 day' following
) as sum_win
from t
Assuming weekly rows
CREATE TABLE Table1
([Dates] date, [Searches] int)
;
INSERT INTO Table1
([Dates], [Searches])
VALUES
('2023-02-03 00:00:00', 2),
('2023-02-10 00:00:00', 4),
('2023-02-17 00:00:00', 1),
('2023-02-24 00:00:00', 5)
;
;with cte as (
select dates
, searches
+ lead(searches) over(order by dates)
+ lag(searches) over(order by dates) as sum_searches
from table1)
select * from cte
where sum_searches is not null;
dates
sum_searches
2023-02-10
7
2023-02-17
10
fiddle

How to only add business days to a date in BigQuery?

For a given date I want to add business days to it. For example, if today is 10-17-2022 and I have a field that is 8 business days. How can I add 8 business days to 10-17-2022 which would be 10-27-2022.
Current Data:
BUSINESS_DAYS
Date
8
10-11-2022
10
10-13-2022
9
10-12-2022
Desired Output Data
BUSINESS_DAYS
Date
FINAL_DATE
8
10-11-2022
10-21-2022
10
10-13-2022
10-27-2022
9
10-12-2022
10-25-2022
As you can see we are skipping all weekends. We can ignore holidays for now.
Update:
Using
The suggest logic I got the following answer. I changed the names up.
I used:
DATE_ADD(A.PO_SENT_DATE , INTERVAL
(CAST(PREDICTED_LEAD_TIME AS INT64)
+ (date_diff(A.PO_SENT_DATE , DATE_ADD(A.PO_SENT_DATE , INTERVAL CAST(PREDICTED_LEAD_TIME AS INT64) DAY), week)* 2))
DAY) as FINAL_DATE
Update2: Using the following:
DATE_ADD(`Date`, INTERVAL
(BUSINESS_DAYS
+ (date_diff( DATE_ADD(`Date`, INTERVAL BUSINESS_DAYS DAY),`Date`, week) * 2))
DAY) as FINAL_DATE
There are instances where the result falls on the weekend. See screenshot below. 10-22-2022 falls on a Saturday.
Consider below simple solution
select *,
( select day
from unnest(generate_date_array(date, date + (div(business_days, 5) + 1) * 7)) day
where not extract(dayofweek from day) in (1, 7)
qualify row_number() over(order by day) = business_days + 1
) final_date
from your_table
if applied to sample data in your question
with your_table as (
select 8 business_days, date '2022-10-11' date union all
select 10, '2022-10-13' union all
select 9, '2022-10-12'
)
output is
The solution from #mikhailberlyant is really really cool, and very innovative. However if you have a lot of rows in your table and value of "business_days" column varies a lot, query will be less efficient especially for larger "business_days" values as implementation needs to generate entire range of array for each row, unnest it, and then do manipulation in that array.
This might help you do calculation without any array business:
select day, add_days as add_business_days,
DATE_ADD(day, INTERVAL cast(add_days +2*ceil((add_days -(5-(
(case when EXTRACT(DAYOFWEEK FROM day) = 7 then 1 else EXTRACT(DAYOFWEEK FROM day) end)
-1)))/5)+(case when EXTRACT(DAYOFWEEK FROM day) = 7 then 1 else 0 end) as int64) DAY) as final_day
from
(select parse_date('%Y-%m-%d', "2022-10-11") as day, 8 as add_days)

how can i use unnest generate date array to provide end of month index?

i want to be able to create a fake index for my data so e.g. if i have an single order i want it repeated for every date in the array created below.
select
*
from
database.data,
UNNEST(GENERATE_DATE_ARRAY(
'2014-01-01',
(SELECT
MAX(Order_Date)
FROM
database.data), INTERVAL 1 MONTH)) AS month
however this creates an index of the 1st of each month, how can i change this so it's the end of every month? e.g. 2014-01-31, and 1 month interval, onwards
You can use date arithmetics:
select d.*, date_sub(date_add(dt, 1, interval 1 month), interval 1 day)
from database.data d
cross join unnest(
generate_date_array('2014-01-01', (select max(order_date) from database.data), interval 1 month)
) as dt
As of 10/14/2020, a new function LAST_DAY is released to do this in one stop:
SELECT LAST_DAY(DATE '2008-11-25', MONTH) AS last_day

Difference of datetime column in SQL

I have a table of 20000 records. each Record has a datetime field. I want to select all records where gap between one record and subsequent record is more than one hour [condition to be applied on datetime field].
can any one give me the SQL command code for this purpose.
regards
KAM
ANSI SQL supports the lead() function. However, date/time functions vary by database. The following is the logic you want, although the exact syntax varies, depending on the database:
select t.*
from (select t.*,
lead(datetimefield) over (order by datetimefield) as next_datetimefield
from t
) t
where datetimefield + interval '1 hour' < next_datetimefield;
Note: In Teradata, the where would be:
where datetimefield + interval '1' hour < next_datetimefield;
This can also be done with a sub query, which should work on all DBMS. As gordon said, date/time functions are different in every one.
SELECT t.* FROM YourTable t
WHERE t.DateCol + interval '1 hour' < (SELECT min(s.DateCol) FROM YourTable s
WHERE t.ID = s.ID AND s.DateCol > t.DateCol)
You can replace this:
t.DateCol + interval '1 hour'
With one of this so it will work on almost every DBMS:
DATE_ADD( t.DateCol, INTERVAL 1 hour)
DATEADD(hour,1,t.DateCol)
Although Teradata doesn't support Standard SQL's LEAD it's easy to rewrite:
select tab.*,
min(ts) over (order by ts rows between 1 following and 1 following) as next_ts
from tab
qualify
ts < next_ts - interval '1' hour
If you don't need to show the next timestamp:
select *
from tab
qualify
ts < min(ts) over (order by ts rows between 1 following and 1 following) - interval '1' hour
QUALIFY is a Teradata extension, but really nice to have, similar to HAVING after GROUP BY

Redshift: Running query using GETDATE() at specified list of times

So, I have a query that uses GETDATE() in WHERE and HAVING clauses:
SELECT GETDATE(), COUNT(*) FROM (
SELECT 1 FROM events
WHERE (event_time > (GETDATE() - interval '25 hours'))
GROUP BY id
HAVING MAX(event_time) BETWEEN (GETDATE() - interval '25 hours') AND (GETDATE() - interval '24 hours')
)
I'm basically trying to find the number of unique ids that have their latest event_time between 25 and 24 hours ago with respect to the current time.
The problem: I have another table query_dts which contains one column containing timestamps. Instead of running the above query on the current time, using GETDATE(), I need to run in on the timestamp of every entry of the query_dts table. Any ideas?
Note: I'm not really storing query_dts anywhere. I've created it like this:
WITH query_dts AS (
SELECT (
DATEADD(hour,-(row_number() over (order by true)), getdate())
) as n
FROM events LIMIT 48
),
which I got from here
How about avoiding the generator altogether and instead just splitting the intervals:
SELECT
dateadd(hour, -distance, getdate()),
count(0) AS event_count
FROM (
SELECT
id,
datediff(hour, max(event_time), getdate()) AS distance
FROM events
WHERE event_time > getdate() - INTERVAL '2 days'
GROUP BY id) AS events_with_distance
GROUP BY distance;
You can use a JOIN to combine the two queries. Then you just need to substitute the values for your date expression. I think this is the logic:
WITH query_dts AS (
SELECT DATEADD(hour, -(row_number() over (order by true)), getdate()) as n
FROM events
LIMIT 48
)
SELECT d.n, COUNT(*)
FROM (SELECT d.n
FROM events e JOIN
query_dts d
WHERE e.event_time > d.n
GROUP BY id
HAVING MAX(event_time) BETWEEN n - interval '25 hours' AND n
) i;
Here's what I ended up doing:
WITH max_time_table AS
(
SELECT id, max(event_time) AS max_time
FROM events
WHERE (event_time > GETDATE() - interval '74 hours')
GROUP BY id
),
query_dts AS
(
SELECT (DATEADD(hour,-(row_number() over (ORDER BY TRUE) - 1), getdate()) ) AS n
FROM events LIMIT 48
)
SELECT query_dts.n, COUNT(*)
FROM max_time_table JOIN query_dts
ON max_time_table.max_time BETWEEN (query_dts.n - interval '25 hours') AND (query_dts.n - interval '24 hours')
GROUP BY query_dts.n
ORDER BY query_dts.n DESC
Here, I selected 74 hours because I wanted 48 hours ago + 25 hours ago = 73 hours ago.
The problem is that this isn't a general-purpose way of doing this. It's a highly specific solution for this particular problem. Can someone think of a more general way of running a query dependent on GETDATE() using a column of dates in another table?