Calculate time gaps in SQL vertica - sql

I have the following data set:
(StartTime,EndTime)
(2,2.30)
(3,4.30)
(5,6)
I need to consider the time gaps in between
What I need:
(2,2.30)
(2.30,3)
(3,4.30)
(4.30,5)
(5,6)
How can I achieve this in SQL.
I am using vertica database.

You'll basically need to use LEAD to generate the rows in between. There is also time series for GFI but it doesn't really fit this problem too well honestly because you are dealing with ranges of time, not static slices.
CREATE TABLE mytable (StartTime interval,EndTime interval)
INSERT into mytable VALUES ( INTERVAL '2 minutes', INTERVAL '2.5 minutes');
INSERT into mytable VALUES ( INTERVAL '3 minutes', INTERVAL '4.5 minutes');
INSERT into mytable VALUES ( INTERVAL '5 minutes', INTERVAL '6 minutes');
SELECT StartTime, EndTime
from mytable
union all
SELECT EndTime, lead (StartTime, 1) OVER (order by StartTime)
from mytable
order by 1
You'll end up with a record at the end indicating an open range which you can filter out if you don't want it. You can also add in another union to get a similar before time record as well if you want that.

Related

SQL Query to group dates and includes different dates in the aggregation

I have a table with two columns, dates and number of searches in each date. What I want to do is group by the dates, and find the sum of number of searches for each date.
The trick is that for each group, I also want to include the number of searches for the date exactly the following week, and the number of searches for the date exactly the previous week.
So If I have
Date
Searches
2/3/2023
2
2/10/2023
4
2/17/2023
1
2/24/2023
5
I want the output for the 2/10/2023 and 2/17/2023 groups to be
Date
Sum
2/10/2023
7
2/17/2023
10
How can I write a query for this?
You can use a correlated query for this:
select date, (
select sum(searches)
from t as x
where x.date between t.date - interval '7 day' and t.date + interval '7 day'
) as sum_win
from t
Replace interval 'x day' with the appropriate date add function for your RDBMS.
If your RDBMS supports interval in window functions then a much better solution would be:
select date, sum(searches) over (
order by date
range between interval '7 day' preceding and interval '7 day' following
) as sum_win
from t
Assuming weekly rows
CREATE TABLE Table1
([Dates] date, [Searches] int)
;
INSERT INTO Table1
([Dates], [Searches])
VALUES
('2023-02-03 00:00:00', 2),
('2023-02-10 00:00:00', 4),
('2023-02-17 00:00:00', 1),
('2023-02-24 00:00:00', 5)
;
;with cte as (
select dates
, searches
+ lead(searches) over(order by dates)
+ lag(searches) over(order by dates) as sum_searches
from table1)
select * from cte
where sum_searches is not null;
dates
sum_searches
2023-02-10
7
2023-02-17
10
fiddle

How to extract year, month or day from BIGINT epoch timestamp when inserting in Redshift

I would like to extract a date format like month, year or day from a BIGINT timestamp 1543258003796 in the Redshift SQL environment when inserting data into a table.
I have a table like this:
CREATE TABLE time_table (
month_date character varying(20),
year_date character varying(20),
);
Now I want to populate the table time_table with data from another table ts_table that has a column with timestamp as BIGINT type:
INSERT INTO time_table (month_date, year_date)
SELECT EXTRACT(month from ts.timestamp) as month_date,
EXTRACT(year from ts.timestamp) as year_date
FROM ts_table ts;
It raises an error because ts.timestamp is a BIGINT. Should I first cast the BIGINT into something else? Or is there another function to perform this action? I tried several things but I am still not able to find a solution.
I assume that these BIGINT dates are epoch dates. So you first need to convert this to a timestamp - for example like so:
select timestamp 'epoch' + t.timestamp * interval '1 second' AS timest
from ts_table t;
Now this isn't want you want but it gets you into a timestamp data type and opens up the useful functions available to you.
Step 2 is to EXTRACT the year and month from this. Putting these together you get:
WITH ts_conv as (
select timestamp 'epoch' + t.timestamp * interval '1 second' AS
timest
from ts_table t
)
SELECT EXTRACT(month from ts.timest) as month_date,
EXTRACT(year from ts.timest) as year_date
FROM ts_conv ts;
And this of course can be inside your INSERT statement.

How to use generate_series to get the sum of values in a weekly interval

I'm having trouble using generate_series in a weekly interval. I have two examples here, one is in a monthly interval and it is working as expected. It is returning each month and the sum of the facts.sends values. I'm trying to do the same exact thing in a weekly interval, but the values are not being added correctly.
Monthly interval (Working):
https://www.db-fiddle.com/f/a9SbDBpa9SMGxM3bk8fMAD/0
Weekly interval (Not working): https://www.db-fiddle.com/f/tNxRbCxvgwswoaN7esDk5w/2
You should generate a series that starts on Monday.
WITH range_values AS (
SELECT date_trunc('week', min(fact_date)) as minval,
date_trunc('week', max(fact_date)) as maxval
FROM facts),
week_range AS (
SELECT generate_series(date_trunc('week', '2022-05-01'::date), now(), '1 week') as week
FROM range_values
),
grouped_facts AS (
SELECT date_trunc('week', fact_date) as week,
sends
FROM facts
WHERE
fact_date >= '2022-05-20'
)
SELECT week_range.week,
COALESCE(sum(sends)::integer, 0) AS total_sends
FROM week_range
LEFT OUTER JOIN grouped_facts on week_range.week = grouped_facts.week
GROUP BY 1
ORDER BY 1;
DB Fiddle.

SQL Aggregation Join and Subquery Optimisation

I am trying to get aggregate values by time periods of two relations (buys and uses) and join them so that I can get the results in one report and also draw a ratio on them. I am using PostgreSQL. The end report required is: dateTime, u.sum, b.sum, b.sum/u.sum
The following query works but scales very poorly with larger table sizes.
SELECT b2.datetime AS dateTime, b2.sum AS BUY_VOLUME, u1.sum AS USE_VOLUME,
CASE u1.sum
WHEN 0 THEN 0
ELSE (b2.sum / u1.sum)
END AS buyToUseRatio
FROM(
SELECT SUM(b.total / 100.0) AS sum, date_trunc('week', (b.datetime + INTERVAL '1 day')) - INTERVAL '1 day' as datetime
FROM buys AS b
WHERE
datetime > date_trunc('month', CURRENT_DATE) - INTERVAL '1 year'
GROUP BY datetime) AS b2
INNER JOIN (SELECT SUM(u.amount) / 100.00 AS sum, date_trunc('week', (u.datetime + INTERVAL '1 day')) - INTERVAL '1 day' AS datetime
FROM uses AS u
WHERE
datetime > date_trunc('month', CURRENT_DATE) - INTERVAL '1 year'
GROUP BY datetime) AS u1 ON b2.datetime = u1.datetime
ORDER BY b2.datetime ASC;
I was wondering if anyone could help me by providing an alternative query that would get the end result required and is faster to execute.
I appreciate any help on this :-) My junior level SQL is a little rusty and I can't think of another way of doing this without creating indexes. Thanks in advance.
At least, these indexes can help your query:
create index idx_buys_datetime on buys(datetime);
create index idx_uses_datetime on uses(datetime);
Your query seems fine. However, you could use full join (instead of inner) to have all rows, where at least one of your tables have data. You could even use generate_series() to always have 1 year of results, even when there is no data in either of your tables, but I'm not sure if that's what you need. Also, some other things can be written more easily; your query could look like this:
select dt, buy_volume, use_volume, buy_volume / nullif(use_volume, 0.0) buy_to_use_ratio
from (select sum(total / 100.0) buy_volume, date_trunc('week', (datetime + interval '1 day')) - interval '1 day' dt
from buys
where datetime > date_trunc('month', current_timestamp - interval '1 year')
group by 2) b
full join (select sum(amount) / 100.0 use_volume, date_trunc('week', (datetime + interval '1 day')) - interval '1 day' dt
from uses
where datetime > date_trunc('month', current_timestamp - interval '1 year')
group by 2) u using (dt)
order by 1
http://rextester.com/YVASV92568
So the answer depends on how large your tables are, but if it was me, I would create one or two new "summary" tables based on your query and make sure to keep them updated (run a batch job once a day to update them or once an hour with all the data that has changed recently).
Then, I would be able to query those tables and do so, much faster.
If however, your tables are very small, then just keep going the way you are and play around with indexes till you get some timing which is acceptable.

Difference of datetime column in SQL

I have a table of 20000 records. each Record has a datetime field. I want to select all records where gap between one record and subsequent record is more than one hour [condition to be applied on datetime field].
can any one give me the SQL command code for this purpose.
regards
KAM
ANSI SQL supports the lead() function. However, date/time functions vary by database. The following is the logic you want, although the exact syntax varies, depending on the database:
select t.*
from (select t.*,
lead(datetimefield) over (order by datetimefield) as next_datetimefield
from t
) t
where datetimefield + interval '1 hour' < next_datetimefield;
Note: In Teradata, the where would be:
where datetimefield + interval '1' hour < next_datetimefield;
This can also be done with a sub query, which should work on all DBMS. As gordon said, date/time functions are different in every one.
SELECT t.* FROM YourTable t
WHERE t.DateCol + interval '1 hour' < (SELECT min(s.DateCol) FROM YourTable s
WHERE t.ID = s.ID AND s.DateCol > t.DateCol)
You can replace this:
t.DateCol + interval '1 hour'
With one of this so it will work on almost every DBMS:
DATE_ADD( t.DateCol, INTERVAL 1 hour)
DATEADD(hour,1,t.DateCol)
Although Teradata doesn't support Standard SQL's LEAD it's easy to rewrite:
select tab.*,
min(ts) over (order by ts rows between 1 following and 1 following) as next_ts
from tab
qualify
ts < next_ts - interval '1' hour
If you don't need to show the next timestamp:
select *
from tab
qualify
ts < min(ts) over (order by ts rows between 1 following and 1 following) - interval '1' hour
QUALIFY is a Teradata extension, but really nice to have, similar to HAVING after GROUP BY