Difference of datetime column in SQL - sql

I have a table of 20000 records. each Record has a datetime field. I want to select all records where gap between one record and subsequent record is more than one hour [condition to be applied on datetime field].
can any one give me the SQL command code for this purpose.
regards
KAM

ANSI SQL supports the lead() function. However, date/time functions vary by database. The following is the logic you want, although the exact syntax varies, depending on the database:
select t.*
from (select t.*,
lead(datetimefield) over (order by datetimefield) as next_datetimefield
from t
) t
where datetimefield + interval '1 hour' < next_datetimefield;
Note: In Teradata, the where would be:
where datetimefield + interval '1' hour < next_datetimefield;

This can also be done with a sub query, which should work on all DBMS. As gordon said, date/time functions are different in every one.
SELECT t.* FROM YourTable t
WHERE t.DateCol + interval '1 hour' < (SELECT min(s.DateCol) FROM YourTable s
WHERE t.ID = s.ID AND s.DateCol > t.DateCol)
You can replace this:
t.DateCol + interval '1 hour'
With one of this so it will work on almost every DBMS:
DATE_ADD( t.DateCol, INTERVAL 1 hour)
DATEADD(hour,1,t.DateCol)

Although Teradata doesn't support Standard SQL's LEAD it's easy to rewrite:
select tab.*,
min(ts) over (order by ts rows between 1 following and 1 following) as next_ts
from tab
qualify
ts < next_ts - interval '1' hour
If you don't need to show the next timestamp:
select *
from tab
qualify
ts < min(ts) over (order by ts rows between 1 following and 1 following) - interval '1' hour
QUALIFY is a Teradata extension, but really nice to have, similar to HAVING after GROUP BY

Related

Postgres Interval not working in where subquery

I have the following code:
Select * from table
where to_date <= ( select max(to_date)
FROM table)
and to_date >= (select (max(to_date)::date - interval '6 months')::date as to_date
FROM table)
Basically, I am trying to look at all the results between the max date and then 6 months in the past, and I tried doing that by making 2 sub queries.
I seem to get null, but oddly enough, if add the regular date that
(select (max(to_date)::date - interval '6 months')::date
is giving and paste it as >='yyyy-mm-dd', the query seems to be working fine. It is weird as both sub-queries are actually spitting out date format results and have no idea why its giving this.
You don't need both comparison:
select *
from table
where to_date >= (select (max(to_date)::date - interval '6 months')::date as to_date
from table
) ;
This is assuming that the table reference is the same in both the inner and outer query.
I can't really think of a reason why this wouldn't work, but you can rewrite the query to only run a single sub-query, which is also more efficient:
select t.*
from the_table t
cross join (
select max(the_date) as max_date
from the_table
) mt
where t.to_date <= mt.max_date
and t.to_date >= mt.max_date - interval '6 months'

Syntax error in redshift sql query with subqueries

I'm quite new in SQL in general and haven't deal with redshift before. I'm trying to make one query, which works perfectly in postgresql. But I get syntax error in redshift. The query is:
SELECT
test.table_1.user_id as user_id,
test.table_1.timestamp as start_session,
test.table_1.step_3 :: timestamp + interval '1 hour' as end_session,
test.table_1.step_3 :: timestamp + interval '1 hour' - test.table_1.timestamp :: timestamp as session_duration
FROM (SELECT *,
min(case when page = 'second_page' then timestamp end) OVER (partition by user_id order by timestamp desc rows between unbounded preceding and unbounded following) as step_2,
min(case when page = 'third_page' then timestamp end) OVER (partition by user_id order by timestamp desc rows between unbounded preceding and unbounded following) as step_3
FROM test.table_1) test.table_1
WHERE
test.table_1.page = 'first_page' AND
step_2 > test.table_1.timestamp AND
step_3 > step_2 AND
step_3 :: timestamp - step_2 :: timestamp < '1 hour' AND
step_2 :: timestamp - test.table_1.timestamp :: timestamp < '1 hour'
ORDER BY
user_id,start_session
The error is Error running query: syntax error at or near "." LINE 11: FROM test.vimbox_pages) test.vimbox_pages ^ in line FROM test.table_1) test.table_1
I don't understand what's wrong there.
By this query I'm trying to get session list of users actions during reading pages in some order.
Will be thankful for any help!
Aliases are identifiers and need to follow the rules for identifiers. You can also simplify your query in other ways:
SELECT t.user_id, t.timestamp as start_session,
(t.step_3::timestamp + interval '1 hour' as end_session),
(t.step_3::timestamp + interval '1 hour' - t.timestamp::timestamp) as session_duration
FROM (SELECT t.*,
MIN(CASE WHEN page = 'second_page' THEN timestamp END) OVER (PARTITION BY user_id) as step_2,
MIN(CASE WHEN page = 'third_page' THEN timestamp END) OVER (partition by user_id) as step_3
FROM test.table_1 t
) t
WHERE t.page = 'first_page' AND
step_2 > t.timestamp AND
step_3 > step_2 AND
step_3::timestamp < step_2::timestamp + interval '1 hour' AND
step_2::timestamp < timestamp + interval '1 hour'
ORDER BY user_id, start_session;
Notes:
Your windowing clause is unnecessarily complex. No ORDER BY is necessary if you want the entire window range.
The conversions to timestamp should be unnecessary, given the names of the columns. But I have left them in.
t.user_id as user_id is redundant. The column name is going to be user_id anyway.
I don't ever see spaces around ::. Of course they are allowed, but the type conversion has very high precedence and is typically written without spaces.
I prefer timestamp comparisons to timestamps, rather than converting to intervals. Strange things can happen with intervals.

How to make two weeks date_trunk in SQL (Vertica)?

I need to turn each timestamp to its date_trunk with two weeks interval. Say, same as date_trunk('week', event_time), but it would be date_trunk('2 weeks', event_time). So I'd have a timestamp column, and its two-weeks date_trunk column as following.
I tried going with date_trunc('week', event_time) + '1 week'::interval or date_trunc('week', event_time) +7 but it just makes an offset from my event_date.
Does anyone know how to fix it?
This answer assumes that ISO week #1 and #2 should map to 2-week #1, weeks 3 and 4 map to 2-week #2 etc. We can try using floor and division here:
SELECT
event_time,
FLOOR((WEEK(event_time) + 1) / 2) AS two_week
FROM yourTable;
You can use case expression:
select (case when mod(week(event_time), 1)
then date_trunc('week', event_time)
else date_trunc('week', event_time) - interval '1 week'
end)
So my final code actually is (with a help of #Tim Biegeleisen):
select distinct
event_date
, min(event_date) over (partition by two_week) as two_week
from (
select distinct
event_time::date as event_date
, FLOOR(week((DATE_TRUNC('week', event_time) - 1)) / 2) AS two_week
from MyTable
Try one of my favourite Vertica-specific functions on date / time: TIME_SLICE() .
SELECT
dt
, TIME_SLICE(dt,24*7*2,'HOUR') AS hebdo
FROM dt;
The biggest unit is the hour, but you can multiply it ....

SQL Aggregation Join and Subquery Optimisation

I am trying to get aggregate values by time periods of two relations (buys and uses) and join them so that I can get the results in one report and also draw a ratio on them. I am using PostgreSQL. The end report required is: dateTime, u.sum, b.sum, b.sum/u.sum
The following query works but scales very poorly with larger table sizes.
SELECT b2.datetime AS dateTime, b2.sum AS BUY_VOLUME, u1.sum AS USE_VOLUME,
CASE u1.sum
WHEN 0 THEN 0
ELSE (b2.sum / u1.sum)
END AS buyToUseRatio
FROM(
SELECT SUM(b.total / 100.0) AS sum, date_trunc('week', (b.datetime + INTERVAL '1 day')) - INTERVAL '1 day' as datetime
FROM buys AS b
WHERE
datetime > date_trunc('month', CURRENT_DATE) - INTERVAL '1 year'
GROUP BY datetime) AS b2
INNER JOIN (SELECT SUM(u.amount) / 100.00 AS sum, date_trunc('week', (u.datetime + INTERVAL '1 day')) - INTERVAL '1 day' AS datetime
FROM uses AS u
WHERE
datetime > date_trunc('month', CURRENT_DATE) - INTERVAL '1 year'
GROUP BY datetime) AS u1 ON b2.datetime = u1.datetime
ORDER BY b2.datetime ASC;
I was wondering if anyone could help me by providing an alternative query that would get the end result required and is faster to execute.
I appreciate any help on this :-) My junior level SQL is a little rusty and I can't think of another way of doing this without creating indexes. Thanks in advance.
At least, these indexes can help your query:
create index idx_buys_datetime on buys(datetime);
create index idx_uses_datetime on uses(datetime);
Your query seems fine. However, you could use full join (instead of inner) to have all rows, where at least one of your tables have data. You could even use generate_series() to always have 1 year of results, even when there is no data in either of your tables, but I'm not sure if that's what you need. Also, some other things can be written more easily; your query could look like this:
select dt, buy_volume, use_volume, buy_volume / nullif(use_volume, 0.0) buy_to_use_ratio
from (select sum(total / 100.0) buy_volume, date_trunc('week', (datetime + interval '1 day')) - interval '1 day' dt
from buys
where datetime > date_trunc('month', current_timestamp - interval '1 year')
group by 2) b
full join (select sum(amount) / 100.0 use_volume, date_trunc('week', (datetime + interval '1 day')) - interval '1 day' dt
from uses
where datetime > date_trunc('month', current_timestamp - interval '1 year')
group by 2) u using (dt)
order by 1
http://rextester.com/YVASV92568
So the answer depends on how large your tables are, but if it was me, I would create one or two new "summary" tables based on your query and make sure to keep them updated (run a batch job once a day to update them or once an hour with all the data that has changed recently).
Then, I would be able to query those tables and do so, much faster.
If however, your tables are very small, then just keep going the way you are and play around with indexes till you get some timing which is acceptable.

SQL date check within set number of days

Im trying to find out how many clients viewed a property within 14 days of May 20, 2004, either before or after. Not really sure at all how to go about this.
Im assuming i need to group it and use a having?
EDIT: I am using oracle now
select count(*)
from VIEWING
WHERE CLAUSE?
For a one time query with that specific date,
select count(*) clients
from yourtable
where yourdatefield >= {d'2004-05-06'}
and yourdatefield < {d'2004-06-08'}
You might want to consult a calendar to see if those dates are correct.
Edit #1, since you are using Oracle, you can use:
select count(*) TotalClients
from yourtable
where dt >= (to_date('2004-05-20', 'yyyy-mm-dd') - INTERVAL '14' DAY)
and dt <= (to_date('2004-05-20', 'yyyy-mm-dd') + INTERVAL '14' DAY)
See SQL Fiddle with Demo
Based on some of your previous questions you were using MySQL.
If you are using MySQL then you can use the DATE_ADD() function to get the date range and then use count(*) to return all records from those dates:
select count(*) TotalClients
from yourtable
where dt >= date_add(str_to_date('2004-05-20', '%Y-%m-%d'), INTERVAL -14 day)
and dt <= date_add(str_to_date('2004-05-20', '%Y-%m-%d'), INTERVAL 14 day)