Difference between Timestamp is 15 minutes - sql

This is the CREATED_TIME 2012-07-17 00:00:22 and this is the Corresponding Timestamp 1342508427000. Here timestamp is 5 seconds more than the CREATED_TIME. I need to see below scenario
Currently I have a query, in which I am joining on created_time and timestamp like this-
ON (UNIX_TIMESTAMP(testingtable1.created_time) = (prod_and_ts.timestamps / 1000))
So in above case, it will not match as timestamp is 5 seconds more than created_time. But I need if the difference between either of the two is within 15 minutes then I need to match it.
So I wrote the below JOIN query- I am not sure whether this is the right way to do it or not?
ON ((UNIX_TIMESTAMP(testingtable1.created_time) - (prod_and_ts.timestamps / 1000)) / 60* 1000 <= 15)
How I can do the above case if difference between timestamps is within 15 minutes then data will get matched by the above ON clause

I'd prefer the (specifically designed for this purpose!) date and time functions instead of doing al these kinds of calculations with timestamps. You wouldn't believe how much trouble this can cause. Make sure you read and understand this and this

Related

SQL: a time-series variant of the "every nth row" problem

I have a table of time-series data, with the columns:
sensor_number (integer primary key)
signal_strength (integer)
signal_time (timestamp)
Each sensor creates 20-30 rows per minute. I need a query that returns for a sensor 1 row per minute (or every 2 minutes, 3 minutes, etc). A pure SQL approach is to use a window function, with a partition on an expression that rounds the timestamp appropriately (date_trunc() works for the 1-minute case, otherwise I have to some messy casting) The problem is the expression blocks the ability to use the index. With 5B rows, that's a killer.
The best alternative I can come up with is a user-defined function that uses a cursor to step through the table in index key order (sensor_number, signal_time) and outputting a row every time the timestamp crosses a minute boundary. That's still slow though. Is there a pure SQL approach that'll accomplish this AND utilize the index?
I think if you're returning enough rows, scanning the whole range of rows that match the sensor_number will just be the best plan. The signal_time portion of the index may simply not be helpful at that point, because the database needs to read so many rows anyway.
However, if your time interval is big enough / the number of rows you're returning is small enough, it might be more efficient to hit the index separately for each row you're returning. Something like this (using an interval of 3 minutes and a sensor number of 5 as an example):
WITH range AS (
SELECT
max(signal_time) as max_time,
min(signal_time) as min_time
FROM timeseries
WHERE sensor_number = 5
)
SELECT sample.*
FROM range
JOIN generate_series(min_time, max_time, interval '3 minutes') timestamp ON true
JOIN LATERAL (
SELECT *
FROM timeseries
WHERE sensor_number = 5
AND signal_time >= timestamp
AND signal_time < timestamp + interval '3 minutes'
LIMIT 1
) sample ON true;

extracting HOUR from an interval in spark sql

I was wondering how to properly extract amount of hours between given 2 timestamps objects.
For instance, when the following SQL query gets executed:
select x, extract(HOUR FROM x) as result
from
(select (TIMESTAMP'2021-01-22T05:00:00' - TIMESTAMP'2021-01-01T09:00:00') as x)
The result value is 20, while I'd expect it to be 500.
It seems odd to me considering that x value indicates the expected return value.
Can anyone please explain to me what I'm doing wrong and perhaps suggest additional way of query so the desired result would return?
Thanks in advance!
I think you have to do the maths with this one as datediff in SparkSQL only supports days. This worked for me:
SELECT (unix_timestamp(to_timestamp('2021-01-22T05:00:00') ) - unix_timestamp(to_timestamp('2021-01-01T09:00:00'))) / 60 / 60 diffInHours
My results (in Synapse Notebook, not Databricks but I expect it to be the same):
The unix_timestamp function converts the timestamp to a Unix timestamp (in seconds) and then you can apply date math to it. Subtracting them gives the number of seconds between the two timestamps. Divide by 60 for the number minutes between the two dates and by 60 again for the number of hours between the two dates.

SQL Server adding two time columns in a single table and putting result into a third column

I have a table containing two time columns like this:
Time1 Time2
07:34:33 08:22:44
I want to add the time in both these columns and put the result of addition into a third column may be Time3
Any help would be appreciated..Thanks
If the value you expect as the result is 15:57:17 then you can get it by calculating for instance the number of seconds from midnight for Time1 and add that value to Time2:
select dateadd(second,datediff(second,0,time1),time2) as Time3
from your_table
I'm not sure how meaningful adding two discrete time values together is though, unless they are meant to represent duration in which case the time datatype might not be the best as it is meant for time of day data and only has a range of 00:00:00.0000000 through 23:59:59.9999999 and an addition could overflow (and hence wrap around).
If the result you want isn't 15:57:17 then you should clarify the question and add the desired output.
The engine doesn't understand addition of two time values, because it thinks you can't add two times of day. You get:
Msg 8117, Level 16, State 1, Line 8
Operand data type time is invalid for add operator.
If these are elapsed times, not times of day, you could take them apart with DATEPART, but in SQL Server 2008 you will have to use a CONVERT to put the value back together, plus have all the gymnastics to do base 60 addition.
If you have the option, it would be best to store the time values as NUMERIC with a positive scale, where the unit of measure is hours, and then break them down when finally reporting them. Something like this:
DECLARE
#r NUMERIC(7, 5);
SET #r = 8.856;
SELECT FLOOR(#r) AS Hours, FLOOR(60 * (#r - FLOOR(#r))) AS Minutes, 60 * ((60 * #r) - FLOOR(60 * #r)) AS Seconds
Returns
Hours Minutes Seconds
8 51 21.60000
There is an advantage to writing a user-defined function to do this, to eliminate the repeated 60 * #r calculations.

sqlalchemy select by date column only x newset days

suppose I have a table MyTable with a column some_date (date type of course) and I want to select the newest 3 months data (or x days).
What is the best way to achieve this?
Please notice that the date should not be measured from today but rather from the date range in the table (which might be older then today)
I need to find the maximum date and compare it to each row - if the difference is less than x days, return it.
All of this should be done with sqlalchemy and without loading the entire table.
What is the best way of doing it? must I have a subquery to find the maximum date? How do I select last X days?
Any help is appreciated.
EDIT:
The following query works in Oracle but seems inefficient (is max calculated for each row?) and I don't think that it'll work for all dialects:
select * from my_table where (select max(some_date) from my_table) - some_date < 10
You can do this in a single query and without resorting to creating datediff.
Here is an example I used for getting everything in the past day:
one_day = timedelta(hours=24)
one_day_ago = datetime.now() - one_day
Message.query.filter(Message.created > one_day_ago).all()
You can adapt the timedelta to whatever time range you are interested in.
UPDATE
Upon re-reading your question it looks like I failed to take into account the fact that you want to compare two dates which are in the database rather than today's day. I'm pretty sure that this sort of behavior is going to be database specific. In Postgres, you can use straightforward arithmetic.
Operations with DATEs
1. The difference between two DATES is always an INTEGER, representing the number of DAYS difference
DATE '1999-12-30' - DATE '1999-12-11' = INTEGER 19
You may add or subtract an INTEGER to a DATE to produce another DATE
DATE '1999-12-11' + INTEGER 19 = DATE '1999-12-30'
You're probably using timestamps if you are storing dates in postgres. Doing math with timestamps produces an interval object. Sqlalachemy works with timedeltas as a representation of intervals. So you could do something like:
one_day = timedelta(hours=24)
Model.query.join(ModelB, Model.created - ModelB.created < interval)
I haven't tested this exactly, but I've done things like this and they have worked.
I ended up doing two selects - one to get the max date and another to get the data
using the datediff recipe from this thread I added a datediff function and using the query q = session.query(MyTable).filter(datediff(max_date, some_date) < 10)
I still don't think this is the best way, but untill someone proves me wrong, it will have to do...

SQL Select statement Where time is *:00

I'm attempting to make a filtered table based off an existing table. The current table has rows for every minute of every hour of 24 days based off of locations (tmcs).
I want to filter this table into another table that has rows for just 1 an hour for each of the 24 days based off the locations (tmcs)
Here is the sql statement that i thought would have done it...
SELECT
Time_Format(t.time, '%H:00') as time, ROUND(AVG(t.avg), 0) as avg,
tmc, Date, Date_Time FROM traffic t
GROUP BY time, tmc, Date
The problem is i still get 247,000 rows effected...and according to simple math I should only have:
Locations (TMCS): 14
Hours in a day: 24
Days tracked: 24
Total = 14 * 24 * 24 = 12,096
My original table has 477,277 rows
When I make a new table off this query i get right around 247,000 which makes no sense, so my query must be wrong.
The reason I did this method instead of a where clause is because I wanted to find the average speed(avg)per hour. This is not mandatory so I'd be fine with using a Where clause for time, but I just don't know how to do this based off *:00
Any help would be much appreciated
Fix the GROUP BY so it's standard, rather then the random MySQL extension
SELECT
Time_Format(t.time, '%H:00') as time,
ROUND(AVG(t.avg), 0) as avg,
tmc, Date, Date_Time
FROM traffic t
GROUP BY
Time_Format(t.time, '%H:00'), tmc, Date, Date_Time
Run this with SET SESSION sql_mode = 'ONLY_FULL_GROUP_BY'; to see the errors that other RDBMS will give you and make MySQL work properly