TSQL adjustable time interval - sql

I have a TSQL query that is returning a list of variable names and their values at a point in time. Currently it is truncating the datetime column to give me a minute-by-minute result set.
It would be incredibly useful to me to be able to specify whatever interval of data I want. Every x seconds, every x minutes, or every x hours.
I cannot GROUP BY because I do not want to aggregate the selected values.
Here is my current query:
SELECT time, var_name, value
FROM (
SELECT time, var_name, value, ROW_NUMBER() over (partition by var_id, convert(varchar(16), time, 121) order by time desc) as seqnum
FROM var_values vv
JOIN var_names vn ON vn.id = vv.tag_id
WHERE ( var_id = 1 OR var_id = 2)
AND time >= '2013-06-04 00:00:00' AND time < '2013-06-04 16:20:17'
) k
WHERE seqnum = 1
ORDER BY time;
And the result set:
2013-06-04 00:20:52.847 Random.Boolean 0
2013-06-04 00:20:52.850 Random.Int1 76
2013-06-04 00:21:52.893 Random.Boolean 1
2013-06-04 00:21:52.897 Random.Int1 46
2013-06-04 00:22:52.920 Random.Boolean 1
2013-06-04 00:22:52.927 Random.Int1 120
Also just to be complete, I want to retain the ability to modify the WHERE clause to choose which var_id's I want in my result set.

You should be able to partition by the unix timestamp divided by your required interval in seconds;
(PARTITION BY var_id, DATEDIFF(SECOND,{d '1970-01-01'}, time) / 60 -- 60 seconds
ORDER BY TIME DESC) AS seqnum
The calculation will give the same result for 60 seconds, which will put all rows in the interval inside the same partition.

Related

Snowflake SQL Time Breakdown

I have a table with a timestamp for when an incident occurred and the downtime associated with that timestamp (in minutes). I want to break down this table by minute using Time_slice and show the minute associated with each slice. For example:
Time Duration
11:34 4.5
11:40 2
to:
time Duration
11:34 1
11:35 1
11:36 1
11:37 1
11:38 0.5
11:39 1
11:40 1
How can I accomplish this?
if you are fine with the same minute being listed many times if the input time + duration over lap, then you can do this.
WITH big_list_of_numbers AS (
SELECT
ROW_NUMBER() OVER (ORDER BY SEQ4())-1 as rn
FROM generator(ROWCOUNT => 1000)
)
SELECT
DATEADD('minute', r.rn, t.time) AS TIME
IFF(r.rn > t.duration, r.rn - t.duration, 1) AS duration
FROM table AS t
JOIN big_list_of_numbers AS r
ON t.duration < r.time
ORDER BY 1
if you want the total per minute you can put a grouping on it like:
WITH big_list_of_numbers AS (
SELECT
ROW_NUMBER() OVER (ORDER BY SEQ4()) as rn
FROM generator(ROWCOUNT => 1000)
)
SELECT
DATEADD('minute', r.rn, t.time) AS TIME
SUM(IFF(r.rn > t.duration, r.rn - t.duration, 1)) AS duration
FROM table AS t
JOIN big_list_of_numbers AS r
ON t.duration < r.time
GROUP BY 1
ORDER BY 1
The GENERATOR needs fixed input, so just use a huge number, it's not the expensive. Also SEQx() function can (and do) have gaps in them, so for data where you need continuous values (like this example) the SEQx() needs to be feed into a ROW_NUMBER() to force non-distributed allocation of numbers.

SQL Server Return 4 rows for every hour in a day

I have a query that returns all messages from a device within a day (simplified):
SELECT date, value
FROM Messages
WHERE date between '04/01/2018 00:00:00' AND '04/01/2018 23:59:59'
ORDER BY date asc
The problem is that it returns too many rows. For example, 1 row per minute minimum (1440 rows in a day), and I have to print that in a chart.
How could I return the first row in every quarter hour so I get 4 rows per every hour of the day?
Expected result:
date value
2018-01-04 05:00:00.000 || 5,52
2018-01-04 05:15:00.000 || 5,48
2018-01-04 05:30:00.000 || 5,35
2018-01-04 05:45:00.000 || 5,42
you can do it by a Modulus (%) like as follow:
SELECT date, value
FROM Messages
WHERE date between '04/01/2018 00:00:00' AND '04/01/2018 23:59:59' and (datepart(minute,date) % 15) = 0
ORDER BY date asc;
This query returns a data which contains a date row which minute completely divide with 15 (Quarter). I think this may solve your problem.
Note: I not used Seconds because of your data added per minute as per
your language in question.
In case you have more than one row in one minute or rows do not exactly match hour:minute pattern, you can use following:
SELECT * INTO tab FROM (VALUES
('2018-01-01 05:00:01', 1),
('2018-01-01 05:10', 2),
('2018-01-01 05:20', 3),
('2018-01-01 05:28', 4),
('2018-01-01 05:31', 5)
) T(Date,Value)
SELECT Date,Value FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY CAST(Date AS DATE),
DATEPART(HOUR,Date),
DATEPART(MINUTE,Date)/15
ORDER BY Date) RowNum FROM tab
) T WHERE RowNum=1
It returns:
Date Value
---- -----
2018-01-01 05:00:01 1
2018-01-01 05:20 3
2018-01-01 05:31 5
You could simply use a "like" condition:
and date like '%00:00.000' or date like '15:00.000' ...
Use a modulo function on the minute part of the date
select *
from mytable T1
where datepart(minute, T1.date)%15 = 0
I would start by getting a row_number partitioned by 15 minute intervals.
SELECT Truncdate, value FROM (
SELECT date
, value
, dateadd(minute, datediff(minute, 0, date) / 15 * 15, 0) AS TruncDate
, row_number() OVER (PARTITION BY dateadd(minute, datediff(minute, 0, date) / 15 * 15, 0) ORDER BY date) as RowNum
FROM messages
) x
WHERE x.rownum = 1
You can change Truncdate to date in the outer select if you want to see the actual first datetime in that 15 minute block instead of the block-rounded date.
e: I didn't actually read the question. This approach has the advantage that it will still get the first value in each block even if it occurs after the minute the block starts, which I now notice isn't a requirement for your solution.

Postgres where clause over two columns

Database - I am working on in Postgres 9.6.5
I am analyzing the data from US Airport Authority (RITA) about the flights arrival and departures.
This link (http://stat-computing.org/dataexpo/2009/the-data.html) lists all the columns in the table.
The table has following 29 columns
No Name Description
1 Year 1987-2008
2 Month 1-12
3 DayofMonth 1-31
4 DayOfWeek 1 (Monday) - 7 (Sunday)
5 DepTime actual departure time (local, hhmm)
6 CRSDepTime scheduled departure time (local, hhmm)
7 ArrTime actual arrival time (local, hhmm)
8 CRSArrTime scheduled arrival time (local, hhmm)
9 UniqueCarrier unique carrier code
10 FlightNum flight number
11 TailNum plane tail number
12 ActualElapsedTime in minutes
13 CRSElapsedTime in minutes
14 AirTime in minutes
15 ArrDelay arrival delay, in minutes
16 DepDelay departure delay, in minutes
17 Origin origin IATA airport code
18 Dest destination IATA airport code
19 Distance in miles
20 TaxiIn taxi in time, in minutes
21 TaxiOut taxi out time in minutes
22 Cancelled was the flight cancelled?
23 CancellationCode reason for cancellation (A = carrier, B = weather, C = NAS, D = security)
24 Diverted 1 = yes, 0 = no
25 CarrierDelay in minutes
26 WeatherDelay in minutes
27 NASDelay in minutes
28 SecurityDelay in minutes
29 LateAircraftDelay in minutes
There are about a million rows for each year.
I am trying to find out a count the most busy airports when delay is more than 15minutes.
column DepDelay - has the delay time.
origin - is the origin code for the airport.
All the data has been loaded into a table called 'ontime'
I am forming the query as follows in stages.
select airports where delay is more than 15 minutes
select origin,year,count(*) as depdelay_count from ontime
where
depdelay > 15
group by year,origin
order by depdelay_count desc
)
Now I wish to pull out only the top 10 airports per year - which I am doing as follows
select x.origin,x.year from (with subquery as (
select origin,year,count(*) as depdelay_count from ontime
where
depdelay > 15
group by year,origin
order by depdelay_count desc
)
select origin,year,rank() over (partition by year order by depdelay_count desc) as rank from subquery) x where x.rank <= 10;
Now that I have the top 10 airports by depdelay - I wish to get a count of the total flights out of these airports.
select origin,count() from ontime where origin in
(select x.origin from (with subquery as (
select origin,year,count() as depdelay_count from ontime
where
depdelay > 15
group by year,origin
order by depdelay_count desc
)
select origin,year,rank() over (partition by year order by depdelay_count desc) as rank from subquery) x where x.rank <= 2)
group by origin
order by origin;
If I modify the Step 3 query by adding the year in the year clause
---- will be any value from (1987 to 2008)
select origin,count(*) from ontime where year = (<YEAR>) origin in
(select x.origin from (with subquery as (
select origin,year,count(*) as depdelay_count from ontime
where
depdelay > 15
group by year,origin
order by depdelay_count desc
)
select origin,year,rank() over (partition by year order by depdelay_count desc) as rank from subquery) x where x.rank <= 2)
group by origin
order by origin;
But I have to do this manually for all years from 1987 to 2008 which I want to avoid.
Please can you help refine the query so that I can select the data for all the years without having to select each year manually.
I find CTEs int he middle of queries to e confusing. You can basically do this with one CTE/subquery:
with oy as (
select origin, year, count(*) as numflights,
sum( (depdelay > 15)::int ) as depdelay_count,
row_number() over (partition by year order by sum( (depdelay > 15)::int ) desc) as seqnum
from ontime
group by origin, year
)
select oy.*
from oy
where seqnum <= 10;
Note the use of conditional aggregation and using window functions with aggregation functions.

Query aggregated data with a given sampling time

Suppose my raw data is:
Timestamp High Low Volume
10:24.22345 100 99 10
10:24.23345 110 97 20
10:24.33455 97 89 40
10:25.33455 60 40 50
10:25.93455 40 20 60
With a sample time of 1 second, the output data should be as following (add additional column):
Timestamp High Low Volume Count
10:24 110 89 70 3
10:25 60 20 110 2
The sampling unit from varying from 1 second, 5 sec, 1 minute, 1 hour, 1 day, ...
How to query the sampled data in quick time in the PostgreSQL database with Rails?
I want to fill all the interval by getting the error
ERROR: JOIN/USING types bigint and timestamp without time zone cannot be matched
SQL
SELECT
t.high,
t.low
FROM
(
SELECT generate_series(
date_trunc('second', min(ticktime)) ,
date_trunc('second', max(ticktime)) ,
interval '1 sec'
) FROM czces AS g (time)
LEFT JOIN
(
SELECT
date_trunc('second', ticktime) AS time ,
max(last_price) OVER w AS high ,
min(last_price) OVER w AS low
FROM czces
WHERE product_type ='TA' AND contract_month = '2014-08-01 00:00:00'::TIMESTAMP
WINDOW w AS (
PARTITION BY date_trunc('second', ticktime)
ORDER BY ticktime ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
) t USING (time)
ORDER BY 1
) AS t ;
Simply use date_trunc() before you aggregate. Works for basic time units 1 second, 1 minute, 1 hour, 1 day - but not for 5 sec. Arbitrary intervals are slightly more complex, see link below!
SELECT date_trunc('second', timestamp) AS timestamp -- or minute ...
, max(high) AS high, min(low) AS low, sum(volume) AS vol, count(*) AS ct
FROM tbl
GROUP BY 1
ORDER BY 1;
If there are no rows for a sample point, you get no row in the result. If you need one row for every sample point:
SELECT g.timestamp, t.high, t.low, t.volume, t.ct
FROM (SELECT generate_series(date_trunc('second', min(timestamp))
,date_trunc('second', max(timestamp))
,interval '1 sec') AS g (timestamp) -- or minute ...
LEFT JOIN (
SELECT date_trunc('second', timestamp) AS timestamp -- or minute ...
, max(high) AS high, min(low) AS low, sum(volume) AS vol, count(*) AS ct
FROM tbl
GROUP BY 1
) t USING (timestamp)
ORDER BY 1;
The LEFT JOIN is essential.
For arbitrary intervals:
Best way to count records by arbitrary time intervals in Rails+Postgres
Retrieve aggregates for arbitrary time intervals
Aside: Don't use timestamp as column name. It's a basic type name and a reserved word in standard SQL. It's also misleading for data that's not actually a timestamp.

PostgreSQL sum of intervals

In my database I have rows like:
date , value
16:13:00, 500
16:17:00, 700
16:20:00, 0
Now I want to do "special sum" over value from 16:00:00 to 17:00:00. So until 16:13 we assume that we have 0.
So special sum would look like (I'll omit seconds):
...
0 + -- (16:12)
500 + -- (16:13)
500 + -- (16:14)
500 + -- (16:15)
500 + -- (16:16)
700 + -- (16:17)
700 + -- (16:18)
700 + -- (16:19)
0 + -- (16:20)
...
So I have in database only changes of value and when this change occurs. And I want to sum over the whole hour. Result of this should be 4100.
What is the optimal way of doing that kind of sum in sql with PostgreSQL?
Best
You could at first select only the hour of your timestamp and then group by this hour:
SELECT
sum(s.value),
s.hour
FROM
(SELECT
value,
EXTRACT(HOUR FROM time) as hour
FROM la_table) as s
GROUP BY s.hour
This way you would just get values from 15:00:00 to 15:59:59 of course.
SQLFiddle for playing: http://sqlfiddle.com/#!1/d6ad1/1
If I've understood you correctly, you are looking for simply totals per hour?
SELECT EXTRACT(hour FROM "date") hr,
SUM(value) total
FROM yourtable
GROUP BY hr
ORDER BY hr;
If I understand, you wish to select every entries with occurance within a time period. In this example any row with a time value within 10:00 and 11:00 is selected if it stars before the period or during the period or if it ends during the period of after the period.
select * from table
where table.start_time < end_of_period and table.end_time > end_of_period
select * from table
where (start_time < '2017-05-16 11:00:00') and (end_time > '2017-05-16 10:00:00')