Query with Interval in Oracle (using JFreeChart) - sql

I am assembling a query to show a experiment in JFreeChart.The Query works fine, but not the displaying in the JFreeChart.Its assembly the intervals as String (interval like 60bigger than TAsmaller than120 is last in the chart, should be second).I will put an example using five intervals of 60minutes each (TA is a numeric field and means Time Average):
SELECT INTERVAL, COUNT(*) TOTAL FROM (SELECT CASE WHEN TA>0 AND TA<=60.00 THEN '0<TA<=60.00' WHEN TA>60.00 AND TA<=120.00 THEN '60.00<TA<=120.00' WHEN TA>120.00 AND TA<=180.00 THEN '120.00<TA<=180.00' WHEN TA>180.00 AND TA<=240.00 THEN '180.00<TA<=240.00' WHEN TA>240.00 THEN '240.00<TA' END INTERVAL, TA FROM MP) GROUP BY INTERVAL HAVING INTERVAL IS NOT NULL ORDER BY INTERVAL
How can i do that to display correctely the intervals without destroying/damaging much my query, because it will be assembled on-the-fly depending of user choice.

If the INTERVAL column will always start with a valid number followed by <, you can convert its first value to a numeric for sorting:
SELECT INTERVAL, COUNT(*) TOTAL
FROM (
SELECT
CASE
WHEN TA>0 AND TA<=60.00 THEN '0<TA<=60.00'
WHEN TA>60.00 AND TA<=120.00 THEN '60.00<TA<=120.00'
WHEN TA>120.00 AND TA<=180.00 THEN '120.00<TA<=180.00'
WHEN TA>180.00 AND TA<=240.00 THEN '180.00<TA<=240.00'
WHEN TA>240.00 THEN '240.00<TA'
END INTERVAL,
TA
FROM MP
)
WHERE INTERVAL IS NOT NULL
GROUP BY INTERVAL
ORDER BY TO_NUMBER(SUBSTR(INTERVAL, 1, INSTR(INTERVAL, '<') - 1));
Also, I've changed the HAVING Interval IS NOT NULL to WHERE Interval IS NOT NULL because HAVING is for aggregated values such as COUNT(*), not for grouping values like INTERVAL.
Addendum If the number of intervals will vary, the query below may work out better for you. It calculates the interval text just like the query above, but it can handle any number of intervals. The first CASE condition handles values outside the range (n<TA); the second condition handles values within the range (m<TA<=n).
I've pointed the values you'll need to set for each query.
SELECT Interval, COUNT(*) FROM (
SELECT
TA,
CASE
WHEN TA > Parm_IntSize * Parm_IntCount THEN
TO_CHAR(Parm_IntSize * Parm_IntCount) || '<TA'
ELSE
TO_CHAR(TRUNC(TA / Parm_IntSize) * Parm_IntSize)
|| '<='
|| TO_CHAR((TRUNC(TA / Parm_IntSize) + 1) * Parm_IntSize)
END AS Interval
FROM MP
CROSS JOIN (
SELECT
60 AS Parm_IntSize, -- Specify interval size here
4 AS Parm_IntCount -- Specify number of intervals here
FROM DUAL
) Parms
)
WHERE Interval IS NOT NULL
GROUP BY Interval
ORDER BY TO_NUMBER(SUBSTR(Interval, 1, INSTR(Interval, '<') - 1))

Related

How to compute window function for each nth row in Presto?

I am working with a table that contains timeseries data, with a row for each minute for each user.
I want to compute some aggregate functions on a rolling window of N calendar days.
This is achieved via
SELECT
SOME_AGGREGATE_FUN(col) OVER (
PARTITION BY user_id
ORDER BY timestamp
ROWS BETWEEN (60 * 24 * N) PRECEDING AND CURRENT ROW
) as my_col
FROM my_table
However, I am only interested in the result of this at a daily scale.
i.e. I want the window to be computed only at 00:00:00, but I want the window itself to contain all the minute-by-minute data to be passed into my aggregate function.
Right now I am doing this:
WITH agg_results AS (
SELECT
SOME_AGGREGATE_FUN(col) OVER (
PARTITION BY user_id
ORDER BY timestamp_col
ROWS BETWEEN (60 * 24 * N) PRECEDING AND CURRENT ROW
)
FROM my_table
)
SELECT * FROM agg_results
WHERE
timestamp_col = DATE_TRUNC('day', "timestamp_col")
This works in theory, but it does 60 * 24 more computations that necessary, resulting in the query being super slow.
Essentially, I am trying to find a way to make the right window bound skip rows based on a condition. Or, if it is simpler to implement, for every nth row (as I have a constant number of rows for each day).
I don't think that's possible with window functions. You could switch to a subquery instead, assuming that your aggregate function works as a regular aggregate function too (that is, without an OVER() clause):
select
timestamp_col,
(
select some_aggregate_fun(t1.col)
from my_table t1
where
t1.user_id = t.user_id
and t1.timestamp_col >= t.timestamp_col - interval '1' day
and t1.timestamp_col <= t.timestamp_col
)
from my_table t
where timestamp_col = date_trunc('day', timestamp_col)
I am unsure that this would perform better than your original query though; you might need to assess that against your actual dataset.
You can change interval '1' day to the actual interval you want to use.

examine if one time series column of table has two adjacent time points which have interval larger than certain length

I am dealing with data preprocessing on a table containing time series column
toy example Table A
timestamp value
12:30:24 1
12:32:21 3
12:33:21 4
timestamp is ordered and always go incrementally
Is that possible to define an function or something else to return "True expression" when table has two adjacent time points which have interval larger than certain length and return "False" otherwise?
I am using postgresql, thank you
SQL Fiddle
select bool_or(bigger_than) as bigger_than
from (
select
time - lag(time) over (order by time)
>
interval '1 minute' as bigger_than
from table_a
) s;
bigger_than
-------------
t
bool_or will stop searching as soon as it finds the first true value.
http://www.postgresql.org/docs/current/static/functions-aggregate.html
Your sample data shows a time value. But it works the same for a timestamp
Something like this:
select count(*) > 0
from (
select timestamp,
lag(timestamp) over (order by value) as prev_ts
from table_a
) t
where timestamp - prev_ts < interval '1' minute;
It calculates the difference between a timestamp and it's "previous" timestamp. The order of the timestamps is defined by the value column. The outer query then counts the number of rows where the difference is smaller than 1 minute.
lag() is called a window functions. More details on those can be found in the manual:
http://www.postgresql.org/docs/current/static/tutorial-window.html

calculating average with grouping based on time intervals

In a postgres table I have store the speed of an object with a 10 seconds interval. The values are not available for every 10 seconds during the day; so it could be that there is no line for today 16:39:40
How would the query look like to get an relation containing the average of the speed for 1 minute (or 30sec or n-sec) intervals for a given day, assuming the non-existing rows mean a speed of 0.
speed_table
id (int, pk)
ts (timestamp)
speed (numeric)
I've built this query but am getting stuck on some important parts:
SELECT
date_trunc('minute', ts) AS truncated,
avg(speed)
FROM speed_table AS t
WHERE ts >= '2014-06-21 00:00:00'
AND ts <= '2014-06-21 23:59:59'
AND condition2 = 'something'
GROUP BY date_trunc('minute', ts)
ORDER BY truncated
How can I alter the interval in something other then the result of the date_trunc function eg 5 minutes of 30 seconds?
How can I add the not available rows for the remaining of the day?
Simple and fast solution for this particular example:
SELECT date_trunc('minute', ts) AS minute
, sum(speed)/6 AS avg_speed
FROM speed_table AS t
WHERE ts >= '2014-06-21 0:0'
AND ts < '2014-06-20 0:0' -- exclude dangling corner case
AND condition2 = 'something'
GROUP BY 1
ORDER BY 1;
You need to factor in missing rows as "0 speed". Since a minute has 6 samples, just sum and divide by 6. Missing rows evaluate to 0 implicitly.
This returns no row for minutes with no rows at all.avg_speed for missing result rows is 0.
General query for arbitrary intervals
Works for all any interval listed in the manual for date_trunc():
SELECT date_trunc('minute', g.ts) AS ts_start
, avg(COALESCE(speed, 0)) AS avg_speed
FROM (SELECT generate_series('2014-06-21 0:0'::timestamp
, '2014-06-22 0:0'::timestamp
, '10 sec'::interval) AS ts) g
LEFT JOIN speed_table t USING (ts)
WHERE (t.condition2 = 'something' OR
t.condition2 IS NULL) -- depends on actual condition!
AND g.ts <> '2014-06-22 0:0'::timestamp -- exclude dangling corner case
GROUP BY 1
ORDER BY 1;
The problematic part is the additional unknown condition. You would need to define that. And decide whether missing rows supplied by generate_series should pass the test or not (which can be tricky!).
I let them pass in my example (and all other rows with a NULL values).
Compare:
PostgreSQL: running count of rows for a query 'by minute'
Arbitrary intervals:
Truncate timestamp to arbitrary intervals
For completely arbitrary intervals consider #Clodoaldo's math based on epoch values or use the often overlooked function width_bucket(). Example:
Aggregating (x,y) coordinate point clouds in PostgreSQL
Aggregating (x,y) coordinate point clouds in PostgreSQL
If you had issued some data it would be possible to test so this can contain errors. Point them including the error message so I can fix.
select
to_timestamp(
(extract(epoch from ts)::integer / (60 * 2)) * (60 * 2)
) as truncated,
avg(coalesce(speed, 0)) as avg_speed
from
generate_series (
'2014-06-21 00:00:00'::timestamp,
'2014-06-22'::timestamp - interval '1 second',
'10 seconds'
) ts (ts)
left join
speed_table t on ts.ts = t.ts and condition2 = 'something'
group by 1
order by 1
The example is grouped by 30 seconds. It is number of seconds since 1970-01-01 00:00:00 (epoch) divided by 120. When you want to group by 5 minutes divide it by 12 (60 / 5).
The generate_series in the example is generating timestamps at 1 second interval. It is left outer joined to the speed table so it fills the gaps. When the speed is null then coalesce returns 0.

LEFT JOIN using subquery not returning null results as "0"

I'm writing a query that searches a database of parking tickets and counts how many have been issued on each quarter hour.
Two moving parts here. The subquery generates 15-minute increment timestamps using generate_series and stores it as "timestep." The outer query then joins using another timestamp statement that rounds everything to the lowest quarter hour.
Right now, it's not returning null results as "0," which is what I'm trying to accomplish by the join.
Essentially, I'm looking for output like this:
12:00: 0
12:15: 10
12:45: 5
13:00: 0
...and so on.
SELECT count(*), timer.timestep
FROM violations
left join (SELECT (hourstep || ':' || minutestep)::time AS timestep
from generate_series(0,23) AS hourstep,
generate_series(0,59, 15) AS minutestep)
AS timer
ON timer.timestep =
(extract(hour from violations."InfractionTime")
|| ':' ||
((extract(minute FROM violations."InfractionTime")::int / 15)*15))::time
GROUP BY timer.timestep
I think the problem is that you had the VIOLATIONS table as the left-side table, so it was the basis of joining to your time step result set. If no record in violations, it didn't care if there was an entry in your time step result.
I reversed so your LEFT-SIDE table was that of the time-step so you ALWAYS get all 15 minute intervals... then LEFT JOIN to the violations. If no records in violations, it should keep the time slot, but have zero as you are looking for.
SELECT
timer.timestep,
count(*)
FROM
( SELECT
(hourstep || ':' || minutestep)::time AS timestep
from
generate_series(0,23) AS hourstep,
generate_series(0,59, 15) AS minutestep) AS timer
LEFT JOIN violations
ON timer.timestep =
(extract(hour from violations."InfractionTime")
|| ':' ||
((extract(minute FROM violations."InfractionTime")::int / 15)*15))::time
group by
timer.timestep
The switcheroo with your LEFT JOIN has been cleared up already. In addition I would suggest this simper and faster query:
SELECT step::time AS timestep, count(*) AS ct
FROM generate_series('2000-1-1 00:00'::timestamp
, '2000-1-1 23:45'::timestamp
, '15 min'::interval ) AS t(step)
LEFT JOIN violations v
ON v."InfractionTime"::time >= t.step::time
AND v."InfractionTime"::time < t.step::time + interval '15 min'
GROUP BY 1
ORDER BY 1
This uses the second form of generate_series() that works with timestamps directly. Not with time, though, so I take a staging day and cast the result to time (very cheap!)
The rewritten query uses sargable expressions for the join, therefore, a plain index on "InfractionTime" can be utilized to great effect - if you query a smaller sample of the table. As long as you query the whole table, Postgres will use a sequential scan anyways.

Group by data intervals

I have a single table which stores bandwidth usage on the network over a period of time. One column will contain the date time (primary key) and another column will record the bandwidth. Data is recorded every minute. We will have other columns recording other data at that moment in time.
If the user requests the data on 15 minute intervals (within a 24 hour period given start and end date), is it possible with a single query to get the data I require or would I have to write a stored procedure/cursor to do this? Users may then request 5 minute intervals data etc.
I will most likely be using Postgres but are there other NOSQL options which would be better?
Any ideas?
WITH t AS (
SELECT ts, (random()*100)::int AS bandwidth
FROM generate_series('2012-09-01', '2012-09-04', '1 minute'::interval) ts
)
SELECT date_trunc('hour', ts) AS hour_stump
,(extract(minute FROM ts)::int / 15) AS min15_slot
,count(*) AS rows_in_timeslice -- optional
,sum(bandwidth) AS sum_bandwidth
FROM t
WHERE ts >= '2012-09-02 00:00:00+02'::timestamptz -- user's time range
AND ts < '2012-09-03 00:00:00+02'::timestamptz -- careful with borders
GROUP BY 1, 2
ORDER BY 1, 2;
The CTE t provides data like your table might hold: one timestamp ts per minute with a bandwidth number. (You don't need that part, you work with your table instead.)
Here is a very similar solution for a very similar question - with detailed explanation how this particular aggregation works:
date_trunc 5 minute interval in PostgreSQL
Here is a similar solution for a similar question concerning running sums - with detailed explanation and links for the various functions used:
PostgreSQL: running count of rows for a query 'by minute'
Additional question in comment
WITH -- same as above ...
SELECT DISTINCT ON (1,2)
date_trunc('hour', ts) AS hour_stump
,(extract(minute FROM ts)::int / 15) AS min15_slot
,bandwidth AS bandwith_sample_at_min15
FROM t
WHERE ts >= '2012-09-02 00:00:00+02'::timestamptz
AND ts < '2012-09-03 00:00:00+02'::timestamptz
ORDER BY 1, 2, ts DESC;
Retrieves one un-aggregated sample per 15 minute interval - from the last available row in the window. This will be the 15th minute if the row is not missing. Crucial parts are DISTINCT ON and ORDER BY.
More information about the used technique here:
Select first row in each GROUP BY group?
select
date_trunc('hour', d) +
(((extract(minute from d)::integer / 5 * 5)::text) || ' minute')::interval
as "from",
date_trunc('hour', d) +
((((extract(minute from d)::integer / 5 + 1) * 5)::text) || ' minute')::interval
- '1 second'::interval
as "to",
sum(random() * 1000) as bandwidth
from
generate_series('2012-01-01', '2012-01-31', '1 minute'::interval) s(d)
group by 1, 2
order by 1, 2
;
That for 5 minutes ranges. For 15 minutes divide by 15.