How to SELECT fields dynamically (LOOP) in PLSQL? - sql

I'm at work and need to SUM specific value by hour. Lets say a shift is 10 hours, I want to be able to loop and do the following pseudo code in PLSQL:
For (int i=0; i<10; i++) {
SUM( DB.Sales) AS Hour#i
FROM
DB
WHERE DB.sale_Time BETWEEN
DB.Shift_Start_Time + i AND
DB.Shift_Start_Time + (i+1)
}
It should return a table with 11 columns, one for the time of the shift and each of the other columns sums the sales of an hour in the 10 hours shift, and each will be named after the hour it represents.
Basically I want to use it for other purposes, as my work has nothing to do with sales, where "i" can be as big as 1000 so I'm looking for a generic solution to my problem.
Again, I am using PLSQL.
Help will be very appreciated!
Thank you

You can do this using the analytic functions in Oracle. First, I aggregate the sales data by hour, since that seems to be the unit you want, and then sum up the hours using the analytic functions with a windowing clause:
select db.shift_start_time,
sum(db.sales) over (partition by null order by db.sale_time
range between current_row and <i> following)
from (select trunc(shift_start_time*24)/24 as thehour, sum(sales) as sales
from DB
group by trunc(shift_start_time*24)/24
) db
Note that this assumes that there are sales for every hour.

You can do this with a cursor, looping over each element. The key is to group by the hour you're interested in.
BEGIN
FOR x IN (
SELECT TRUNC( DB.sale_Time, 'HH' ) AS start_time,
SUM( db.sales ) INTO mysum
FROM db
GROUP BY TRUNC( DB.sale_Time, 'HH' )
) LOOP
/* Do loop stuff here */
END LOOP;
END;

I eventually gave up on my original demands and found a nice and simple way to loop through the hours using:
SELECT
db.shift_start_time,
T1.n AS Hour,
SUM(db.sales) AS Hourly_Sales
FROM
db,
(SELECT n FROM
(SELECT rownum n FROM DUAL CONNECT BY LEVEL <=10) WHERE n > 0) T1
WHERE
db.sale_time BETWEEN db.shift_start_time+(T1.n - 1)/24 AND db.shift_start_time + (T1.n)/24
GROUP BY
db.shift_start_time,
T1.n
ORDER BY
db.shift_start_time,
T1.n
It also solved my problem where hours are not rounded. If a shift starts at 9:45 it will work as expected and not round it to 9 or 10.
Only issue I have with this solution is that there is a separate row for each hour.. nevertheless , it's still the best solution yet.

Related

SELECT statement optimization

I'm not so expert in SQL queryes, but not even a complete newbie.
I'm exporting data from a MS-SQL database to an excel file using a SQL query.
I'm exporting many columns and two of this columns contain a date and an hour, this are the columns I use for the WHERE clause.
In detail I have about 200 rows for each day, everyone with a different hour, for many days. I need to extract the first value after the 15:00 of each day for more days.
Since the hours are different for each day i can't specify something like
SELECT a,b,hour,day FROM table WHERE hour='15:01'
because sometimes the value is at 15:01, sometimes 15:03 and so on (i'm looking for the closest value after the 15:00), for fix this i used this workaround:
SELECT TOP 1 a,b,hour,day FROM table WHERE hour > "15:00"
in this way i can take the first value after the 15:00 for a day...the problem is that i need this for more days...for a user-specifyed interval of days. At the moment i fix this with a UNION ALL statement, like this:
SELECT TOP 1 a,b,hour,day FROM table WHERE data="first_day" AND hour > "15:00"
UNION ALL SELECT TOP 1 a,b,hour,day FROM table WHERE data="second_day" AND hour > "15:00"
UNION ALL SELECT TOP 1 a,b,hour,day FROM table WHERE data="third_day" AND hour > "15:00"
...and so on for all the days (i build the SQL string with a for each day in the specifyed interval).
Until now this worked, but now I need to expand the days interval (now is maximun a week, so 5 days) to up to 60 days. I don't want to build an huge query string, but i can't imagine an alternative way for write the SQL.
Any help appreciated
Ettore
I typical solution for this uses row_number():
SELECT a, b, hour, day
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY day ORDER BY hour) as seqnum
FROM table t
WHERE hour > '15:00'
) t
WHERE seqnum = 1;

SQL workaround to substitute FOLLOWING / PRECEEDING in PostgreSQL 8.4

I have a query that does a basic moving average using the FOLLOWING / PRECEDING syntax of PostgreSQL 9.0. To my horror I discovered our pg server runs on 8.4 and there is no scope to get an upgrade in the near future.
I am therefore looking for the simplest way to make a backwards compatible query of the following:
SELECT time_series,
avg_price AS daily_price,
CASE WHEN row_number() OVER (ORDER BY time_series) > 7
THEN avg(avg_price) OVER (ORDER BY time_series DESC ROWS BETWEEN 0 FOLLOWING
AND 6 FOLLOWING)
ELSE NULL
END AS avg_price
FROM (
SELECT to_char(closing_date, 'YYYY/MM/DD') AS time_series,
SUM(price) / COUNT(itemname) AS avg_price
FROM auction_prices
WHERE itemname = 'iphone6_16gb' AND price < 1000
GROUP BY time_series
) sub
It is a basic 7-day moving average for a table containing price and timestamp columns:
closing_date timestamp
price numeric
itemname text
The requirement for basic is due to my basic knowledge of SQL.
Postgres 8.4 already has CTEs.
I suggest to use that, calculate the daily average in a CTE and then self-join to all days (existing or not) in the past week. Finally, aggregate once more for the weekly average:
WITH cte AS (
SELECT closing_date::date AS closing_day
, sum(price) AS day_sum
, count(price) AS day_ct
FROM auction_prices
WHERE itemname = 'iphone6_16gb'
AND price <= 1000 -- including upper border
GROUP BY 1
)
SELECT d.closing_day
, CASE WHEN d.day_ct > 1
THEN d.day_sum / d.day_ct
ELSE d.day_sum
END AS avg_day -- also avoids division-by-zero
, CASE WHEN sum(w.day_ct) > 1
THEN sum(w.day_sum) / sum(w.day_ct)
ELSE sum(w.day_sum)
END AS week_avg_proper -- also avoids division-by-zero
FROM cte d
JOIN cte w ON w.closing_day BETWEEN d.closing_day - 6 AND d.closing_day
GROUP BY d.closing_day, d.day_sum, d.day_ct
ORDER BY 1;
SQL Fiddle. (Running on Postgres 9.3, but should work in 8.4, too.)
Notes
I used a different (correct) algorithm to calculate the weekly average. See considerations in my comment to the question.
This calculates averages for every day in the base table, including corner cases. But no row for days without any rows.
One can subtract integer from date: d.closing_day - 6. (But not from varchar or timestamp!)
It's rather confusing that you call a timestamp column closing_date - it's not a date, it's a timestamp.
And time_series for the resulting column with a date value? I use closing_day instead ...
Note how I count prices count(price), not items COUNT(itemname) - which would be an entry point for a sneaky error if either of the columns can be NULL. If neither can be NULL count(*) would be superior.
The CASE construct avoids division-by-zero errors, which can occur as long as the column you are counting can be NULL. I could use COALESCE for the purpose, but while being at it I simplified the case for exactly 1 price as well.
-- make a subset and rank it on date
WITH xxx AS (
SELECT
rank() OVER(ORDER BY closing_date) AS rnk
, closing_date
, price
FROM auction_prices
WHERE itemname = 'iphone6_16gb' AND price < 1000
)
-- select subset, + aggregate on self-join
SELECT this.*
, (SELECT AVG(price) AS mean
FROM xxx that
WHERE that.rnk > this.rnk + 0 -- <<-- adjust window
AND that.rnk < this.rnk + 7 -- <<-- here
)
FROM xxx this
ORDER BY this.rnk
;
Note: the CTE is for conveniance (Postgres-8.4 does have CTE's), but the CTE could be replaced by a subquery or, more elegantly, by a view.
The code assumes that the time series is has no gaps (:one opservation for every {product*day}. When not: join with a calendar table (which could also contain the rank.)
(also note that I did not cover the corner cases.)
PostgreSQL 8.4.... wasn't that in the day when everybody thought Windows 95 was great? Anyway...
The only option I can think of is to use a stored procedure with a scrollable cursor and do the math manually:
CREATE FUNCTION auction_prices(item text, price_limit real)
RETURNS TABLE (closing_date timestamp, avg_day real, avg_7_day real) AS $$
DECLARE
last_date date;
first_date date;
cur refcursor;
rec record;
dt date;
today date;
today_avg real;
p real;
sum_p real;
n integer;
BEGIN
-- There may be days when an item was not traded under the price limit, so need a
-- series of consecutive days to find all days. Find the end-points of that
-- interval.
SELECT max(closing_date), min(closing_date) INTO last_date, first_date
FROM auction_prices
WHERE itemname = item AND price < price_limit;
-- Need at least some data, so quit if item was never traded under the price limit.
IF NOT FOUND THEN
RETURN;
END IF;
-- Create a scrollable cursor over the auction_prices daily average and the
-- series of consecutive days. The LEFT JOIN means that you will get a NULL
-- for avg_price on days without trading.
OPEN cur SCROLL FOR
SELECT days.dt, sub.avg_price
FROM generate_series(last_date, first_date, interval '-1 day') AS days(dt)
LEFT JOIN (
SELECT sum(price) / count(itemname) AS avg_price
FROM auction_prices
WHERE itemname = item AND price < price_limit
GROUP BY closing_date
) sub ON sub.closing_date::date = days.dt::date;
<<all_recs>>
LOOP -- over the entire date series
-- Get today's data (today = first day of 7-day period)
FETCH cur INTO today, today_avg;
EXIT all_recs WHEN NOT FOUND; -- No more data, so exit the loop
IF today_avg IS NULL THEN
n := 0;
sum_p := 0.0;
ELSE
n := 1;
sum_p := today_avg;
END IF;
-- Loop over the remaining 6 days
FOR i IN 2 .. 7 LOOP
FETCH cur INTO dt, p;
EXIT all_recs WHEN NOT FOUND; -- No more data, so exit the loop
IF p IS NOT NULL THEN
sum_p := sum_p + p;
n := n + 1;
END IF;
END LOOP;
-- Save the data to the result set
IF n > 0 THEN
RETURN NEXT today, today_avg, sum_p / n;
ELSE
RETURN NEXT today, today_avg, NULL;
END IF;
-- Move the cursor back to the starting row of the next 7-day period
MOVE RELATIVE -6 FROM cur;
END LOOP all_recs;
CLOSE cur;
RETURN;
END; $$ LANGUAGE plpgsql STRICT;
A few notes:
There may be dates when an item is not traded under the limit price. In order to get accurate moving averages, you need to include those days. Generate a series of consecutive dates during which the item was indeed traded under the limit price and you will get accurate results.
The cursor needs to be scrollable such that you can look forward 6 days to earlier dates to get data needed for the calculation, and then move back 6 days to calculate the average for the next day.
You cannot calculate a moving average on the last 6 days. The simple reason is that the MOVE command needs a constant number of records to move. Parameter substitution is not supported. On the up side, your moving average will always be for 7 days (of which not all may have seen trading).
This function will by no means be fast, but it should work. No guarantees though, I have not worked on an 8.4 box for years.
Use of this function is rather straightforward. Since it is returning a table you can use it in a FROM clause like any other table (and even JOIN to other relations):
SELECT to_char(closing_date, 'YYYY/MM/DD') AS time_series, avg_day, avg_7_day
FROM auction_prices('iphone6_16gb', 1000);

SQL Average Inter-arrival Time, Time Between Dates

I have a table with sequential timestamps:
2011-03-17 10:31:19
2011-03-17 10:45:49
2011-03-17 10:47:49
...
I need to find the average time difference between each of these(there could be dozens) in seconds or whatever is easiest, I can work with it from there. So for example the above inter-arrival time for only the first two times would be 870 (14m 30s). For all three times it would be: (870 + 120)/2 = 445 (7m 25s).
A note, I am using postgreSQL 8.1.22 .
EDIT: The table I mention above is from a different query that is literally just a one-column list of timestamps
Not sure I understood your question completely, but this might be what you are looking for:
SELECT avg(difference)
FROM (
SELECT timestamp_col - lag(timestamp_col) over (order by timestamp_col) as difference
FROM your_table
) t
The inner query calculates the distance between each row and the preceding row. The result is an interval for each row in the table.
The outer query simply does an average over all differences.
i think u want to find avg(timestamptz).
my solution is avg(current - min value). but since result is interval, so add it to min value again.
SELECT avg(target_col - (select min(target_col) from your_table))
+ (select min(target_col) from your_table)
FROM your_table
If you cannot upgrade to a version of PG that supports window functions, you
may compute your table's sequential steps "the slow way."
Assuming your table is "tbl" and your timestamp column is "ts":
SELECT AVG(t1 - t0)
FROM (
-- All this silliness would be moot if we could use
-- `` lead(ts) over (order by ts) ''
SELECT tbl.ts AS t0,
next.ts AS t1
FROM tbl
CROSS JOIN
tbl next
WHERE next.ts = (
SELECT MIN(ts)
FROM tbl subquery
WHERE subquery.ts > tbl.ts
)
) derived;
But don't do that. Its performance will be terrible. Please do what
a_horse_with_no_name suggests, and use window functions.

Postgres SQL select a range of records spaced out by a given interval

I am trying to determine if it is possible, using only sql for postgres, to select a range of time ordered records at a given interval.
Lets say I have 60 records, one record for each minute in a given hour. I want to select records at 5 minute intervals for that hour. The resulting rows should be 12 records each one 5 minutes apart.
This is currently accomplished by selecting the full range of records and then looping thru the results and pulling out the records at the given interval. I am trying to see if I can do this purly in sql as our db is large and we may be dealing with tens of thousands of records.
Any thoughts?
Yes you can. Its really easy once you get the hang of it. I think its one of jewels of SQL and its especially easy in PostgreSQL because of its excellent temporal support. Often, complex functions can turn into very simple queries in SQL that can scale and be indexed properly.
This uses generate_series to draw up sample time stamps that are spaced 1 minute apart. The outer query then extracts the minute and uses modulo to find the values that are 5 minutes apart.
select
ts,
extract(minute from ts)::integer as minute
from
( -- generate some time stamps - one minute apart
select
current_time + (n || ' minute')::interval as ts
from generate_series(1, 30) as n
) as timestamps
-- extract the minute check if its on a 5 minute interval
where extract(minute from ts)::integer % 5 = 0
-- only pick this hour
and extract(hour from ts) = extract(hour from current_time)
;
ts | minute
--------------------+--------
19:40:53.508836-07 | 40
19:45:53.508836-07 | 45
19:50:53.508836-07 | 50
19:55:53.508836-07 | 55
Notice how you could add an computed index on the where clause (where the value of the expression would make up the index) could lead to major speed improvements. Maybe not very selective in this case, but good to be aware of.
I wrote a reservation system once in PostgreSQL (which had lots of temporal logic where date intervals could not overlap) and never had to resort to iterative methods.
http://www.amazon.com/SQL-Design-Patterns-Programming-Focus/dp/0977671542 is an excellent book that goes has lots of interval examples. Hard to find in book stores now but well worth it.
Extract the minutes, convert to int4, and see, if the remainder from dividing by 5 is 0:
select *
from TABLE
where int4 (date_part ('minute', COLUMN)) % 5 = 0;
If the intervals are not time based, and you just want every 5th row; or
If the times are regular and you always have one record per minute
The below gives you one record per every 5
select *
from
(
select *, row_number() over (order by timecolumn) as rown
from tbl
) X
where mod(rown, 5) = 1
If your time records are not regular, then you need to generate a time series (given in another answer) and left join that into your table, group by the time column (from the series) and pick the MAX time from your table that is less than the time column.
Pseudo
select thetimeinterval, max(timecolumn)
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
And further join it back to the table for the full record (assuming unique times)
select t.* from
tbl inner join
(
select thetimeinterval, max(timecolumn) timecolumn
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
) y on tbl.timecolumn = y.timecolumn
How about this:
select min(ts), extract(minute from ts)::integer / 5
as bucket group by bucket order by bucket;
This has the advantage of doing the right thing if you have two readings for the same minute, or your readings skip a minute. Instead of using min even better would be to use one of the the first() aggregate functions-- code for which you can find here:
http://wiki.postgresql.org/wiki/First_%28aggregate%29
This assumes that your five minute intervals are "on the fives", so to speak. That is, that you want 07:00, 07:05, 07:10, not 07:02, 07:07, 07:12. It also assumes you don't have two rows within the same minute, which might not be a safe assumption.
select your_timestamp
from your_table
where cast(extract(minute from your_timestamp) as integer) in (0,5);
If you might have two rows with timestamps within the same minute, like
2011-01-01 07:00:02
2011-01-01 07:00:59
then this version is safer.
select min(your_timestamp)
from your_table
group by (cast(extract(minute from your_timestamp) as integer) / 5)
Wrap either of those in a view, and you can join it to your base table.

Sqlite3: Need to Cartesian On date

I have a table which is a list of games that have been played in a sqlite3 database. The field "datetime" is the a datetime of when game ended. The field "duration" is the number of seconds the game lasted. I want to know what percent of the past 24 hours had at least 5 games running simutaniously. I figured out to tell how many games running at a given time are:
select count(*)
from games
where strftime('%s',datetime)+0 >= 1257173442 and
strftime('%s',datetime)-duration <= 1257173442
If I had a table that was simply a list of every second (or every 30 seconds or something) I could do an intentional cartisian product like this:
select count(*)
from (
select count(*) as concurrent, d.second
from games g, date d
where strftime('%s',datetime)+0 >= d.second and
strftime('%s',datetime)-duration <= d.second and
d.second >= strftime('%s','now') - 24*60*60 and
d.second <= strftime('%s','now')
group by d.second) x
where concurrent >=5
Is there a way to create this date table on the fly? Or that I can get a similar effect to this without having to actually create a new table that is simply a list of all the seconds this week?
Thanks
First, I can't think of a way to approach your problem by creating a table on the fly or without the aid of an extra table. Sorry.
My suggestion is for you to rely on a static Numbers table.
Create a fixed table with the format:
CREATE TABLE Numbers (
number INTEGER PRIMARY KEY
);
Populate it with the number of seconds in 24h (24*60*60 = 84600). I would use any scripting language to do that using the insert statement:
insert into numbers default values;
Now the Numbers table has the numbers 1 through 84600. Your query will them be modified to be:
select count(*)
from (
select count(*) as concurrent, strftime('%s','now') - 84601 + n.number second
from games g, numbers n
where strftime('%s',datetime)+0 >= strftime('%s','now') - 84601 + n.number and
strftime('%s',datetime)-duration <= strftime('%s','now') - 84601 + n.number
group by second) x
where concurrent >=5
Without a procedural language in the mix, that is the best you'll be able to do, I think.
Great question!
Here's a query that I think gives you what you want without using a separate table. Note this is untested (so probably contains errors) and I've assumed datetime is an int column with # of seconds to avoid a ton of strftime's.
select sum(concurrent_period) from (
select min(end_table.datetime - begin_table.begin_time) as concurrent_period
from (
select g1.datetime, g1.num_end, count(*) as concurrent
from (
select datetime, count(*) as num_end
from games group by datetime
) g1, games g2
where g2.datetime >= g1.datetime and
g2.datetime-g2.duration < g1.datetime and
g1.datetime >= strftime('%s','now') - 24*60*60 and
g1.datetime <= strftime('%s','now')+0
) end_table, (
select g3.begin_time, g1.num_begin, count(*) as concurrent
from (
select datetime-duration as begin_time,
count(*) as num_begin
from games group by datetime-duration
) g3, games g4
where g4.datetime >= g3.begin_time and
g4.datetime-g4.duration < g3.begin_time and
g3.begin_time >= strftime('%s','now') - 24*60*60 and
g3.begin_time >= strftime('%s','now')+0
) begin_table
where end_table.datetime > begin_table.begin_time
and begin_table.concurrent < 5
and begin_table.concurrent+begin_table.num_begin >= 5
and end_table.concurrent >= 5
and end_table.concurrent-end_table.num_end < 5
group by begin_table.begin_time
) aah
The basic idea is to make two tables: one with the # of concurrent games at the begin time of each game, and one with the # of concurrent games at the end time. Then join the tables together and only take rows at "critical points" where # of concurrent games crosses 5. For each critical begin time, take the critical end time that happened soonest and that hopefully gives all the periods where at least 5 games were running concurrently.
Hope that's not too convoluted to be helpful!
Kevin rather beat me to the punchline there (+1), but I'll post this variation as it differs at least a little.
The key ideas are
Map the data in to a stream of events with attributes time and 'polarity' (=start or end of game)
Keep a running total of how many games are open at the time of each event
(this is done by forming a self-join on the event stream)
Find the event times where the number of games (as Kevin says) transitions up to 5, or down to 4
A little trick: add up all the down-to-4 times and take away the up-to-5s - the order is not important
The result is the number of seconds spent with 5 or more games open
I don't have sqllite, so I've been testing with MySQL, and I've not bothered to limit the time window to preserve some sanity. Shouldn't be difficult to revise.
Also, and more importantly, I've not considered what to do if games are open at the beginning or end of the period!
Something tells me there's a big simplification to be had here, but I've not spotted it yet.
SELECT SUM( event_time )
FROM (
SELECT -ga.event_type * ga.event_time AS event_time,
SUM( ga.event_type * gb.event_type ) event_type
FROM
( SELECT UNIX_TIMESTAMP( g1.endtime - g1.duration ) AS event_time
, 1 event_type
FROM games g1
UNION
SELECT UNIX_TIMESTAMP( g1.endtime )
, -1
FROM games g1 ) AS ga,
( SELECT UNIX_TIMESTAMP( g1.endtime - g1.duration ) AS event_time
, 1 event_type
FROM games g1
UNION
SELECT UNIX_TIMESTAMP( g1.endtime )
, -1
FROM games g1 ) AS gb
WHERE
ga.event_time >= gb.event_time
GROUP BY ga.event_time
HAVING SUM( ga.event_type * gb.event_type ) IN ( -4, 5 )
) AS gr
Why don't you trim the date and keep only the time, if you filter your data for any given date every time is unique. In this way you'll only need a table with numbers from 1 to 86400 (or less if you take bigger intervals), you may create two columns, "from" and "to" to define the intervals.
I'm not familiar with SQLite functions but according to the manual you have to use the strftime function with this format: HH:MM:SS.