SQL workaround to substitute FOLLOWING / PRECEEDING in PostgreSQL 8.4 - sql

I have a query that does a basic moving average using the FOLLOWING / PRECEDING syntax of PostgreSQL 9.0. To my horror I discovered our pg server runs on 8.4 and there is no scope to get an upgrade in the near future.
I am therefore looking for the simplest way to make a backwards compatible query of the following:
SELECT time_series,
avg_price AS daily_price,
CASE WHEN row_number() OVER (ORDER BY time_series) > 7
THEN avg(avg_price) OVER (ORDER BY time_series DESC ROWS BETWEEN 0 FOLLOWING
AND 6 FOLLOWING)
ELSE NULL
END AS avg_price
FROM (
SELECT to_char(closing_date, 'YYYY/MM/DD') AS time_series,
SUM(price) / COUNT(itemname) AS avg_price
FROM auction_prices
WHERE itemname = 'iphone6_16gb' AND price < 1000
GROUP BY time_series
) sub
It is a basic 7-day moving average for a table containing price and timestamp columns:
closing_date timestamp
price numeric
itemname text
The requirement for basic is due to my basic knowledge of SQL.

Postgres 8.4 already has CTEs.
I suggest to use that, calculate the daily average in a CTE and then self-join to all days (existing or not) in the past week. Finally, aggregate once more for the weekly average:
WITH cte AS (
SELECT closing_date::date AS closing_day
, sum(price) AS day_sum
, count(price) AS day_ct
FROM auction_prices
WHERE itemname = 'iphone6_16gb'
AND price <= 1000 -- including upper border
GROUP BY 1
)
SELECT d.closing_day
, CASE WHEN d.day_ct > 1
THEN d.day_sum / d.day_ct
ELSE d.day_sum
END AS avg_day -- also avoids division-by-zero
, CASE WHEN sum(w.day_ct) > 1
THEN sum(w.day_sum) / sum(w.day_ct)
ELSE sum(w.day_sum)
END AS week_avg_proper -- also avoids division-by-zero
FROM cte d
JOIN cte w ON w.closing_day BETWEEN d.closing_day - 6 AND d.closing_day
GROUP BY d.closing_day, d.day_sum, d.day_ct
ORDER BY 1;
SQL Fiddle. (Running on Postgres 9.3, but should work in 8.4, too.)
Notes
I used a different (correct) algorithm to calculate the weekly average. See considerations in my comment to the question.
This calculates averages for every day in the base table, including corner cases. But no row for days without any rows.
One can subtract integer from date: d.closing_day - 6. (But not from varchar or timestamp!)
It's rather confusing that you call a timestamp column closing_date - it's not a date, it's a timestamp.
And time_series for the resulting column with a date value? I use closing_day instead ...
Note how I count prices count(price), not items COUNT(itemname) - which would be an entry point for a sneaky error if either of the columns can be NULL. If neither can be NULL count(*) would be superior.
The CASE construct avoids division-by-zero errors, which can occur as long as the column you are counting can be NULL. I could use COALESCE for the purpose, but while being at it I simplified the case for exactly 1 price as well.

-- make a subset and rank it on date
WITH xxx AS (
SELECT
rank() OVER(ORDER BY closing_date) AS rnk
, closing_date
, price
FROM auction_prices
WHERE itemname = 'iphone6_16gb' AND price < 1000
)
-- select subset, + aggregate on self-join
SELECT this.*
, (SELECT AVG(price) AS mean
FROM xxx that
WHERE that.rnk > this.rnk + 0 -- <<-- adjust window
AND that.rnk < this.rnk + 7 -- <<-- here
)
FROM xxx this
ORDER BY this.rnk
;
Note: the CTE is for conveniance (Postgres-8.4 does have CTE's), but the CTE could be replaced by a subquery or, more elegantly, by a view.
The code assumes that the time series is has no gaps (:one opservation for every {product*day}. When not: join with a calendar table (which could also contain the rank.)
(also note that I did not cover the corner cases.)

PostgreSQL 8.4.... wasn't that in the day when everybody thought Windows 95 was great? Anyway...
The only option I can think of is to use a stored procedure with a scrollable cursor and do the math manually:
CREATE FUNCTION auction_prices(item text, price_limit real)
RETURNS TABLE (closing_date timestamp, avg_day real, avg_7_day real) AS $$
DECLARE
last_date date;
first_date date;
cur refcursor;
rec record;
dt date;
today date;
today_avg real;
p real;
sum_p real;
n integer;
BEGIN
-- There may be days when an item was not traded under the price limit, so need a
-- series of consecutive days to find all days. Find the end-points of that
-- interval.
SELECT max(closing_date), min(closing_date) INTO last_date, first_date
FROM auction_prices
WHERE itemname = item AND price < price_limit;
-- Need at least some data, so quit if item was never traded under the price limit.
IF NOT FOUND THEN
RETURN;
END IF;
-- Create a scrollable cursor over the auction_prices daily average and the
-- series of consecutive days. The LEFT JOIN means that you will get a NULL
-- for avg_price on days without trading.
OPEN cur SCROLL FOR
SELECT days.dt, sub.avg_price
FROM generate_series(last_date, first_date, interval '-1 day') AS days(dt)
LEFT JOIN (
SELECT sum(price) / count(itemname) AS avg_price
FROM auction_prices
WHERE itemname = item AND price < price_limit
GROUP BY closing_date
) sub ON sub.closing_date::date = days.dt::date;
<<all_recs>>
LOOP -- over the entire date series
-- Get today's data (today = first day of 7-day period)
FETCH cur INTO today, today_avg;
EXIT all_recs WHEN NOT FOUND; -- No more data, so exit the loop
IF today_avg IS NULL THEN
n := 0;
sum_p := 0.0;
ELSE
n := 1;
sum_p := today_avg;
END IF;
-- Loop over the remaining 6 days
FOR i IN 2 .. 7 LOOP
FETCH cur INTO dt, p;
EXIT all_recs WHEN NOT FOUND; -- No more data, so exit the loop
IF p IS NOT NULL THEN
sum_p := sum_p + p;
n := n + 1;
END IF;
END LOOP;
-- Save the data to the result set
IF n > 0 THEN
RETURN NEXT today, today_avg, sum_p / n;
ELSE
RETURN NEXT today, today_avg, NULL;
END IF;
-- Move the cursor back to the starting row of the next 7-day period
MOVE RELATIVE -6 FROM cur;
END LOOP all_recs;
CLOSE cur;
RETURN;
END; $$ LANGUAGE plpgsql STRICT;
A few notes:
There may be dates when an item is not traded under the limit price. In order to get accurate moving averages, you need to include those days. Generate a series of consecutive dates during which the item was indeed traded under the limit price and you will get accurate results.
The cursor needs to be scrollable such that you can look forward 6 days to earlier dates to get data needed for the calculation, and then move back 6 days to calculate the average for the next day.
You cannot calculate a moving average on the last 6 days. The simple reason is that the MOVE command needs a constant number of records to move. Parameter substitution is not supported. On the up side, your moving average will always be for 7 days (of which not all may have seen trading).
This function will by no means be fast, but it should work. No guarantees though, I have not worked on an 8.4 box for years.
Use of this function is rather straightforward. Since it is returning a table you can use it in a FROM clause like any other table (and even JOIN to other relations):
SELECT to_char(closing_date, 'YYYY/MM/DD') AS time_series, avg_day, avg_7_day
FROM auction_prices('iphone6_16gb', 1000);

Related

show results of sql code per month in one view

I have this SQL statement that generates the revenue, the total customer and revenue for new business.
I would like to amend my SQL to allow it to show the same result but also run it for the previous 3 years with the same logic.
One option is to use a union and amend every time the current_date with DATEADD function by -1 , -2 etc. But this would be so inefficient.
Is there a better way to amend the code? with a date dimension maybe?
select date_trunc('month',current_date),
COUNT(DISTINCT CASE WHEN (case when RELEVANT_DATE_OUTBOUND> current_date then TOTAL_REVENUE end)>0 THEN CUSTOMER_NAME end) CUSTOMER_ID,
SUM(CASE WHEN (case when date_trunc('month',reporting_date) = date_trunc('month',current_date) then NB_EUR end)>0 THEN nb_eur end) nb_eur
from REVENUE_DATABASE_AGR_VIEW
Please find attached the dataset and result of the SQL and desired outcome.
dataset:
results:
desired outcome:
I would suggest that you create a date_dim table and use that to provide the date values to join with your query as the date parameter.
You could also use a loop, but row based processing is pretty non-performant in Snowflake.
However, if you're only running it once to populate the prior 3 years, and then once per day only to do each new day's data, then you could wrap your existing code in a loop like the following to process all the historical data, and then use your query with just the current_date()-1 and run daily prospectively:
create
or replace table date_dim (loop_date varchar);
declare ctr number(4, 0);
begin ctr: = 0;
while (ctr <= 1095) do -- create date for prior 3 years
insert into
date_dim
select
current_date() - :ctr;
ctr: = ctr + 1;
end while;
end;

Limit result rows for minimal time intervals for PostgreSQL

Background: I am running TeslaMate/Grafana for monitoring my car status, one of the gauges plots the battery level fetched from database. My server is located remotely and running in a Dock from an old NAS, so both query performance and network overhead matters.
I found the koisk page frequently hangs and by investigation, it might caused by the query -- two of the plots returns 10~100k rows of results from database. I want to limit the number of rows returned by SQL queries, as the plots certainly don't have that much precision for drawing such detailed intervals.
I tried to follow this answer and use row_number() to pop only 100-th rows of results, but more complicated issues turned up, that is, the time intervals among rows are not consistent.
The car has 4 status, driving / online / asleep / offline.
If the car is at driving status, the time interval could be less than 200ms as the car pushes the status whenever it has new data.
If the car is at online status, the time interval could be several minutes as the system actively fetches the status from the car.
Even worse, if the system thinks the car is going to sleep and need to stop fetching status (to avoid preventing the car to sleep), the interval could be 40 minutes maximum depend on settings.
If the car is in asleep/offline status, no data is recorded at all.
This obviously makes skipping every n-th rows a bad idea, as for case 2-4 above, lots of data points might missing so that Grafana cannot plot correct graph representing the battery level at satisfactory precision.
I wonder if there's any possible to skip the rows by time interval from a datetime field rather than row_number() without much overhead from the query? i.e., fetch every row with minimal 1000ms from the previous row.
E.g., I have following data in the table, I want the rows returned are row 1, 4 and 5.
row date
[1] 1610000001000
[2] 1610000001100
[3] 1610000001200
[4] 1610000002000
[5] 1610000005000
The current (problematic) method I am using is as follows:
SELECT $__time(t.date), t.battery_level AS "SOC [%]"
FROM (
SELECT date, battery_level, row_number() OVER(ORDER BY date ASC) AS row
FROM (
SELECT battery_level, date
FROM positions
WHERE car_id = $car_id AND $__timeFilter(date)
UNION ALL
SELECT battery_level, date
FROM charges c
JOIN charging_processes p ON p.id = c.charging_process_id
WHERE $__timeFilter(date) AND p.car_id = $car_id) AS data
ORDER BY date ASC) as t
WHERE t.row % 100 = 0;
This method clearly gives problem that only returns alternate rows instead of what I wanted (given the last row reads t.row % 2 = 0)
PS: please ignore the table structures and UNION from the sample code, I haven't dig deep enough to the tables which could be other tweaks but irrelevant to this question anyway.
Thanks in advance!
You can use a recursive CTE:
WITH RECURSIVE rec(cur_row, cur_date) AS (
(
SELECT row, date
FROM t
ORDER BY date
LIMIT 1
)
UNION ALL
(
SELECT row, date
FROM t
JOIN rec
ON t.date >= cur_date + 1000
ORDER BY t.date
LIMIT 1
)
)
SELECT *
FROM rec;
cur_row
cur_date
1
1610000001000
4
1610000002000
5
1610000005000
View on DB Fiddle
Using a function instead would probably be faster:
CREATE OR REPLACE FUNCTION f() RETURNS SETOF t AS
$$
DECLARE
row t%ROWTYPE;
cur_date BIGINT;
BEGIN
FOR row IN
SELECT *
FROM t
ORDER BY date
LOOP
IF row.date >= cur_date + 1000 OR cur_date IS NULL
THEN
cur_date := row.date;
RETURN NEXT row;
END IF;
END LOOP;
END;
$$ LANGUAGE plpgsql;
SELECT *
FROM f();
row
date
1
1610000001000
4
1610000002000
5
1610000005000

adding business days in oracle sql

I have two date fields, DATE_FIELD_ONE = 8/30/2018 and DATE_FIELD_TWO = DATE_FIELD_ONE + 20. I need to find what DATE_FIELD_TWO should be if I'm only added 20 business days . How would I accomplish this? I thought maybe trying 'DY' but not sure how to get it to work. Thanks.
CASE WHEN TO_CHAR(TO_DATE(DATE_FIELD_ONE),'DY')='SAT' THEN 1 ELSE 0 END
CASE WHEN TO_CHAR(TO_DATE(DATE_FIELD_ONE),'DY')='SUN' THEN 1 ELSE 0 END
You may try this :
select max(date_field_two) as date_field_two
from
(
select date'2018-08-30'+
cast(case when to_char(date'2018-08-30'+level,'D','NLS_DATE_LANGUAGE=ENGLISH')
in ('6','7') then
0
else
level
end as int) as date_field_two,
sum(cast(case when to_char(date'2018-08-30'+level,'D','NLS_DATE_LANGUAGE=ENGLISH')
in ('6','7') then
0
else
1
end as int)) over (order by level) as next_day
from dual
connect by level <= 20*1.5
-- 20 is the day to be added, every time 5(#of business days)*1.5 > 7(#of week days)
-- 7=5+2<5+(5/2)=5*(1+1/2)=5*1.5 [where 1.5 is just a coefficient might be replaced a greater one like 2]
-- so 4*5*1.5=20*1.5 > 4*7
)
where next_day = 20;
DATE_FIELD_TWO
-----------------
27.09.2018
by using connect by dual clause.
P.S. Ignored the case for public holidays, which differ from one culture to another , depending on the question being related with only weekends.
Rextester Demo
Edit : Assume you have a national holidays on '2018-09-25' and '2018-09-26' (in this set of days), then consider the following :
select max(date_field_two) as date_field_two
from
(
select date'2018-08-30'+
(case when to_char(date'2018-08-30'+level,'D','NLS_DATE_LANGUAGE=ENGLISH')
in ('6','7') then
0
when date'2018-08-30'+level in (date'2018-09-25',date'2018-09-26') then
0
else
level
end) as date_field_two,
sum(cast(case when to_char(date'2018-08-30'+level,'D','NLS_DATE_LANGUAGE=ENGLISH')
in ('6','7') then
0
when date'2018-08-30'+level in (date'2018-09-25',date'2018-09-26') then
0
else
1
end as int)) over (order by level) as next_day
from dual
connect by level <= 20*2
)
where next_day = 20;
DATE_FIELD_TWO
-----------------
01.10.2018
which iterates one day next, as in this case, unless this holiday coincides with weekend.
you can define workdays to be whatever you like if you use a PL/SQL function
Have a simple prototype here - without any holidays - but it could be adapted for that purpose using the same kind of logic.
create or replace function add_business_days (from_date IN date, bd IN integer) return date as
fd date := trunc(from_date,'iw');
cnt int := (from_date-fd)+bd-1;
ww int := ceil(cnt/5);
wd int := mod(cnt,5);
begin
return from_date + (ww*7)+wd;
end;
/
I realize you already have an answer, but for what it's worth this is something we deal with all the time and have what has turned out to be a very good solution.
In effect, we maintain a separate table called "work days" that has every conceivable date we would ever compare (and that definition will vary from application to application, of course -- but in any case it will never be "huge" by RDBMS standards). There is a boolean flag that dictates if the date is a work day or a weekend/holiday, but more importantly there is an index value that only increments on work days. The table looks like this:
The advantage to this is transparency and scalability. If you want the difference between two dates in work days:
select
h.entry_date, h.invoice_date, wd2.workday_index - wd1.workday_index as delta
from
sales_order_data h
join util.work_days wd1 on h.sales_order_entry_dte = wd1.cal_date
join util.work_days wd2 on h.invoice_dte = wd2.cal_date
If you need to take a date in a table and add 20 days (like your original problem statement):
select
h.date_field_1, wd2.cal_date as date_field_1_plus_20
from
my_table h
join util.work_days wd1 on h.date_field_1 = wd1.cal_date
join util.work_days wd2 on
wd1.workday_index + 20 = wd2.workday_index and
wd2.is_workday
(disclaimer, this is in PostgreSQL, which is why I have the boolean. In Oracle, I'm guessing you need to change that to an integer and say =1)
Also, for the bonus question, this also gives two different options for defining "work day," one that rolls forward and another that rolls backwards (hence the workday_index and workday_index_back). For example, if you need something on a Saturday, and Saturday is not a work day, that means you need it on Friday. Conversely, if something is to be delivered on Saturday, and Saturday is not on a work day, then that means it will be available on Monday. The context of how to handle non-workdays differs, and this method affords you the option of chosing the right one.
As a final selling point, this option allows you to define holidays as non-work days also... and you can do this or not do this; it's up to you. The solution permits either option. You could theoretically add two more columns for work day index weekend only that gave you both options.

Identify groups of rows in close proximity

I am doing a project for school and have been given a database over gps-recordings for three people during the course of a week. I am trying to group these recordings into trips based on the time between them. If a recording is within 300 seconds from the recording before it, they are considered to be part of the same trip, otherwise, they are considered part of different trips.
So far I have managed to calculate the time difference between a recording on row n and the one on row n-1 and I am now trying to create a function for merging the recordings intro trips. This would have been real easy in another programming language, but in this course we are using PostgreSQL which I am not that well versed in.
To solve this, I am trying to create a function with a variable that increases every time the time difference between two recordings is greater than 300 seconds and assigns each row to a trip based on the variable. This is as far as I have currently gotten, although at the moment, the variable resets X all the time, thus assigning all rows to trip 1...
CREATE OR REPLACE FUNCTION tripmerge(time_diff double precision)
RETURNs integer AS $$
declare
X integer := 1;
ID integer;
BEGIN
IF time_diff < 300 THEN
ID = X;
ELSE
ID =X;
X:=X+1;
END IF;
RETURN ID;
END;$$
LANGUAGE plpgsql;
How do I change so X does not reset all the time? I am using PostgreSQL 9.1.
EDIT:
This is the table I am working with:
curr_rec (timestamp), prev_rec (timestamp), time_diff (double precision)
With this being a sample of the dataset:
'2013-11-14 05:22:33.991+01',null ,null
'2013-11-14 09:15:40.485+01','2013-11-14 05:22:33.991+01',13986.494
'2013-11-14 09:17:04.837+01','2013-11-14 09:15:40.485+01',84.352
'2013-11-14 09:17:43.055+01','2013-11-14 09:17:04.837+01',38.218
'2013-11-14 09:23:24.205+01','2013-11-14 09:17:43.055+01',341.15
The expected result would add a column:
tripID
1
2
2
2
3
And I think this fiddle should be working: http://sqlfiddle.com/#!1/4e3e5/1/0
This query uses only curr_rec, not the other redundant, precomputed columns:
SELECT 1 + count(step OR NULL) OVER (ORDER BY curr_rec) AS trip_id
FROM (
SELECT curr_rec
,lag(curr_rec) OVER (ORDER BY curr_rec) AS prev_rec
,curr_rec - lag(curr_rec) OVER (ORDER BY curr_rec)
> interval '5 min' AS step
FROM timestamps
) x;
Key features are:
The window function lag(), which I use to see if the previous row is more than 5 minutes ago. (Just using an interval for the comparison, no need to extract seconds)
The window aggregate function count() - that's just the basic aggregate function with an OVER clause.
The expression step OR NULL only leaves TRUE or NULL, where only TRUE is counted in a running count, thereby arriving at your desired result.
SQL Fiddle (building on the one you provided).

How to SELECT fields dynamically (LOOP) in PLSQL?

I'm at work and need to SUM specific value by hour. Lets say a shift is 10 hours, I want to be able to loop and do the following pseudo code in PLSQL:
For (int i=0; i<10; i++) {
SUM( DB.Sales) AS Hour#i
FROM
DB
WHERE DB.sale_Time BETWEEN
DB.Shift_Start_Time + i AND
DB.Shift_Start_Time + (i+1)
}
It should return a table with 11 columns, one for the time of the shift and each of the other columns sums the sales of an hour in the 10 hours shift, and each will be named after the hour it represents.
Basically I want to use it for other purposes, as my work has nothing to do with sales, where "i" can be as big as 1000 so I'm looking for a generic solution to my problem.
Again, I am using PLSQL.
Help will be very appreciated!
Thank you
You can do this using the analytic functions in Oracle. First, I aggregate the sales data by hour, since that seems to be the unit you want, and then sum up the hours using the analytic functions with a windowing clause:
select db.shift_start_time,
sum(db.sales) over (partition by null order by db.sale_time
range between current_row and <i> following)
from (select trunc(shift_start_time*24)/24 as thehour, sum(sales) as sales
from DB
group by trunc(shift_start_time*24)/24
) db
Note that this assumes that there are sales for every hour.
You can do this with a cursor, looping over each element. The key is to group by the hour you're interested in.
BEGIN
FOR x IN (
SELECT TRUNC( DB.sale_Time, 'HH' ) AS start_time,
SUM( db.sales ) INTO mysum
FROM db
GROUP BY TRUNC( DB.sale_Time, 'HH' )
) LOOP
/* Do loop stuff here */
END LOOP;
END;
I eventually gave up on my original demands and found a nice and simple way to loop through the hours using:
SELECT
db.shift_start_time,
T1.n AS Hour,
SUM(db.sales) AS Hourly_Sales
FROM
db,
(SELECT n FROM
(SELECT rownum n FROM DUAL CONNECT BY LEVEL <=10) WHERE n > 0) T1
WHERE
db.sale_time BETWEEN db.shift_start_time+(T1.n - 1)/24 AND db.shift_start_time + (T1.n)/24
GROUP BY
db.shift_start_time,
T1.n
ORDER BY
db.shift_start_time,
T1.n
It also solved my problem where hours are not rounded. If a shift starts at 9:45 it will work as expected and not round it to 9 or 10.
Only issue I have with this solution is that there is a separate row for each hour.. nevertheless , it's still the best solution yet.