Calculating differences in average monthly between years - sql

I have a table which contains average monthly values from a sensor over the last 3 years
Is there a way in which I can calculate the differences between, for example, the monthly values in 2019 and the monthly values in 2018, and perhaps create a new table or view that includes the 2018 dates in one column, 2019 dates in another and the difference in sensor reading value in a third ?
Thanks
TP

Assuming that your data has no missing month/year, one option uses window functions:
select
t.*,
lag(average) over(
partition by sensor_id, extract(month from m)
order by extract(year from m)
) last_year_average
from mytable
This puts all rows that belong to the same sensor and the same month in the same partition. You can then use the year part of the timestamp as an ordering column.
You can use the new column as needed to compare it to the current average.

If you have a value for every month, you can just use a 12-month lag:
select t.*,
lag(average, 12) over (partition by sensor_id
order by m
) as last_year_average
from t;
Filtering this to just 2019/2018 requires a subquery:
select t.*
from (select t.*,
lag(average, 12) over (partition by sensor_id
order by m
) as last_year_average
from t
) t
where m >= '2019-01-01'::date and
m < '2020-01-01'::date
If you are missing months, then neither this (nor GMB's answer) will work correctly. Instead, you can use a join, aggregation, or window function:
select t.*
from t left join
t tprev
on tprev.sensor_id = t.sensor_id and
tprev.m = t.m - interval '12 month'
where t.m >= '2019-01-01'::date and t.m < '2020-01-01'::date;
Two other methods are:
select t.sensor_id, month(t.m)
max(average) filter (where year(t.m) = 2019) as avg_2019,
max(average) filter (where year(t.m) = 2018) as avg_2018
from t
group by t.sensor_id, month(t.m);
And to use window functions safely if there is the possibility of missing months:
select t.*,
max(average) over (partition by sensor_id
order by m
range between '1 year preceding' and '1 year preceding'
) as average_prev
from t;

Related

PostgreSQL subquery - calculating average of lagged values

I am looking at Sales Rates by month, and was able to query the 1st table. I am quite new to PostgreSQL and am trying to figure out how I can query the second (I had to do the 2nd one in Excel for now)
I have the current Sales Rate and I would like to compare it to the Sales Rate 1 and 2 months ago, as an averaged rate.
I am not asking for an answer how exactly to solve it because this is not the point of getting better, but just for hints for functions to use that are specific to PostgreSQL. What I am trying to calculate is the 2 month average in the 2nd table based on the lagged values of the 2nd table. Thanks!
Here is the query for the 1st table:
with t1 as
(select date,
count(sales)::numeric/count(poss_sales) as SR_1M_before
from data
where date between '2019-07-01' and '2019-11-30'
group by 1),
t2 as
(select date,
count(sales)::numeric/count(poss_sales) as SR_2M_before
from data
where date between '2019-07-01' and '2019-10-31'
group by 1)
select t0.date,
count(t0.sales)::numeric/count(t0.poss_sales) as Sales_Rate
t1.SR_1M_before,
t2.SR_2M_before
from data as t0
left join t1 on t0.date=t1.date
left join t2 on t0.date=t1.date
where date between '2019-07-01' and '2019-12-31'
group by 1,3,4
order by 1;
As commented by a_horse_with_no_name, you can use window functions to take the average of the two previous monthes with a range clause:
select
date,
count(sales)::numeric/count(poss_sales) as Sales_Rate,
avg(count(sales)::numeric/count(poss_sales)) over(
order by date
rows between '2 month' preceding and '1 month' preceding
) Sales_Rate,
count(sales)::numeric/count(poss_sales) as Sales_Rate
- avg(count(sales)::numeric/count(poss_sales)) over(
order by date
rows between '2 month' preceding and '1 month' preceding
) PercentDeviation
from data
where date between '2019-07-01' and '2019-12-31'
group by date
order by date;
Your data is a bit confusing -- it would be less confusing if you had decimal places (that is, 58% being the average of 57% and 58% is not obvious).
Because you want to have NULL values on the first two rows, I'm going to calculate the values using sum() and count():
with q as (
<whatever generates the data you have shown>
)
select q.*,
(sum(sales_rate) over (order by date
rows between 2 preceding and 1 preceding
) /
nullif(count(*) over (order by date
rows between 2 preceding and 1 preceding
)
) as two_month_average
from q;
You could also express this using case and avg():
select q.*,
(case when row_number() over (order by date) > 2)
then avg(sales_rate) over (order by date
rows between 2 preceding and 1 preceding
)
end) as two_month_average
from q;

ORACLE SQL: Find last minimum and maximum consecutive period

I have the sample data set below which list the water meters not working for specific reason for a certain range period (jan 2016 to december 2018).
I would like to have a query that retrieves the last maximum and minimum consecutive period where the meter was not working within that range of period.
any help will be greatly appreciated.
You have two options:
select code, to_char(min_period, 'yyyymm') min_period, to_char(max_period, 'yyyymm') max_period
from (
select code, min(period) min_period, max(period) max_period,
max(min(period)) over (partition by code) max_min_period
from (
select code, period, sum(flag) over (partition by code order by period) grp
from (
select code, period,
case when add_months(period, -1)
= lag(period) over (partition by code order by period)
then 0 else 1 end flag
from (select mrdg_acc_code code, to_date(mrdg_per_period, 'yyyymm') period from t)))
group by code, grp)
where min_period = max_min_period
Explanation:
flag rows where period is not equal previous period plus one month,
create column grp which sums flags consecutively,
group data using code and grp additionaly finding maximal start of period,
show only rows where min_period = max_min_period
Second option is recursive CTE available in Oracle 11g and above:
with
data(period, code) as (
select to_date(mrdg_per_period, 'yyyymm'), mrdg_acc_code from t
where mrdg_per_period between 201601 and 201812),
cte (period, code) as (
select to_char(period, 'yyyymm'), code from data
where (period, code) in (select max(period), code from data group by code)
union all
select to_char(data.period, 'yyyymm'), cte.code
from cte
join data on data.code = cte.code
and data.period = add_months(to_date(cte.period, 'yyyymm'), -1))
select code, min(period) min_period, max(period) max_period
from cte group by code
Explanation:
subquery data filters only rows from 2016 - 2018 additionaly converting period to date format. We need this for function add_months to work.
cte is recursive. Anchor finds starting rows, these with maximum period for each code. After union all is recursive member, which looks for the row one month older than current. If it finds it then net row, if not then stop.
final select groups data. Notice that period which were not consecutive were rejected by cte.
Though recursive queries are slower than traditional ones, there can be scenarios where second solution is better.
Here is the dbfiddle demo for both queries. Good luck.
use aggregate function with group by
select max(mdrg_per_period) mdrg_per_period, mrdg_acc_code,max(mrdg_date_read),rea_Desc,min(mdrg_per_period) not_working_as_from
from tablename
group by mrdg_acc_code,rea_Desc
This is a bit tricky. This is a gap-and-islands problem. To get all continuous periods, it will help if you have an enumeration of months. So, convert the period to a number of months and then subtract a sequence generated using row_number(). The difference is constant for a group of adjacent months.
This looks like:
select acc_code, min(period), max(period)
from (select t.*,
row_number() over (partition by acc_code order by period_num) as seqnum
from (select t.*, floor(period / 100) * 12 + mod(period, 100) as period_num
from t
) t
where rea_desc = 'METER NOT WORKING'
) t
group by (period_num - seqnum);
Then, if you want the last one for each account, you can use a subquery:
select t.*
from (select acc_code, min(period), max(period),
row_number() over (partition by acc_code order by max(period desc) as seqnum
from (select t.*,
row_number() over (partition by acc_code order by period_num) as seqnum
from (select t.*, floor(period / 100) * 12 + mod(period, 100) as period_num
from t
) t
where rea_desc = 'METER NOT WORKING'
) t
group by (period_num - seqnum)
) t
where seqnum = 1;

Same output in two different lateral joins

I'm working on a bit of PostgreSQL to grab the first 10 and last 10 invoices of every month between certain dates. I am having unexpected output in the lateral joins. Firstly the limit is not working, and each of the array_agg aggregates is returning hundreds of rows instead of limiting to 10. Secondly, the aggregates appear to be the same, even though one is ordered ASC and the other DESC.
How can I retrieve only the first 10 and last 10 invoices of each month group?
SELECT first.invoice_month,
array_agg(first.id) first_ten,
array_agg(last.id) last_ten
FROM public.invoice i
JOIN LATERAL (
SELECT id, to_char(invoice_date, 'Mon-yy') AS invoice_month
FROM public.invoice
WHERE id = i.id
ORDER BY invoice_date, id ASC
LIMIT 10
) first ON i.id = first.id
JOIN LATERAL (
SELECT id, to_char(invoice_date, 'Mon-yy') AS invoice_month
FROM public.invoice
WHERE id = i.id
ORDER BY invoice_date, id DESC
LIMIT 10
) last on i.id = last.id
WHERE i.invoice_date BETWEEN date '2017-10-01' AND date '2018-09-30'
GROUP BY first.invoice_month, last.invoice_month;
This can be done with a recursive query that will generate the interval of months for who we need to find the first and last 10 invoices.
WITH RECURSIVE all_months AS (
SELECT date_trunc('month','2018-01-01'::TIMESTAMP) as c_date, date_trunc('month', '2018-05-11'::TIMESTAMP) as end_date, to_char('2018-01-01'::timestamp, 'YYYY-MM') as current_month
UNION
SELECT c_date + interval '1 month' as c_date,
end_date,
to_char(c_date + INTERVAL '1 month', 'YYYY-MM') as current_month
FROM all_months
WHERE c_date + INTERVAL '1 month' <= end_date
),
invocies_with_month as (
SELECT *, to_char(invoice_date::TIMESTAMP, 'YYYY-MM') invoice_month FROM invoice
)
SELECT current_month, array_agg(first_10.id), 'FIRST 10' as type FROM all_months
JOIN LATERAL (
SELECT * FROM invocies_with_month
WHERE all_months.current_month = invoice_month AND invoice_date >= '2018-01-01' AND invoice_date <= '2018-05-11'
ORDER BY invoice_date ASC limit 10
) first_10 ON TRUE
GROUP BY current_month
UNION
SELECT current_month, array_agg(last_10.id), 'LAST 10' as type FROM all_months
JOIN LATERAL (
SELECT * FROM invocies_with_month
WHERE all_months.current_month = invoice_month AND invoice_date >= '2018-01-01' AND invoice_date <= '2018-05-11'
ORDER BY invoice_date DESC limit 10
) last_10 ON TRUE
GROUP BY current_month;
In the code above, '2018-01-01' and '2018-05-11' represent the dates between we want to find the invoices. Based on those dates, we generate the months (2018-01, 2018-02, 2018-03, 2018-04, 2018-05) that we need to find the invoices for.
We store this data in all_months.
After we get the months, we do a lateral join in order to join the invoices for every month. We need 2 lateral joins in order to get the first and last 10 invoices.
Finally, the result is represented as:
current_month - the month
array_agg - ids of all selected invoices for that month
type - type of the selected invoices ('first 10' or 'last 10').
So in the current implementation, you will have 2 rows for each month (if there is at least 1 invoice for that month). You can easily join that in one row if you need to.
LIMIT is working fine. It's your query that's broken. JOIN is just 100% the wrong tool here; it doesn't even do anything close to what you need. By joining up to 10 rows with up to another 10 rows, you get up to 100 rows back. There's also no reason to self join just to combine filters.
Consider instead window queries. In particular, we have the dense_rank function, which can number every row in the result set according to groups:
SELECT
invoice_month,
time_of_month,
ARRAY_AGG(id) invoice_ids
FROM (
SELECT
id,
invoice_month,
-- Categorize as end or beginning of month
CASE
WHEN month_rank <= 10 THEN 'beginning'
WHEN month_reverse_rank <= 10 THEN 'end'
ELSE 'bug' -- Should never happen. Just a fall back in case of a bug.
END AS time_of_month
FROM (
SELECT
id,
invoice_month,
dense_rank() OVER (PARTITION BY invoice_month ORDER BY invoice_date) month_rank,
dense_rank() OVER (PARTITION BY invoice_month ORDER BY invoice_date DESC) month_rank_reverse
FROM (
SELECT
id,
invoice_date,
to_char(invoice_date, 'Mon-yy') AS invoice_month
FROM public.invoice
WHERE invoice_date BETWEEN date '2017-10-01' AND date '2018-09-30'
) AS fiscal_year_invoices
) ranked_invoices
-- Get first and last 10
WHERE month_rank <= 10 OR month_reverse_rank <= 10
) first_and_last_by_month
GROUP BY
invoice_month,
time_of_month
Don't be intimidated by the length. This query is actually very straightforward; it just needed a few subqueries.
This is what it does logically:
Fetch the rows for the fiscal year in question
Assign a "rank" to the row within its month, both counting from the beginning and from the end
Filter out everything that doesn't rank in the 10 top for its month (counting from either direction)
Adds an indicator as to whether it was at the beginning or end of the month. (Note that if there's less than 20 rows in a month, it will categorize more of them as "beginning".)
Aggregate the IDs together
This is the tool set designed for the job you're trying to do. If really needed, you can adjust this approach slightly to get them into the same row, but you have to aggregate before joining the results together and then join on the month; you can't join and then aggregate.

Postgres windowing (determine contiguous days)

Using Postgres 9.3, I'm trying to count the number of contiguous days of a certain weather type. If we assume we have a regular time series and weather report:
date|weather
"2016-02-01";"Sunny"
"2016-02-02";"Cloudy"
"2016-02-03";"Snow"
"2016-02-04";"Snow"
"2016-02-05";"Cloudy"
"2016-02-06";"Sunny"
"2016-02-07";"Sunny"
"2016-02-08";"Sunny"
"2016-02-09";"Snow"
"2016-02-10";"Snow"
I want something count the contiguous days of the same weather. The results should look something like this:
date|weather|contiguous_days
"2016-02-01";"Sunny";1
"2016-02-02";"Cloudy";1
"2016-02-03";"Snow";1
"2016-02-04";"Snow";2
"2016-02-05";"Cloudy";1
"2016-02-06";"Sunny";1
"2016-02-07";"Sunny";2
"2016-02-08";"Sunny";3
"2016-02-09";"Snow";1
"2016-02-10";"Snow";2
I've been banging my head on this for a while trying to use windowing functions. At first, it seems like it should be no-brainer, but then I found out its much harder than expected.
Here is what I've tried...
Select date, weather, Row_Number() Over (partition by weather order by date)
from t_weather
Would it be better just easier to compare the current row to the next? How would you do that while maintaining a count? Any thoughts, ideas, or even solutions would be helpful!
-Kip
You need to identify the contiguous where the weather is the same. You can do this by adding a grouping identifier. There is a simple method: subtract a sequence of increasing numbers from the dates and it is constant for contiguous dates.
One you have the grouping, the rest is row_number():
Select date, weather,
Row_Number() Over (partition by weather, grp order by date)
from (select w.*,
(date - row_number() over (partition by weather order by date) * interval '1 day') as grp
from t_weather w
) w;
The SQL Fiddle is here.
I'm not sure what the query engine is going to do when scanning multiple times across the same data set (kinda like calculating area under a curve), but this works...
WITH v(date, weather) AS (
VALUES
('2016-02-01'::date,'Sunny'::text),
('2016-02-02','Cloudy'),
('2016-02-03','Snow'),
('2016-02-04','Snow'),
('2016-02-05','Cloudy'),
('2016-02-06','Sunny'),
('2016-02-07','Sunny'),
('2016-02-08','Sunny'),
('2016-02-09','Snow'),
('2016-02-10','Snow') ),
changes AS (
SELECT date,
weather,
CASE WHEN lag(weather) OVER () = weather THEN 1 ELSE 0 END change
FROM v)
SELECT date
, weather
,(SELECT count(weather) -- number of times the weather didn't change
FROM changes v2
WHERE v2.date <= v1.date AND v2.weather = v1.weather
AND v2.date >= ( -- bounded between changes of weather
SELECT max(date)
FROM changes v3
WHERE change = 0
AND v3.weather = v1.weather
AND v3.date <= v1.date) --<-- here's the expensive part
) curve
FROM changes v1
Here is another approach based off of this answer.
First we add a change column that is 1 or 0 depending on whether the weather is different or not from the previous day.
Then we introduce a group_nr column by summing the change over an order by date. This produces a unique group number for each sequence of consecutive same-weather days since the sum is only incremented on the first day of each sequence.
Finally we do a row_number() over (partition by group_nr order by date) to produce the running count per group.
select date, weather, row_number() over (partition by group_nr order by date)
from (
select *, sum(change) over (order by date) as group_nr
from (
select *, (weather != lag(weather,1,'') over (order by date))::int as change
from tmp_weather
) t1
) t2;
sqlfiddle (uses equivalent WITH syntax)
You can accomplish this with a recursive CTE as follows:
WITH RECURSIVE CTE_ConsecutiveDays AS
(
SELECT
my_date,
weather,
1 AS consecutive_days
FROM My_Table T
WHERE
NOT EXISTS (SELECT * FROM My_Table T2 WHERE T2.my_date = T.my_date - INTERVAL '1 day' AND T2.weather = T.weather)
UNION ALL
SELECT
T.my_date,
T.weather,
CD.consecutive_days + 1
FROM
CTE_ConsecutiveDays CD
INNER JOIN My_Table T ON
T.my_date = CD.my_date + INTERVAL '1 day' AND
T.weather = CD.weather
)
SELECT *
FROM CTE_ConsecutiveDays
ORDER BY my_date;
Here's the SQL Fiddle to test: http://www.sqlfiddle.com/#!15/383e5/3

Aggregates for today and the previous day depending on data

Having trouble putting together a query to pull the aggregate values of a give timestamp and the timestamp before it. Given the following schema:
name TEXT,
ts TIMESTAMP,
X NUMERIC,
Y NUMERIC
where there are gaps in the ts column due to gaps in data, I'm trying to construct a query to produce
name,
date_trunc('day' q1.ts),
avg(q1.X),
sum(q2.Y),
date_trunc('day', q2.ts),
avg(q2.X),
sum(q2.Y)
The first half is straightforward:
SELECT q1.name, date_trunc('day', q1.ts), avg(q1.X), sum(q1.Y)
FROM data as q1
GROUP BY 1, 2
ORDER BY 1, 2;
But not sure how to generate the relation to find the "day" before for each row. I'm trying to work an inner join like this:
SELECT q1.name, q1.day, q1.avg, q1.sum, q2.day, q2.avg, q2.sum
FROM (
SELECT name, date_trunc('day', ts) AS day, avg(X) AS avg, sum(Y) as sum
FROM data
GROUP BY 1,2
ORDER BY 1,2
) q1 INNER JOIN (
SELECT name, date_trunc('day', ts) AS day, avg(X) AS avg, sum(Y) as sum
FROM data
GROUP BY 1,2
ORDER BY 1,2
) q2 ON (
q1.name = q2.name
AND q2.day = q1.day - interval '1 day'
);
The problem with this is, it doesn't cover the cases when the next "day" is more than 1 day before the current day.
The special difficulty here is that you need to number days after aggregating rows. You can do this in a single query level with the window function row_number(), since window functions are applied after aggregation by GROUP BY.
Also, use a CTE to avoid executing the same subquery multiple times:
WITH q AS (
SELECT name, ts::date AS day
,avg(x) AS avg_x, sum(y) AS sum_y
,row_number() OVER (PARTITION BY name ORDER BY ts::date) AS rn
FROM data
GROUP BY 1,2
)
SELECT q1.name, q1.day, q1.avg_x, q1.sum_y
,q2.day AS day2, q2.avg_x AS avg_x2, q2.sum_y AS sum_y2
FROM q q1
LEFT JOIN q q2 ON q1.name = q2.name
AND q1.rn = q2.rn + 1
ORDER BY 1,2;
Using the simpler cast to date (ts::date) instead of date_trunc('day', ts) to get "days".
LEFT [OUTER] JOIN (as opposed to [INNER] JOIN) is instrumental to preserve the corner case of the first row, where there is no previous day.
And ORDER BY should be applied to the outer query.
The question isn't crystal clear, but it sounds like you're actually trying to fill gaps while keeping track of leading/lagging rows.
To fill the gaps, look into generate_series() and left join it with your table:
select d
from generate_series(timestamp '2013-12-01', timestamp '2013-12-31', interval '1 day') d;
http://www.postgresql.org/docs/current/static/functions-srf.html
For previous and next row values, look into lead() and lag() window functions:
select date_trunc('day', ts) as curr_row_day,
lag(date_trunc('day', ts)) over w as prev_row_day
from data
window w as (order by ts)
http://www.postgresql.org/docs/current/static/tutorial-window.html