Finding the highest after grouping by month

Finding the highest after grouping by month - sql

In postgres, I want to output the persons who have the highest no. of "discussed" requests for each month, irrespective of the year i.e. there should be 12 outputs.
ID PERSON REQUEST DATE
4 datanoise opened 2010-09-02
5 marsuboss opened 2010-09-02
6 m3talsmith opened 2010-09-06
7 sferik opened 2010-09-08
8 sferik opened 2010-09-09
8 dtrasbo discussed 2010-09-09
8 brianmario discussed 2010-09-09
8 sferik discussed 2010-09-09
9 rsim opened 2011-09-09
.....more tuples to follow
*This is just a small part of the databse. also assume that the dataset is big enough that all months are represented in the date column.

Test data:
CREATE TEMPORARY TABLE foo( id SERIAL PRIMARY KEY, name INTEGER NOT NULL,
dt DATE NULL, request BOOL NOT NULL );
INSERT INTO foo (name,dt,request) SELECT random()*1000,
'2010-01-01'::DATE+('1 DAY'::INTERVAL)*(random()*3650), random()>0.5
FROM generate_series(1,100000) n;
SELECT * FROM foo LIMIT 10;
id | name | dt | request
----+------+------------+---------
1 | 110 | 2014-11-05 | f
2 | 747 | 2015-03-12 | t
3 | 604 | 2014-09-26 | f
4 | 211 | 2011-12-14 | t
5 | 588 | 2016-12-15 | f
6 | 96 | 2012-02-19 | f
7 | 17 | 2018-09-18 | t
8 | 591 | 2018-02-15 | t
9 | 370 | 2015-07-28 | t
10 | 844 | 2019-05-16 | f
Now you have to get the count per name and month, then get the max count, but that won't give you the name that has the maximum, which requires joining back with the previous result. In order to do the group by only once, it is done in a CTE:
WITH totals AS (
SELECT EXTRACT(month FROM dt) mon, name, count(*) cnt FROM foo
WHERE request=true GROUP BY name,mon
)
SELECT * FROM
(SELECT mon, max(cnt) cnt FROM totals GROUP BY mon) x
JOIN totals USING (mon,cnt);
If several names have the same maximum count, they will be returned both. To keep only one, you can use DISTICT ON:
WITH (same as above)
SELECT DISTINCT ON (mon) * FROM
(SELECT mon, max(cnt) cnt FROM totals GROUP BY mon) x
JOIN totals USING (mon,cnt) ORDER BY mon,name;
You can also use DISTINCT ON to keep only one row per month, specified by the ORDER clause, in this cas by count desc, so it keeps the highest count.
SELECT DISTINCT ON (mon) * FROM (
SELECT EXTRACT(month FROM dt) mon, name, count(*) cnt FROM foo
WHERE request=true GROUP BY name,mon
)x ORDER BY mon, cnt DESC;
...or you could hack an argmax() function by sticking the primary key into an array passed to max(), which means it will return the id of the row which has the maximum value:
SELECT mon, cntid[1] cnt, name FROM
(SELECT mon, max(ARRAY[cnt,id]) cntid FROM (
SELECT EXTRACT(month FROM dt) mon, name, count(*) cnt, min(id) id FROM foo
WHERE request=true GROUP BY name,mon
) x GROUP BY mon)y
JOIN foo ON (foo.id=cntid[2]);
Which one will be faster?...

given your table is named t01 and the colum date is date1 (and in string format):
create temp table t02 as
select extract(month from CAST(date1 as date)) as month, person, count(*) nb from t01 where request = 'discussed' group by 1, 2 ;
create temp table t03 as
select month, max(nb) max_nb from t02 group by 1 ;
the result is :
select month , person from t02 a natural join t03 b where a.nb = b.max_nb;
https://rextester.com/BYMM84335[ : run here]1

I would recommend distinct on. If you want to combine all the months into a single "uber-month":
select distinct on (extract(month from date)) person, extract(month from date), count(*) as num_discussed
from t
where request = 'discussed'
group by person, extract(month from date)
order by extract(month from date), num_discussed desc;
Distinct on is a very handy Postgres extension. It returns on row per "group", which is defined by the expressions in parentheses. The row is the "first" one determined by the order by clause.
If you want the highest month regardless of year:
select distinct on (extract(month from date)) person, date_trunc('month', date), count(*) as num_discussed
from t
where request = 'discussed'
group by person, date_trunc('month', date)
order by extract(month from date), num_discussed desc;

Related

sum last 7 days of sales in new column

I have the following data set:
I want to create a new column that sums the last 7 days of sales. So the query result should look be the following:
Pls help
Thanks!

In standard SQL, you would use a window function -- assuming you have data for each day:
select t.*,
sum(sales) over (partition by itemid order by date rows between 6 preceding and current row) as sales_7
from t;

use sum() aggregate function and group by
select country,itemid,year,monthnumber,week sum(sales) as sales_last_7days from your_table
where date>=DATEADD(day, -7, getdate()) and date< getdate()
group by country,itemid,year,monthnumber,week

with window:
select (list other columns here), sum(sum(sales)) over
(partition by week
order by day
rows between 6 preceding and current row)
from table
group by date, week;
note that week doesen't change group by beacause a date is reffered to one week only, but it is needed in window.

Seems you are working with SQL Server if so, then you can use apply :
select t.*, t1.[last7day]
from table t outer apply
(select sum(t1.sales) as [last7day]
from table t1
where t.itemid = t1.itemid and
t1.date <= dateadd(day, -6, t.dt)
) t1;

If you don't have exactly one day for each row, for example if you have a list of transactions...
The below example completely confused me the first time I saw it, so I've tried to comment as much as I can to explain what's happening.
Suppose we have a table tbl with date column dt and amount column amt, and for each date in tbl we want to return a rolling sum of the amount from the current day and the past 6 days.
select distinct -- see note after code on what this distinct is doing.
dt
, ( -- Has to be in brackets to denote we're returning 1 value per row.
-- for each row of T1:
select sum(b.amt) -- the sum of amounts in T2. The where clause will restrict which rows in T2 will be summed.
from tbl T2
where T2.dt between T1.dt - 6 and T1.dt -- for each row in T1, give me all rows in T2 where the date is between 6 days before this T1 row's date and T1 row's date, giving us our rolling sum
-- WARNING: CHECK YOUR VERSION OF SQL FOR HOW TO SUBTRACT DAYS FROM A DATE, I'VE MADE IT (T1.dt - 6) FOR SIMPLICITY
-- we don't need a group by, because we're returning one value for each row in T1
)
from tbl T1
We have a main version of tbl, aliased T1. We then have a secondary table, aliased T2. For each row in T1, we're going to ask for a set of rows in T2 that we're going to sum before giving it to our main query.
To understand what's happening, run the code without the distinct. You'll notice that we have the same number of rows as in tbl, because the T2 statement is happening for every row in T1.
Notes:
If you have any days for which no rows exist in your table you will not get a calculation for this day. To be certain this doesn't happen, join your table to a table containing a distinct list of consecutive dates, and use this as your date column.
If you have nulls in your amount column the calculation will still work, but if the rolling average contains only nulls you will have null instead of 0 as your result. If that troubles you convert all your nulls to zero's before (or after) you use the query.
The beginning of the period will have a 'ramp up'. But this would be the same whatever method you use to do a rolling sum. If it bothers you, don't return the first 6 days.
Finally a worked example if you're playing along at home using SQL Server:
with tbl as (
-- a list of transactions from 1.10.2019 to 14.10.2019
select cast('2019-10-01' as date) dt, 1 amt
union select cast('2019-10-02' as date), 4
union select cast('2019-10-01' as date), 10
union select cast('2019-10-03' as date), 3
union select cast('2019-10-04' as date), 20
union select cast('2019-10-04' as date), 2
union select cast('2019-10-04' as date), 12
union select cast('2019-10-04' as date), 17
union select cast('2019-10-05' as date), null -- a whole week of null values because we all had the week off... I hope this data wasn't important
union select cast('2019-10-06' as date), null
union select cast('2019-10-07' as date), null
union select cast('2019-10-08' as date), null
union select cast('2019-10-09' as date), null
union select cast('2019-10-10' as date), null
union select cast('2019-10-10' as date), null
union select cast('2019-10-10' as date), null
union select cast('2019-10-11' as date), null
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-13' as date), 2
union select cast('2019-10-14' as date), 1000
)
select distinct
a.dt
, (
select sum(b.amt)
from tbl b
where b.dt between dateadd(dd, -6, a.dt) and a.dt
) past_7_days_amt
from tbl a
Returns:
+------------+-----------------+
| dt | past_7_days_amt |
+------------+-----------------+
| 2019-10-01 | 11 |
| 2019-10-02 | 15 |
| 2019-10-03 | 18 |
| 2019-10-04 | 69 |
| 2019-10-05 | 69 |
| 2019-10-06 | 69 |
| 2019-10-07 | 69 |
| 2019-10-08 | 58 |
| 2019-10-09 | 54 |
| 2019-10-10 | 51 |
| 2019-10-11 | NULL |
| 2019-10-12 | 1 |
| 2019-10-13 | 3 |
| 2019-10-14 | 1003 |
+------------+-----------------+

Combining multiple scalar bigquery queries into a single query to generate one table

I have a BiqQuery query that basically takes a date as a parameter and calculates the number of active users our app had near that date.
Right now, if I want to make a graph over a year of active users, I have to run the query 12 times (once per month) and collate the results manually, which is error-prone and time consuming.
Is there a way to make a single bigquery query that runs the subquery 12 times and puts the results on 12 different rows?
For example, if my query is
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-01-01'
How can I get a table like
| Date | Count |
|------------|---------|
| 2017-01-01 | 50000 |
| 2017-02-01 | 40000 |
| 2017-03-01 | 30000 |
| 2017-04-01 | 20000 |
| 2017-05-01 | 10000 |

Supposing that you have a column called date and one called user_id and you want to calculate distinct users on a monthly basis, you can run a query such as:
#standardSQL
SELECT
DATE_TRUNC(date, MONTH) AS month,
COUNT(DISTINCT user_id) AS distinct_users
FROM YourTable
GROUP BY month
ORDER BY month ASC;
(Here you can replace YourTable with the subquery that you want to run). As a self-contained example:
#standardSQL
WITH YourTable AS (
SELECT DATE '2017-06-25' AS date, 10 AS user_id UNION ALL
SELECT DATE '2017-05-04', 11 UNION ALL
SELECT DATE '2017-06-20', 10 UNION ALL
SELECT DATE '2017-04-01', 11 UNION ALL
SELECT DATE '2017-06-02', 12 UNION ALL
SELECT DATE '2017-04-13', 10
)
SELECT
DATE_TRUNC(date, MONTH) AS month,
COUNT(DISTINCT user_id) AS distinct_users
FROM YourTable
GROUP BY month
ORDER BY month ASC;

Elliot taught me UNION ALL and it seemed to do the trick:
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-01-01'
UNION ALL
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-02-01'
UNION ALL
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-03-01'
Maybe there's a nicer way to parameterize the dates in the WHERE clause, but this did the trick for me.

Querying for an ID that has the most number of reads

Suppose I have a table like the one below:
+----+-----------+
| ID | TIME |
+----+-----------+
| 1 | 12-MAR-15 |
| 2 | 23-APR-14 |
| 2 | 01-DEC-14 |
| 1 | 01-DEC-15 |
| 3 | 05-NOV-15 |
+----+-----------+
What I want to do is for each year ( the year is defined as DATE), list the ID that has the highest count in that year. So for example, ID 1 occurs the most in 2015, ID 2 occurs the most in 2014, etc.
What I have for a query is:
SELECT EXTRACT(year from time) "YEAR", COUNT(ID) "ID"
FROM table
GROUP BY EXTRACT(year from time)
ORDER BY COUNT(ID) DESC;
But this query just counts how many times a year occurs, how do I fix it to highest count of an ID in that year?
Output:
+------+----+
| YEAR | ID |
+------+----+
| 2015 | 2 |
| 2012 | 2 |
+------+----+
Expected Output:
+------+----+
| YEAR | ID |
+------+----+
| 2015 | 1 |
| 2014 | 2 |
+------+----+

Starting with your sample query, the first change is simply to group by the ID as well as by the year.
SELECT EXTRACT(year from time) "YEAR" , id, COUNT(*) "TOTAL"
FROM table
GROUP BY EXTRACT(year from time), id
ORDER BY EXTRACT(year from time) DESC, COUNT(*) DESC
With that, you could find the rows you want by visual inspection (the first row for each year is the ID with the most rows).
To have the query just return the rows with the highest totals, there are several different ways to do it. You need to consider what you want to do if there are ties - do you want to see all IDs tied for highest in a year, or just an arbitrary one?
Here is one approach - if there is a tie, this should return just the lowest of the tied IDs:
WITH groups AS (
SELECT EXTRACT(year from time) "YEAR" , id, COUNT(*) "TOTAL"
FROM table
GROUP BY EXTRACT(year from time), id
)
SELECT year, MIN(id) KEEP (DENSE_RANK FIRST ORDER BY total DESC)
FROM groups
GROUP BY year
ORDER BY year DESC

You need to count per id and then apply a RANK on that count:
SELECT *
FROM
(
SELECT EXTRACT(year from time) "YEAR" , ID, COUNT(*) AS cnt
, RANK() OVER (PARTITION BY "YEAR" ORDER BY COUNT(*) DESC) AS rnk
FROM table
GROUP BY EXTRACT(year from time), ID
) dt
WHERE rnk = 1
If this return multiple rows with the same high count per year and you want just one of them randomly, you can switch to a ROW_NUMBER.

This should do what you're after, I think:
with sample_data as (select 1 id, to_date('12/03/2015', 'dd/mm/yyyy') time from dual union all
select 2 id, to_date('23/04/2014', 'dd/mm/yyyy') time from dual union all
select 2 id, to_date('01/12/2014', 'dd/mm/yyyy') time from dual union all
select 1 id, to_date('01/12/2015', 'dd/mm/yyyy') time from dual union all
select 3 id, to_date('05/11/2015', 'dd/mm/yyyy') time from dual)
-- End of creating a subquery to mimick a table called "sample_data" containing your input data.
-- See SQL below:
select yr,
id most_frequent_id,
cnt_id_yr cnt_of_most_freq_id
from (select to_char(time, 'yyyy') yr,
id,
count(*) cnt_id_yr,
dense_rank() over (partition by to_char(time, 'yyyy') order by count(*) desc) dr
from sample_data
group by to_char(time, 'yyyy'),
id)
where dr = 1;
YR MOST_FREQUENT_ID CNT_OF_MOST_FREQ_ID
---- ---------------- -------------------
2014 2 2
2015 1 2

Select distinct users group by time range

I have a table with the following info
|date | user_id | week_beg | month_beg|
SQL to create table with test values:
CREATE TABLE uniques
(
date DATE,
user_id INT,
week_beg DATE,
month_beg DATE
)
INSERT INTO uniques VALUES ('2013-01-01', 1, '2012-12-30', '2013-01-01')
INSERT INTO uniques VALUES ('2013-01-03', 3, '2012-12-30', '2013-01-01')
INSERT INTO uniques VALUES ('2013-01-06', 4, '2013-01-06', '2013-01-01')
INSERT INTO uniques VALUES ('2013-01-07', 4, '2013-01-06', '2013-01-01')
INPUT TABLE:
| date | user_id | week_beg | month_beg |
| 2013-01-01 | 1 | 2012-12-30 | 2013-01-01 |
| 2013-01-03 | 3 | 2012-12-30 | 2013-01-01 |
| 2013-01-06 | 4 | 2013-01-06 | 2013-01-01 |
| 2013-01-07 | 4 | 2013-01-06 | 2013-01-01 |
OUTPUT TABLE:
| date | time_series | cnt |
| 2013-01-01 | D | 1 |
| 2013-01-01 | W | 1 |
| 2013-01-01 | M | 1 |
| 2013-01-03 | D | 1 |
| 2013-01-03 | W | 2 |
| 2013-01-03 | M | 2 |
| 2013-01-06 | D | 1 |
| 2013-01-06 | W | 1 |
| 2013-01-06 | M | 3 |
| 2013-01-07 | D | 1 |
| 2013-01-07 | W | 1 |
| 2013-01-07 | M | 3 |
I want to calculate the number of distinct user_id's for a date:
For that date
For that week up to that date (Week to date)
For the month up to that date (Month to date)
1 is easy to calculate.
For 2 and 3 I am trying to use such queries:
SELECT
date,
'W' AS "time_series",
(COUNT DISTINCT user_id) COUNT (user_id) OVER (PARTITION BY week_beg) AS "cnt"
FROM user_subtitles
SELECT
date,
'M' AS "time_series",
(COUNT DISTINCT user_id) COUNT (user_id) OVER (PARTITION BY month_beg) AS "cnt"
FROM user_subtitles
Postgres does not allow window functions for DISTINCT calculation, so this approach does not work.
I have also tried out a GROUP BY approach, but it does not work as it gives me numbers for whole week/months.
Whats the best way to approach this problem?

Count all rows
SELECT date, '1_D' AS time_series, count(DISTINCT user_id) AS cnt
FROM uniques
GROUP BY 1
UNION ALL
SELECT DISTINCT ON (1)
date, '2_W', count(*) OVER (PARTITION BY week_beg ORDER BY date)
FROM uniques
UNION ALL
SELECT DISTINCT ON (1)
date, '3_M', count(*) OVER (PARTITION BY month_beg ORDER BY date)
FROM uniques
ORDER BY 1, time_series
Your columns week_beg and month_beg are 100 % redundant and can easily be replaced by
date_trunc('week', date + 1) - 1 and date_trunc('month', date) respectively.
Your week seems to start on Sunday (off by one), therefore the + 1 .. - 1.
The default frame of a window function with ORDER BY in the OVER clause uses is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. That's exactly what you need.
Use UNION ALL, not UNION.
Your unfortunate choice for time_series (D, W, M) does not sort well, I renamed to make the final ORDER BY easier.
This query can deal with multiple rows per day. Counts include all peers for a day.
More about DISTINCT ON:
Select first row in each GROUP BY group?
DISTINCT users per day
To count every user only once per day, use a CTE with DISTINCT ON:
WITH x AS (SELECT DISTINCT ON (1,2) date, user_id FROM uniques)
SELECT date, '1_D' AS time_series, count(user_id) AS cnt
FROM x
GROUP BY 1
UNION ALL
SELECT DISTINCT ON (1)
date, '2_W'
,count(*) OVER (PARTITION BY (date_trunc('week', date + 1)::date - 1)
ORDER BY date)
FROM x
UNION ALL
SELECT DISTINCT ON (1)
date, '3_M'
,count(*) OVER (PARTITION BY date_trunc('month', date) ORDER BY date)
FROM x
ORDER BY 1, 2
DISTINCT users over dynamic period of time
You can always resort to correlated subqueries. Tend to be slow with big tables!
Building on the previous queries:
WITH du AS (SELECT date, user_id FROM uniques GROUP BY 1,2)
,d AS (
SELECT date
,(date_trunc('week', date + 1)::date - 1) AS week_beg
,date_trunc('month', date)::date AS month_beg
FROM uniques
GROUP BY 1
)
SELECT date, '1_D' AS time_series, count(user_id) AS cnt
FROM du
GROUP BY 1
UNION ALL
SELECT date, '2_W', (SELECT count(DISTINCT user_id) FROM du
WHERE du.date BETWEEN d.week_beg AND d.date )
FROM d
GROUP BY date, week_beg
UNION ALL
SELECT date, '3_M', (SELECT count(DISTINCT user_id) FROM du
WHERE du.date BETWEEN d.month_beg AND d.date)
FROM d
GROUP BY date, month_beg
ORDER BY 1,2;
SQL Fiddle for all three solutions.
Faster with dense_rank()
#Clodoaldo came up with a major improvement: use the window function dense_rank(). Here is another idea for an optimized version. It should be even faster to exclude daily duplicates right away. The performance gain grows with the number of rows per day.
Building on a simplified and sanitized data model
- without the redundant columns
- day as column name instead of date
date is a reserved word in standard SQL and a basic type name in PostgreSQL and shouldn't be used as identifier.
CREATE TABLE uniques(
day date -- instead of "date"
,user_id int
);
Improved query:
WITH du AS (
SELECT DISTINCT ON (1, 2)
day, user_id
,date_trunc('week', day + 1)::date - 1 AS week_beg
,date_trunc('month', day)::date AS month_beg
FROM uniques
)
SELECT day, count(user_id) AS d, max(w) AS w, max(m) AS m
FROM (
SELECT user_id, day
,dense_rank() OVER(PARTITION BY week_beg ORDER BY user_id) AS w
,dense_rank() OVER(PARTITION BY month_beg ORDER BY user_id) AS m
FROM du
) s
GROUP BY day
ORDER BY day;
SQL Fiddle demonstrating the performance of 4 faster variants. It depends on your data distribution which is fastest for you.
All of them are about 10x as fast as the correlated subqueries version (which isn't bad for correlated subqueries).

Without correlated subqueries. SQL Fiddle
with u as (
select
"date", user_id,
date_trunc('week', "date" + 1)::date - 1 week_beg,
date_trunc('month', "date")::date month_beg
from uniques
)
select
"date", count(distinct user_id) D,
max(week_dr) W, max(month_dr) M
from (
select
user_id, "date",
dense_rank() over(partition by week_beg order by user_id) week_dr,
dense_rank() over(partition by month_beg order by user_id) month_dr
from u
) s
group by "date"
order by "date"

Try
SELECT
*
FROM
(
SELECT dates, count(user_id), 'D' as timesereis FROM users_data GROUP BY dates
UNION
SELECT max(dates), count(user_id), 'W' FROM users_data GROUP BY date_part('year',dates)+date_part('week',dates)
UNION
SELECT max(dates), count(user_id), 'M' FROM users_data GROUP BY date_part('year',dates)+date_part('week',dates)
) tEMP order by dates, timesereis
SQLFIDDLE

Try queries like this
SELECT count(distinct user_id), date_format(date, '%Y-%m-%d') as date_period
FROM uniques
GROUP By date_period

Need to find Average of top 3 records grouped by ID in SQL

I have a postgres table with customer ID's, dates, and integers. I need to find the average of the top 3 records for each customer ID that have dates within the last year. I can do it with a single ID using the SQL below (id is the customer ID, weekending is the date, and maxattached is the integer).
One caveat: the maximum values are per month, meaning we're only looking at the highest value in a given month to create our dataset, thus why we're extracting month from the date.
SELECT
id,
round(avg(max),0)
FROM
(
select
id,
extract(month from weekending) as month,
extract(year from weekending) as year,
max(maxattached) as max
FROM
myTable
WHERE
weekending >= now() - interval '1 year' AND
id=110070 group by id,month,year
ORDER BY
max desc limit 3
) AS t
GROUP BY id;
How can I expand this query to include all ID's and a single averaged number for each one?
Here is some sample data:
ID | MaxAttached | Weekending
110070 | 5 | 2011-11-10
110070 | 6 | 2011-11-17
110071 | 4 | 2011-11-10
110071 | 7 | 2011-11-17
110070 | 3 | 2011-12-01
110071 | 8 | 2011-12-01
110070 | 5 | 2012-01-01
110071 | 9 | 2012-01-01
So, for this sample table, I would expect to receive the following results:
ID | MaxAttached
110070 | 5
110071 | 8
This averages the highest value in a given month for each ID (6,3,5 for 110070 and 7,8,9 for 110071)
Note: postgres version 8.1.15

First - get the max(maxattached) for every customer and month:
SELECT id,
max(maxattached) as max_att
FROM myTable
WHERE weekending >= now() - interval '1 year'
GROUP BY id, date_trunc('month',weekending);
Next - for every customer rank all his values:
SELECT id,
max_att,
row_number() OVER (PARTITION BY id ORDER BY max_att DESC) as max_att_rank
FROM <previous select here>;
Next - get the top 3 for every customer:
SELECT id,
max_att
FROM <previous select here>
WHERE max_att_rank <= 3;
Next - get the avg of the values for every customer:
SELECT id,
avg(max_att) as avg_att
FROM <previous select here>
GROUP BY id;
Next - just put all the queries together and rewrite/simplify them for your case.
UPDATE: Here is an SQLFiddle with your test data and the queries: SQLFiddle.
UPDATE2: Here is the query, that will work on 8.1 :
SELECT customer_id,
(SELECT round(avg(max_att),0)
FROM (SELECT max(maxattached) as max_att
FROM table1
WHERE weekending >= now() - interval '2 year'
AND id = ct.customer_id
GROUP BY date_trunc('month',weekending)
ORDER BY max_att DESC
LIMIT 3) sub
) as avg_att
FROM customer_table ct;
The idea - to take your initial query and run it for every customer (customer_table - table with all unique id for customers).
Here is SQLFiddle with this query: SQLFiddle.
Only tested on version 8.3 (8.1 is too old to be on SQLFiddle).

8.3 version
8.3 is the oldest version I've got access to, so I can't guarantee it'll work in 8.1
I'm using a temporary table to work out the best three records.
CREATE TABLE temp_highest_per_month as
select
id,
extract(month from weekending) as month,
extract(year from weekending) as year,
max(maxattached) as max_in_month,
0 as priority
FROM
myTable
WHERE
weekending >= now() - interval '1 year'
group by id,month,year;
UPDATE temp_highest_per_month t
SET priority =
(select count(*) from temp_highest_per_month t2
where t2.id = t.id and
(t.max_in_month < t2.max_in_month or
(t.max_in_month= t2.max_in_month and
t.year * 12 + t.month > t2.year * 12 + t.month)));
select id,round(avg(max_in_month),0)
from temp_highest_per_month
where priority <= 3
group by id;
The year & month are included in the working out the priority so that if two months have the same maximum, they'll still be included in the numbering correctly.
9.1 version
Similar to Igor's answer, but I used the With clause to split the steps.
with highest_per_month as
( select
id,
extract(month from weekending) as month,
extract(year from weekending) as year,
max(maxattached) as max_in_month
FROM
myTable
WHERE
weekending >= now() - interval '1 year'
group by id,month,year),
prioritised as
( select id, month, year, max_in_month,
row_number() over (partition by id, month, year
order by max_in_month desc)
as priority
from highest_per_month
)
select id, round(avg(max_in_month),0)
from prioritised
where priority <= 3
group by id;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Finding the highest after grouping by month - sql

Related

sum last 7 days of sales in new column

Combining multiple scalar bigquery queries into a single query to generate one table

Querying for an ID that has the most number of reads

Select distinct users group by time range

Need to find Average of top 3 records grouped by ID in SQL

Categories

Resources