BigQuery Running Count of Unique ID per Year - sql

I found a bunch of similar questions but not addressing this one specifically (correct me if I'm wrong).
I am trying---on BigQuery---to index each row on a table with the running count of user per year using an analytical function.
So with:
with dataset as (
select 'A' as user, '2020' as year, RAND() as some_value
union all
select 'A' as user, '2020' as year, RAND() as some_value
union all
select 'B' as user, '2020' as year, RAND() as some_value
union all
select 'B' as user, '2020' as year, RAND() as some_value
union all
select 'B' as user, '2020' as year, RAND() as some_value
union all
select 'C' as user, '2020' as year, RAND() as some_value
union all
select 'C' as user, '2020' as year, RAND() as some_value
union all
select 'A' as user, '2021' as year, RAND() as some_value
union all
select 'A' as user, '2021' as year, RAND() as some_value
union all
select 'B' as user, '2021' as year, RAND() as some_value
union all
select 'C' as user, '2021' as year, RAND() as some_value
union all
select 'C' as user, '2021' as year, RAND() as some_value
union all
select 'C' as user, '2021' as year, RAND() as some_value
union all
select 'C' as user, '2021' as year, RAND() as some_value
union all
select 'C' as user, '2021' as year, RAND() as some_value
)
I would like to get:
rcount | user | year | some_value
1 | A | 2020 | 0.2365421124968884
1 | A | 2020 | 0.21087749308191206
2 | B | 2020 | 0.6096882013526258
2 | B | 2020 | 0.8544447727632739
2 | B | 2020 | 0.6113604025541309
3 | C | 2020 | 0.5803237472480643
3 | C | 2020 | 0.165305669127888
1 | A | 2021 | 0.1200575362708826
1 | A | 2021 | 0.015721175944171915
2 | B | 2021 | 0.21890252010457295
3 | C | 2021 | 0.5087613385277634
3 | C | 2021 | 0.9949262690813603
3 | C | 2021 | 0.50824183164116
3 | C | 2021 | 0.8262428736484341
3 | C | 2021 | 0.6866964737106948
I tried :
count(user) over (partition by year,user )
I also tried using ranges like order by year range between unbounded preceding and current row
and row_count()
I have no idea where to tap for a solution now.

A simpler solution would be to use DENSE_RANK:
SELECT
DENSE_RANK() OVER (PARTITION BY year ORDER BY user) as rcount,
user,
year,
some_value
FROM dataset
Information about DENSE_RANK can be found here.

Try the following:
select user
, year
, some_value
, sum(count) over (partition by year order by year, user ROWS UNBOUNDED PRECEDING) as rcount
from (
select user
, year
, some_value
, IF(lag(user,1) OVER (order by year,user)=user,0,1) count
from dataset
)
The sub-select is defining the logic of whether to count the record or not based on what the previous row was, then we simply perform a sum with the outer select.

Related

Get detail days between two date (mysql query)

I have data like this:
id | start_date | end_date
----------------------------
1 | 16-09-2019 | 22-12-2019
I want to get the following results:
id | month | year | days
------------------------
1 | 09 | 2019 | 15
1 | 10 | 2019 | 31
1 | 11 | 2019 | 30
1 | 12 | 2019 | 22
Is there a way to get that result ?
This is what you want to do:
SELECT id, EXTRACT(MONTH FROM start_date ) as month , EXTRACT(YEAR FROM start_date ) as year , DATEDIFF(end_date, start_date ) as days
From tbl
You can use MONTH() , YEAR() and DATEDIFF() functions
SELECT id, MONTH(start_date) as month, YEAR(start_date ) as year, DATEDIFF(end_date, start_date ) as days from table-name
One way is to create a Calendar table and use that.
select month,year, count(*)
from Calendar
where db_date between '2019-09-16'
and '2019-12-22'
group by month,year
CHECK DEMO HERE
Also you can use recursive CTE to achieve the same.
You can use a recursive CTE and aggregation:
with recursive cte as (
select id, start_date, end_date
from t
union all
select id, start_date + interval 1 day, end_date
from cte
where start_date < end_date
)
select id, year(start_date), month(start_date), count(*) as days
from cte
group by id, year(start_date), month(start_date);
Here is a db<>fiddle.

SQL: How to create a weekly user count summary by month

I’m trying to create a week over week active user count summary report/table aggregated by month. I have one table for June 2017 and one table for May 2016 which I need to join together in order to. The date timestamp is created_utc which is a UNIX timestamp which I can figure out to transform into a human-readable format and from there extract the week of the year value so 1 through 52. The questions I have are:
Number the weeks just by values of 1 through 4. So, week 1 for June, Week 1 for May, Week 2 for June week 2 for May and so on.
Joining the tables based by those weeks 1 through 4 values
Pivoting the table and adding a WOW Change variable.
I'd like the final table to look like this:
W
| Week | June_count | May_count |WOW_Change |
|:-----------|:-----------:|:------------:|:----------:
| Week_1 | 5 | 8 | 0.6 |
| Week_2 | 2 | 1 | -0.5 |
| Week_3 | 10 | 5 | -0.5 |
| Week_4 | 30 | 6 | 1 |
Below is some sample data as well as the code I've started.
CREATE TABLE June
(created_utc int, id varchar(6))
;
INSERT INTO June
(created_utc, userid)
VALUES
(1496354167, '6eq4xf'),
(1496362973, '6eqzz3'),
(1496431934, '6ewlm8'),
(1496870877, '6fwied'),
(1496778080, '6fo79k'),
(1496933893, '6g1gcg'),
(1497154559, '6gjkid'),
(1497618561, '6hmeud'),
(1497377349, '6h1osm'),
(1497221017, '6god73'),
(1497731470, '6hvmic'),
(1497273130, '6gs4ay'),
(1498080798, '6ioz8q'),
(1497769316, '6hyer4'),
(1497415729, '6h5cgu'),
(1497978764, '6iffwq')
;
CREATE TABLE May
(created_utc int, id varchar(6))
;
INSERT INTO May
(created_utc, userid)
VALUES
(1493729491, '68sx7k'),
(1493646801, '68m2s2'),
(1493747285, '68uohf'),
(1493664087, '68ntss'),
(1493690759, '68qe5k'),
(1493829196, '691fy9'),
(1493646344, '68m1dv'),
(1494166859, '69rhkl'),
(1493883023, '6963qb'),
(1494362328, '6a83wv'),
(1494525998, '6alv6c'),
(1493945230, '69bkhb'),
(1494050355, '69jqtz'),
(1494418011, '6accd0'),
(1494425781, '6ad0xm'),
(1494024697, '69hx2z'),
(1494586576, '6aql9y')
;
#standardSQL
SELECT created_utc,
DATE(TIMESTAMP_SECONDS(created_utc)) as event_date,
CAST(EXTRACT(WEEK FROM TIMESTAMP_SECONDS(created_utc)) AS STRING) AS week_number,
COUNT(distinct userid) as user_count
FROM June
SELECT created_utc,
DATE(TIMESTAMP_SECONDS(created_utc)) as event_date,
CAST(EXTRACT(WEEK FROM TIMESTAMP_SECONDS(created_utc)) AS STRING) AS week_number,
COUNT(distinct userid) as user_count
FROM May
Below is for BigQuery Standard SQL
#standardSQL
SELECT
CONCAT('Week_', CAST(week AS STRING)) Week,
June.user_count AS June_count,
May.user_count AS May_count,
ROUND((May.user_count - June.user_count) / June.user_count, 2) AS WOW_Change
FROM (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.June`
GROUP BY week
) June
JOIN (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.May`
GROUP BY week
) May
USING(week)
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.June` AS (
SELECT 1496354167 created_utc, '6eq4xf' userid UNION ALL
SELECT 1496362973, '6eqzz3' UNION ALL
SELECT 1496431934, '6ewlm8' UNION ALL
SELECT 1496870877, '6fwied' UNION ALL
SELECT 1496778080, '6fo79k' UNION ALL
SELECT 1496933893, '6g1gcg' UNION ALL
SELECT 1497154559, '6gjkid' UNION ALL
SELECT 1497618561, '6hmeud' UNION ALL
SELECT 1497377349, '6h1osm' UNION ALL
SELECT 1497221017, '6god73' UNION ALL
SELECT 1497731470, '6hvmic' UNION ALL
SELECT 1497273130, '6gs4ay' UNION ALL
SELECT 1498080798, '6ioz8q' UNION ALL
SELECT 1497769316, '6hyer4' UNION ALL
SELECT 1497415729, '6h5cgu' UNION ALL
SELECT 1497978764, '6iffwq'
), `project.dataset.May` AS (
SELECT 1493729491 created_utc, '68sx7k' userid UNION ALL
SELECT 1493646801, '68m2s2' UNION ALL
SELECT 1493747285, '68uohf' UNION ALL
SELECT 1493664087, '68ntss' UNION ALL
SELECT 1493690759, '68qe5k' UNION ALL
SELECT 1493829196, '691fy9' UNION ALL
SELECT 1493646344, '68m1dv' UNION ALL
SELECT 1494166859, '69rhkl' UNION ALL
SELECT 1493883023, '6963qb' UNION ALL
SELECT 1494362328, '6a83wv' UNION ALL
SELECT 1494525998, '6alv6c' UNION ALL
SELECT 1493945230, '69bkhb' UNION ALL
SELECT 1494050355, '69jqtz' UNION ALL
SELECT 1494418011, '6accd0' UNION ALL
SELECT 1494425781, '6ad0xm' UNION ALL
SELECT 1494024697, '69hx2z' UNION ALL
SELECT 1494586576, '6aql9y'
)
SELECT
CONCAT('Week_', CAST(week AS STRING)) Week,
June.user_count AS June_count,
May.user_count AS May_count,
ROUND((May.user_count - June.user_count) / June.user_count, 2) AS WOW_Change
FROM (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.June`
GROUP BY week
) June
JOIN (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.May`
GROUP BY week
) May
USING(week)
-- ORDER BY week
with result (as sample data is limited to just first two weeks result is also showing two weeks only which should not be an issue when you apply it to real data)
Row Week June_count May_count WOW_Change
1 Week_1 5 12 1.4
2 Week_2 6 5 -0.17
Use arithmetic on the day of the month to get the week:
SELECT j.weeknumber, j.user_count as june_user_count,
m.user_count as may_user_count
FROM (SELECT (EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1) / 7 as week_number,
COUNT(distinct userid) as user_count
FROM June
GROUP BY week_number
) j JOIN
(SELECT (EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1) / 7 as week_number,
COUNT(distinct userid) as user_count
FROM May
GROUP BY week_number
) m
ON m.week_number = j.week_number;
Note that splitting data into different tables just based on the date is bad idea. The data should all go into one table, perhaps partitioned if data volume is an issue.

Oracle first and last observation over multiple windows

I have a problem with a query in Oracle.
My table contains all of the loan applications from last year. Some of the customers have more than one application. I want to aggregate those applications as follows:
For each customer, I want to find his first application (let's call it A) in the last year and then I want to find out what was the last application in 30 days interval, counting from the first application (say B is the last one). Next, I need to find the application following B and again find for it the last one in 30 days interval, as in the previous step. What I want as the result is the table with the latest and earliest applications on each customer's interval. It is also possible that the first one is the same as the last one.
How could I do this in Oracle without plsql? Is this possible? Should I use cumulative sums of time intervals for it? (but then the starting point for each sum depends on the counted sum..)
Let's say the table has a following form:
application_id (unique) | customer_id (not unique) | create_date
1 1 2017-01-02 <- first
2 1 2017-01-10 <- middle
3 1 2017-01-30 <- last
4 1 2017-05-02 <- first and last
5 1 2017-06-02 <- first
6 1 2017-06-30 <- middle
7 1 2017-06-30 <- middle
8 1 2017-07-01 <- last
What I expect is:
application_id (unique) | customer_id (not unique) | create_date
1 1 2017-01-02 <- first
3 1 2017-01-30 <- last
4 1 2017-05-02 <- first and last
5 1 2017-06-02 <- first
8 1 2017-07-01 <- last
Thanks in advance for help.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( application_id, customer_id, create_date ) AS
SELECT 1, 1, DATE '2017-01-02' FROM DUAL UNION ALL -- <- first
SELECT 2, 1, DATE '2017-01-10' FROM DUAL UNION ALL -- <- middle
SELECT 3, 1, DATE '2017-01-30' FROM DUAL UNION ALL -- <- last
SELECT 4, 1, DATE '2017-05-02' FROM DUAL UNION ALL -- <- first and last
SELECT 5, 1, DATE '2017-06-02' FROM DUAL UNION ALL -- <- first
SELECT 6, 1, DATE '2017-06-30' FROM DUAL UNION ALL -- <- middle
SELECT 7, 1, DATE '2017-06-30' FROM DUAL UNION ALL -- <- middle
SELECT 8, 1, DATE '2017-07-01' FROM DUAL -- <- last
Query 1:
WITH data ( application_id, customer_id, create_date, first_date, grp ) AS (
SELECT t.application_id,
t.customer_id,
t.create_date,
t.create_date,
1
FROM table_name t
WHERE application_id = 1
UNION ALL
SELECT t.application_id,
t.customer_id,
t.create_date,
CASE WHEN t.create_date <= d.first_date + INTERVAL '30' DAY
THEN d.first_date
ELSE t.create_date
END,
CASE WHEN t.create_date <= d.first_date + INTERVAL '30' DAY
THEN grp
ELSE grp + 1
END
FROM data d
INNER JOIN table_name t
ON ( d.customer_id = t.customer_id
AND d.application_id + 1 = t.application_id )
)
SELECT application_id,
customer_id,
create_date,
grp
FROM (
SELECT d.*,
ROW_NUMBER() OVER ( PARTITION BY customer_id, grp ORDER BY create_date ASC ) AS rn_a,
ROW_NUMBER() OVER ( PARTITION BY customer_id, grp ORDER BY create_date DESC ) AS rn_d
FROM data d
)
WHERE rn_a = 1
OR rn_d = 1
Results:
| APPLICATION_ID | CUSTOMER_ID | CREATE_DATE | GRP |
|----------------|-------------|----------------------|-----|
| 1 | 1 | 2017-01-02T00:00:00Z | 1 |
| 3 | 1 | 2017-01-30T00:00:00Z | 1 |
| 4 | 1 | 2017-05-02T00:00:00Z | 2 |
| 5 | 1 | 2017-06-02T00:00:00Z | 3 |
| 8 | 1 | 2017-07-01T00:00:00Z | 3 |

Postgresql - How to get value from last record of each month

I have a view like this:
Year | Month | Week | Category | Value |
2017 | 1 | 1 | A | 1
2017 | 1 | 1 | B | 2
2017 | 1 | 1 | C | 3
2017 | 1 | 2 | A | 4
2017 | 1 | 2 | B | 5
2017 | 1 | 2 | C | 6
2017 | 1 | 3 | A | 7
2017 | 1 | 3 | B | 8
2017 | 1 | 3 | C | 9
2017 | 1 | 4 | A | 10
2017 | 1 | 4 | B | 11
2017 | 1 | 4 | C | 12
2017 | 2 | 5 | A | 1
2017 | 2 | 5 | B | 2
2017 | 2 | 5 | C | 3
2017 | 2 | 6 | A | 4
2017 | 2 | 6 | B | 5
2017 | 2 | 6 | C | 6
2017 | 2 | 7 | A | 7
2017 | 2 | 7 | B | 8
2017 | 2 | 7 | C | 9
2017 | 2 | 8 | A | 10
2017 | 2 | 8 | B | 11
2017 | 2 | 8 | C | 12
And I need to make a new view which needs to show average of value column (let's call it avg_val) and the value from the max week of the month (max_val_of_month). Ex: max week of january is 4, so the value of category A is 10. Or something like this to be clear:
Year | Month | Category | avg_val | max_val_of_month
2017 | 1 | A | 5.5 | 10
2017 | 1 | B | 6.5 | 11
2017 | 1 | C | 7.5 | 12
2017 | 2 | A | 5.5 | 10
2017 | 2 | B | 6.5 | 11
2017 | 2 | C | 7.5 | 12
I have use window function, over partition by year, month, category to get the avg value. But how can I get the value of the max week of each month?
Assuming that you need a month average and a value for the max week not the max value per month
SELECT year, month, category, avg_val, value max_week_val
FROM (
SELECT *,
AVG(value) OVER (PARTITION BY year, month, category) avg_val,
ROW_NUMBER() OVER (PARTITION BY year, month, category ORDER BY week DESC) rn
FROM view1
) q
WHERE rn = 1
ORDER BY year, month, category
or more verbose version without window functions
SELECT q.year, q.month, q.category, q.avg_val, v.value max_week_val
FROM (
SELECT year, month, category, avg(value) avg_val, MAX(week) max_week
FROM view1
GROUP BY year, month, category
) q JOIN view1 v
ON q.year = v.year
AND q.month = v.month
AND q.category = v.category
AND q.max_week = v.week
ORDER BY year, month, category
Here is a dbfiddle demo for both queries
And here is my NEW version.
My thanks to #peterm for pointing me about the prior false value of val_from_max_week_of_month. So, I corrected it:
SELECT
a.Year,
a.Month,
a.Category,
max(a.Week) AS max_week,
AVG(a.Value) AS avg_val,
(
SELECT b.Value
FROM decades AS b
WHERE
b.Year = a.Year AND
b.Month = a.Month AND
b.Week = max(a.Week) AND
b.Category = a.Category
) AS val_from_max_week_of_month
FROM decades AS a
GROUP BY
a.Year,
a.Month,
a.Category
;
The new results:
First, you might need to check, how do you handle the first week in January. If 1st of January are not a Monday, there are several interpretations & not every one of them will fit the solutions here. You'll either need to use:
the ISO week concept, ie. the week column should hold the ISO week & the year column should hold the ISO year (week-year, rather). Note: in this concept, 1st of January actually sometimes belongs to the previous year
use your own concept, where the first week of the year is "split" into two if 1st of January is not a Monday.
Note: the solutions below will not work if (in your table) the first week of January can be 52 or 53.
Given that avg_val is just a simple aggregation, while max_val_of_month can be calculated with typical greatest-n-per-group queries. It has a lot of possible solutions in PostgreSQL, with varying performance. Fortunately, your query will naturally have an easily determined selectivity: you'll always need (approx.) a quarter of your data.
Usual winners (in performance) are:
(These are not surprise though, as these 2 should perform more and more as you need more portion of the original data.)
array_agg() with order by variant:
select year, month, category, avg(value) avg_val,
(array_agg(value order by week desc))[1] max_val_of_month
from table_name
group by year, month, category;
distinct on variant:
select distinct on (year, month, category) year, month, category,
avg(value) over (partition by year, month, category) avg_val,
value max_val_of_month
from table_name
order by year, month, category, week desc;
The pure window function variant is not that bad either:
row_number() variant:
select year, month, category, avg_val, max_val_of_month
from (select year, month, category, value max_val_of_month,
avg(value) over (partition by year, month, category) avg_val,
row_number() over (partition by year, month, category order by week desc) rn
from table_name) w
where rn = 1;
But the LATERAL variant is only viable with an index:
LATERAL variant:
create index idx_table_name_year_month_category_week_desc
on table_name(year, month, category, week desc);
select year, month, category,
avg(value) avg_val,
max_val_of_month
from table_name t
cross join lateral (select value max_val_of_month
from table_name
where (year, month, category) = (t.year, t.month, t.category)
order by week desc
limit 1) m
group by year, month, category, max_val_of_month;
But most of the solutions above can actually utilize this index, not just this last one.
Without the index: http://rextester.com/WNEL86809
With the index: http://rextester.com/TYUA52054
with data (yr, mnth, wk, cat, val) as
(
-- begin test data
select 2017 , 1 , 1 , 'A' , 1 from dual union all
select 2017 , 1 , 1 , 'B' , 2 from dual union all
select 2017 , 1 , 1 , 'C' , 3 from dual union all
select 2017 , 1 , 2 , 'A' , 4 from dual union all
select 2017 , 1 , 2 , 'B' , 5 from dual union all
select 2017 , 1 , 2 , 'C' , 6 from dual union all
select 2017 , 1 , 3 , 'A' , 7 from dual union all
select 2017 , 1 , 3 , 'B' , 8 from dual union all
select 2017 , 1 , 3 , 'C' , 9 from dual union all
select 2017 , 1 , 4 , 'A' , 10 from dual union all
select 2017 , 1 , 4 , 'B' , 11 from dual union all
select 2017 , 1 , 4 , 'C' , 12 from dual union all
select 2017 , 2 , 5 , 'A' , 1 from dual union all
select 2017 , 2 , 5 , 'B' , 2 from dual union all
select 2017 , 2 , 5 , 'C' , 3 from dual union all
select 2017 , 2 , 6 , 'A' , 4 from dual union all
select 2017 , 2 , 6 , 'B' , 5 from dual union all
select 2017 , 2 , 6 , 'C' , 6 from dual union all
select 2017 , 2 , 7 , 'A' , 7 from dual union all
select 2017 , 2 , 8 , 'A' , 10 from dual union all
select 2017 , 2 , 8 , 'B' , 11 from dual union all
select 2017 , 2 , 7 , 'B' , 8 from dual union all
select 2017 , 2 , 7 , 'C' , 9 from dual union all
select 2018 , 2 , 7 , 'C' , 9 from dual union all
select 2017 , 2 , 8 , 'C' , 12 from dual
-- end test data
)
select * from
(
select
-- data.*: all columns of the data table
data.*,
-- avrg: partition by a combination of year,month and category to work out -
-- the avg for each category in a month of a year
avg(val) over (partition by yr, mnth, cat) avrg,
-- mwk: partition by year and month to work out -
-- the max week of a month in a year
max(wk) over (partition by yr, mnth) mwk
from
data
)
-- as OP's interest is in the max week of each month of a year, -
-- "wk" column value is matched against
-- the derived column "mwk"
where wk = mwk
order by yr,mnth,cat;

Querying for an ID that has the most number of reads

Suppose I have a table like the one below:
+----+-----------+
| ID | TIME |
+----+-----------+
| 1 | 12-MAR-15 |
| 2 | 23-APR-14 |
| 2 | 01-DEC-14 |
| 1 | 01-DEC-15 |
| 3 | 05-NOV-15 |
+----+-----------+
What I want to do is for each year ( the year is defined as DATE), list the ID that has the highest count in that year. So for example, ID 1 occurs the most in 2015, ID 2 occurs the most in 2014, etc.
What I have for a query is:
SELECT EXTRACT(year from time) "YEAR", COUNT(ID) "ID"
FROM table
GROUP BY EXTRACT(year from time)
ORDER BY COUNT(ID) DESC;
But this query just counts how many times a year occurs, how do I fix it to highest count of an ID in that year?
Output:
+------+----+
| YEAR | ID |
+------+----+
| 2015 | 2 |
| 2012 | 2 |
+------+----+
Expected Output:
+------+----+
| YEAR | ID |
+------+----+
| 2015 | 1 |
| 2014 | 2 |
+------+----+
Starting with your sample query, the first change is simply to group by the ID as well as by the year.
SELECT EXTRACT(year from time) "YEAR" , id, COUNT(*) "TOTAL"
FROM table
GROUP BY EXTRACT(year from time), id
ORDER BY EXTRACT(year from time) DESC, COUNT(*) DESC
With that, you could find the rows you want by visual inspection (the first row for each year is the ID with the most rows).
To have the query just return the rows with the highest totals, there are several different ways to do it. You need to consider what you want to do if there are ties - do you want to see all IDs tied for highest in a year, or just an arbitrary one?
Here is one approach - if there is a tie, this should return just the lowest of the tied IDs:
WITH groups AS (
SELECT EXTRACT(year from time) "YEAR" , id, COUNT(*) "TOTAL"
FROM table
GROUP BY EXTRACT(year from time), id
)
SELECT year, MIN(id) KEEP (DENSE_RANK FIRST ORDER BY total DESC)
FROM groups
GROUP BY year
ORDER BY year DESC
You need to count per id and then apply a RANK on that count:
SELECT *
FROM
(
SELECT EXTRACT(year from time) "YEAR" , ID, COUNT(*) AS cnt
, RANK() OVER (PARTITION BY "YEAR" ORDER BY COUNT(*) DESC) AS rnk
FROM table
GROUP BY EXTRACT(year from time), ID
) dt
WHERE rnk = 1
If this return multiple rows with the same high count per year and you want just one of them randomly, you can switch to a ROW_NUMBER.
This should do what you're after, I think:
with sample_data as (select 1 id, to_date('12/03/2015', 'dd/mm/yyyy') time from dual union all
select 2 id, to_date('23/04/2014', 'dd/mm/yyyy') time from dual union all
select 2 id, to_date('01/12/2014', 'dd/mm/yyyy') time from dual union all
select 1 id, to_date('01/12/2015', 'dd/mm/yyyy') time from dual union all
select 3 id, to_date('05/11/2015', 'dd/mm/yyyy') time from dual)
-- End of creating a subquery to mimick a table called "sample_data" containing your input data.
-- See SQL below:
select yr,
id most_frequent_id,
cnt_id_yr cnt_of_most_freq_id
from (select to_char(time, 'yyyy') yr,
id,
count(*) cnt_id_yr,
dense_rank() over (partition by to_char(time, 'yyyy') order by count(*) desc) dr
from sample_data
group by to_char(time, 'yyyy'),
id)
where dr = 1;
YR MOST_FREQUENT_ID CNT_OF_MOST_FREQ_ID
---- ---------------- -------------------
2014 2 2
2015 1 2