Count records and group by different field while showing max value - sql

id
year
1
2014
10
2015
10
2019
102
2015
102
2019
104
2015
104
2017
104
2019
104
2021
The output I want in postgres is below. The max year is populated based on the id and the count should also count based on id. If id = 10 then it should show the max date within id 10 and also count how many records have the id as 10.
id
year
max year
count
1
2014
2014
1
10
2015
2017
2
10
2017
2017
2
102
2015
2019
2
102
2019
2019
2
104
2015
2021
4
104
2017
2021
4
104
2019
2021
4
104
2021
2021
4
SELECT aa.id,
aa.year,
aa.max_year,
count(aa.id)
from (SELECT id,MAX(year) AS year FROM table
GROUP BY id) aa
FULL JOIN table2 b ON aa.id = b.id

You can use window functions:
select id,
year,
max(year) over (partition by id) as max_year,
count(*) over (partition by id)
from the_table

Related

sql - How To Remove All Rows After 4th Occurence of Column Combination in postgresql

I have a sql query that results in a table similar to the following after grouping by name, quarter, year and ordering by year DESC, quarter DESC:
name
count
quarter
year
orange
22
4
2022
apple
1
4
2022
banana
123
3
2022
pie
93
2
2022
apple
12
2
2022
orange
0
1
2022
apple
900
4
2021
...
...
...
...
I want to remove any rows that come after the 4th unique combination of quarter and year is reached (for the table above this would be any rows after the last combination of quarter 1, year 2022), like so:
name
count
quarter
year
orange
22
4
2022
apple
1
4
2022
banana
123
3
2022
pie
93
2
2022
apple
12
2
2022
orange
0
1
2022
I am using Postgres 6.10.
If the next year were reached, it would still need to work with the quarter at the top being 1 and the year 2023.
select name
,count
,quarter
,year
from
(
select *
,dense_rank() over(order by year desc, quarter desc) as dns_rnk
from t
) t
where dns_rnk <= 4
name
count
quarter
year
orange
22
4
2022
apple
1
4
2022
banana
123
3
2022
pie
93
2
2022
apple
12
2
2022
orange
0
1
2022
Fiddle

Calculate running sum of previous 3 months from monthly aggregated data

I have a dataset that I have aggregated at monthly level. The next part needs me to take, for every block of 3 months, the sum of the data at monthly level.
So essentially my input data (after aggregated to monthly level) looks like:
month
year
status
count_id
08
2021
stat_1
1
09
2021
stat_1
3
10
2021
stat_1
5
11
2021
stat_1
10
12
2021
stat_1
10
01
2022
stat_1
5
02
2022
stat_1
20
and then my output data to look like:
month
year
status
count_id
3m_sum
08
2021
stat_1
1
1
09
2021
stat_1
3
4
10
2021
stat_1
5
8
11
2021
stat_1
10
18
12
2021
stat_1
10
25
01
2022
stat_1
5
25
02
2022
stat_1
20
35
i.e 3m_sum for Feb = Feb + Jan + Dec. I tried to do this using a self join and wrote a query along the lines of
WITH CTE AS(
SELECT date_part('month',date_col) as month
,date_part('year',date_col) as year
,status
,count(distinct id) as count_id
FROM (date_col, status, transaction_id) as a
)
SELECT a.month, a.year, a.status, sum(b.count_id) as 3m_sum
from cte as a
left join cte as b on a.status = b.status
and b.month >= a.month - 2 and b.month <= a.month
group by 1,2,3
This query NEARLY works. Where it falls apart is in Jan and Feb. My data is from August 2021 to Apr 2022. The means, the value for Jan should be Nov + Dec + Jan. Similarly for Feb it should be Dec + Jan + Feb.
As I am doing a join on the MONTH, all the months of Aug - Nov are treated as being values > month of jan/feb and so the query isn't doing the correct sum.
How can I adjust this bit to give the correct sum?
I did think of using a LAG function, but (even though I'm 99% sure a month won't ever be missed), I can't guarantee we will never have a month with 0 values, and therefore my LAG function will be summing the wrong rows.
I also tried doing the same join, but at individual date level (and not aggregating in my nested query) but this gave vastly different numbers, as I want the sum of the aggregation and I think the sum from the individual row was duplicated a lot of stuff I do a COUNT DISTINCT on to remove.
You can use a SUM with a window frame of 2 PRECEDING. To ensure you don't miss rows, use a calendar table and left-join all the results to it.
SELECT *,
SUM(a.count_id) OVER (ORDER BY c.year, c.month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM Calendar c
LEFT JOIN a ON a.year = c.year AND a.month = c.month
WHERE c.year >= 2021 AND c.year <= 2022;
db<>fiddle
You could also use LAG but you would need it twice.
It should be #Charlieface's answer - only that I get one different result than you put in your expected result table:
WITH
-- your input - and I avoid keywords like "MONTH" or "YEAR"
-- and also identifiers starting with digits are forbidden -
indata(mm,yy,status,count_id,sum_3m) AS (
SELECT 08,2021,'stat_1',1,1
UNION ALL SELECT 09,2021,'stat_1',3,4
UNION ALL SELECT 10,2021,'stat_1',5,8
UNION ALL SELECT 11,2021,'stat_1',10,18
UNION ALL SELECT 12,2021,'stat_1',10,25
UNION ALL SELECT 01,2022,'stat_1',5,25
UNION ALL SELECT 02,2022,'stat_1',20,35
)
SELECT
*
, SUM(count_id) OVER(
ORDER BY yy,mm
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS sum_3m_calc
FROM indata;
-- out mm | yy | status | count_id | sum_3m | sum_3m_calc
-- out ----+------+--------+----------+--------+-------------
-- out 8 | 2021 | stat_1 | 1 | 1 | 1
-- out 9 | 2021 | stat_1 | 3 | 4 | 4
-- out 10 | 2021 | stat_1 | 5 | 8 | 9
-- out 11 | 2021 | stat_1 | 10 | 18 | 18
-- out 12 | 2021 | stat_1 | 10 | 25 | 25
-- out 1 | 2022 | stat_1 | 5 | 25 | 25
-- out 2 | 2022 | stat_1 | 20 | 35 | 35

SQL query to Find highest value in table and sum the corresponding value

I would like to group Highest values in month column group by year and Sum the value column
value
Year
Month
4
2019
10
1
2019
11
5
2019
11
1
2019
11
1
2019
12
8
2019
12
1
2019
12
1
2020
1
10
2020
1
3
2021
1
2
2021
2
11
2021
2
1
2021
2
3
2021
2
2
2021
3
In above table I would like to extract highest value of month group by year
in year 2019 highest month is 12 so there are 3 rows and sum of value column will be 10
The output should be
value
Year
Month
10
2019
12
11
2020
1
2
2021
3
supposing that the table is called "example_table" you can use the following query:
select sum(example_table.value), example_table.year, example_table.month
from example_table
join (
select year, max(month) "month"
from example_table
group by year
) sub on example_table.year = sub.year and example_table.month = sub.month
group by example_table.year, example_table.month
order by example_table.year

Add column value to next column in SQL

My sql table is
Week Year Applications
1 2017 0
2 2017 10
3 2017 20
4 2017 50
5 2017 0
1 2018 10
2 2018 0
3 2018 40
4 2018 50
5 2018 10
And I want SQL query which give below output
Week Year Applications
1 2017 0
2 2017 10
3 2017 30
4 2017 80
5 2017 80
1 2018 10
2 2018 10
3 2018 50
4 2018 100
5 2018 110
Can anyone help me to write below query?
You could use SUM() OVER to get cumulative sum:
SELECT *, SUM(Applications) OVER(PARTITION BY Year ORDER BY Week)
FROM tab
It looks like you want a cumulative sum:
select week, year,
sum(applications) over (partition by year order by week) as cumulative_applications
from t;

SQL Query Return 0 on weeks in between

i have this query that works , but the result is not like i want
returns only year and weeks that has data , i want to return 0 to the result
for example this returns
year week totalstop
2017 50 7
2018 1 3
2018 3 5
but i want to return
year week totalstop
2017 50 7
2017 51 0
2017 52 0
2018 1 3
2018 2 0
2018 3 5
and so on
here is the current query
SELECT year(Stopdate)[year],datepart(week,date1) [week],sum(stop) totalstop
from Table1 where
building in (select item from dbo.fn_Split('A1,A2,A3,A4,A5',','))
and
date1 between '2017-12-12' and '2018-05-08'
and grp = 1
group by year(date1),datepart(week,date1)
order by year(date1),[week]
iam using ms sql-server 2016
need help to modify it to my needs as iam out of ideas atm.