How to filter data based on top values from a specific year in SQL? - sql

Let's assume my data looks like this:
year person cash
0 2020 personone 29
1 2021 personone 40
2 2020 persontwo 17
3 2021 persontwo 13
4 2020 personthree 62
5 2021 personthree 55
What I want to do is the following. I'd like to get the top 2 people comparing their cash based on year 2021. We can see that in 2021 personone and personthree are the top 2 people, then it can be ordered by cash in 2021. So the output I'm after is:
year person cash
0 2020 personthree 62
1 2021 personthree 55
2 2020 personone 29
3 2021 personone 40
I've been trying a similar approach to the one described here with no much luck.

We can use DENSE_RANK here:
WITH cte AS (
SELECT *, DENSE_RANK() OVER (PARTITION BY person ORDER BY cash DESC) dr
FROM yourTable
WHERE year = 2021
)
SELECT *
FROM yourTable
WHERE person IN (SELECT person FROM cte WHERE dr = 2);

Related

sql query to find student having average of 90 from last 2 years?

I was trying this on test table
create table years (
yr bigint,
average decimal(10,2),
rollno bigint
)
i created this table year for storing 2 years like 2020 and 2021
average of marks scored in average
Condition is to find only those students whose avg is above 54 from last 2 years.
data is as follows
year average rollno
2021 55.20 1
2020 55.50 1
2020 54.50 2
2020 55.50 3
2021 55.40 3
select rollno
from years
where average > 54
and yr = (YEAR(GETDATE())-1)
and yr = (YEAR(GETDATE())-2)
i tried this query but it is not working when i want to specifically find those values where the condition is true.
if i use this query like this
select rollno
from years
where average > 54
and yr = (YEAR(GETDATE())-1) or yr = (YEAR(GETDATE())-2)
it works but doesnt give me the desired result.
result i want is as follows
year average rollno
2020 55.50 1
2021 55.20 1
2020 55.50 3
2021 55.40 3
but i am getting roll no 2 in the output
I'm checking your request after a review in you query, seems there are some parentheses missing in your expression.
Follows the example, based on your query
SELECT * FROM years WHERE
average > 55
and --below the bracket P1
(--Open P1
(--Open P2
yr = year(getdate())-1
)--Close P2
or
(--Open P3
yr = year(getdate())-2
)--Close P3
)--Close P1
What does it mean in the WHERE clause is the average > 55(mandatory) AND all what we have inside in the bracket P1 as another condition.
The result
year average rollno
2020 55.50 1
2021 55.20 1
2020 55.50 3
2021 55.40 3
Best Regards

sql - How To Remove All Rows After 4th Occurence of Column Combination in postgresql

I have a sql query that results in a table similar to the following after grouping by name, quarter, year and ordering by year DESC, quarter DESC:
name
count
quarter
year
orange
22
4
2022
apple
1
4
2022
banana
123
3
2022
pie
93
2
2022
apple
12
2
2022
orange
0
1
2022
apple
900
4
2021
...
...
...
...
I want to remove any rows that come after the 4th unique combination of quarter and year is reached (for the table above this would be any rows after the last combination of quarter 1, year 2022), like so:
name
count
quarter
year
orange
22
4
2022
apple
1
4
2022
banana
123
3
2022
pie
93
2
2022
apple
12
2
2022
orange
0
1
2022
I am using Postgres 6.10.
If the next year were reached, it would still need to work with the quarter at the top being 1 and the year 2023.
select name
,count
,quarter
,year
from
(
select *
,dense_rank() over(order by year desc, quarter desc) as dns_rnk
from t
) t
where dns_rnk <= 4
name
count
quarter
year
orange
22
4
2022
apple
1
4
2022
banana
123
3
2022
pie
93
2
2022
apple
12
2
2022
orange
0
1
2022
Fiddle

How to get most recent balance for every user and its corresponding dates

I have a table called balances. I want to get the most recent balance for each user, forever every financial year and its corresponding date it was updated.
name
balance
financial_year
date_updated
Bob
20
2021
2021-04-03
Bob
58
2019
2019-11-13
Bob
43
2019
2022-01-24
Bob
-4
2019
2019-12-04
James
92
2021
2021-09-11
James
86
2021
2021-08-18
James
33
2019
2019-03-24
James
46
2019
2019-02-12
James
59
2019
2019-08-12
So my desired output would be:
name
balance
financial_year
date_updated
Bob
20
2021
2021-04-03
Bob
43
2019
2022-01-24
James
92
2021
2021-09-11
James
59
2019
2019-08-12
I've attempted this but found that using max() sometimes does not work since I use it across multiple columns
SELECT name, max(balance), financial_year, max(date_updated)
FROM balances
group by name, financial_year
select NAME
,BALANCE
,FINANCIAL_YEAR
,DATE_UPDATED
from (
select t.*
,row_number() over(partition by name, financial_year order by date_updated desc) as rn
from t
) t
where rn = 1
NAME
BALANCE
FINANCIAL_YEAR
DATE_UPDATED
Bob
43
2019
24-JAN-22
Bob
20
2021
03-APR-21
James
59
2019
12-AUG-19
James
92
2021
11-SEP-21
Fiddle
The problem is not that you use max() across multiple columns but the fact, that max() returns the maximum value. In your example, the highest balance of Bob in financial year 2019 was 58. The 'highest' (last) date_updated was 2022-01-24, but at this time the balance was 43.
What you're looking for is the balance at the time the balance was updated last within a financial year per user, that is something like
SELECT b.name, b.financial_year, b.balance, b.date_updated
FROM balances b
INNER JOIN (SELECT name, financial_year, max(date_updated) last_updated
FROM balances GROUP BY name, financial_year) u
ON b.name = u.name AND b.financial_year = u.financial_year AND b.date_updated = u.last_updated;

Calculate running sum of previous 3 months from monthly aggregated data

I have a dataset that I have aggregated at monthly level. The next part needs me to take, for every block of 3 months, the sum of the data at monthly level.
So essentially my input data (after aggregated to monthly level) looks like:
month
year
status
count_id
08
2021
stat_1
1
09
2021
stat_1
3
10
2021
stat_1
5
11
2021
stat_1
10
12
2021
stat_1
10
01
2022
stat_1
5
02
2022
stat_1
20
and then my output data to look like:
month
year
status
count_id
3m_sum
08
2021
stat_1
1
1
09
2021
stat_1
3
4
10
2021
stat_1
5
8
11
2021
stat_1
10
18
12
2021
stat_1
10
25
01
2022
stat_1
5
25
02
2022
stat_1
20
35
i.e 3m_sum for Feb = Feb + Jan + Dec. I tried to do this using a self join and wrote a query along the lines of
WITH CTE AS(
SELECT date_part('month',date_col) as month
,date_part('year',date_col) as year
,status
,count(distinct id) as count_id
FROM (date_col, status, transaction_id) as a
)
SELECT a.month, a.year, a.status, sum(b.count_id) as 3m_sum
from cte as a
left join cte as b on a.status = b.status
and b.month >= a.month - 2 and b.month <= a.month
group by 1,2,3
This query NEARLY works. Where it falls apart is in Jan and Feb. My data is from August 2021 to Apr 2022. The means, the value for Jan should be Nov + Dec + Jan. Similarly for Feb it should be Dec + Jan + Feb.
As I am doing a join on the MONTH, all the months of Aug - Nov are treated as being values > month of jan/feb and so the query isn't doing the correct sum.
How can I adjust this bit to give the correct sum?
I did think of using a LAG function, but (even though I'm 99% sure a month won't ever be missed), I can't guarantee we will never have a month with 0 values, and therefore my LAG function will be summing the wrong rows.
I also tried doing the same join, but at individual date level (and not aggregating in my nested query) but this gave vastly different numbers, as I want the sum of the aggregation and I think the sum from the individual row was duplicated a lot of stuff I do a COUNT DISTINCT on to remove.
You can use a SUM with a window frame of 2 PRECEDING. To ensure you don't miss rows, use a calendar table and left-join all the results to it.
SELECT *,
SUM(a.count_id) OVER (ORDER BY c.year, c.month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM Calendar c
LEFT JOIN a ON a.year = c.year AND a.month = c.month
WHERE c.year >= 2021 AND c.year <= 2022;
db<>fiddle
You could also use LAG but you would need it twice.
It should be #Charlieface's answer - only that I get one different result than you put in your expected result table:
WITH
-- your input - and I avoid keywords like "MONTH" or "YEAR"
-- and also identifiers starting with digits are forbidden -
indata(mm,yy,status,count_id,sum_3m) AS (
SELECT 08,2021,'stat_1',1,1
UNION ALL SELECT 09,2021,'stat_1',3,4
UNION ALL SELECT 10,2021,'stat_1',5,8
UNION ALL SELECT 11,2021,'stat_1',10,18
UNION ALL SELECT 12,2021,'stat_1',10,25
UNION ALL SELECT 01,2022,'stat_1',5,25
UNION ALL SELECT 02,2022,'stat_1',20,35
)
SELECT
*
, SUM(count_id) OVER(
ORDER BY yy,mm
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS sum_3m_calc
FROM indata;
-- out mm | yy | status | count_id | sum_3m | sum_3m_calc
-- out ----+------+--------+----------+--------+-------------
-- out 8 | 2021 | stat_1 | 1 | 1 | 1
-- out 9 | 2021 | stat_1 | 3 | 4 | 4
-- out 10 | 2021 | stat_1 | 5 | 8 | 9
-- out 11 | 2021 | stat_1 | 10 | 18 | 18
-- out 12 | 2021 | stat_1 | 10 | 25 | 25
-- out 1 | 2022 | stat_1 | 5 | 25 | 25
-- out 2 | 2022 | stat_1 | 20 | 35 | 35

Determine the first occurrence of a particular customer visiting the store in a particular month

I need to determine the counts breakdown to per month (and year) of customers [alias'ed as Patient_ID] which made their first visit to a store. The date times of store visits are stored in the [MDT Review Date] column of the table.
Customers can come to the store multiple times throughout the year and increase the total count-> but what I require is ONLY the first time a customer visited.
E.g. Tom Bombadil visited the store once in January 2019, so count increased to 1, then again 4 times in March, so count should be 1 for the month of March and 0 for febraury and 1 for January, then again 4 times in October, then again 2 times in December.
I require that Tom Bombadil should be counted one and only once for a particular month, his first occurrence which was per month
The output should be like :
rn1 YEAR Month_Number Month Total_Count
1 2010 6 June 2
1 2010 7 July 1
1 2010 8 August 5
1 2010 10 October 5
1 2010 11 November 3
1 2011 1 January 4
1 2011 2 February 6
1 2011 4 April 7
1 2011 5 May 4
1 2011 6 June 10
1 2011 7 July 10
1 2011 8 August 14
1 2011 9 September 4
1 2011 10 October 8
1 2011 11 November 11
1 2011 12 December 11
1 2012 1 January 8
1 2012 2 February 21​
Please refer to my query. What I have attempts to use the windowing function COUNT to count the store visits per month. Then the ROW_NUMBER function attempts to assign a unique number to each visit. What am I doing wrong?
select
*
from
(select distinct
row_number() over (partition by p.Patient_ID, p.PAT_Forename1, p.PAT_Surname
order by PAT_Forename1, p.Patient_ID, PAT_Surname) AS rn1,
datepart(year, [DATE_COLUMN]) as YEAR,
datepart(month, [DATE_COLUMN]) as Month_Number,
datename(month,[DATE_COLUMN]) as Month,
count(p.Patient_ID) over (partition by datepart(year,[DATE_COLUMN]),
datename(month, [DATE_COLUMN])) as Total_Count
from
Tablename m
inner join
TableName p on m.PK_ID = p.PK_ID
) as temp
where
rn1 = 1​