Presto SQL - Rank Multiple Conditions for Multiple Columns - sql

I am trying to write a single query (if possible) to rank ids based on multiple conditions.
My table is like this:
id group subgroup value
1 A Q 12
2 A Z 10
3 B Z 14
4 A Z 20
5 B W 20
I tried this query:
SELECT id,
CASE WHEN group = 'A' THEN ROW_NUMBER() OVER (PARTITION BY group ORDER BY SUM(value) DESC) AS rank_group
CASE WHEN group = 'A' AND subgroup = 'Z' THEN ROW_NUMBER() OVER (PARTITION BY group, subgroup ORDER BY SUM(value) DESC) AS rank_subgroup
FROM table
GROUP BY group, subgroup
But ended up with something like this:
id rank_group rank_subgroup
1 1 1
1 2 2
I would like to get each distinct id and return the rank based on the conditions of the case statement, but it looks like adding the needed partition causes a multiplication as the group by is necessary. I could write individual queries for each column, but I'd like to avoid if possible.

Do you want something like this?
select t.*,
dense_rank() over (order by sumg, group),
dense_rank() over (partition by group order by sumsg, subg),
from (select t.*,
sum(value) over (partition by group) as sumg,
sum(value) over (partition by group, subgroup) as sumsg
from t
) t;
This is my best guess at interpreting what you might want.

Related

How do I create a new SQL table with custom column names and populate these columns

So I currently have an SQL statement that generates a table with the most frequent occurring value as well as the least frequent occurring value in a table. However this table has 2 rows with the row values as well as the fields. I need to create a custom table with 2 columns with min and max. Then have one row with one value for each. The value for these columns needs to be from the same row.
(SELECT name, COUNT(name) AS frequency
FROM firefighter_certifications
GROUP BY name
ORDER BY frequency DESC limit 1)
UNION
(SELECT name, COUNT(name) AS frequency
FROM firefighter_certifications
GROUP BY name
ORDER BY frequency ASC limit 1);
So for the above query I would need the names of the min and max values in one row. I need to be able to define the name of new columns for the generated SQL query as well.
Min_Name | Max_Name
Certif_1 | Certif_2
I think this query should give you the results you want. It ranks each name according to the number of times it appears in the table, then uses conditional aggregation to select the min and max frequency names in one row:
with cte as (
select name,
row_number() over (order by count(*) desc) as maxr,
row_number() over (order by count(*)) as minr
from firefighter_certifications
group by name
)
select max(case when minr = 1 then name end) as Min_Name,
max(case when maxr = 1 then name end) as Max_Name
from cte
Postgres doesn't offer "first" and "last" aggregation functions. But there are other, similar methods:
select distinct first_value(name) over (order by cnt desc, name) as name_at_max,
first_value(name) over (order by cnt asc, name) as name_at_min
from (select name, count(*) as cnt
from firefighter_certifications
group by name
) n;
Or without any subquery at all:
select first_value(name) over (order by count(*) desc, name) as name_at_max,
first_value(name) over (order by count(*) asc, name) as name_at_min
from firefighter_certifications
group by name
limit 1;
Here is a db<>fiddle

Oracle ListaGG, Top 3 most frequent values, given in one column, grouped by ID

I have a problem regarding SQL query , it can be done in "plain" SQL, but as I am sure that I need to use some group concatenation (can't use MySQL) so second option is ORACLE dialect as there will be Oracle database. Let's say we have following entities:
Table: Veterinarian visits
Visit_Id,
Animal_id,
Veterinarian_id,
Sickness_code
Let's say there is 100 visits (100 visit_id) and each animal_id visits around 20 times.
I need to create a SELECT , grouped by Animal_id with 3 columns
animal_id
second shows aggregated amount of flu visits for this particular animal (let's say flu, sickness_code = 5)
3rd column shows top three sicknesses codes for each animal (top 3 most often codes for this particular animal_id)
How to do it? First and second columns are easy, but third? I know that I need to use LISTAGG from Oracle, OVER PARTITION BY, COUNT and RANK, I tried to tie it together but didn't work out as I expected :( How should this query look like?
Here sample data
create table VET as
select
rownum+1 Visit_Id,
mod(rownum+1,5) Animal_id,
cast(NULL as number) Veterinarian_id,
trunc(10*dbms_random.value)+1 Sickness_code
from dual
connect by level <=100;
Query
basically the subqueries do the following:
aggregate count and calculate flu count (in all records of the animal)
calculate RANK (if you need realy only 3 records use ROW_NUMBER - see discussion below)
Filter top 3 RANKs
LISTAGGregate result
with agg as (
select Animal_id, Sickness_code, count(*) cnt,
sum(case when SICKNESS_CODE = 5 then 1 else 0 end) over (partition by animal_id) as cnt_flu
from vet
group by Animal_id, Sickness_code
), agg2 as (
select ANIMAL_ID, SICKNESS_CODE, CNT, cnt_flu,
rank() OVER (PARTITION BY ANIMAL_ID ORDER BY cnt DESC) rnk
from agg
), agg3 as (
select ANIMAL_ID, SICKNESS_CODE, CNT, CNT_FLU, RNK
from agg2
where rnk <= 3
)
select
ANIMAL_ID, max(CNT_FLU) CNT_FLU,
LISTAGG(SICKNESS_CODE||'('||CNT||')', ', ') WITHIN GROUP (ORDER BY rnk) as cnt_lts
from agg3
group by ANIMAL_ID
order by 1;
gives
ANIMAL_ID CNT_FLU CNT_LTS
---------- ---------- ---------------------------------------------
0 1 6(5), 1(4), 9(3)
1 1 1(5), 3(4), 2(3), 8(3)
2 0 1(5), 10(3), 4(3), 6(3), 7(3)
3 1 5(4), 2(3), 4(3), 7(3)
4 1 2(5), 10(4), 1(2), 3(2), 5(2), 7(2), 8(2)
I intentionally show Sickness_code(count visits) to demonstarte that top 3 can have ties that you should handle.
Check the RANK function. Using ROW_NUMBER is not deterministic in this case.
I think the most natural way uses two levels of aggregation, along with a dash of window functions here and there:
select vas.animal,
sum(case when sickness_code = 5 then cnt else 0 end) as numflu,
listagg(case when seqnum <= 3 then sickness_code end, ',') within group (order by seqnum) as top3sicknesses
from (select animal, sickness_code, count(*) as cnt,
row_number() over (partition by animal order by count(*) desc) as seqnum
from visits
group by animal, sickness_code
) vas
group by vas.animal;
This uses the fact that listagg() ignores NULL values.

Aggregate function like MAX for most common cell in column?

Group by the highest Number in a column worked great with MAX(), but what if I would like to get the cell that is at most common.
As example:
ID
100
250
250
300
200
250
So I would like to group by ID and instead of get the lowest (MIN) or highest (MAX) number, I would like to get the most common one (that would be 250, because there 3x).
Is there an easy way in SQL Server 2012 or am I forced to add a second SELECT where I COUNT(DISTINCT ID) and add that somehow to my first SELECT statement?
You can use dense_rank to return all the id's with the highest counts. This would handle cases when there are ties for the highest counts as well.
select id from
(select id, dense_rank() over(order by count(*) desc) as rnk from tablename group by id) t
where rnk = 1
A simple way to do what you want uses top and order by:
SELECT top 1 id
FROM t
GROUP BY id
ORDER BY COUNT(*) DESC;
This is a statistic called the mode. Getting the mode and max is a bit challenging in SQL Server. I would approach it as:
WITH cte AS (
SELECT t.id, COUNT(*) AS cnt,
row_number() OVER (ORDER BY COUNT(*) DESC) AS seqnum
FROM t
GROUP BY id
)
SELECT MAX(id) AS themax, MAX(CASE WHEN seqnum = 1 THEN id END) AS MODE
FROM cte;

SQL - Find Differences Between Columns

Let's say I have the following table
Sku | Number | Name
11 1 hat
12 1 hat
13 1 hats
22 2 car
33 3 truck
44 4 boat
45 4 boat
Is there an easy way to figure out how to find the differences within each Number. For example, with the table above, I would want the query to output:
13 | 1 | hats
The reason for this is because our program processes the rows as long as the number matches the name. If there is an instance where the name doesn't match but the rest of the names do, it will fail.
You can find the most common value (the "mode") using window functions and aggregation:
select t.*
from (select number, name, count(*) as cnt,
row_number() over (partition by number order by count(*) desc) as seqnum
from t
group by number, name
) t
where seqnum = 1;
You could then find everything that is not the mode using a join. The easier way is just to change the where condition:
select t.*
from (select number, name, count(*) as cnt,
row_number() over (partition by number order by count(*) desc) as seqnum
from t
group by number, name
) t
where seqnum > 1;
Note: If there are ties in frequency for the most common value, then an arbitrary most common value is chosen.
EDIT:
Actually, if you want the original skus, you might as well do the join:
with modes as (
select t.*
from (select number, name, count(*) as cnt,
row_number() over (partition by number order by count(*) desc) as seqnum
from t
group by number, name
) t
where seqnum = 1
)
select t.*
from t join
modes
on t.number = modes.number and t.name <> modes.name;
This will ignore NULL values (but the logic can easily be fixed to accommodate them).

Select only 3 best ranked after rank() over

I'd like to select the 3 best results of a rank() function for each partition
For instance, in this query :
SELECT id, rank() over (PARTITION BY year order by ...) as rank
FROM table1
GROUP BY year
I'd like to have 3 best ranked for every year.
I can manage that by making a new :
Select *
from ...
where rank <= 3
but then if I have some equalities, i'll get more than 3 row per year.
Do someone have an idea how to solve that ?
We have not much information about your table and query structures, but as a generic solution I'd suggest to add row_number() over (ORDER BY ... desc) as rn and filter by it too with where rn = 1 like here.