PARTITION BY to consider only two specific columns for aggregation? - sql

My table has the following data:
REF_NO
PRD_GRP
ACC_NO
ABC
12
1234
ABC
9C
1234
DEF
AB
7890
DEF
TY
9891
I'm trying to build a query that summarises the number of accounts per customer - the product group is irrelevant for this purpose so my expected result is:
REF_NO
PRD_GRP
ACC_NO
NO_OF_ACC
ABC
12
1234
1
ABC
9C
1234
1
DEF
AB
7890
2
DEF
TY
9891
2
I tried doing this using a window function:
SELECT
T.REF_NO,
T.PRD_GRP,
T.ACC_NO,
COUNT(T.ACC_NO) OVER (PARTITION BY T.REF_NO) AS NUM_OF_ACC
FROM TABLE T
However, the NUM_OF_ACC value returned is 2 and not 1 in the above example for the first customer (ABC). It seems that the query is simply counting the number of unique rows for each customer, rather than identifying the number of accounts as desired.
How can I fix this error?
Link to Fiddle - https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=83344cbe95fb46d4a1640caf0bb6d0b2

You need COUNT(DISTINCT, which is unfortunately not supported by SQL Server as a window function.
But you can simulate it with DENSE_RANK and MAX
SELECT
T.REF_NO,
T.PRD_GRP,
T.ACC_NO,
MAX(T.rn) OVER (PARTITION BY T.REF_NO) AS NUM_OF_ACC
FROM (
SELECT *,
DENSE_RANK() OVER (PARTITION BY T.REF_NO ORDER BY T.ACC_NO) AS rn
FROM [TABLE] T
) T;
DENSE_RANK will count up rows ordered by ACC_NO, but ignoring ties, therefore the MAX of that will be the number of distinct values.
db<>fiddle.uk

What you need is COUNT(DISTINCT T.ACC_NO) which is unfortunately not supported in window functions. Therefore you have to write a sub-query to allow you to use COUNT(DISTINCT T.ACC_NO) without a window function.
SELECT
T.REF_NO,
T.PRD_GRP,
T.ACC_NO,
-- Use of DISTINCT is not allowed with the OVER clause.
-- COUNT(DISTINCT T.ACC_NO) OVER (PARTITION BY T.REF_NO) AS NUM_OF_ACC,
(
SELECT COUNT(DISTINCT T1.ACC_NO)
FROM TEST_DATA T1
WHERE T1.REF_NO = T.REF_NO
) AS NUM_OF_ACC
FROM TEST_DATA T

The simplest way to implement count(distinct) as a window functions is by summing two dense_ranks():
SELECT T.REF_NO, T.PRD_GRP, T.ACC_NO,
(-1 +
DENSE_RANK() OVER (PARTITION BY t.REF_NO ORDER BY T.ACC_NO ASC) +
DENSE_RANK() OVER (PARTITION BY t.REF_NO ORDER BY T.ACC_NO DESC)
) as cnt_distinct
FROM TABLE T

Related

How to choose max of one column per other column

I am using SQL Server and I have a table "a"
month segment_id price
-----------------------------
1 1 100
1 2 200
2 3 50
2 4 80
3 5 10
I want to make a query which presents the original columns where the price will be the max per month
The result should be:
month segment_id price
----------------------------
1 2 200
2 4 80
3 5 10
I tried to write SQL code:
Select
month, segment_id, max(price) as MaxPrice
from
a
but I got an error:
Column segment_id is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
I tried to fix it in many ways but didn't find how to fix it
Because you need a group by clause without segment_id
Select month, max(price) as MaxPrice
from a
Group By month
as you want results per each month, and segment_id is non-aggregated in your original select statement.
If you want to have segment_id with maximum price repeating per each month for each row, you need to use max() function as window analytic function without Group by clause
Select month, segment_id,
max(price) over ( partition by month order by segment_id ) as MaxPrice
from a
Edit (due to your lastly edited desired results) : you need one more window analytic function row_number() as #Gordon already mentioned:
Select month, segment_id, price From
(
Select a.*,
row_number() over ( partition by month order by price desc ) as Rn
from a
) q
Where rn = 1
I would recommend a correlated subquery:
select t.*
from t
where t.price = (select max(t2.price) from t t2 where t2.month = t.month);
The "canonical" solution is to use row_number():
select t.*
from (select t.*,
row_number() over (partition by month order by price desc) as seqnum
from t
) t
where seqnum = 1;
With the right indexes, the correlated subquery often performs better.
Only because it was not mentioned.
Yet another option is the WITH TIES clause.
To be clear, the approach by Gordon and Barbaros would be a nudge more performant, but this technique does not require or generate an extra column.
Select Top 1 with ties *
From YourTable
Order By row_number() over (partition by month order by price desc)
With not exists:
select t.*
from tablename t
where not exists (
select 1 from tablename
where month = t.month and price > t.price
)
or:
select t.*
from tablename inner join (
select month, max(price) as price
from tablename
group By month
) g on g.month = t.month and g.price = t.price

How to to get maximum sequence number in SQL

This is the data I have in my table. What I want is maximum sequence number for each order number.
Order No seq Sta
--------------------
32100 1 rd
32100 3 rd
23600 1 rd
23600 6 rd
I want to get the following result without using cursor.
Output:
Order No seq Sta
-----------------
32100 3 rd
23600 6 rd
If you want entire records you could use ROW_NUMBER:
SELECT *
FROM (SELECT *, ROW_NUMBER() OVER(PARTITION BY Order ORDER BY No_Seq DESC) AS rn
FROM tab) s
WHERE rn = 1;
DBFiddle Demo
Please do not use keywords like Order and spaces in column names.
The most simple solution is using group by with max.
Give this a try:
select [Order No], max(seq), Sta
from myTable
group by [Order No]
Just use group by order no and order by sequence desc and you will get your record.
If you are using Oracle Database then you can use ROW_NUMBER() analytical function to achieve this result
Try the below query:
select
*
from
(select
ROW_NUMBER() OVER (PARTITION BY order_no ORDER BY seq DESC) as "ROW_NUM",
order_no, seq, sta
from
Order_Details) temp
where
temp.row_num = 1 ;
Demo
The following is probably the most efficient solution in most databases (with the right index):
select t.*
from t
where t.seq = (select max(t2.seq) from t t2 where t2.orderno = t.orderno);
You can also do this with group by:
select orderno, max(seq), sta
from t
group by orderno, sta;
Note that all columns referenced in the select are either group by keys or arguments to aggregation functions. This is proper SQL.

Oracle ListaGG, Top 3 most frequent values, given in one column, grouped by ID

I have a problem regarding SQL query , it can be done in "plain" SQL, but as I am sure that I need to use some group concatenation (can't use MySQL) so second option is ORACLE dialect as there will be Oracle database. Let's say we have following entities:
Table: Veterinarian visits
Visit_Id,
Animal_id,
Veterinarian_id,
Sickness_code
Let's say there is 100 visits (100 visit_id) and each animal_id visits around 20 times.
I need to create a SELECT , grouped by Animal_id with 3 columns
animal_id
second shows aggregated amount of flu visits for this particular animal (let's say flu, sickness_code = 5)
3rd column shows top three sicknesses codes for each animal (top 3 most often codes for this particular animal_id)
How to do it? First and second columns are easy, but third? I know that I need to use LISTAGG from Oracle, OVER PARTITION BY, COUNT and RANK, I tried to tie it together but didn't work out as I expected :( How should this query look like?
Here sample data
create table VET as
select
rownum+1 Visit_Id,
mod(rownum+1,5) Animal_id,
cast(NULL as number) Veterinarian_id,
trunc(10*dbms_random.value)+1 Sickness_code
from dual
connect by level <=100;
Query
basically the subqueries do the following:
aggregate count and calculate flu count (in all records of the animal)
calculate RANK (if you need realy only 3 records use ROW_NUMBER - see discussion below)
Filter top 3 RANKs
LISTAGGregate result
with agg as (
select Animal_id, Sickness_code, count(*) cnt,
sum(case when SICKNESS_CODE = 5 then 1 else 0 end) over (partition by animal_id) as cnt_flu
from vet
group by Animal_id, Sickness_code
), agg2 as (
select ANIMAL_ID, SICKNESS_CODE, CNT, cnt_flu,
rank() OVER (PARTITION BY ANIMAL_ID ORDER BY cnt DESC) rnk
from agg
), agg3 as (
select ANIMAL_ID, SICKNESS_CODE, CNT, CNT_FLU, RNK
from agg2
where rnk <= 3
)
select
ANIMAL_ID, max(CNT_FLU) CNT_FLU,
LISTAGG(SICKNESS_CODE||'('||CNT||')', ', ') WITHIN GROUP (ORDER BY rnk) as cnt_lts
from agg3
group by ANIMAL_ID
order by 1;
gives
ANIMAL_ID CNT_FLU CNT_LTS
---------- ---------- ---------------------------------------------
0 1 6(5), 1(4), 9(3)
1 1 1(5), 3(4), 2(3), 8(3)
2 0 1(5), 10(3), 4(3), 6(3), 7(3)
3 1 5(4), 2(3), 4(3), 7(3)
4 1 2(5), 10(4), 1(2), 3(2), 5(2), 7(2), 8(2)
I intentionally show Sickness_code(count visits) to demonstarte that top 3 can have ties that you should handle.
Check the RANK function. Using ROW_NUMBER is not deterministic in this case.
I think the most natural way uses two levels of aggregation, along with a dash of window functions here and there:
select vas.animal,
sum(case when sickness_code = 5 then cnt else 0 end) as numflu,
listagg(case when seqnum <= 3 then sickness_code end, ',') within group (order by seqnum) as top3sicknesses
from (select animal, sickness_code, count(*) as cnt,
row_number() over (partition by animal order by count(*) desc) as seqnum
from visits
group by animal, sickness_code
) vas
group by vas.animal;
This uses the fact that listagg() ignores NULL values.

Aggregate function like MAX for most common cell in column?

Group by the highest Number in a column worked great with MAX(), but what if I would like to get the cell that is at most common.
As example:
ID
100
250
250
300
200
250
So I would like to group by ID and instead of get the lowest (MIN) or highest (MAX) number, I would like to get the most common one (that would be 250, because there 3x).
Is there an easy way in SQL Server 2012 or am I forced to add a second SELECT where I COUNT(DISTINCT ID) and add that somehow to my first SELECT statement?
You can use dense_rank to return all the id's with the highest counts. This would handle cases when there are ties for the highest counts as well.
select id from
(select id, dense_rank() over(order by count(*) desc) as rnk from tablename group by id) t
where rnk = 1
A simple way to do what you want uses top and order by:
SELECT top 1 id
FROM t
GROUP BY id
ORDER BY COUNT(*) DESC;
This is a statistic called the mode. Getting the mode and max is a bit challenging in SQL Server. I would approach it as:
WITH cte AS (
SELECT t.id, COUNT(*) AS cnt,
row_number() OVER (ORDER BY COUNT(*) DESC) AS seqnum
FROM t
GROUP BY id
)
SELECT MAX(id) AS themax, MAX(CASE WHEN seqnum = 1 THEN id END) AS MODE
FROM cte;

Select Record with Maximum Creation Date

Let us say that I have a database table with the following two records:
CACHE_ID BUSINESS_DATE CREATED_DATE
1183 13-09-06 13-09-19 16:38:59.336000000
1169 13-09-06 13-09-24 17:19:05.762000000
1152 13-09-06 13-09-17 14:18:59.336000000
1173 13-09-05 13-09-19 15:48:59.136000000
1139 13-09-05 13-09-24 12:59:05.263000000
1152 13-09-05 13-09-27 13:28:59.332000000
I need to write a query that will return the CACHE_ID for the record which has the most recent CREATED_DATE.
I am having trouble crafting such a query. I can do a GROUP BY based on BUSINESS_DATE and get the MAX(CREATED_DATE)...of course, I won't have the CACHE_ID of the record.
Could someone help with this?
Not positive on oracle syntax, but use the ROW_NUMBER() function:
SELECT BUSINESS_DATE, CACHE_ID
FROM (SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY BUSINESS_DATE ORDER BY CREATED_DATE DESC) RN
FROM YourTable t
)sub
WHERE RN = 1
The ROW_NUMBER() function assigns a number to each row. PARTITION BY is optional, but used to start the numbering over for each value in that group,  ie: if you PARTITION BY BUSINESS_DATE  then for each unique BUSINESS_DATE value the numbering would start over at 1.  ORDER BY of course is used to define how the counting should go, and is required in the ROW_NUMBER() function.
You want to group on business date, and get the CACHE_ID with the most current created date? Use something like this:
select yt.CACHE_ID, yt.BUSINESS_DATE, yt.CREATED_DATE
from YourTable yt
where yt.CREATED_DATE = (select max(yt1.CREATED_DATE)
from YourTable yt1
where yt1.BUSINESS_DATE = yt.BUSINESS_DATE)
Not sure of the exact syntax, but conceptually, can't you just sort by CREATED_DATE descending and take the first one?
Across all records -
select top 1 CACHE_ID from YourTable order by CREATED_DATE desc
For each BUSINESS_DATE -
select distinct
a.BUSINESS_DATE,
(
select top 1 b.CACHE_ID
from YourTable b where a.BUSINESS_DATE = b.BUSINESS_DATE
order by b.CREATED_DATE desc
) as Last_CREATED_DATE
from YourTable a