select value based on max of other column - sql

I have a few questions about a table I'm trying to make in Postgres.
The following table is my input:
id
area
count
function
1
100
20
living
1
200
30
industry
2
400
10
living
2
400
10
industry
2
400
20
education
3
150
1
industry
3
150
1
education
I want to group by id and get the dominant function based on max area. With summing up the rows for area and count. When area is equal it should be based on max count, when area and count is equal it should be based on prior function (i still have to decide if education is prior to industry or vice versa). So the result should be:
id
area
count
function
1
300
50
industry
2
1200
40
education
3
300
2
industry
I tried a lot of things and maybe it's easy, but i don't get it. Can someone help to get the right SQL?

One method uses row_number() and conditional aggregation:
select id, sum(area), sum(count),
max(function) over (filter where seqnum = 1) as function
from (select t.*,
row_number() over (partition by id order by area desc) as seqnum
from t
) t
group by id;
Another method uses ``distinct on`:
select id, sum(area) over (partition by id) as area,
sum(count) over (partition by id) as count,
function
from t
order by id, area desc;

Use a scalar sub-query for "function".
select t.id, sum(t.area), sum(t.count),
(
select "function"
from the_table
where id = t.id
order by area desc, count desc, "function" desc
limit 1
) as "function"
from the_table as t
group by t.id order by t.id;
SQL Fiddle

you can use sum as window function:
select distinct on (t.id)
id,
sum(area) over (partition by id) as area,
sum(count) over (partition by id) as count,
( select function from tbl_test where tbl_test.id = t.id order by count desc limit 1 ) as function
from tbl_test t

This is how you get the function for each group based on id:
select id, function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null;
(we ensure that no yt2 exists that would be of the same id but of higher areay)
This would work nicely, but you might have several max areas with different values. To cope with this isue, let's ensure that exactly one is chosen:
select id, max(function) as function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null
group by id;
Now, let's join this to our main table;
select yourtable.id, sum(yourtable.area), sum(yourtable.count), t.function
from yourtable
join (
select id, max(function) as function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null
group by id
) t
on yourtable.id = t.id
group by yourtable.id;

Related

How to choose max of one column per other column

I am using SQL Server and I have a table "a"
month segment_id price
-----------------------------
1 1 100
1 2 200
2 3 50
2 4 80
3 5 10
I want to make a query which presents the original columns where the price will be the max per month
The result should be:
month segment_id price
----------------------------
1 2 200
2 4 80
3 5 10
I tried to write SQL code:
Select
month, segment_id, max(price) as MaxPrice
from
a
but I got an error:
Column segment_id is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
I tried to fix it in many ways but didn't find how to fix it
Because you need a group by clause without segment_id
Select month, max(price) as MaxPrice
from a
Group By month
as you want results per each month, and segment_id is non-aggregated in your original select statement.
If you want to have segment_id with maximum price repeating per each month for each row, you need to use max() function as window analytic function without Group by clause
Select month, segment_id,
max(price) over ( partition by month order by segment_id ) as MaxPrice
from a
Edit (due to your lastly edited desired results) : you need one more window analytic function row_number() as #Gordon already mentioned:
Select month, segment_id, price From
(
Select a.*,
row_number() over ( partition by month order by price desc ) as Rn
from a
) q
Where rn = 1
I would recommend a correlated subquery:
select t.*
from t
where t.price = (select max(t2.price) from t t2 where t2.month = t.month);
The "canonical" solution is to use row_number():
select t.*
from (select t.*,
row_number() over (partition by month order by price desc) as seqnum
from t
) t
where seqnum = 1;
With the right indexes, the correlated subquery often performs better.
Only because it was not mentioned.
Yet another option is the WITH TIES clause.
To be clear, the approach by Gordon and Barbaros would be a nudge more performant, but this technique does not require or generate an extra column.
Select Top 1 with ties *
From YourTable
Order By row_number() over (partition by month order by price desc)
With not exists:
select t.*
from tablename t
where not exists (
select 1 from tablename
where month = t.month and price > t.price
)
or:
select t.*
from tablename inner join (
select month, max(price) as price
from tablename
group By month
) g on g.month = t.month and g.price = t.price

Finding top count of a value in a table using SQL

I'm looking for a way to find the top count value of a column by SQL.
If for example this is my data
id type
----------
1 A
1 B
1 A
2 C
2 D
2 D
I would like the result to be:
1 A
2 D
I'm looking for a way to do it without groping by the column I count (type in the example)
Thanks
Statistically, this is called the "mode". You can calculate it using window functions:
select id, type, cnt
from (select id, type, count(*) as cnt,
row_number() over (partition by id order by count(*) desc) as seqnum
from t
group by id, type
) t
where seqnum = 1;
If there are ties, then an arbitrary value is chosen from among the ties.
You are looking for the statistic mode (the most often ocurring value):
select id, stats_mode(type)
from mytable
group by id
order by id;
Not all DBMS support this however. Check your docs, wheher this function or a similar one is available in your DBMS.
Just GROUP BY id, type and keep the rows with the maximum counter:
select id, type
from tablename
group by id, type
having count(*) = (
select count(*) from tablename group by id, type order by count(*) desc limit 1
)
See the demo
Or
select id, type
from tablename
group by id, type
having count(*) = (
select max(t.counter) from (select count(*) counter from tablename group by id, type) t
)
See the demo

How to get max value and using group by clause

I have a query like this:
select transactions_id,
time_stamp,
clock
from times
group by transactions_id
having sum(distinct type) = 1
now, I would like to get max value depending on id.
I used below queries but not worked:
select max(id),
transactions_id,
time_stamp,
clock
from times
group by transactions_id
having sum(distinct type) = 1
or
select transactions_id,
time_stamp,
clock
from times
group by transactions_id
having sum(distinct type) = 1
and max(id)
for example:
I have three conditions:
type must be 1
group by transactions_id
max id
You can find aggregates in one query and join its result with the table to get the relevant rows.
select *
from times t1
join (
select transactions_id,
max(id) as id
from times
where type = 1
group by transactions_id
) t2 using (transactions_id, id);
If I understand correctly, you can use the ANSI standard row_number() function:
select t.*
from (select t.*,
row_number() over (partition by transactions_id order by id desc) as seqnum
from times t
) t
where seqnum = 1;
I am not sure what having sum(distinct type) = 1. That condition is not explained in the question.

Aggregate function like MAX for most common cell in column?

Group by the highest Number in a column worked great with MAX(), but what if I would like to get the cell that is at most common.
As example:
ID
100
250
250
300
200
250
So I would like to group by ID and instead of get the lowest (MIN) or highest (MAX) number, I would like to get the most common one (that would be 250, because there 3x).
Is there an easy way in SQL Server 2012 or am I forced to add a second SELECT where I COUNT(DISTINCT ID) and add that somehow to my first SELECT statement?
You can use dense_rank to return all the id's with the highest counts. This would handle cases when there are ties for the highest counts as well.
select id from
(select id, dense_rank() over(order by count(*) desc) as rnk from tablename group by id) t
where rnk = 1
A simple way to do what you want uses top and order by:
SELECT top 1 id
FROM t
GROUP BY id
ORDER BY COUNT(*) DESC;
This is a statistic called the mode. Getting the mode and max is a bit challenging in SQL Server. I would approach it as:
WITH cte AS (
SELECT t.id, COUNT(*) AS cnt,
row_number() OVER (ORDER BY COUNT(*) DESC) AS seqnum
FROM t
GROUP BY id
)
SELECT MAX(id) AS themax, MAX(CASE WHEN seqnum = 1 THEN id END) AS MODE
FROM cte;

Group By Retrieve 4 Values

I have the following query
SELECT Cod ,
MIN(Id) AS id_Min,
-- retrieve value min in the middle as id_Min_Middle,
-- retrieve value max in the middle as id_Max_Middle,
MAX(Id) AS id_Max,
COUNT(*) AS Tot
FROM Table a ( NOLOCK )
GROUP BY Cod
HAVING COUNT(*)=4
How could I retrieve the values between min and max as I have done for min and max?
If I use (SUM(Id) - (MIN(Id)+MAX(Id)) I get the sum of middle min and max, but not the values I want.
EXAMPLES
Cod | Id
Stack 10
Stack 15
Stack 11
Stack 40
Overflow 1
Overflow 120
Overflow 15
Overflow 100
Required output
Cod | Min | Min_In_The_Middle | Max_In_The_Middle | Max
Stack 10 11 15 40
Overflow 1 15 100 120
Just only one [Table|[Clustered] Index]]Scan (demo here):
SELECT pvt.Cod,
pvt.[1] AS MinValue,
pvt.[2] AS MinInterValue,
pvt.[3] AS MaxInterValue,
pvt.[4] AS MaxValue
FROM
(
SELECT x.Cod, x.ID, x.RowNumAsc
FROM
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY t.Cod ORDER BY t.ID ASC) RowNumAsc,
ROW_NUMBER() OVER(PARTITION BY t.Cod ORDER BY t.ID DESC) RowNumDesc
FROM MyTable t
) x
WHERE x.RowNumAsc = 1 AND x.RowNumDesc = 4
OR x.RowNumAsc = 2 AND x.RowNumDesc = 3
OR x.RowNumAsc = 3 AND x.RowNumDesc = 2
OR x.RowNumAsc = 4 AND x.RowNumDesc = 1
) y
PIVOT ( MAX(y.ID) FOR y.RowNumAsc IN ([1], [2], [3], [4]) ) pvt;
Try using this, best of luck
WITH temp AS
(SELECT cod, MIN (ID) min_id, MAX (ID) max_id
FROM tab
GROUP BY cod
HAVING COUNT (ID) = 4)
SELECT code, temp.min_id,
(SELECT MIN (ID)
FROM tab
WHERE cod = temp.cod AND ID NOT IN (temp.min_id)
GROUP BY cod) min_mid_id,
(SELECT MAX (ID)
FROM tab
WHERE cod = temp.cod AND ID NOT IN (temp.max_id)
GROUP BY cod) max_min_id, temp.max_id
FROM temp;
I'm not sure what it means for your question to be tagged plsql and sql-server. But I'll assume you're working with a database system that supports CTEs and window functions.
To generalize what you're been trying to do, first assign row numbers to the rows, then use whatever technique you want to achieve the pivot:
;WITH OrderedValues as (
SELECT Cod,Id,ROW_NUMBER() OVER (PARTITION BY Cod ORDER BY Id) as rn
COUNT(*) OVER (PARTITION BY Cod) as Cnt
FROM Table (NOLOCK)
), With4Values as (
SELECT * from OrderedValues where Cnt=4
)
SELECT Cod,
--However you want to do the pivot. Here I'll use MAX/CASE
MAX(CASE WHEN rn=1 THEN Id END) as Value1,
MAX(CASE WHEN rn=2 THEN Id END) as Value2,
MAX(CASE WHEN rn=3 THEN Id END) as Value3,
MAX(CASE WHEN rn=4 THEN Id END) as Value4
FROM
With4Values
GROUP BY
Cod
You can hopefully see that this is more easily extended to more columns than answering your overly specific questions about 3 rows, or 4 rows. But if you need to deal with an arbitrary number of columns, you'll have to switch to dynamic SQL.
I understand you want to exclude the extreme values and find min and max for the rest.
This is what I think of, but I had no chance to run and test it...
WITH Extremes AS ( SELECT Cod, MAX(ID) AS Id_Max, MIN(ID) AS Id_Min
FROM [Table] a GROUP BY Cod)
SELECT
e.Cod,
e.Id_Min,
MIN(a.Id) AS id_Min_Middle,
MAX(a.Id) AS id_Max_Middle,
e.Id_Max
FROM Extremes e
LEFT JOIN [Table] a ON a.Cod = e.Cod AND a.Id > e.Id_Min AND a.Id < e.Id_Max
GROUP BY e.Cod