How to choose max of one column per other column - sql

I am using SQL Server and I have a table "a"
month segment_id price
-----------------------------
1 1 100
1 2 200
2 3 50
2 4 80
3 5 10
I want to make a query which presents the original columns where the price will be the max per month
The result should be:
month segment_id price
----------------------------
1 2 200
2 4 80
3 5 10
I tried to write SQL code:
Select
month, segment_id, max(price) as MaxPrice
from
a
but I got an error:
Column segment_id is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
I tried to fix it in many ways but didn't find how to fix it

Because you need a group by clause without segment_id
Select month, max(price) as MaxPrice
from a
Group By month
as you want results per each month, and segment_id is non-aggregated in your original select statement.
If you want to have segment_id with maximum price repeating per each month for each row, you need to use max() function as window analytic function without Group by clause
Select month, segment_id,
max(price) over ( partition by month order by segment_id ) as MaxPrice
from a
Edit (due to your lastly edited desired results) : you need one more window analytic function row_number() as #Gordon already mentioned:
Select month, segment_id, price From
(
Select a.*,
row_number() over ( partition by month order by price desc ) as Rn
from a
) q
Where rn = 1

I would recommend a correlated subquery:
select t.*
from t
where t.price = (select max(t2.price) from t t2 where t2.month = t.month);
The "canonical" solution is to use row_number():
select t.*
from (select t.*,
row_number() over (partition by month order by price desc) as seqnum
from t
) t
where seqnum = 1;
With the right indexes, the correlated subquery often performs better.

Only because it was not mentioned.
Yet another option is the WITH TIES clause.
To be clear, the approach by Gordon and Barbaros would be a nudge more performant, but this technique does not require or generate an extra column.
Select Top 1 with ties *
From YourTable
Order By row_number() over (partition by month order by price desc)

With not exists:
select t.*
from tablename t
where not exists (
select 1 from tablename
where month = t.month and price > t.price
)
or:
select t.*
from tablename inner join (
select month, max(price) as price
from tablename
group By month
) g on g.month = t.month and g.price = t.price

Related

Second minimum value for every customer

I am using MySQL database. So, there are two columns I am working on, CustomerId, and OrderDate. I want to find a second-order date (2nd minimum order date) for each customer.
If you are using MySQL 8+, then ROW_NUMBER can be used here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY OrderDate) rn
FROM yourTable
)
SELECT CustomerId, OrderDate
FROM cte
WHERE rn = 2;
I would recommend using dense_rank as it can give you correct result even if there is duplicate order_date as follows:
SELECT * FROM
(SELECT t.*, DENSE_RANK() OVER (PARTITION BY CustomerId ORDER BY OrderDate) dr
FROM yourTable t
) t where dr = 2;
You can use corelated sub-query as follows if your MySQL version do not support analytical functions as follows:
SELECT T.*
FROM YOURTABLE T
WHERE 1 = (
SELECT COUNT(DISTINCT ORDER_DATE)
FROM YOURTABLE TT
WHERE TT.ORDER_DATE > T.ORDER_DATE
)
I would use a subquery like this:
select o.*
from orders o
where o.order_date = (select o2.order_date
from orders o2
where o2.customer_id = o.customer_id
order by o2.order_date
limit 1 offset 1
);
The subquery is a correlated subquery that returns the second date. If you want the second date with other columns, it can be moved to the select.
With an index on (customer_id, order_date), this is likely to be the fastest solution.
This assumes that there is one row per date (or that if there are multiple rows, "second" can be the earliest date). If you want the second distinct date then use select distinct int he subquery -- however select distinct and group by would incur additional overhead.

How to get the records from inner query results with the MAX value

The results are below. I need to get the records (seller and purchaser) with the max count- grouped by purchaser (marked with yellow)
You can use window functions:
with q as (
<your query here>
)
select q.*
from (select q.*,
row_number() over (order by seller desc) as seqnum_s,
row_number() over (order by purchaser desc) as seqnum_p
from q
) q
where seqnum_s = 1 or seqnum_p = 1;
Try this:
SELECT COUNT,seller,purchaser FROM YourTable ORDER BY seller,purchaser DESC
SELECT T2.MaxCount,T2.purchaser,T1.Seller FROM <Yourtable> T1
Inner JOIN
(
Select Max(Count) as MaxCount, purchaser
FROM <Yourtable>
GROUP BY Purchaser
)T2
On T2.Purchaser=T1.Purchaser AND T2.MaxCount=T1.Count
First you select the Seller from which will give you a list of all 5 sellers. Then you write another query where you select only the Purchaser and the Max(count) grouped by Purchaser which will give you the two yellow-marked lines. Join the two queries on fields Purchaser and Max(Count) and add the columns from the joined table to your first query.
I can't think of a faster way but this works pretty fast even with rather large queries. You can further-by order the fields as needed.

Aggregate function like MAX for most common cell in column?

Group by the highest Number in a column worked great with MAX(), but what if I would like to get the cell that is at most common.
As example:
ID
100
250
250
300
200
250
So I would like to group by ID and instead of get the lowest (MIN) or highest (MAX) number, I would like to get the most common one (that would be 250, because there 3x).
Is there an easy way in SQL Server 2012 or am I forced to add a second SELECT where I COUNT(DISTINCT ID) and add that somehow to my first SELECT statement?
You can use dense_rank to return all the id's with the highest counts. This would handle cases when there are ties for the highest counts as well.
select id from
(select id, dense_rank() over(order by count(*) desc) as rnk from tablename group by id) t
where rnk = 1
A simple way to do what you want uses top and order by:
SELECT top 1 id
FROM t
GROUP BY id
ORDER BY COUNT(*) DESC;
This is a statistic called the mode. Getting the mode and max is a bit challenging in SQL Server. I would approach it as:
WITH cte AS (
SELECT t.id, COUNT(*) AS cnt,
row_number() OVER (ORDER BY COUNT(*) DESC) AS seqnum
FROM t
GROUP BY id
)
SELECT MAX(id) AS themax, MAX(CASE WHEN seqnum = 1 THEN id END) AS MODE
FROM cte;

Need to change LIMIT into something else

Is there a way to change "LIMIT 1" and get the same output? I have to get client's name, surname and a quantity of books that has the most books
SELECT stud.skaitytojas.name, stud.skaitytojas.surname,
COUNT (stud.skaitytojas.nr) AS quantity
FROM stud.egzempliorius , stud.skaitytojas
WHERE stud.egzempliorius.client = stud.skaitytojas.nr
GROUP BY stud.skaitytojas.nr
ORDER BY quantity DESC
LIMIT 1
Postgres supports the ANSI standard FETCH FIRST 1 ROW ONLY, so you can do:
SELECT s.name, s.surname, COUNT(s.nr) AS quantity
FROM stud.egzempliorius e JOIN
stud.skaitytojas s
ON e.client = s.nr
GROUP BY s.name, s.surname
ORDER BY quantity DESC
FETCH FIRST 1 ROW ONLY;
Also notice the use of table aliases and proper JOIN syntax. I also prefer to list the columns in the SELECT in the GROUP BY, although that is optional if s.nr is unique.
You can select the row with the highest quantity using row_number()
SELECT * FROM (
SELECT * , row_number() over (order by quantity desc) rn FROM (
SELECT stud.skaitytojas.name, stud.skaitytojas.surname, COUNT (stud.skaitytojas.nr) AS quantity
FROM stud.egzempliorius , stud.skaitytojas
WHERE stud.egzempliorius.client = stud.skaitytojas.nr
GROUP BY stud.skaitytojas.name, stud.skaitytojas.surname
) t
) t where rn = 1
If you want to include ties for the highest quantity, then use rank() instead.

Query to find the FIRST AND SECOND largest value from a group

i have a query like this:
SELECT
DATEPART(year,some_date),
DATEPART(month,some_date),
MAX(some_value) max_value
FROM
some_table
GROUP BY
DATEPART(year,some_date),
DATEPART(month,some_date)
This returns a table with: year, month, the largest value for the month.
I would like to modify the query so that i could obtain:
year, month, the largest value for the month, the second largest value for the month in each row.
It seems to me that the well-known solutions like "TOP 2", "NOT IN TOP 1" or a subselect won't work here.
(To be really specific - i am using SQL Server 2008.)
It seems to me that the question calls for a query that would return best, and second best in the same row for each month and year, like so:
month, year, best, second best
...
...
and not two rows for the same month and year containing best and second best value.
This is the solution that I came up with, so if anyone has a simpler way of achieving this, I would like to know.
with ranks as (
select
year(entrydate) as [year],
month(entrydate) as [month],
views,
rank() over (partition by year(entrydate), month(entrydate) order by views desc) as [rank]
from product
)
select
t1.year,
t1.month,
t1.views as [best],
t2.views as [second best]
from ranks t1
inner join ranks t2
on t1.year = t2.year
and t1.month = t2.month
and t1.rank = 1
and t2.rank = 2
EDIT: Just out of curiosity I did a bit more testing and ended up with a simpler variation on the Stephanie Page's answer that doesn't use an aditional subquery. And I changed the rank() function to row_number() as it doesn't work when two max values are the same.
with ranks as (
select
year(entrydate) as [year],
month(entrydate) as [month],
views,
row_number() over (partition by year(entrydate), month(entrydate) order by views desc) as [rank]
from product
)
select
t1.year,
t1.month,
max(case when t1.rank = 1 then t1.views else 0 end) as [best],
max(case when t1.rank = 2 then t1.views else 0 end) as [second best]
from
ranks t1
where
t1.rank in (1,2)
group by
t1.year, t1.month
RANK() is maybe the thing you are looking for...
http://msdn.microsoft.com/en-us/library/ms176102.aspx
to do this without joins ( I'll show the Oracle... you'll just use CASE instead of DECODES)
with ranks as (
select
year(entrydate) as [year],
month(entrydate) as [month],
views,
rank() over (partition by year(entrydate), month(entrydate) order by views desc) as [rank]
from product
)
SELECT [year], [month], Max([best]), Max([second best])
FROM
( select
t1.year,
t1.month,
Decode([rank],1,t1.views,0) as [best],
Decode([rank],2,t1.views,0) as [second best]
from ranks t1
where t1.rank <= 2 ) x
GROUP BY [year], [month]
This is a bit old-school but TOP and a subquery will work if you use ORDER BY. Try this:
SELECT TOP 2
DATEPART(year,some_date),
DATEPART(month,some_date),
(SELECT MAX(st1.some_value) FROM some_table AS st1
WHERE DATEPART(month,some_date) = DATEPART(month,st1.some_date)) AS max_value
FROM
some_table
GROUP BY
DATEPART(year,some_date),
DATEPART(month,some_date)
ORDER BY DATEPART(month,some_date) DESC
That will give you the two rows with the "highest" month values and the added subselect should give you the max from each grouping.
You can use a CTE with the ranking functions in SQL Server 2005 and up:
;WITH TopValues AS
(
SELECT
YEAR(some_date) AS 'Year',
MONTH(some_date) AS 'Month',
Some_Value,
ROW_NUMBER() OVER(PARTITION BY YEAR(some_date),MONTH(some_date)
ORDER BY Some_Value DESC) AS 'RowNumber'
FROM
dbo.some_table
)
SELECT
Year, Month, Some_Value
FROM
TopValues
WHERE
RowNumber <= 2
This will "partition" (i.e. group) your data by month/year, order inside each group by Some_Value descending (largest first), and then you can select the first two of each group from that CTE.
RANK() works as well (I most often use ROW_NUMBER) - it produces slightly different results, though - really depends on what your needs are.
Hmmm it's kind of a rig, but you can do this with subqueries... instead of using that max I'd select the some_values which have the matching year & month, row_number()=1 / row_number() = 2 respectively and order by some_value DESC.
The inability to use OFFSET / LIMIT like you can in SQLite is one of my dislikes about SQL Server.