Query to find the FIRST AND SECOND largest value from a group - sql

i have a query like this:
SELECT
DATEPART(year,some_date),
DATEPART(month,some_date),
MAX(some_value) max_value
FROM
some_table
GROUP BY
DATEPART(year,some_date),
DATEPART(month,some_date)
This returns a table with: year, month, the largest value for the month.
I would like to modify the query so that i could obtain:
year, month, the largest value for the month, the second largest value for the month in each row.
It seems to me that the well-known solutions like "TOP 2", "NOT IN TOP 1" or a subselect won't work here.
(To be really specific - i am using SQL Server 2008.)

It seems to me that the question calls for a query that would return best, and second best in the same row for each month and year, like so:
month, year, best, second best
...
...
and not two rows for the same month and year containing best and second best value.
This is the solution that I came up with, so if anyone has a simpler way of achieving this, I would like to know.
with ranks as (
select
year(entrydate) as [year],
month(entrydate) as [month],
views,
rank() over (partition by year(entrydate), month(entrydate) order by views desc) as [rank]
from product
)
select
t1.year,
t1.month,
t1.views as [best],
t2.views as [second best]
from ranks t1
inner join ranks t2
on t1.year = t2.year
and t1.month = t2.month
and t1.rank = 1
and t2.rank = 2
EDIT: Just out of curiosity I did a bit more testing and ended up with a simpler variation on the Stephanie Page's answer that doesn't use an aditional subquery. And I changed the rank() function to row_number() as it doesn't work when two max values are the same.
with ranks as (
select
year(entrydate) as [year],
month(entrydate) as [month],
views,
row_number() over (partition by year(entrydate), month(entrydate) order by views desc) as [rank]
from product
)
select
t1.year,
t1.month,
max(case when t1.rank = 1 then t1.views else 0 end) as [best],
max(case when t1.rank = 2 then t1.views else 0 end) as [second best]
from
ranks t1
where
t1.rank in (1,2)
group by
t1.year, t1.month

RANK() is maybe the thing you are looking for...
http://msdn.microsoft.com/en-us/library/ms176102.aspx

to do this without joins ( I'll show the Oracle... you'll just use CASE instead of DECODES)
with ranks as (
select
year(entrydate) as [year],
month(entrydate) as [month],
views,
rank() over (partition by year(entrydate), month(entrydate) order by views desc) as [rank]
from product
)
SELECT [year], [month], Max([best]), Max([second best])
FROM
( select
t1.year,
t1.month,
Decode([rank],1,t1.views,0) as [best],
Decode([rank],2,t1.views,0) as [second best]
from ranks t1
where t1.rank <= 2 ) x
GROUP BY [year], [month]

This is a bit old-school but TOP and a subquery will work if you use ORDER BY. Try this:
SELECT TOP 2
DATEPART(year,some_date),
DATEPART(month,some_date),
(SELECT MAX(st1.some_value) FROM some_table AS st1
WHERE DATEPART(month,some_date) = DATEPART(month,st1.some_date)) AS max_value
FROM
some_table
GROUP BY
DATEPART(year,some_date),
DATEPART(month,some_date)
ORDER BY DATEPART(month,some_date) DESC
That will give you the two rows with the "highest" month values and the added subselect should give you the max from each grouping.

You can use a CTE with the ranking functions in SQL Server 2005 and up:
;WITH TopValues AS
(
SELECT
YEAR(some_date) AS 'Year',
MONTH(some_date) AS 'Month',
Some_Value,
ROW_NUMBER() OVER(PARTITION BY YEAR(some_date),MONTH(some_date)
ORDER BY Some_Value DESC) AS 'RowNumber'
FROM
dbo.some_table
)
SELECT
Year, Month, Some_Value
FROM
TopValues
WHERE
RowNumber <= 2
This will "partition" (i.e. group) your data by month/year, order inside each group by Some_Value descending (largest first), and then you can select the first two of each group from that CTE.
RANK() works as well (I most often use ROW_NUMBER) - it produces slightly different results, though - really depends on what your needs are.

Hmmm it's kind of a rig, but you can do this with subqueries... instead of using that max I'd select the some_values which have the matching year & month, row_number()=1 / row_number() = 2 respectively and order by some_value DESC.
The inability to use OFFSET / LIMIT like you can in SQLite is one of my dislikes about SQL Server.

Related

Second minimum value for every customer

I am using MySQL database. So, there are two columns I am working on, CustomerId, and OrderDate. I want to find a second-order date (2nd minimum order date) for each customer.
If you are using MySQL 8+, then ROW_NUMBER can be used here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY OrderDate) rn
FROM yourTable
)
SELECT CustomerId, OrderDate
FROM cte
WHERE rn = 2;
I would recommend using dense_rank as it can give you correct result even if there is duplicate order_date as follows:
SELECT * FROM
(SELECT t.*, DENSE_RANK() OVER (PARTITION BY CustomerId ORDER BY OrderDate) dr
FROM yourTable t
) t where dr = 2;
You can use corelated sub-query as follows if your MySQL version do not support analytical functions as follows:
SELECT T.*
FROM YOURTABLE T
WHERE 1 = (
SELECT COUNT(DISTINCT ORDER_DATE)
FROM YOURTABLE TT
WHERE TT.ORDER_DATE > T.ORDER_DATE
)
I would use a subquery like this:
select o.*
from orders o
where o.order_date = (select o2.order_date
from orders o2
where o2.customer_id = o.customer_id
order by o2.order_date
limit 1 offset 1
);
The subquery is a correlated subquery that returns the second date. If you want the second date with other columns, it can be moved to the select.
With an index on (customer_id, order_date), this is likely to be the fastest solution.
This assumes that there is one row per date (or that if there are multiple rows, "second" can be the earliest date). If you want the second distinct date then use select distinct int he subquery -- however select distinct and group by would incur additional overhead.

How to choose max of one column per other column

I am using SQL Server and I have a table "a"
month segment_id price
-----------------------------
1 1 100
1 2 200
2 3 50
2 4 80
3 5 10
I want to make a query which presents the original columns where the price will be the max per month
The result should be:
month segment_id price
----------------------------
1 2 200
2 4 80
3 5 10
I tried to write SQL code:
Select
month, segment_id, max(price) as MaxPrice
from
a
but I got an error:
Column segment_id is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
I tried to fix it in many ways but didn't find how to fix it
Because you need a group by clause without segment_id
Select month, max(price) as MaxPrice
from a
Group By month
as you want results per each month, and segment_id is non-aggregated in your original select statement.
If you want to have segment_id with maximum price repeating per each month for each row, you need to use max() function as window analytic function without Group by clause
Select month, segment_id,
max(price) over ( partition by month order by segment_id ) as MaxPrice
from a
Edit (due to your lastly edited desired results) : you need one more window analytic function row_number() as #Gordon already mentioned:
Select month, segment_id, price From
(
Select a.*,
row_number() over ( partition by month order by price desc ) as Rn
from a
) q
Where rn = 1
I would recommend a correlated subquery:
select t.*
from t
where t.price = (select max(t2.price) from t t2 where t2.month = t.month);
The "canonical" solution is to use row_number():
select t.*
from (select t.*,
row_number() over (partition by month order by price desc) as seqnum
from t
) t
where seqnum = 1;
With the right indexes, the correlated subquery often performs better.
Only because it was not mentioned.
Yet another option is the WITH TIES clause.
To be clear, the approach by Gordon and Barbaros would be a nudge more performant, but this technique does not require or generate an extra column.
Select Top 1 with ties *
From YourTable
Order By row_number() over (partition by month order by price desc)
With not exists:
select t.*
from tablename t
where not exists (
select 1 from tablename
where month = t.month and price > t.price
)
or:
select t.*
from tablename inner join (
select month, max(price) as price
from tablename
group By month
) g on g.month = t.month and g.price = t.price

MS SQL add max()-1 to qyery

how to add to the query max(o.Acct)-1 rows. I need to visualize the last two o.Acct rows. My query is currently showing only the max(o.Acct)
SELECT Max(o.Acct) AS [MaxAcct],o.ObjectID,o.Opertype
FROM Operations o
GROUP By o.ObjectID,o.Opertype
If you want to see the last two rows (per group), you're better off using ROW_NUMBER() rather than GROUP BY.
SELECT
*
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY ObjectID,
Opertype
ORDER BY Acct DESC
)
AS sequence_id
FROM
Operations
)
sortedOperations
WHERE
sequence_id <= 2
ORDER BY
ObjectID,
Opertype,
Acct
If you want the last two of something, I'm thinking order by and top. Something like this:
select top (2) o.*
from Operations o
order by o.acct desc;

Need help to find the middle row using Row_Number

SELECT median.spaid
,median.total
,ROW_NUMBER() OVER (
ORDER BY median.total
) AS row
FROM (
SELECT SpaID
,COUNT(1) AS Total
FROM dbo.[Order](NOLOCK)
WHERE DateCreated BETWEEN '04-01-2014'
AND '04-30-2014'
GROUP BY SpaID
) AS median
ORDER BY median.total
My issue here is that I need to find the middle row for column "Total" using Row_number. I need to find which "SpaID" is linked to the middle row of the "Total" column.
This is a shot in the dark based on very sparse details but I think you are looking for something like this.
with numberedResults as
(
select spaid
, ROW_NUMBER() over(order by count(*)) as RowNum
from [order]
where DateCreated between '20140401' AND '20140630'
group by SpaID
)
, Medians as
(
select MAX(RowNum) / 2 as Median
, MAX(RowNum) as TotalCount
from numberedResults
)
select *
from numberedResults r
join Medians m on m.Median = r.RowNum
I would suggest not relying on ROW_NUMBER in your query as results using ROW_NUMBER can at times be unpredictable. I understand it seems bulky - -the challenge is the "median" is the middle of grouped rows. Here's the query I believe should work for you:
SELECT SpaID, d FROM
(SELECT SpaID,
DENSE_RANK() OVER (ORDER BY COUNT(1)) AS d
FROM dbo.[Order]
WHERE DateCreated BETWEEN '04-01-2014'
AND '04-30-2014'
GROUP BY SpaID)
WHERE D=
(SELECT ROUND(MAX(D)/2,0)
DENSE_RANK() OVER (ORDER BY COUNT(1)) AS d
FROM dbo.[Order]
WHERE DateCreated BETWEEN '04-01-2014'
AND '04-30-2014')
Here is one method of finding the median:
SELECT o.*
FROM (SELECT SpaID, COUNT(*) AS Total,
ROW_NUMBER() OVER (ORDER BY COUNT(*)) as seqnum,
COUNT(*) OVER () as cnt
FROM dbo.[Order](NOLOCK)
WHERE DateCreated BETWEEN '2014-04-01' AND '2014-04-30'
GROUP BY SpaID
) o
WHERE 2*o.seqnum IN (cnt - 1, cnt);
This is approximate when you have an even number of rows. You are looking for the exact row id, so you have to choose either the one before or after the median (which is between two rows).
Note: You should expression date constants using the ISO standard formats, either YYYYMMDD or YYYY-MM-DD. The first is the safest way in SQL Server (although I personally prefer the hyphens for readability).

How to write a derived query in Netezza SQL?

I need to query the data for inviteid based. For each inviteid I need to have the top 5 IDs and ID Descriptions.
I see that the query I wrote is taking all the time in the world to fetch. I didn't notice an error or anything wrong with it.
The code is:
SELECT count(distinct ID),
IDdesc,
inviteid,
A
FROM (
SELECT
ID,
IDdesc,
inviteid,
RANK() OVER(order by invtypeid asc ) A
FROM Fact_s
--WHERE dateid ='26012013'
GROUP BY invteid,IDdesc,ID
ORDER BY invteid,IDdesc,ID
) B
WHERE A <=5
GROUP BY A, IDDESC, inviteid
ORDER BY A
I'm not sure I understood you requirement completely, but as far as I can tell the group by in the derived table is not necessary (just as the order by as Mark mentioned) because you are using a window function.
And you probably want row_number() instead of rank() in there.
Including the result of rank() in the outer query seems dubious as well.
So this leads to the following statement:
SELECT count(distinct ID),
IDdesc,
inviteid
FROM (
SELECT ID,
IDdesc,
inviteid,
row_number() OVER (order by invtypeid asc ) as rn
FROM Fact_s
) B
WHERE rn <= 5
GROUP BY IDDESC, inviteid;