UPD: thanks for all, topic closed, after sleeping I understand everything =)
I have a problem with understanding OVER clause and and ROW_NUMBER function. Simple table - name and mark. I want to calculate average mark for each name.
SELECT top 1 with ties name, ROW_NUMBER() over (PARTITION BY name ORDER BY name) as number
FROM table
ORDER BY AVG(mark) OVER(PARTITION BY name)
it will display something like this, and I understand why - that is what ROW_NUMBER() does
name|number
Pete 1
Pete 2
But if I write
SELECT top 1 with ties name, ROW_NUMBER() over (PARTITION BY name ORDER BY name) as number
FROM table
ORDER BY AVG(mark) OVER(PARTITION BY name), number
it will display
name|number
Pete 1
And this time I don't understand how ORDER BY works with ROW_NUMBER() function. Can somebody explain it to me?
You can certainly order by ROW_NUMBER column because the SELECT clause is evaluated before the ORDER BY clause. You can ORDER BY any column or column alias. This is why no error message was thrown (because it is valid).
SELECT name, ROW_NUMBER() over (PARTITION BY name ORDER BY name) as number
FROM #table
ORDER BY number
Evaluates to
name number
---------- --------------------
John 1
pete 1
pete 2
John 2
pete 3
OP's second example of row_number is not correct.
SELECT AVG(mark) OVER(PARTITION BY name), name, ROW_NUMBER() over (PARTITION BY name ORDER BY name) as number
FROM #table
ORDER BY AVG(mark) OVER(PARTITION BY name), number
Returns as expected because AVG is the first sort column followed by number.
name number
----------- ---------- --------------------
11 pete 1
11 pete 2
11 pete 3
17 John 1
17 John 2
Change the query to number DESC and pete is still first however the row numbers are descending order.
name number
----------- ---------- --------------------
11 pete 3
11 pete 2
11 pete 1
17 John 2
17 John 1
SQL Order of operations
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
You can't ORDER BY the ROW_NUMBER directly: I don't know why you didn't get an error on this case, but normally you would. Hence the use of derived tables or CTEs
SELECT
name, number
FROM
(
SELECT
name,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) as number,
AVG(mark) OVER (PARTITION BY name) AS nameavg
FROM table
) foo
ORDER BY
nameavg, number
However, PARTITION BY name ORDER BY name is meaningless. Each partition has random order because the sort is the partition
I suspect you want something like this where ROW_NUMBER is based on AVG
SELECT
name, number
FROM
(
SELECT
name,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY nameavg) AS number
FROM
(
SELECT
name,
AVG(mark) OVER (PARTITION BY name) AS nameavg
FROM table
) foo
) bar
ORDER BY
number
Or more traditionally (but name is collapsed for the average)
SELECT
name, number
FROM
(
SELECT
name,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY nameavg) AS number
FROM
(
SELECT
name,
AVG(mark) AS nameavg
FROM
table
GROUP BY
name
) foo
) bar
ORDER BY
number
You can maybe collapse the derived foo and bar into one with
ROW_NUMBER() OVER (PARTITION BY name ORDER BY AVG(mark))
But none of this makes sense: I understand that your question is abstract about how it works bit it is unclear question. It would make more sense if you described what you want in plain English and with sample input and output
Related
So I have a table employees as shown below
ID | name | department
---|------|-----------
1 | john | home
2 | alex | home
3 | ryan | tech
I'm trying to group these by the department number and have the count displayed. But I am trying to select the second most common, which in this case it should return (tech 1). Any help on how to approach this is appreciated.
Edit:
By only using MINUS, I'm still not familiar with LIMIT when searching around online.
We can use COUNT along with DENSE_RANK:
WITH cte AS (
SELECT department, COUNT(*) AS cnt,
DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) rnk
FROM yourTable
GROUP BY department
)
SELECT department, cnt
FROM cte
WHERE rnk = 2;
As of Oracle 12c, you might find the following limit query satisfactory:
SELECT department, COUNT(*) AS cnt
FROM yourTable
GROUP BY department
ORDER BY COUNT(*) DESC
OFFSET 1 ROWS FETCH NEXT 1 ROWS ONLY;
But this limit approach does not handle well the scenario where e.g. there might be 2 or more departments ties for first place. DENSE_RANK does a better job of handling such edge cases.
I was wondering if it's possible to use SQL (preferably snowflake) to select up to N records given certain criteria.
To illustrate:
Lets say I have a table with 1 million records, containing full names and phone numbers.
There's no limits on the amount of phone numbers that can be assigned to X person, but I only want to select up to 10 numbers per person, even if the person has more than 10.
Notice I don't want to select just 10 records, I want the query to return every name in the table, I only want to ignore extra phone numbers when the person already has 10 of them.
Can this be done?
You can use window functions to solve this greatest-n-per-group problem:
select t.*
from (
select
t.*,
row_number() over(partition by name order by phone_number) rn
from mytable t
) t
where rn <= 10
Note that you need an ordering column to define what "top 10" actually means. I assumed phone_number, but you can change that to whatever suits your use case best.
Better yet: as commented by waldente, snowflake has the qualify syntax, which removes the need for a subquery:
select t.*
from mytable t
qualify row_number() over(partition by name order by phone_number) <= 10
This query will help your requirement:
select
full_name,
phonenumber
from
(select
full_name,
phonenumber,
ROW_NUMBER() over (partition by phonenumber order by full_name desc) as ROW_NUMBER from sample_tab) a
where
a.row_number between 1 and 10
order by
full_name asc,
phonenumber desc;
using Snowflake Qualify function:
select
full_name,
phonenumber
from
sample_tab qualify row_number() over (partition by phonenumber order by full_name) between 1 and 10
order by
full_name asc ,
phonenumber desc;
Consider a table with two columns: mark and name
I need to get the second highest value, and the name of the second highest value.
You can use ROW_NUMBER(), RANK(), DENSE_RANK() functions in SQL.
But in this case, you'll have to use DENSE_RANK() because there may be a condition where 2 or more students may have scored maximum marks, in such case you can't use ROW_NUMBER() or RANK().
Learn more about this functions click here
SELECT * FROM (
SELECT name, mark, DENSE_RANK() over (order by mark desc) RankNo
FROM tablename
) AS Result
WHERE Result.RankNo = 2
SELECT *
FROM (SELECT name,
mark,
Row_number() over(ORDER BY mark DESC) AS rownums
FROM employees)
WHERE rownums = 2;
SELECT name,mark FROM table ORDER BY mark desc limit 1,1
This code sort all records by mark in descending order. limit 1,1 skips the first result (first 1 = first record) and then returns the next result (second 1 = second record).
I have the following problem:
My table looks like this.
ID Name
1 Company LTD.
1 Company Limited
1 Company ltd
2 Example Corp.
2 Example Corporation
...
Since they are "different" names for the same company, I just decided to keep the longest name as my company name.
So my question is. How do I check for the longest one and at the same moment just keep one entry, e.g.
ID Name
1 Company Limited
2 Example Corporation
The table should look like this afterwards.
You can do this with a ROW_NUMBER() with a PARTITION on the ID and ordering by the LEN() desc:
;With Cte As
(
Select *,
Row_Number() Over (Partition By Id Order By Len(Name) Desc) As RN
From YourTable
)
Delete Cte
Where RN <> 1
Note: This will physically remove the records from your table that are not the longest entry. If you do not wish to physically remove them, and only SELECT the longest entries, use the following instead:
;With Cte As
(
Select *,
Row_Number() Over (Partition By Id Order By Len(Name) Desc) As RN
From YourTable
)
Select Id, Name
From Cte
Where RN = 1
One more option with Ties...
select
top 1 with ties
id,name
from
table
order by
row_number() over (partition by id order by len(name))
Let's say I have the following table
Sku | Number | Name
11 1 hat
12 1 hat
13 1 hats
22 2 car
33 3 truck
44 4 boat
45 4 boat
Is there an easy way to figure out how to find the differences within each Number. For example, with the table above, I would want the query to output:
13 | 1 | hats
The reason for this is because our program processes the rows as long as the number matches the name. If there is an instance where the name doesn't match but the rest of the names do, it will fail.
You can find the most common value (the "mode") using window functions and aggregation:
select t.*
from (select number, name, count(*) as cnt,
row_number() over (partition by number order by count(*) desc) as seqnum
from t
group by number, name
) t
where seqnum = 1;
You could then find everything that is not the mode using a join. The easier way is just to change the where condition:
select t.*
from (select number, name, count(*) as cnt,
row_number() over (partition by number order by count(*) desc) as seqnum
from t
group by number, name
) t
where seqnum > 1;
Note: If there are ties in frequency for the most common value, then an arbitrary most common value is chosen.
EDIT:
Actually, if you want the original skus, you might as well do the join:
with modes as (
select t.*
from (select number, name, count(*) as cnt,
row_number() over (partition by number order by count(*) desc) as seqnum
from t
group by number, name
) t
where seqnum = 1
)
select t.*
from t join
modes
on t.number = modes.number and t.name <> modes.name;
This will ignore NULL values (but the logic can easily be fixed to accommodate them).