How to query data with sequence of duplicated rows - sql

I have the following table with data:
ID SAL
_______________
1 1500
1 1500
1 2000
1 1500
2 2500
2 2500
3 1500
I want to query the same data BUT with new column that gives me a sequence of each duplicated row (based on id and sal columns).
My desired query results should be like this:
ID SAL ROW_SEQ
_______________________
1 1500 1
1 1500 2
1 2000 1
1 1500 3
2 2500 1
2 2500 2
3 1500 1
Any one has any idea PLEASE!..

You want to use a ROW_NUMBER() function here.
SELECT ID, SAL, ROW_NUMBER() OVER (PARTITION BY ID, SAL, ORDER BY SAL) AS ROW_SEQ
FROM MyTable
ORDER BY ID, SAL
By partitioning over your two desired fields, it will give you a ranking within each group in a sequential manner.
The documentation on it can be found here:
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions137.htm

Related

SQL: finding the maximum average grouped by an ID

Consider following dataset:
id
name
mgr_id
salary
bonus
1
Paul
1
68000
10000
2
Lucas
2
29000
null
3
Max
1
50000
20000
4
Zack
2
30000
null
I now want to find the manager who pays his subordinates the highest average salary plus bonus. A manager is someone who is present in of the mgr_id cells. So in this example Paul and Lucas are managers because their id is present in the mgr_id column of themselves and Max for Paul and Zack for Lucas.
Basically I want MAX(AVG(salary + bonus)) and then grouped by the mgr_id. How can I do that?
I am using SQLite.
My expected output in this example would be simply the employee name 'Paul' because he has 2 subordinates (himself and Max) and pays them more than the other manager Lucas.
SELECT
mrg_id
, pay_avg
FROM
(
SELECT mrg_id
, AVG(salary + COALESCE(bonus,0)) pay_avg
FROM <table>
GROUP
BY mrg_id
) q
ORDER
BY pay_avg
desc
LIMIT 1
select top 1 t1.mgr_id,AVG((t1.salary)+(t1.bonus)) as tot_sal
from #tbl_emps as t1
group by t1.mgr_id
order by AVG((t1.salary)+(t1.bonus)) desc

Sql Query for Total Salary Deptno wise where more than two employess exits from the below table (tried lot can any one help me with this)

Empid Deptno Sal
1 1 1000
1 2 2000
2 3 3000
3 4 4000
4 1 5000
2 2 6000
Try below query :
select Sum(Sal),DeptNo from Table1
where Deptno in (
select Deptno from Table1
group by Deptno
having Count(DeptNo)>1
) group by DeptNo

Query to find least fifth salaried employee [duplicate]

This question already has answers here:
Select second most minimum value in Oracle
(3 answers)
Closed 4 years ago.
EMPID NAME DEPTID SALARY
---------- ------------------------------------------ --
101 surendra 201 1000
102 narendra 202 2000
103 rajesh 203 3000
104 ramesh 203 2000
105 hanumanth 202 10000
a) Write a Query to find least 5th (Least salary in 5th position from
least salary in the order) salaried employee?
b) Query to find the highest earning employee in each department
a) Write a Query to find least 5th (Least salary in 5th position from least salary in the order) salaried employee?
SELECT EMPID,NAME,DEPTID,SALARY
FROM
(
SELECT EMPID,NAME,DEPTID,SALARY, DENSE_RANK() OVER (ORDER BY SALARY) AS RN
FROM Table1
)
WHERE RN=5
b) Query to find the highest earning employee in each department
SELECT EMPID,NAME,DEPTID,SALARY
FROM
(
SELECT EMPID,NAME,DEPTID,SALARY,
DENSE_RANK() OVER (PARTITION BY DEPTID ORDER BY SALARY DESC) AS RN
FROM Table1
)
WHERE RN=1
Demo
http://sqlfiddle.com/#!4/63ce0/12
Let's assume we have a table named EmployeeSalary which has 3 columns:
Id
Name
Salary
Salary might be same for multiple employee according to the following table data
Id Name Salary
----------
6 Belalo 74
1 Karim 100
5 dIPU 100
4 Satter 102
9 Kiron 120
10 Rash 120
11 Harun 130
13 Baki 130
12 Munshi 132
2 Rahim 500
7 Kaif 987
8 Sony 987
3 Belal 4000
Now the query will be
SELECT A.Salary
FROM
(
SELECT DISTINCT Salary,
DENSE_RANK() OVER (ORDER BY Salary) AS Ranks
FROM [dbo].[EmployeeSalary]
) A
WHERE A.Ranks=5

Why cant we use rank() analytic function to delete duplicates in a table?

I have created an emp table with the following records in it.
create table emp(
EMPNO integer,
EMPNAME varchar2(20),
SALARY number);
select * from emp;
empno empname salary
10 bill 2000
11 bill 2000
12 mark 3000
12 mark 3000
12 mark 3000
12 philip 3000
12 john 3000
13 tom 4000
14 tom 4000
14 jerry 5000
14 matt 5000
15 susan 5000
To delete duplicates i have been using the rownum() function along with partition by and order by clause with the query as follows:
delete from emp where rowid in
(
select rid from
(
select rowid rid,
row_number() over(partition by empno order by empno) rn
from emp
)
where rn > 1
);
--6 rows deleted
The query deletes all the employee records with duplicate empno's and the result looks somethin like this:
empno empname salary
10 bill 2000
11 bill 2000
12 mark 3000
13 tom 4000
14 tom 4000
15 susan 5000
When i use the inner query to fetch the rownumbers for all the results in the table it gives me the following result:
select rowid as rid,empno,empname,
row_number() over(partition by empno order by empno) rn
from emp;
rowid rownumber
AACDJUAAPAAGLlTAAA 10 bill 1
AACDJUAAPAAGLlTAAB 11 bill 1
AACDJUAAPAAGLlTAAE 12 mark 1
AACDJUAAPAAGLlTAAD 12 mark 2
AACDJUAAPAAGLlTAAC 12 mark 3
AACDJUAAPAAGLlTAAF 12 philip 4
AACDJUAAPAAGLlTAAG 12 john 5
AACDJUAAPAAGLlTAAH 13 tom 1
AACDJUAAPAAGLlTAAI 14 tom 1
AACDJUAAPAAGLlTAAJ 14 jerry 2
AACDJUAAPAAGLlTAAK 14 matt 3
AACDJUAAPAAGLlTAAL 15 susan 1
But when i use rank() in place of the rownumber() function it gives me the following result:
select rowid as rid,empno,empname,
rank() over(partition by empno order by empno) rn
from emp;
rowid rank
AACDJUAAPAAGLlTAAA 10 bill 1
AACDJUAAPAAGLlTAAB 11 bill 1
AACDJUAAPAAGLlTAAE 12 mark 1
AACDJUAAPAAGLlTAAD 12 mark 1
AACDJUAAPAAGLlTAAC 12 mark 1
AACDJUAAPAAGLlTAAF 12 philip 1
AACDJUAAPAAGLlTAAG 12 john 1
AACDJUAAPAAGLlTAAH 13 tom 1
AACDJUAAPAAGLlTAAI 14 tom 1
AACDJUAAPAAGLlTAAJ 14 jerry 1
AACDJUAAPAAGLlTAAK 14 matt 1
AACDJUAAPAAGLlTAAL 15 susan 1
So my question here is why does rank() give the same value to all the records in the table even though there are duplicate empid's?
That's the way RANK() works. It would be rather surprising to get different RANK values for equal-ranking rows within the partition. In fact, the ORDER BY clause is the significant driver for RANK within a partition, but since you're using the same columns for the partitions as for the ordering, it is clear that every row ranks first within their respective partition (as they're the only value in the partition)
See an explanation in this blog post, where this SQL (PostgreSQL syntax)
SELECT
v,
ROW_NUMBER() OVER (window) row_number,
RANK() OVER (window) rank,
DENSE_RANK() OVER (window) dense_rank
FROM t
WINDOW window AS (ORDER BY v)
ORDER BY v
... produces this output
+---+------------+------+------------+
| V | ROW_NUMBER | RANK | DENSE_RANK |
+---+------------+------+------------+
| a | 1 | 1 | 1 |
| a | 2 | 1 | 1 |
| a | 3 | 1 | 1 |
| b | 4 | 4 | 2 |
| c | 5 | 5 | 3 |
| c | 6 | 5 | 3 |
| d | 7 | 7 | 4 |
| e | 8 | 8 | 5 |
+---+------------+------+------------+
There are three "ranking" analytic functions: row_number(), rank(), and dense_rank().
These all work very similarly. They assign numbers, in order, to rows within a group. The group is defined by the partition by clause. The ordering is defined by the order by clause. The difference between the three is how they handle duplicate values.
row_number() always returns sequential numbers within the group. When there are ties, then equal valued rows have sequential values, but they are different.
dense_rank() assigns sequential values with no gaps. However, equal valued rows are given the same values. The next value has a rank one more.
rank() assigns sequential values with gaps. Equal valued rows have the same value but subsequent rows have a gap.
Here is an example:
value row_number dense_rank rank
a 1 1 1
b 2 2 2
b 3 2 2
b 4 2 2
c 5 3 5
d 6 4 6
d 7 4 6

What's the difference between RANK() and DENSE_RANK() functions in oracle?

What's the difference between RANK() and DENSE_RANK() functions? How to find out nth salary in the following emptbl table?
DEPTNO EMPNAME SAL
------------------------------
10 rrr 10000.00
11 nnn 20000.00
11 mmm 5000.00
12 kkk 30000.00
10 fff 40000.00
10 ddd 40000.00
10 bbb 50000.00
10 ccc 50000.00
If in the table data having nulls, what will happen if I want to find out nth salary?
RANK() gives you the ranking within your ordered partition. Ties are assigned the same rank, with the next ranking(s) skipped. So, if you have 3 items at rank 2, the next rank listed would be ranked 5.
DENSE_RANK() again gives you the ranking within your ordered partition, but the ranks are consecutive. No ranks are skipped if there are ranks with multiple items.
As for nulls, it depends on the ORDER BY clause. Here is a simple test script you can play with to see what happens:
with q as (
select 10 deptno, 'rrr' empname, 10000.00 sal from dual union all
select 11, 'nnn', 20000.00 from dual union all
select 11, 'mmm', 5000.00 from dual union all
select 12, 'kkk', 30000 from dual union all
select 10, 'fff', 40000 from dual union all
select 10, 'ddd', 40000 from dual union all
select 10, 'bbb', 50000 from dual union all
select 10, 'xxx', null from dual union all
select 10, 'ccc', 50000 from dual)
select empname, deptno, sal
, rank() over (partition by deptno order by sal nulls first) r
, dense_rank() over (partition by deptno order by sal nulls first) dr1
, dense_rank() over (partition by deptno order by sal nulls last) dr2
from q;
EMP DEPTNO SAL R DR1 DR2
--- ---------- ---------- ---------- ---------- ----------
xxx 10 1 1 4
rrr 10 10000 2 2 1
fff 10 40000 3 3 2
ddd 10 40000 3 3 2
ccc 10 50000 5 4 3
bbb 10 50000 5 4 3
mmm 11 5000 1 1 1
nnn 11 20000 2 2 2
kkk 12 30000 1 1 1
9 rows selected.
Here's a link to a good explanation and some examples.
I have explained this more in detail in this article. Essentially, you can look at it as such:
CREATE TABLE t AS
SELECT 'a' v FROM dual UNION ALL
SELECT 'a' FROM dual UNION ALL
SELECT 'a' FROM dual UNION ALL
SELECT 'b' FROM dual UNION ALL
SELECT 'c' FROM dual UNION ALL
SELECT 'c' FROM dual UNION ALL
SELECT 'd' FROM dual UNION ALL
SELECT 'e' FROM dual;
SELECT
v,
ROW_NUMBER() OVER (ORDER BY v) row_number,
RANK() OVER (ORDER BY v) rank,
DENSE_RANK() OVER (ORDER BY v) dense_rank
FROM t
ORDER BY v;
The above will yield:
+---+------------+------+------------+
| V | ROW_NUMBER | RANK | DENSE_RANK |
+---+------------+------+------------+
| a | 1 | 1 | 1 |
| a | 2 | 1 | 1 |
| a | 3 | 1 | 1 |
| b | 4 | 4 | 2 |
| c | 5 | 5 | 3 |
| c | 6 | 5 | 3 |
| d | 7 | 7 | 4 |
| e | 8 | 8 | 5 |
+---+------------+------+------------+
In words
ROW_NUMBER() attributes a unique value to each row
RANK() attributes the same row number to the same value, leaving "holes"
DENSE_RANK() attributes the same row number to the same value, leaving no "holes"
rank() : It is used to rank a record within a group of rows.
dense_rank() : The DENSE_RANK function acts like the RANK function except that it assigns consecutive ranks.
Query -
select
ENAME,SAL,RANK() over (order by SAL) RANK
from
EMP;
Output -
+--------+------+------+
| ENAME | SAL | RANK |
+--------+------+------+
| SMITH | 800 | 1 |
| JAMES | 950 | 2 |
| ADAMS | 1100 | 3 |
| MARTIN | 1250 | 4 |
| WARD | 1250 | 4 |
| TURNER | 1500 | 6 |
+--------+------+------+
Query -
select
ENAME,SAL,dense_rank() over (order by SAL) DEN_RANK
from
EMP;
Output -
+--------+------+-----------+
| ENAME | SAL | DEN_RANK |
+--------+------+-----------+
| SMITH | 800 | 1 |
| JAMES | 950 | 2 |
| ADAMS | 1100 | 3 |
| MARTIN | 1250 | 4 |
| WARD | 1250 | 4 |
| TURNER | 1500 | 5 |
+--------+------+-----------+
SELECT empno,
deptno,
sal,
RANK() OVER (PARTITION BY deptno ORDER BY sal) "rank"
FROM emp;
EMPNO DEPTNO SAL rank
---------- ---------- ---------- ----------
7934 10 1300 1
7782 10 2450 2
7839 10 5000 3
7369 20 800 1
7876 20 1100 2
7566 20 2975 3
7788 20 3000 4
7902 20 3000 4
7900 30 950 1
7654 30 1250 2
7521 30 1250 2
7844 30 1500 4
7499 30 1600 5
7698 30 2850 6
SELECT empno,
deptno,
sal,
DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal) "rank"
FROM emp;
EMPNO DEPTNO SAL rank
---------- ---------- ---------- ----------
7934 10 1300 1
7782 10 2450 2
7839 10 5000 3
7369 20 800 1
7876 20 1100 2
7566 20 2975 3
7788 20 3000 4
7902 20 3000 4
7900 30 950 1
7654 30 1250 2
7521 30 1250 2
7844 30 1500 3
7499 30 1600 4
7698 30 2850 5
select empno
,salary
,row_number() over(order by salary desc) as Serial
,Rank() over(order by salary desc) as rank
,dense_rank() over(order by salary desc) as denseRank
from emp ;
Row_number() -> Used for generating serial number
Dense_rank() will give continuous rank but Rank() will skip rank in case of clash of rank.
The only difference between the RANK() and DENSE_RANK() functions is in cases where there is a “tie”; i.e., in cases where multiple values in a set have the same ranking. In such cases, RANK() will assign non-consecutive “ranks” to the values in the set (resulting in gaps between the integer ranking values when there is a tie), whereas DENSE_RANK() will assign consecutive ranks to the values in the set (so there will be no gaps between the integer ranking values in the case of a tie).
For example, consider the set {25, 25, 50, 75, 75, 100}. For such a set, RANK() will return {1, 1, 3, 4, 4, 6} (note that the values 2 and 5 are skipped), whereas DENSE_RANK() will return {1,1,2,3,3,4}.
Rank() SQL function generates rank of the data within ordered set of values but next rank after previous rank is row_number of that particular row. On the other hand, Dense_Rank() SQL function generates next number instead of generating row_number. Below is the SQL example which will clarify the concept:
Select ROW_NUMBER() over (order by Salary) as RowNum, Salary,
RANK() over (order by Salary) as Rnk,
DENSE_RANK() over (order by Salary) as DenseRnk from (
Select 1000 as Salary union all
Select 1000 as Salary union all
Select 1000 as Salary union all
Select 2000 as Salary union all
Select 3000 as Salary union all
Select 3000 as Salary union all
Select 8000 as Salary union all
Select 9000 as Salary) A
It will generate following output:
----------------------------
RowNum Salary Rnk DenseRnk
----------------------------
1 1000 1 1
2 1000 1 1
3 1000 1 1
4 2000 4 2
5 3000 5 3
6 3000 5 3
7 8000 7 4
8 9000 8 5
Rank(), Dense_rank(), row_number()
These all are window functions that means they act as window over some ordered input set at first. These windows have different functionality attached to it based on the requirement. Heres the above 3 :
row_number()
Starting by row_number() as this forms the basis of these related window functions. row_number() as the name suggests gives a unique number to the set of rows over which its been applied. Similar to giving a serial number to each row.
Rank()
A subversion of row_number() can be said as rank(). Rank() is used to give same serial number to those ordered set rows which are duplicates but it still keeps the count mantained as similar to a row_number() for all those after duplicates rank() meaning as from below eg. For data 2 row_number() =rank() meaning both just differs in the form of duplicates.
Data row_number() rank() dense_rank()
1 1 1 1
1 2 1 1
1 3 1 1
2 4 4 2
Finally,
Dense_rank() is an extended version of rank() as the name suggests its dense because as you can see from the above example rank() = dense_rank() for all data 1 but just that for data 2 it differs in the form that it mantains the order of rank() from previous rank() not the actual data
Rank and Dense rank gives the rank in the partitioned dataset.
Rank() : It doesn't give you consecutive integer numbers.
Dense_rank() : It gives you consecutive integer numbers.
In above picture , the rank of 10008 zip is 2 by dense_rank() function and 24 by rank() function as it considers the row_number.
The only difference between the RANK() and DENSE_RANK() functions is in cases where there is a “tie”; i.e., in cases where multiple values in a set have the same ranking. In such cases, RANK() will assign non-consecutive “ranks” to the values in the set (resulting in gaps between the integer ranking values when there is a tie), whereas DENSE_RANK() will assign consecutive ranks to the values in the set (so there will be no gaps between the integer ranking values in the case of a tie).
For example, consider the set {30, 30, 50, 75, 75, 100}. For such a set, RANK() will return {1, 1, 3, 4, 4, 6} (note that the values 2 and 5 are skipped), whereas DENSE_RANK() will return {1,1,2,3,3,4}.