RANK partition function used in conjunction with SUM OVER - sql

I have the following which works but I think I've done my usual trick of over complicating something which could be a lot simpler.
If you run the script you will see what I'm trying to achieve - simply a rank initially by Department score and then a rank by each name's score within each department.
How do I simplify the following?:
IF OBJECT_ID('TEMPDB..#Table') IS NOT NULL BEGIN DROP TABLE #Table END;
CREATE TABLE #Table
(
Department VARCHAR(100),
Name VARCHAR(100),
Score INT
);
INSERT INTO #Table VALUES
('Sales','michaeljackson',7),
('Sales','jim',10),
('Sales','jill',66),
('Sales','j',1),
('DataAnalysis','jagoda',66),
('DataAnalysis','phil',5),
('DataAnalysis','jesus',6),
('DataAnalysis','sam',79),
('DataAnalysis','michaeljackson',9999);
WITH SumCte AS
(
SELECT Department,
sm = sum(Score)
FROM #Table
GROUP BY Department
)
, RnkDepCte AS
(
SELECT Department,
rk =RANK() OVER (ORDER BY sm DESC)
FROM SumCte
)
, RnkCte AS
(
SELECT Department,
Name,
Score,
rnk = RANK() OVER (PARTITION BY a.Department ORDER BY a.Score DESC)
FROM #Table a
)
SELECT a.Department,
a.Name,
a.Score,
FinalRank = RANK() OVER (ORDER BY ((10000/b.rk) + (100/a.rnk)) DESC)
FROM RnkCte a
INNER JOIN RnkDepCte b
ON a.Department = b.Department

There is a simpler way. Try this:
select t.*,
RANK() over (order by sumscore desc, score desc)
from (select t.*,
SUM(score) over (partition by department) as SumScore
from #Table t
) t

Related

How to join to a statement with a row_number() function in SQL?

I a SQL with a row_number() function, and I would like to join on additional tables to get the fields below. How would I accomplish this?
Desired fields:
EMPLOYEE.EMPLID
EMPLOYEE.JOBTITLE
NAME.FIRST_NAME
NAME.LAST_NAME
LOCATION.ADDRESS
PROFESSIONAL_NAME.PROF_NAME
Beginning SQL:
SELECT COUNT(*)
FROM
(
SELECT EMPLOYEE.*, ROW_NUMBER() OVER (PARTITION BY EMPLID ORDER BY
PRIM_ROLE_IND DESC, EMPL_RCD ASC) as RN
FROM EMPLOYEE
WHERE JOB_INDICATOR = 'P'
) dt
WHERE RN = 1
When I try to add a left join at the end, I get an error that says "EMPLOYEE"."EMLID" invalid identifier.
What I'm trying:
SELECT
EMPLOYEE.EMPLID,
EMPLOYEE.JOBTITLE,
NAME.FIRST_NAME,
NAME.LAST_NAME,
LOCATION.ADDRESS,
PROFESSIONAL_NAME.PROF_NAME
FROM
(
SELECT EMPLOYEE.*, ROW_NUMBER() OVER (PARTITION BY EMPLID ORDER BY
PRIM_ROLE_IND DESC, EMPL_RCD ASC) as RN
FROM EMPLOYEE
WHERE JOB_INDICATOR = 'P'
)
LEFT JOIN NAME ON EMPLOYEE.EMPLID = NAME.EMPLID
WHERE
RN = 1
AND
NAME.EFFDT = (
SELECT
MAX (NAME2.EFFDT)
FROM
NAME NAME2
WHERE
NAME2.EMPLID = NAME.EMPLID
AND NAME.NAME_TYPE = 'PRI'
)
AND EMPLOYEE.JOB_INDICATOR = 'P'
You just need to alias your table
...
(
SELECT EMPLOYEE.*, ROW_NUMBER() OVER (PARTITION BY EMPLID ORDER BY
PRIM_ROLE_IND DESC, EMPL_RCD ASC) as RN
FROM EMPLOYEE
WHERE JOB_INDICATOR = 'P'
) temp_employee --add this
LEFT JOIN NAME ON temp_employee.EMPLID = NAME.EMPLID
...
When you create your new table with row_number() in an inner select you essentially create a new table. You need to alias or name this table and then refer to that alias. In the above your from is the inner select, not the EMPLOYEE table. See below for simplified example.
select newtable.field from (select field from mytable) newtable

How to select multiple max values from a sql table

I am trying to get the top performers from a table, grouped by the company but can't seem to get the grouping right.
I have tried to use subqueries but this goes beyond my knowledge
I am trying to make a query that selects the rows in green. In other words I want to include the name, the company, and what they paid but only the top performers of each company.
Here is the raw data
create table test (person varchar(50),company varchar(50),paid numeric);
insert into
test
values
('bob','a',200),
('jane','a',100),
('mark','a',350),
('susan','b',650),
('thabo','b',100),
('thembi','b',210),
('lucas','b',110),
('oscar','c',10),
('janet','c',20),
('nancy','c',30)
You can use MAX() in a subquery as
CREATE TABLE T(
Person VARCHAR(45),
Company CHAR(1),
Paid INT
);
INSERT INTO T
VALUES ('Person1', 'A', 10),
('Person2', 'A', 20),
('Person3', 'B', 10);
SELECT T.*
FROM T INNER JOIN
(
SELECT Company, MAX(Paid) Paid
FROM T
GROUP BY Company
) TT ON T.Company = TT.Company AND T.Paid = TT.Paid;
Demo
Or using a window function as
SELECT Person,
Company,
Paid
FROM
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY Company ORDER BY Paid DESC) RN
FROM T
) TT
WHERE RN = 1;
Demo
Here's your query.
select a.person, a.company, a.paid from tableA a
inner join
(select person, company, row_number() over (partition by company order by paid desc) as rn from tableA) as t1
on t1.person = a.person and t1.company = a.company
where t1.rn = 1
Maybe something like
WITH ranked AS (SELECT person, company, paid
, rank() OVER (PARTITION BY company ORDER BY paid DESC) AS rnk
FROM yourtable)
SELECT person, company, paid
FROM ranked
WHERE rnk = 1
ORDER BY company;
You can use rank() function with partition by clause.
DENSE_RANK gives you the ranking within your ordered partition, but the ranks are consecutive. No ranks are skipped if there are ranks with multiple items.
WITH cte AS (
SELECT person, company, paid
rank() OVER (PARTITION BY company ORDER BY paid desc) rn
FROM yourtable
)
SELECT
*
FROM cte

SQL code it's looks to complicate

Test Table
create table Test (
Id integer,
Store_N varchar(25),
Department varchar(25)
);
INSERT INTO Test (Id, Store_N, Department )
Values (25,'1','A'), (67,'1','A'), (34,'1','A'), (97,'1','C'),
(21,'1','C'), (268,'1','B'), (456,'2','A'), (349,'2','A'),
(935,'2','B'), (36,'3','B'), (637,'3','B'), (388,'3','B'),
(891,'3','B'), (344,'4','A'), (763,'4','A'), (836,'4','A')
SELECT * , ROW_NUMBER() OVER( Partition BY Store_N ORDER BY Store_N ) AS AA
FROM Test;
Result is
I need to exclude all stores which have only one department and have the only DISTINCT department for each store. The result looks like this
And this is code
SELECT DISTINCT TB4.Department, TB4.Store_N
From
(
SELECT TB0.Store_N, TB0.Department FROM Test TB0
INNER JOIN
(
SELECT TB2.Store_N , Count(*) AS AA1
FROM
(
SELECT DISTINCT TB1.Department , TB1.Store_N
FROM
( SELECT * , ROW_NUMBER() OVER( Partition BY Store_N ORDER BY Store_N ) AA
FROM Test ) TB1
) TB2
group by TB2.Store_N
HAVING
COUNT(*) > 1 ) TB3
ON TB0.Store_N = TB3.Store_N
) TB4
Now the question how to simplify this code?
Thank you
You can basically do:
select store_n, department
from test
group by store_n, department;
But, you want to exclude stores that have only one department, so lets do a count:
select store_n, department
from (select store_n, department, count(*) over (partition by store_n) as cnt
from test
group by store_n, department
) t
where cnt > 1;
Here is a SQL Fiddle.
You are going a long way round to get the functionality of the "GROUP BY" clause
SELECT TB2.Store_N , TB2.Department
FROM
(
SELECT Department , Store_N, count(Id) as c
FROM Test
GROUP BY Department, Store_N) as TB2
WHERE TB2.c > 1

How to select both row_number and count over partition?

I need to find duplicate record (with master record id and duplicate record ids):
select ciid, name from (
select ciid, name, row_number() over (
partition by related_id, name order by updatedate desc) rn
) where rn = 1;
This gives me the master record IDs, but it also includes records without duplicates.
If I use
select ciid, name from (
select ciid, name, row_number() over (
partition by related_id, name order by updatedate desc) rn
) where rn > 1;
This gets me all the duplicate records, but not the master record.
I was wishing if I do something like:
select ciid, name from (
select ciid, name, row_number() over (
partition by related_id, name order by updatedate desc
) rn, count(*) over (
partition by related_id, name order by updatedate desc
) cnt
) where rn = 1 and cnt > 1;
But I was worried about the performance, or even is it actually doing what I want.
How do I get the master record only for the ones with duplicates? Please note that name is not unique column. Only ciid is unique.
I ended up using similar query in my question:
select ciid, name from (
select ciid, name, row_number() over (
partition by related_id, name order by updatedate desc
) rn, count(*) over (
partition by related_id, name desc
) cnt
) where rn = 1 and cnt > 1;
Works surprisingly well. The master record is where rn = 1 and duplicates are where rn > 1. Make sure count(*) over (partition ..) cannot have order by clause.
I haven't tested this (because I don't have real data and am too lazy to create some), but it seems something along these lines might work:
with has_duplicates as (
select related_id, name
from yourtable
group by related_id, name
having count (*) > 1
),
with_dupes as (
select
y.ccid, y.name,
row_number() over (partition by y.related_id, y.name order by y.updatedate desc) rn
from
yourtable y,
has_duplicates d
where
y.related_id = d.related_id and
y.name = d.name
)
select
ccid, name
from with_dupes
where rn = 1
select ciid, name
from (
select ciid, name,
dense_rank() over (partition by related_id, name order by updatedate desc) rn
from tablename) t
group by ciid,name
having count(distinct rn) > 1;
Edit: To find duplicates, why not just do this.
select x.ciid, x.name, x.updatedate
from tablename x join
(
select name, related_id, max(updatedate) as mxdt, count(*)
from tablename
group by name, related_id
having count(*) > 1
) t
on x.updatedate = t.mxdt and x.name = t.name
You can do a group by with having to select only those id's having more than one row with the same row number.

Get max value of column, min value of column corresponding to rank at once in ms sql

I have this query
WITH summary AS
(
SELECT Msisdn, DateRegistered ,
RANK() OVER (ORDER BY DateRegistered ASC) AS dRank
FROM dbo.SubscriptionsArchive
WHERE MSISDN='123456'
)
SELECT s.msisdn, s.DateRegistered AS firstReg
FROM summary s
WHERE dRank =(SELECT max(dRank) FROM summary )
This displays the firstReg corresponding to the min Rank, I want to get the lastReg corresponding to maxRank at the same time.
How do I achieve this?
Solution by cross joining the first and last line of the CTE, with TOP 1 syntax:
WITH summary AS
(
SELECT Msisdn, DateRegistered ,
RANK() OVER (ORDER BY DateRegistered ASC) AS dRank
FROM dbo.SubscriptionsArchive
WHERE MSISDN='123456'
)
SELECT minrow.*, maxrow.*
FROM
(select TOP 1 * from summary order by dRank desc) minrow
CROSS JOIN
(select TOP 1 * from summary order by dRank asc) maxrow ;
WITH summary AS
(
SELECT Msisdn, DateRegistered ,
RANK() OVER (ORDER BY DateRegistered ASC) AS ASCRank,
RANK() OVER (ORDER BY DateRegistered DESC) AS DESCRank
FROM dbo.SubscriptionsArchive
WHERE MSISDN='123456'
)
SELECT s.msisdn,
(CASE WHEN ASCRAnk=1 THEN s.DateRegistered END) AS firstReg,
(CASE WHEN DESCRAnk=1 THEN s.DateRegistered END) AS LASTReg
FROM summary s