RANK, ROW_NUMBER on T-SQL - sql

I have rows like this in SQL Server 2014:
id | fld1
---+-----
1 | 100
2 | 100
3 | 80
4 | 102
5 | 100
6 | 80
7 | 102
I would need a partition that without changing order would return:
NewFld | id | fld1
-------+----+------
1 | 1 | 100
1 | 2 | 100
2 | 3 | 80
3 | 4 | 102
1 | 5 | 100
2 | 6 | 80
3 | 7 | 102
Newfld should return the same value based on fld1 without changing ordering given by id.
I tried with ROW_NUMBER, RANK, DENSE_RANK but nothing works for me.
View this fiddle

Use min() over() in a subquery to establish the ordering values needed for the dense_rank().
SELECT id
, Fld1
, DENSE_RANK() OVER (order by fld1_idmin) AS Rank
FROM (
SELECT id
, fld1
, Min (id) over (partition by fld1) fld1_idmin
FROM yourtable
) d
ORDER BY ID
With an index on FLD1 using these window functions needs just a single index scan for this query. See this SQLfiddle

you may use this
with mytab as
(
SELECT *
,(SELECT MIN(ID) FROM yourtable sub where sub.fld1 = yourtable.fld1) as ranks
FROM yourtable
)
SELECT ID ,fld1 , DENSE_RANK()OVER(ORDER BY Ranks)
FROM mytab
ORDER BY ID
view this fiddle

Related

Selecting the first row of group with additional group by columns

Say I have a table with the following results:
How is it possible for me to select such that I only want distinct parent_ids with the min result of object0_behaviour?
Expected output:
parent_id | id | object0_behaviour | type
------------------------------------------
1 | 1 | 5 | IP
2 | 3 | 5 | IP
3 | 5 | 7 | ID
4 | 6 | 7 | ID
5 | 8 | 5 | IP
6 | 18 | 7 | ID
7 | 10 | 7 | ID
8 | 9 | 5 | IP
I have tried:
SELECT parent_id, min(object0_behaviour) FROM table GROUP BY parent_id
It works, however if I wanted the other 2 additional columns, I am required to add into GROUP BY clause and things go back to square one.
I saw examples with R : Select the first row by group
Similar output from what I need, but I can't seem to convert it into SQL
You can try using row_number() window function
select * from
(
select *, row_number() over(partition by parent_id order by object0_behaviour) as rn
from tablename
)A where rn=1
select * from table
join (
SELECT parent_id, min(object0_behaviour) object0_behaviour
FROM table GROUP BY parent_id
) grouped
on grouped.parent_id = table.parent_id
and grouped.object0_behaviour = table.object0_behaviour

Get sum over a column value that is determined by two other column values in the same table

I have the following table MY_TABLE
ID | SEQ | TYPE | VAL
1 | 2 | A | 100
1 | 3 | A | 100
1 | 2 | B | 200
1 | 3 | A | 100
1 | 3 | B | 200
2 | 25 | X | 100
2 | 24 | Y | 200
2 | 24 | X | 300
2 | 25 | Y | 400
2 | 25 | X | 50
Here in MY_TABLE, each ID has a set of Seq values and Type values. I want to get the sum of VAL rows per TYPE that belong to each IDs max(Seq).
Expected output:
ID| SEQ | TYPE | SUM(VAL)
1 | 3 | A | 200 <- 100 + 100
1 | 3 | B | 200
2 | 25 | X | 150 <- 100 + 50
2 | 25 | Y | 400
What I tried:
-- this sub query finds the max(seq) for each ID
with max_seq as (
select id, max(seq) max_seq
from my_table t
group by id)
-- select query on my_table
select
bd.id,
bd.seq,
bd.type,
sum(bd.val)
from my_table bd
-- joining on id-max_seq pair
inner join max_seq
on
(max_seq.id = bd.id)
and
(max_seq.max_seq = bd.seq)
-- sum(val) per ID, MAX(SEQ), TYPE
group by bd.id, bd.seq, bd.type;
Question:
The above query works well for smaller tables but gets slower when the table is bigger. Is there an efficient way of getting this output? (Maybe without using two joins on the same table with a sub query?)
You could avoid the self-join by using a subquery which gets a ranking for each row based on the id and seq:
select id, seq, type, sum(val)
from (
select id, seq, type, val, rank() over (partition by id order by seq desc) as rnk
from my_table
)
where rnk = 1
group by id, seq, type
order by id, seq, type;
ID SEQ T SUM(VAL)
---------- ---------- - ----------
1 3 A 200
1 3 B 200
2 25 X 150
2 25 Y 400
Because of the order by seq desc, the rnk value is 1 for the highest seq for each id. The outer query then just filters on rnk = 1, limiting the output and the aggregation to those lowest-rank (highest-seq) rows.
db<>fiddle demo

SQL - Return records with highest version for each quotation ID

If I have a table with three columns like below
CREATE TABLE QUOTATIONS
(
ID INT NOT NULL,
VERSION INT NOT NULL,
PRICE FLOAT NOT NULL
);
In addition to this, lets say that the table consists of the follow records:
ID | VERSION | PRICE
----+-----------+--------
1 | 1 | 50
1 | 2 | 40
1 | 3 | 30
2 | 1 | 100
2 | 2 | 80
3 | 1 | 50
Is there any single SQL query that can be run and return the rows of all quotations with the highest version only?
The results should be like follow:
ID | VERSION | PRICE
----+-----------+--------
1 | 3 | 30
2 | 2 | 80
3 | 1 | 50
I like this method which uses no subqueries:
select top (1) with ties q.*
from quotations q
order by row_number() over (partition by id order by version desc);
Basically, the row_number() assigns "1" to the highest version for each id. The top (1) with ties returns all the 1s.
SELECT id,version,price
FROM tableName t
JOIN (SELECT id,MAX(version) AS version
FROM tableName
GROUP BY id) AS q1 ON q1.id = t.id AND q1.version = t.version

Select entire partition where max row in partition is greater than 1

I'm partitioning by some non unique identifier, but I'm only concerned in the partitions with at least two results. What would be the way to get out all the instances where there's exactly one of the specified identifier?
Query I'm using:
SELECT ROW_NUMBER() OVER
(PARTITION BY nonUniqueId ORDER BY nonUniqueId, aTimeStamp) as row
,nonUniqueId
,aTimeStamp
FROM myTable
What I'm getting:
row | nonUniqueId | aTimeStamp
---------------------------------
1 | 1234 | 2014-10-08...
2 | 1234 | 2014-10-09...
1 | 1235 | 2014-10-08...
1 | 1236 | 2014-10-08...
2 | 1236 | 2014-10-09...
What I want:
row | nonUniqueId | aTimeStamp
---------------------------------
1 | 1234 | 2014-10-08...
2 | 1234 | 2014-10-09...
1 | 1236 | 2014-10-08...
2 | 1236 | 2014-10-09...
Thanks for any direction :)
Based on syntax, I'm assuming this is SQL Server 2005 or higher. My answer will be meant for that.
You have a couple options.
One, use a CTE:
;WITH CTE AS (
SELECT ROW_NUMBER() OVER
(PARTITION BY nonUniqueId ORDER BY nonUniqueId, aTimeStamp) as row
,nonUniqueId
,aTimeStamp
FROM myTable
)
SELECT *
FROM CTE t
WHERE EXISTS (SELECT 1 FROM CTE WHERE row = 2 and nonUniqueId = t.nonUniqueId);
Or, you can use subqueries:
SELECT ROW_NUMBER() OVER
(PARTITION BY nonUniqueId ORDER BY nonUniqueId, aTimeStamp) as row
,nonUniqueId
,aTimeStamp
FROM myTable t
WHERE EXISTS (SELECT 1 FROM myTable
WHERE nonUniqueId = t.nonUniqueId GROUP BY nonUniqueId, aTimeStamp HAVING COUNT(*) >= 2);

select top 1 with max 2 fields

I have this table :
+------+-------+------------------------------------+
| id | rev | class |
+------+-------+------------------------------------+
| 1 | 10 | 2 |
| 1 | 10 | 5 |
| 2 | 40 | 6 |
| 2 | 50 | 6 |
| 2 | 52 | 1 |
| 3 | 33 | 3 |
| 3 | 63 | 5 |
+------+-------+------------------------------------+
I only need the rows where rev AND then class columns have max value.
+------+-------+------------------------------------+
| id | rev | class |
+------+-------+------------------------------------+
| 1 | 10 | 5 |
| 2 | 52 | 1 |
| 3 | 63 | 5 |
+------+-------+------------------------------------+
Query cost is important for me.
Just the rows that satisfy the condition that it has both max values?
Here's an SQL Fiddle;
SELECT h.id, h.rev, h.class
FROM ( SELECT id,
MAX( rev ) rev,
MAX( class ) class
FROM Herp
GROUP BY id ) derp
INNER JOIN Herp h
ON h.rev = derp.rev
AND h.class = derp.class;
The fastest way might be to have an index on t(id, rev) and t(id, class) and then do:
select t.*
from table t
where not exists (select 1
from table t2
where t2.id = t.id and t2.rev > t.rev
) and
not exists (select 1
from table t2
where t2.id = t.id and t2.class > t.class
);
SQL Server is pretty smart in terms of optimization, so the aggregation approach might be just as good. However, in terms of performance, this is just a bunch of index lookups.
Here is a SQL 2012 example. Very straight forward with the implied table and the PARTITION function.
Basically, with each ID as a partition/group, sort the values of the other fields in a descending order assigning each one an incrementing RowId, then only take the first one.
select id, rev, [class]
from
(
SELECT id, rev, [class],
ROW_NUMBER() OVER(PARTITION BY id ORDER BY rev DESC, [class] desc) AS RowId
FROM sample
) t
where RowId = 1
Here is the SQL Fiddle
Keep in mind, this works with the criteria in the example dataset, and not the MAX of two fields as stated in the question's title.
I guess you mean: the max of rev and the max of class. If not, please clarify what to do when there is no row where both fields have the highest value.
select id
, max(rev)
, max(class)
from table
group
by id
If you mean total value of rev and class use this:
select id
, max
, rev
from table
where id in
( select id
, max(rev + class)
from table
group
by id
)