Unable to get dedupe records with rank - sql

I am trying to dedupe my dataset using rank, but it is not assigning a different number to the second record. What am I doing wrong here?
with get_rank as (
select id, code, rank() over (partition by id order by z.rowid) as ranking
from mytable z
)
select *
from get_rank
where ranking = 1
and id = 72755
ID CODE RANKING
---------- ---- ----------
72755 M 1
72755 M 1

Use row_number():
with get_rank as (
select id, code,
row_number() over (partition by id order by z.rowid) as ranking
from mytable z
)
select *
from get_rank
where ranking = 1 and id = 72755;
It is guaranteed to return a different value for each row.

Related

DB2 Using max aggregate function

Can I rewrite this select without using aggregate function to retrieve the highest value
Select *
From A
Where Id= 123
and value = (
select max(value)
from A inner
where inner.id = 123 )
If you are certain that only one record would have the max value, or, if there are ties you don't care which gets returned, then you may use this limit query:
SELECT *
FROM A
WHERE Id = 123
ORDER BY value DESC
LIMIT 1;
If this doesn't meet your expectations, then stick with your current approach. Note that you could also use RANK() here:
WITH cte AS (
SELECT *, RANK() OVER (ORDER BY value DESC) rnk
FROM A
WHERE Id = 123
)
SELECT *
FROM cte
WHERE rnk = 1;
But like your version, the above rank query also requires a subquery.

ORACLE SQL find row with max date for each grouping

I am trying to write a query which will return only the rows, which time has the greatest value for each id
Table: positions
id time otherCols...
---------- ----------- ----------
1 1
1 2
3 1
1 3
2 1
3 2
Result should look like:
id time otherCols...
---------- ----------- ----------
1 3
2 1
3 2
I tried grouping by id but I don't know how to sort after that and pick only the top result.
You can use MAX(..) KEEP (DENSE_RANK ..) OVER (PARTITION BY ..) analytic function without need of any subquery :
SELECT MAX(time) KEEP (DENSE_RANK LAST ORDER BY time)
OVER (PARTITION BY id) AS time_max,
p.*
FROM positions p
ORDER BY id
Demo
You can use window functions:
select t.*
from (select t.*,
row_number() over (partition by id order by time desc) as seqnum
from t
) t
where seqnum = 1;
An alternative method is a correlated subquery:
select t.*
from t
where t.time = (select max(t2.time) from t t2 where t2.id = t.id);
This is different from the first query in two respects:
If there are duplicate times for an id, then this returns all rows for an id. You can get that behavior using rank() in the first query.
This will not return NULL id values or ids where the time is uniformly NULL. The first query does.

Selecting a column where another column is maximal

I guess this is a standard problem. But I could not find a proper solution yet.
I have three columns in table A:
ID ID_Version Var
1 1 A
1 2 A
1 3 X
1 4 D
2 1 B
2 2 Z
2 3 D
3 1 A
4 1 B
4 2 Q
4 3 Z
For every unique ID, I would like to isolate the Var-value that belongs to the maximal ID-Version.
For ID = 1 this would be D, for ID = 2 this would be D, for ID = 3 this would be A and for ID = 4 this would be Z.
I tried to use a group by statement but I cannot select Var-values when using the max-function on ID-Version and grouping by ID.
Does anyone have a clue how to write fast, effective code for this simple problem?
use row_number() analytic function :
select ID,Var from
(
select row_number() over (partition by id order by id_version desc) as rn,
t.*
from tab t
)
where rn = 1
or max(var) keep (dense_rank...)
select id, max(var) keep (dense_rank first order by id_version desc) as var
from tab
group by id
Demo
You could use ranking function:
SELECT *
FROM (SELECT tab.*, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ID_Version DESC) rn
FROM tab)
WHERE rn = 1
Oracle has the keep syntax, so you can also use aggregation:
select id, max(id_version) as id_version,
max(var) keep (dense_rank first order by id_version desc) as var
from a
group by id;
You could also use a simple join to do what you want, see below :
SELECT A.id, A.var FROM A
JOIN
(SELECT id, MAX(id_version) as id_version
FROM A
GROUP BY id) temp ON (temp.id = A.id AND temp.id_version = A.id_version)
Or you could also use a subquery like this :
SELECT a1.id, a1.var FROM A a1
WHERE a1.id_version = (SELECT MAX(id_version) FROM A a2 WHERE a2.id = a1.id)

Finding top count of a value in a table using SQL

I'm looking for a way to find the top count value of a column by SQL.
If for example this is my data
id type
----------
1 A
1 B
1 A
2 C
2 D
2 D
I would like the result to be:
1 A
2 D
I'm looking for a way to do it without groping by the column I count (type in the example)
Thanks
Statistically, this is called the "mode". You can calculate it using window functions:
select id, type, cnt
from (select id, type, count(*) as cnt,
row_number() over (partition by id order by count(*) desc) as seqnum
from t
group by id, type
) t
where seqnum = 1;
If there are ties, then an arbitrary value is chosen from among the ties.
You are looking for the statistic mode (the most often ocurring value):
select id, stats_mode(type)
from mytable
group by id
order by id;
Not all DBMS support this however. Check your docs, wheher this function or a similar one is available in your DBMS.
Just GROUP BY id, type and keep the rows with the maximum counter:
select id, type
from tablename
group by id, type
having count(*) = (
select count(*) from tablename group by id, type order by count(*) desc limit 1
)
See the demo
Or
select id, type
from tablename
group by id, type
having count(*) = (
select max(t.counter) from (select count(*) counter from tablename group by id, type) t
)
See the demo

Tsql to get first random product in a category

I've this result set:
select a.id, a.categoria from Articolo a
where novita = 1
order by a.categoria, newid()
id categoria
----------- -----------
3 4
11 4
1 4
12 5
13 5
4 6
and i would to get the first product (in a random order) from each different category:
id categoria
----------- -----------
3 4
12 5
4 6
Ideally something like
select FIRST(a.id), a.categoria from Articolo a
where novita = 1
order by a.categoria, newid()
Any ideas?
Use MAX(a.id) with GROUP BY a.categoria
SELECT MAX(a.id), a.categoria
from Articolo a
where novita = 1
GROUP BY a.category
Update
To get random id for each categoria you can use the ranking function ROW_NUMBER() OVER(PARTITION BY categoria) with ORDER BY NEWID to get a random ordering, like this:
WITH CTE
AS
(
SELECT id, categoria, ROW_NUMBER() OVER(PARTITION BY categoria
ORDER BY NEwID()) AS rn
FROM Articolo
)
SELECT id, categoria
FROM CTE
WHERE rn = 1;
See it in action here:
SQL Fiddle Demo
This way, it will give you a random id for each categoria each time.
However, If you want the first, you can use the ORDER BY(SELECT 1) inside the ranking function ROW_NUMBER():
WITH CTE
AS
(
SELECT id, categoria, ROW_NUMBER() OVER(PARTITION BY categoria
ORDER BY (select 1)) AS rn
FROM Articolo
)
SELECT id, categoria
FROM CTE
WHERE rn = 1;
Updated SQL Fiddle Demo
This will give you the first id for each categoria.
Note that: There is no meaning of the first value in the database concepts, because in the relational model, the rows order is not significant. And it is not guaranteed to return the same order each time, you have to ORDER BY specific column to get consistent ordering.