Select duplicate rows - sql

I have data like this :
| col1 |
--------
| 1 |
| 2 |
| 1 |
| 2 |
| 1 |
| 2 |
| 1 |
| 2 |
| 1 |
| 2 |
How can I get like this and order by MAX to Min :
| col1 |
--------
| 2 |
| 1 |
I try this :
SELECT col1 , count(col1 ) FROM myTable GROUP BY col1
But I got strange results

If you want to order by the count of occurences of each value:
SELECT col1, count(1) FROM myTable GROUP BY col1 ORDER BY count(1) DESC
If you want to order by the actual value contained in col1
SELECT DISTINCT col1 FROM myTable ORDER BY col1 DESC

You can use the SQL DISTINCT keyword to only show unique results.
SELECT DISTINCT col1 FROM myTable;
You can then order by that column.
SELECT DISTINCT col1 FROM myTable ORDER BY col1 DESC;

Related

Rank with current order

I have table like this:
col1 | col2
__________________
15077244 | 544648
15077320 | 544648
15080285 | 544632
15382858 | 544648
15584221 | 544648
15584222 | 544648
15584223 | 544628
15584224 | 544628
15584225 | 544628
15584226 | 544628
15584227 | 544632
15584228 | 544632
And I want to rank it as the col2 value changed as in example below (This one is that I want to achieve):
col1 | col2 | rank
________________________
15077244 | 544648 | 1
15077320 | 544648 | 1
15080285 | 544632 | 2
15382858 | 544648 | 1
15584221 | 544648 | 1
15584222 | 544648 | 1
15584223 | 544628 | 3
15584224 | 544628 | 3
15584225 | 544628 | 3
15584226 | 544628 | 3
15584227 | 544632 | 2
15584228 | 544632 | 2
I found an answer that suggest me to use DENSE_RANK() function. So I use it:
SELECT col1, col2, DENSE_RANK() OVER(ORDER BY col2) as rank
FROM myTable
but when I use it it change the order of col1, like this:
col1 | col2 | rank
____________________________
15584223 | 544628 | 1
15584224 | 544628 | 1
15584225 | 544628 | 1
15584226 | 544628 | 1
15080285 | 544632 | 2
15584227 | 544632 | 2
15584228 | 544632 | 2
15077244 | 544648 | 3
15077320 | 544648 | 3
15382858 | 544648 | 3
15584221 | 544648 | 3
15584222 | 544648 | 3
Now when I use ORDER BY at the end of my SELECT query like ORDER BY col1, I have data with correct order but RANKS are wrong, becouse for example col2 value 544648 has RANK 3 but it should have RANK 1.
How to use DENSE_RANK function or something different that helps me RANK my col2 values without changing an data order?
You need to change your order for dense_rank to desc. And order the results by col1 asc.
Fiddle Demo
SELECT
col1
, col2
, DENSE_RANK() OVER(ORDER BY col2 DESC) as rank
FROM myTable
ORDER BY col1 ASC
While there may be an easier solution, here's one approach using a subquery with row_number to establish a grouping of results, ordering by min(col1):
SELECT t.col1, t.col2, t2.rank
FROM myTable t JOIN (
SELECT MIN(col1) minCol1, col2, ROW_NUMBER() OVER (ORDER BY MIN(col1)) rank
FROM myTable
GROUP BY col2
) t2 ON t.col2 = t2.col2
ORDER BY t.col1
Sample Demo
You can use a correlated subquery that with a windowed minimum.
;WITH CorrelatedDistinctCount AS
(
SELECT
D.col1,
D.col2,
(
SELECT
COUNT(DISTINCT(X.col2))
FROM
Data X
WHERE
X.col1 <= D.col1) AS DistinctCol2Count
FROM
Data D
)
SELECT
C.col1,
C.col2,
MIN(C.DistinctCol2Count) OVER (PARTITION BY C.col2) AS rank
FROM
CorrelatedDistinctCount C
ORDER BY
C.col1 ASC

Select most recent rows - last 24 hours

I have a table that looks like this:
col1 | col2 | col3 | t_insert
---------------------------------
1 | z | |2018-04-25 17:23:46.686816+10
1 | zy | |2018-04-26 18:53:46.686816+10
2 | f | |2018-04-26 19:23:46.686816+10
3 | g | |2018-04-27 17:23:46.686816+10
2 | z | |2018-04-27 18:23:46.686816+10
4 | z | |2018-04-27 20:13:46.686816+10
Where there are duplicate values in col1 I want to select by most recent timestamp and create a new column (col4) and insert the string 'update'.
Where there are not duplicate values in col1 I want to select the value and insert the string 'new' into col4.
Also I only want to select rows that have a timestamp from the last 24 hours.
The expected result: (This result dosen't show select rows from last 24 hours)
col1 | col2 | col3 | t_insert | col4 |
-------------------------------------------------------------
1 | zy | |2018-04-26 18:53:46.686816+10 |update |
3 | g | |2018-04-27 17:23:46.686816+10 |new |
2 | z | |2018-04-27 18:23:46.686816+10 |update |
4 | z | |2018-04-27 20:13:46.686816+10 |new |
Thanks in advance,
Hmmm, window function can help here:
select col, col2, col3, t_insert,
(case when cnt > 1 then 'update' else 'new' end) as col4
from (select t.*,
count(*) over (partition by col1) as cnt,
row_number() over (partition by col1 order by t_insert desc) as seqnum
from t
where t_insert >= now() - interval '24 hour'
) t
where seqnum = 1;

In SQL is there a way to partition by a value if it's not continuous

I would like to do the rank the values over a partition with two columns. col1 will be the key and col2 will be some value that is also going to be used in ORDER BY. I would like to start a new partition only when col2 is discontinued. For example, I would like to do the following:
+------+------+------+
| col1 | col2 | rank |
+------+------+------+
| a | 1 | 1 |
| a | 2 | 2 |
| a | 3 | 3 |
| a | 9 | 1 |
| a | 10 | 2 |
| b | 1 | 1 |
| b | 2 | 2 |
| b | 8 | 1 |
+------+------+------+
Thinking somewhere in lines of
SELECT col1, RANK() OVER (PARTITION BY col1, SOMETHING HERE??? ORDER BY col2 DESC)
Does anyone have any ideas?
If I understand correctly, you want to enumerate by "islands" of adjoining sequential values. You can do so with a simple observation: subtracting a sequence from col2 will be constant for each group. So, let's use this observation:
select t.*,
row_number() over (partition by col1, grp order by col1) as rnk
from (select t.*,
(col2 - row_number() over (partition by col1 order by col2)) as grp
from t
) t

If 2 rows have the same ID select one with the greater other column value

I'm having difficulty getting my head round this one, which should be simple.
When selecting from the table, if multiple rows have the same ID then select the row which has a greater value in Col2.
Here is my sample table:
ID | Col2 |
----------------
123 | 1 |
123 | 2 |
1234 | 2 |
12345 | 3 |
Expected output:
ID | Col2 |
----------------
123 | 2 |
1234 | 2 |
12345 | 3 |
For this example, group by is sufficient;
select id, max(col2) as col2
from t
group by id;
If you want the row with the maximum column, then I would often recommend row_number():
select t.*
from (select t.*, row_number() over (partition by id order by col2 desc) as seqnum
from t
) t
where seqnum = 1;
However, the "old-fashioned" method might have better performance:
select t.*
from t
where t.col2 = (select max(t2.col2) from t t2 where t2.id = t.id);
NOT EXISTS operator can also be used:
SELECT * FROM Table1 t1
WHERE NOT EXISTS(
SELECT 'Anything' FROM Table1 t2
WHERE t1.id = t2.id
AND t1.Col2 < t2.col2
)
Demo: http://sqlfiddle.com/#!18/5e1d6/3
| ID | Col2 |
|-------|------|
| 123 | 2 |
| 1234 | 2 |
| 12345 | 3 |

display records based on ranks and also delete duplicated data

i have a table like this
+------+------+------+------+
| col1 | col2 | col3 | rank |
+------+------+------+------+
| 1 | A | X | 4 |
| 2 | C | Y | 3 |
| 2 | C | Y | 3 |
| | A | X | 3 |
| 1 | B | Z | 2 |
+------+------+------+------+
(5 rows)
I need o/p like this
+------+------+------+------+
| col1 | col2 | col3 | rank |
+------+------+------+------+
| 1 | A | X | 4 |
| 2 | C | Y | 3 |
| 1 | B | Z | 2 |
+------+------+------+------+
so that I written query like below
select col1,col2,col3,rank,dense_rank() over(order by rank desc) from table1;
but its not giving proper o/p
try this !!
select a.col1,a.col2,a.col3,max(a.rank) as rank
from [dbo].[5] a join [dbo].[5] b
on a.col1=b.col1 group by a.col1,a.col2,a.col3
looks like you need aggregation with max():
select
col1,col2,col3,
max(rnk)
from table1
group by col1,col2,col3
If you could have different values of col1 for one combination of col2, col3, then distinct on is what you need:
select distinct on (col2, col3)
col1,col2,col3,
rnk
from table1
order by col2, col3, rnk desc
sql fiddle demo
The following should match what you are looking for:
select col1,col2,col3,rank,dense_rank() over(order by rank desc) from table1
WHERE col1 IS NOT NULL
GROUP BY 1, 2, 3, 4;
You can also use numeric aliases in your order by clause if you want one.