In SQL is there a way to partition by a value if it's not continuous - sql

I would like to do the rank the values over a partition with two columns. col1 will be the key and col2 will be some value that is also going to be used in ORDER BY. I would like to start a new partition only when col2 is discontinued. For example, I would like to do the following:
+------+------+------+
| col1 | col2 | rank |
+------+------+------+
| a | 1 | 1 |
| a | 2 | 2 |
| a | 3 | 3 |
| a | 9 | 1 |
| a | 10 | 2 |
| b | 1 | 1 |
| b | 2 | 2 |
| b | 8 | 1 |
+------+------+------+
Thinking somewhere in lines of
SELECT col1, RANK() OVER (PARTITION BY col1, SOMETHING HERE??? ORDER BY col2 DESC)
Does anyone have any ideas?

If I understand correctly, you want to enumerate by "islands" of adjoining sequential values. You can do so with a simple observation: subtracting a sequence from col2 will be constant for each group. So, let's use this observation:
select t.*,
row_number() over (partition by col1, grp order by col1) as rnk
from (select t.*,
(col2 - row_number() over (partition by col1 order by col2)) as grp
from t
) t

Related

Rank with current order

I have table like this:
col1 | col2
__________________
15077244 | 544648
15077320 | 544648
15080285 | 544632
15382858 | 544648
15584221 | 544648
15584222 | 544648
15584223 | 544628
15584224 | 544628
15584225 | 544628
15584226 | 544628
15584227 | 544632
15584228 | 544632
And I want to rank it as the col2 value changed as in example below (This one is that I want to achieve):
col1 | col2 | rank
________________________
15077244 | 544648 | 1
15077320 | 544648 | 1
15080285 | 544632 | 2
15382858 | 544648 | 1
15584221 | 544648 | 1
15584222 | 544648 | 1
15584223 | 544628 | 3
15584224 | 544628 | 3
15584225 | 544628 | 3
15584226 | 544628 | 3
15584227 | 544632 | 2
15584228 | 544632 | 2
I found an answer that suggest me to use DENSE_RANK() function. So I use it:
SELECT col1, col2, DENSE_RANK() OVER(ORDER BY col2) as rank
FROM myTable
but when I use it it change the order of col1, like this:
col1 | col2 | rank
____________________________
15584223 | 544628 | 1
15584224 | 544628 | 1
15584225 | 544628 | 1
15584226 | 544628 | 1
15080285 | 544632 | 2
15584227 | 544632 | 2
15584228 | 544632 | 2
15077244 | 544648 | 3
15077320 | 544648 | 3
15382858 | 544648 | 3
15584221 | 544648 | 3
15584222 | 544648 | 3
Now when I use ORDER BY at the end of my SELECT query like ORDER BY col1, I have data with correct order but RANKS are wrong, becouse for example col2 value 544648 has RANK 3 but it should have RANK 1.
How to use DENSE_RANK function or something different that helps me RANK my col2 values without changing an data order?
You need to change your order for dense_rank to desc. And order the results by col1 asc.
Fiddle Demo
SELECT
col1
, col2
, DENSE_RANK() OVER(ORDER BY col2 DESC) as rank
FROM myTable
ORDER BY col1 ASC
While there may be an easier solution, here's one approach using a subquery with row_number to establish a grouping of results, ordering by min(col1):
SELECT t.col1, t.col2, t2.rank
FROM myTable t JOIN (
SELECT MIN(col1) minCol1, col2, ROW_NUMBER() OVER (ORDER BY MIN(col1)) rank
FROM myTable
GROUP BY col2
) t2 ON t.col2 = t2.col2
ORDER BY t.col1
Sample Demo
You can use a correlated subquery that with a windowed minimum.
;WITH CorrelatedDistinctCount AS
(
SELECT
D.col1,
D.col2,
(
SELECT
COUNT(DISTINCT(X.col2))
FROM
Data X
WHERE
X.col1 <= D.col1) AS DistinctCol2Count
FROM
Data D
)
SELECT
C.col1,
C.col2,
MIN(C.DistinctCol2Count) OVER (PARTITION BY C.col2) AS rank
FROM
CorrelatedDistinctCount C
ORDER BY
C.col1 ASC

How to ignore nulls in PostgreSQL window functions? or return the next non-null value in a column

Lets say I have the following table:
| User_id | COL1 | COL2 |
+---------+----------+------+
| 1 | | 1 |
| 1 | | 2 |
| 1 | 2421 | |
| 1 | | 1 |
| 1 | 3542 | |
| 2 | | 1 |
I need another column indicating the next non-null COL1 value for each row, so the result would look like the below:
| User_id | COL1 | COL2 | COL3 |
+---------+----------+------+------
| 1 | | 1 | 2421 |
| 1 | | 2 | 2421 |
| 1 | 2421 | | |
| 1 | | 1 | 3542 |
| 1 | 3542 | | |
| 2 | | 1 | |
SELECT
first_value(COL1 ignore nulls) over (partition by user_id order by COL2 rows unbounded following)
FROM table;
would work but I'm using PostgreSQL which doesn't support the ignore nulls clause.
Any suggested workarounds?
You can still do it with windowing function if you add a case when criteria in the order by like this:
select
first_value(COL1)
over (
partition by user_id
order by case when COL1 is not null then 0 else 1 end ASC, COL2
rows unbounded following
)
from table
This will use non null values first.
However performance will probably not be great compared to skip nulls because the database will have to sort on the additional criteria.
I also had the same problem. The other solutions may work, but I have to build multiple windows for each row I need.
You can try this snippets : https://wiki.postgresql.org/wiki/First/last_(aggregate)
If you create the aggregates you can use them:
SELECT
first(COL1) over (partition by user_id order by COL2 rows unbounded following)
FROM table;
There is always the tried and true approach of using a correlated subquery:
select t.*,
(select t2.col1
from t t2
where t2.id >= t.id and t2.col1 is not null
order by t2.id desc
fetch first 1 row only
) as nextcol1
from t;
Hope this helps,
SELECT * FROM TABLE ORDER BY COALESCE(colA, colB);
which orders by colA and if colA has NULL value it orders by colB.
You can use COALESCE() function. For your query:
SELECT
first_value(COALESCE(COL1)) over (partition by user_id order by COL2 rows unbounded following)
FROM table;
but i don't understand what the reason to use sort by COL2, because this rows has null value for COL2:
| User_id | COL1 | COL2 |
+---------+----------+------+
| 1 | | 1 |
| 1 | | 2 |
| 1 | 2421 | | <<--- null?
| 1 | | 1 |
| 1 | 3542 | | <<--- null?
| 2 | | 1 |

Frequency based sort in sql [duplicate]

This question already has answers here:
Order SQL query records by frequency
(2 answers)
Closed 8 years ago.
I have a problem with sorting sql tables.
I have this:
+------+------+
| col1 | col2 |
+------+------+
| a | 1 |
| b | 3 |
| c | 4 |
| d | 3 |
| e | 2 |
| f | 2 |
| g | 2 |
| h | 1 |
+------+------+
And i need to have this:
+------+------+
| col1 | col2 |
+------+------+
| e | 2 |
| f | 2 |
| g | 2 |
| a | 1 |
| h | 1 |
| b | 3 |
| d | 3 |
| c | 4 |
+------+------+
I tried with COUNT(), but it work only with GROUP OF that's why it isn't what i need.
Sorry for my bad english and thanks for all responses.
If database supports OVER clause then it is quite simple:
SELECT t.id, t.value
FROM t
ORDER BY count(*) over (partition by value) DESC
See SQL Fiddle - http://sqlfiddle.com/#!6/ce805/3
I see. You want to sort by the frequency of the values. Most dialects of SQL support window functions, so this does what you want:
select t.col1, t.col2
from (select t.*, count(*) over (partition by col2) as cnt
from table t
) t
order by cnt desc, col2;
Another way of writing this uses a join and aggregation:
select t.*
from table t join
(select col2, count(*) as cnt
from table t
group by col2
) tt
on t.col2 = tt.col2
order by tt.cnt desc, t.col2;
If I understand well your sort order, you want to first have the the rows with the most occurrences of Col2 value, etc...
Here is a suggestion for getting your result:
SELECT T.Col1, T.Col2
FROM YourTable T
ORDER BY (SELECT COUNT(*)
FROM YourTable T2
WHERE T2.Col2 = T.Col2) DESC, T.Col2 DESC, T.Col1 ASC
Hope this will help.

display records based on ranks and also delete duplicated data

i have a table like this
+------+------+------+------+
| col1 | col2 | col3 | rank |
+------+------+------+------+
| 1 | A | X | 4 |
| 2 | C | Y | 3 |
| 2 | C | Y | 3 |
| | A | X | 3 |
| 1 | B | Z | 2 |
+------+------+------+------+
(5 rows)
I need o/p like this
+------+------+------+------+
| col1 | col2 | col3 | rank |
+------+------+------+------+
| 1 | A | X | 4 |
| 2 | C | Y | 3 |
| 1 | B | Z | 2 |
+------+------+------+------+
so that I written query like below
select col1,col2,col3,rank,dense_rank() over(order by rank desc) from table1;
but its not giving proper o/p
try this !!
select a.col1,a.col2,a.col3,max(a.rank) as rank
from [dbo].[5] a join [dbo].[5] b
on a.col1=b.col1 group by a.col1,a.col2,a.col3
looks like you need aggregation with max():
select
col1,col2,col3,
max(rnk)
from table1
group by col1,col2,col3
If you could have different values of col1 for one combination of col2, col3, then distinct on is what you need:
select distinct on (col2, col3)
col1,col2,col3,
rnk
from table1
order by col2, col3, rnk desc
sql fiddle demo
The following should match what you are looking for:
select col1,col2,col3,rank,dense_rank() over(order by rank desc) from table1
WHERE col1 IS NOT NULL
GROUP BY 1, 2, 3, 4;
You can also use numeric aliases in your order by clause if you want one.

Select duplicate rows

I have data like this :
| col1 |
--------
| 1 |
| 2 |
| 1 |
| 2 |
| 1 |
| 2 |
| 1 |
| 2 |
| 1 |
| 2 |
How can I get like this and order by MAX to Min :
| col1 |
--------
| 2 |
| 1 |
I try this :
SELECT col1 , count(col1 ) FROM myTable GROUP BY col1
But I got strange results
If you want to order by the count of occurences of each value:
SELECT col1, count(1) FROM myTable GROUP BY col1 ORDER BY count(1) DESC
If you want to order by the actual value contained in col1
SELECT DISTINCT col1 FROM myTable ORDER BY col1 DESC
You can use the SQL DISTINCT keyword to only show unique results.
SELECT DISTINCT col1 FROM myTable;
You can then order by that column.
SELECT DISTINCT col1 FROM myTable ORDER BY col1 DESC;