Frequency based sort in sql [duplicate] - sql

This question already has answers here:
Order SQL query records by frequency
(2 answers)
Closed 8 years ago.
I have a problem with sorting sql tables.
I have this:
+------+------+
| col1 | col2 |
+------+------+
| a | 1 |
| b | 3 |
| c | 4 |
| d | 3 |
| e | 2 |
| f | 2 |
| g | 2 |
| h | 1 |
+------+------+
And i need to have this:
+------+------+
| col1 | col2 |
+------+------+
| e | 2 |
| f | 2 |
| g | 2 |
| a | 1 |
| h | 1 |
| b | 3 |
| d | 3 |
| c | 4 |
+------+------+
I tried with COUNT(), but it work only with GROUP OF that's why it isn't what i need.
Sorry for my bad english and thanks for all responses.

If database supports OVER clause then it is quite simple:
SELECT t.id, t.value
FROM t
ORDER BY count(*) over (partition by value) DESC
See SQL Fiddle - http://sqlfiddle.com/#!6/ce805/3

I see. You want to sort by the frequency of the values. Most dialects of SQL support window functions, so this does what you want:
select t.col1, t.col2
from (select t.*, count(*) over (partition by col2) as cnt
from table t
) t
order by cnt desc, col2;
Another way of writing this uses a join and aggregation:
select t.*
from table t join
(select col2, count(*) as cnt
from table t
group by col2
) tt
on t.col2 = tt.col2
order by tt.cnt desc, t.col2;

If I understand well your sort order, you want to first have the the rows with the most occurrences of Col2 value, etc...
Here is a suggestion for getting your result:
SELECT T.Col1, T.Col2
FROM YourTable T
ORDER BY (SELECT COUNT(*)
FROM YourTable T2
WHERE T2.Col2 = T.Col2) DESC, T.Col2 DESC, T.Col1 ASC
Hope this will help.

Related

SQL: Select Most Recent Sequentially Distinct Value w/ Grouping

I am having trouble writing a query that would select the last "new" sequentially distinct value (let's call this column Col A) grouped based on another column (Col B). Since this is a bit ambiguous/confusing, here is an example to explain (assume row number is indicative of sequence inside groups; in my issue the rows are ordered by date):
|--------|-------|-------|
| RowNum | Col A | Col B |
|--------|-------|-------|
| 1 | A | A |
| 2 | B | A |
| 3 | C | A |
| 4 | B | B |
| 5 | A | B |
| 6 | B | B |
Would select:
| 3 | C | A |
| 6 | B | B |
Note that although B also appears in row 4, the fact that row 5 contains A means that the B in row 6 is sequentially distinct. But if table looked like this:
|--------|-------|-------|
| RowNum | Col A | Col B |
|--------|-------|-------|
| 1 | A | A |
| 2 | B | A |
| 3 | C | A |
| 4 | B | B |
| 5 | A | B |
| 6 | A | B | <--
Then we would want to select:
| 3 | C | A |
| 5 | A | B |
I think that this would be an easier problem if I wasn't concerned with values being distinct but not sequential. I'm not really sure how to even consider sequence when making a query.
I have attempted to solve this by calculating the min/max row numbers where each value of Col A appears. That calculation (using the second sample table) would produce a result like this:
|--------|--------|--------|--------|
| ColA | ColB | MinRow | MaxRow |
|--------|--------|--------|--------|
| A | A | 1 | 1 |
| B | A | 2 | 2 |
| C | A | 3 | 3 |
| A | B | 5 | 6 |
| B | B | 4 | 4 |
A solution raised in a related post (SQL: Select Row with Last New Sequentially Distinct Value) went on a similar path, essentially taking the most recent RowNum which differs from the last ColA and then picks the next row. However, in that question I failed to address the need for the query to work for multiple groups, hence the new post.
Any help with this problem, if it is at all possible to do in SQL, would be greatly appreciated. I am running SQL 2008 SP4.
Hmmm . . . One method is to get the last value. Then choose all the last rows with that value and aggregate:
select min(rownum), colA, colB
from (select t.*,
first_value(colA) over (partition by colB order by rownum desc) as last_colA
from t
) t
where rownum > all (select t2.rownum
from t t2
where t2.colB = t.colB and t2.colA <> t.last_colA
)
group by colA, colB;
Or, without the aggregation:
select t.*
from (select t.*,
first_value(colA) over (partition by colB order by rownum desc) as last_colA,
lag(colA) over (partition by colB order by rownum) as prev_clA
from t
) t
where rownum > all (select t2.rownum
from t t2
where t2.colB = t.colB and t2.colA <> t.last_colA
) and
(prev_colA is null or prev_colA <> colA);
But in SQL Server 2008, let's treat this as a gaps-and-islands problem:
select t.*
from (select t.*,
min(rownum) over (partition by colB, colA, (seqnum_b - seqnum_ab) ) as min_rownum_group,
max(rownum) over (partition by colB, colA, (seqnum_b - seqnum_ab) ) as max_rownum_group
from (select t.*,
row_number() over (partition by colB order by rownum) as seqnum_b,
row_number() over (partition by colB, colA order by rownum) as seqnum_ab,
max(rownum) over (partition by colB order by rownum) as max_rownum
from t
) t
) t
where rownum = min_rownum_group and -- first row in the group defined by adjacent colA, colB
max_rownum_group = max_rownum -- last group for each colB;
This identifies each of the groups using a difference of row numbers. It calculates the maximum rownum for the group and overall in the data. These are the same for the last group.

Rank with current order

I have table like this:
col1 | col2
__________________
15077244 | 544648
15077320 | 544648
15080285 | 544632
15382858 | 544648
15584221 | 544648
15584222 | 544648
15584223 | 544628
15584224 | 544628
15584225 | 544628
15584226 | 544628
15584227 | 544632
15584228 | 544632
And I want to rank it as the col2 value changed as in example below (This one is that I want to achieve):
col1 | col2 | rank
________________________
15077244 | 544648 | 1
15077320 | 544648 | 1
15080285 | 544632 | 2
15382858 | 544648 | 1
15584221 | 544648 | 1
15584222 | 544648 | 1
15584223 | 544628 | 3
15584224 | 544628 | 3
15584225 | 544628 | 3
15584226 | 544628 | 3
15584227 | 544632 | 2
15584228 | 544632 | 2
I found an answer that suggest me to use DENSE_RANK() function. So I use it:
SELECT col1, col2, DENSE_RANK() OVER(ORDER BY col2) as rank
FROM myTable
but when I use it it change the order of col1, like this:
col1 | col2 | rank
____________________________
15584223 | 544628 | 1
15584224 | 544628 | 1
15584225 | 544628 | 1
15584226 | 544628 | 1
15080285 | 544632 | 2
15584227 | 544632 | 2
15584228 | 544632 | 2
15077244 | 544648 | 3
15077320 | 544648 | 3
15382858 | 544648 | 3
15584221 | 544648 | 3
15584222 | 544648 | 3
Now when I use ORDER BY at the end of my SELECT query like ORDER BY col1, I have data with correct order but RANKS are wrong, becouse for example col2 value 544648 has RANK 3 but it should have RANK 1.
How to use DENSE_RANK function or something different that helps me RANK my col2 values without changing an data order?
You need to change your order for dense_rank to desc. And order the results by col1 asc.
Fiddle Demo
SELECT
col1
, col2
, DENSE_RANK() OVER(ORDER BY col2 DESC) as rank
FROM myTable
ORDER BY col1 ASC
While there may be an easier solution, here's one approach using a subquery with row_number to establish a grouping of results, ordering by min(col1):
SELECT t.col1, t.col2, t2.rank
FROM myTable t JOIN (
SELECT MIN(col1) minCol1, col2, ROW_NUMBER() OVER (ORDER BY MIN(col1)) rank
FROM myTable
GROUP BY col2
) t2 ON t.col2 = t2.col2
ORDER BY t.col1
Sample Demo
You can use a correlated subquery that with a windowed minimum.
;WITH CorrelatedDistinctCount AS
(
SELECT
D.col1,
D.col2,
(
SELECT
COUNT(DISTINCT(X.col2))
FROM
Data X
WHERE
X.col1 <= D.col1) AS DistinctCol2Count
FROM
Data D
)
SELECT
C.col1,
C.col2,
MIN(C.DistinctCol2Count) OVER (PARTITION BY C.col2) AS rank
FROM
CorrelatedDistinctCount C
ORDER BY
C.col1 ASC

In SQL is there a way to partition by a value if it's not continuous

I would like to do the rank the values over a partition with two columns. col1 will be the key and col2 will be some value that is also going to be used in ORDER BY. I would like to start a new partition only when col2 is discontinued. For example, I would like to do the following:
+------+------+------+
| col1 | col2 | rank |
+------+------+------+
| a | 1 | 1 |
| a | 2 | 2 |
| a | 3 | 3 |
| a | 9 | 1 |
| a | 10 | 2 |
| b | 1 | 1 |
| b | 2 | 2 |
| b | 8 | 1 |
+------+------+------+
Thinking somewhere in lines of
SELECT col1, RANK() OVER (PARTITION BY col1, SOMETHING HERE??? ORDER BY col2 DESC)
Does anyone have any ideas?
If I understand correctly, you want to enumerate by "islands" of adjoining sequential values. You can do so with a simple observation: subtracting a sequence from col2 will be constant for each group. So, let's use this observation:
select t.*,
row_number() over (partition by col1, grp order by col1) as rnk
from (select t.*,
(col2 - row_number() over (partition by col1 order by col2)) as grp
from t
) t

Getting the last updated name

I am having a table having records like this:
+------+------+
| ID | name |
+------+------+
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | A |
| 5 | B |
| 6 | A |
| 7 | A |
| 8 | A |
+------+------+
I need to get value of A after it was last updated from a different value, for example here it would be the row at ID 6.
Try this query (MySQL syntax):
select min(ID)
from records
where name = 'A'
and ID >=
(
select max(ID)
from records
where name <> 'A'
);
Illustration:
select * from records;
+------+------+
| ID | name |
+------+------+
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | A |
| 5 | B |
| 6 | A |
| 7 | A |
| 8 | A |
+------+------+
-- run query:
+---------+
| min(ID) |
+---------+
| 6 |
+---------+
Using the Lag function...
SELECT Max([ID])
FROM (SELECT [name], [ID],
Lag([name]) OVER (ORDER BY [ID]) AS PrvVal
FROM tablename) tbl
WHERE [name] = 'A'
AND prvval <> 'A'
Online Demo: http://www.sqlfiddle.com/#!18/a55eb/2/0
If you want to get the whole row, you can do this...
SELECT Top 1 *
FROM (SELECT [name], [ID],
Lag([name]) OVER (ORDER BY [ID]) AS PrvVal
FROM tablename) tbl
WHERE [name] = 'A' AND prvval <> 'A'
ORDER BY [ID] DESC
Online Demo: http://www.sqlfiddle.com/#!18/a55eb/22/0
The ANSI SQL below uses a self-join on the previous id.
And the where-clause gets those with a name that's different from the previous.
select max(t1.ID) as ID
from YourTable as t1
left join YourTable as t2 on t1.ID = t2.ID+1
where (t1.name <> t2.name or t2.name is null)
and t1.name = 'A';
It should work on most RDBMS, including MS Sql Server.
Note that with the ID+1 that there's an assumption that are no gaps between the ID's.

display records based on ranks and also delete duplicated data

i have a table like this
+------+------+------+------+
| col1 | col2 | col3 | rank |
+------+------+------+------+
| 1 | A | X | 4 |
| 2 | C | Y | 3 |
| 2 | C | Y | 3 |
| | A | X | 3 |
| 1 | B | Z | 2 |
+------+------+------+------+
(5 rows)
I need o/p like this
+------+------+------+------+
| col1 | col2 | col3 | rank |
+------+------+------+------+
| 1 | A | X | 4 |
| 2 | C | Y | 3 |
| 1 | B | Z | 2 |
+------+------+------+------+
so that I written query like below
select col1,col2,col3,rank,dense_rank() over(order by rank desc) from table1;
but its not giving proper o/p
try this !!
select a.col1,a.col2,a.col3,max(a.rank) as rank
from [dbo].[5] a join [dbo].[5] b
on a.col1=b.col1 group by a.col1,a.col2,a.col3
looks like you need aggregation with max():
select
col1,col2,col3,
max(rnk)
from table1
group by col1,col2,col3
If you could have different values of col1 for one combination of col2, col3, then distinct on is what you need:
select distinct on (col2, col3)
col1,col2,col3,
rnk
from table1
order by col2, col3, rnk desc
sql fiddle demo
The following should match what you are looking for:
select col1,col2,col3,rank,dense_rank() over(order by rank desc) from table1
WHERE col1 IS NOT NULL
GROUP BY 1, 2, 3, 4;
You can also use numeric aliases in your order by clause if you want one.