SQL: Select Most Recent Sequentially Distinct Value w/ Grouping - sql

I am having trouble writing a query that would select the last "new" sequentially distinct value (let's call this column Col A) grouped based on another column (Col B). Since this is a bit ambiguous/confusing, here is an example to explain (assume row number is indicative of sequence inside groups; in my issue the rows are ordered by date):
|--------|-------|-------|
| RowNum | Col A | Col B |
|--------|-------|-------|
| 1 | A | A |
| 2 | B | A |
| 3 | C | A |
| 4 | B | B |
| 5 | A | B |
| 6 | B | B |
Would select:
| 3 | C | A |
| 6 | B | B |
Note that although B also appears in row 4, the fact that row 5 contains A means that the B in row 6 is sequentially distinct. But if table looked like this:
|--------|-------|-------|
| RowNum | Col A | Col B |
|--------|-------|-------|
| 1 | A | A |
| 2 | B | A |
| 3 | C | A |
| 4 | B | B |
| 5 | A | B |
| 6 | A | B | <--
Then we would want to select:
| 3 | C | A |
| 5 | A | B |
I think that this would be an easier problem if I wasn't concerned with values being distinct but not sequential. I'm not really sure how to even consider sequence when making a query.
I have attempted to solve this by calculating the min/max row numbers where each value of Col A appears. That calculation (using the second sample table) would produce a result like this:
|--------|--------|--------|--------|
| ColA | ColB | MinRow | MaxRow |
|--------|--------|--------|--------|
| A | A | 1 | 1 |
| B | A | 2 | 2 |
| C | A | 3 | 3 |
| A | B | 5 | 6 |
| B | B | 4 | 4 |
A solution raised in a related post (SQL: Select Row with Last New Sequentially Distinct Value) went on a similar path, essentially taking the most recent RowNum which differs from the last ColA and then picks the next row. However, in that question I failed to address the need for the query to work for multiple groups, hence the new post.
Any help with this problem, if it is at all possible to do in SQL, would be greatly appreciated. I am running SQL 2008 SP4.

Hmmm . . . One method is to get the last value. Then choose all the last rows with that value and aggregate:
select min(rownum), colA, colB
from (select t.*,
first_value(colA) over (partition by colB order by rownum desc) as last_colA
from t
) t
where rownum > all (select t2.rownum
from t t2
where t2.colB = t.colB and t2.colA <> t.last_colA
)
group by colA, colB;
Or, without the aggregation:
select t.*
from (select t.*,
first_value(colA) over (partition by colB order by rownum desc) as last_colA,
lag(colA) over (partition by colB order by rownum) as prev_clA
from t
) t
where rownum > all (select t2.rownum
from t t2
where t2.colB = t.colB and t2.colA <> t.last_colA
) and
(prev_colA is null or prev_colA <> colA);
But in SQL Server 2008, let's treat this as a gaps-and-islands problem:
select t.*
from (select t.*,
min(rownum) over (partition by colB, colA, (seqnum_b - seqnum_ab) ) as min_rownum_group,
max(rownum) over (partition by colB, colA, (seqnum_b - seqnum_ab) ) as max_rownum_group
from (select t.*,
row_number() over (partition by colB order by rownum) as seqnum_b,
row_number() over (partition by colB, colA order by rownum) as seqnum_ab,
max(rownum) over (partition by colB order by rownum) as max_rownum
from t
) t
) t
where rownum = min_rownum_group and -- first row in the group defined by adjacent colA, colB
max_rownum_group = max_rownum -- last group for each colB;
This identifies each of the groups using a difference of row numbers. It calculates the maximum rownum for the group and overall in the data. These are the same for the last group.

Related

Get some values from the table by selecting

I have a table:
| id | Number |Address
| -----| ------------|-----------
| 1 | 0 | NULL
| 1 | 1 | NULL
| 1 | 2 | 50
| 1 | 3 | NULL
| 2 | 0 | 10
| 3 | 1 | 30
| 3 | 2 | 20
| 3 | 3 | 20
| 4 | 0 | 75
| 4 | 1 | 22
| 4 | 2 | 30
| 5 | 0 | NULL
I need to get: the NUMBER of the last ADDRESS change for each ID.
I wrote this select:
select dh.id, dh.number from table dh where dh =
(select max(min(t.history)) from table t where t.id = dh.id group by t.address)
But this select not correctly handling the case when the address first changed, and then changed to the previous value. For example id=1: group by return:
| Number |
| -------- |
| NULL |
| 50 |
I have been thinking about this select for several days, and I will be happy to receive any help.
You can do this using row_number() -- twice:
select t.id, min(number)
from (select t.*,
row_number() over (partition by id order by number desc) as seqnum1,
row_number() over (partition by id, address order by number desc) as seqnum2
from t
) t
where seqnum1 = seqnum2
group by id;
What this does is enumerate the rows by number in descending order:
Once per id.
Once per id and address.
These values are the same only when the value is 1, which is the most recent address in the data. Then aggregation pulls back the earliest row in this group.
I answered my question myself, if anyone needs it, my solution:
select * from table dh1 where dh1.number = (
select max(x.number)
from (
select
dh2.id, dh2.number, dh2.address, lag(dh2.address) over(order by dh2.number asc) as prev
from table dh2 where dh1.id=dh2.id
) x
where NVL(x.address, 0) <> NVL(x.prev, 0)
);

delete all duplicated rows in sql

I have table like this
|-------------------------|
| A | B | C |
|-------------------------|
| 1 | 2 | 5 |
|-------------------------|
| 1 | 2 | 10 |
|-------------------------|
| 1 | 2 | 2 |
|-------------------------|
I need to delete all duplicated rows with equals A nad B value and lower C value
after running sql script i need to have only this row with top C Value for every equals A and B columns
|-------------------------|
| A | B | V |
|-------------------------|
| 1 | 2 | 10 |
|-------------------------|
One method is window functions:
select t.*
from (select t.*,
row_number() over (partition by a, b order by v desc) as seqnum
from t
) t
where seqnum = 1;
This returns the entire row, which can be handy if you want additional columns. If you really need just the three columns, then aggregation does what you want:
select a, b, max(v)
from t
group by a, b;
In standard SQL, you can keep only the maximum value using:
delete from t
where t.v < (select max(t2.v) from t t2 where t2.a = t.a and t2.b = t.b);

How to list the latest series with no gaps of a given clause?

Given the following example table:
+-----------+
| Id | Name |
+----+------+
| 1 | A |
| 2 | B |
| 3 | B |
| 4 | C |
| 5 | A |
| 6 | B |
| 7 | B |
| 8 | B |
| 9 | B |
| 10 | X |
+----+------+
I would like a query to get the following result:
+----+------+
| 6 | B |
| 7 | B |
| 8 | B |
| 9 | B |
+----+------+
The best query I could do was:
SELECT * FROM
(SELECT id, name, LEAD(id) OVER (ORDER BY id) t
FROM test WHERE name = 'B' ORDER BY id)
WHERE ID <> t-1;
sqlfiddle here
If you want the length and where it starts:
select min(id), max(id)
from (select t.*,
row_number() over (order by id) as seqnum,
row_number() over (partition by name order by id) as seqnum_1
from test t
) t
where name = 'B'
group by (seqnum - seqnum_1)
order by min(id) desc
fetch first 1 row only;
You can join back to the table to get the original rows.
Another method using window functions to count the number of non-Bs after a given row . . . and then choose the first:
select t.*
from (select t.*,
dense_rank() over (order by nonbs_after asc) as grp
from (select t.*,
sum(case when name <> 'B' then 1 else 0 end) over (order by id desc) as nonbs_after
from test t
) t
where name = 'B'
) t
where grp = 1;
Here is a db<>fiddle.

How can I select each particular data up to a certain quantity?

How can I select each particular data upto a certain quantity. For example in the below table, there are 4 A, 4 B, 2 C and 1 D. Now I want to select all letters but not more than two each of it, Which will yield 2 A, 2 B, 2 C and 1 D.
+====+========+
| ID | Letter |
+====+========+
| 1 | A |
+----+--------+
| 2 | B |
+----+--------+
| 3 | B |
+----+--------+
| 4 | C |
+----+--------+
| 5 | A |
+----+--------+
| 6 | A |
+----+--------+
| 7 | C |
+----+--------+
| 8 | B |
+----+--------+
| 9 | B |
+----+--------+
| 10 | D |
+----+--------+
| 11 | A |
+----+--------+
Can anyone please help me for the above scenario?
I can think of a simple way:
select
case
when count(*) > 1
then 2
else count(*)
end,
second_column
from your_table
group by second_column;
This will give the result you want, but it won't really 'select ONLY two or less records' of each.
Using a ROW_NUMBER() function and a derived table:
CREATE TABLE myTable (id int, Letter varchar(1))
INSERT INTO myTable
VALUES (1,'A')
,(2,'B')
,(3,'B')
,(4,'C')
,(5,'A')
,(6,'A')
,(7,'C')
,(8,'B')
,(9,'B')
,(10,'D')
,(11,'A')
SELECT id, Letter
FROM
(SELECT *
,ROW_NUMBER() OVER(PARTITION BY Letter ORDER BY Letter) as rn
FROM myTable) myTable
WHERE rn = 1 or rn = 2
In essence, "cut" (PARTITION) the rows by Letters, and assign them each a number for its unique group, then pick the first two of each Letter.
Try it here:
http://rextester.com/WTKYCE51114
Use ROW_NUMBER() function to tag each record the row number and PARTITION it BY (grouping by) letter and ORDER it BY (id)
SELECT id,
letter
FROM (SELECT *,
ROW_NUMBER() OVER(PARTITION BY letter ORDER BY id) rnum
FROM myTable
) t
WHERE rnum <=2
Ordering it by id, you will have the first two instances of each letter in ascending order, thus you will have below result (note that id 1 and 5 are selected for A, 2 and 3 for B)
id letter
1 A
5 A
2 B
3 B
4 C
7 C
10 D

Frequency based sort in sql [duplicate]

This question already has answers here:
Order SQL query records by frequency
(2 answers)
Closed 8 years ago.
I have a problem with sorting sql tables.
I have this:
+------+------+
| col1 | col2 |
+------+------+
| a | 1 |
| b | 3 |
| c | 4 |
| d | 3 |
| e | 2 |
| f | 2 |
| g | 2 |
| h | 1 |
+------+------+
And i need to have this:
+------+------+
| col1 | col2 |
+------+------+
| e | 2 |
| f | 2 |
| g | 2 |
| a | 1 |
| h | 1 |
| b | 3 |
| d | 3 |
| c | 4 |
+------+------+
I tried with COUNT(), but it work only with GROUP OF that's why it isn't what i need.
Sorry for my bad english and thanks for all responses.
If database supports OVER clause then it is quite simple:
SELECT t.id, t.value
FROM t
ORDER BY count(*) over (partition by value) DESC
See SQL Fiddle - http://sqlfiddle.com/#!6/ce805/3
I see. You want to sort by the frequency of the values. Most dialects of SQL support window functions, so this does what you want:
select t.col1, t.col2
from (select t.*, count(*) over (partition by col2) as cnt
from table t
) t
order by cnt desc, col2;
Another way of writing this uses a join and aggregation:
select t.*
from table t join
(select col2, count(*) as cnt
from table t
group by col2
) tt
on t.col2 = tt.col2
order by tt.cnt desc, t.col2;
If I understand well your sort order, you want to first have the the rows with the most occurrences of Col2 value, etc...
Here is a suggestion for getting your result:
SELECT T.Col1, T.Col2
FROM YourTable T
ORDER BY (SELECT COUNT(*)
FROM YourTable T2
WHERE T2.Col2 = T.Col2) DESC, T.Col2 DESC, T.Col1 ASC
Hope this will help.