How to get first and last row of each island? - sql

So I recently got excellent help on a problem. However, I need to get a little more precise, which hopefully is doable in SQL.
This was my last problem:
Select only rows that has a column changed from the rows before it, given an unique ID
Clarificiation:
The help I got in that problem was to give me the start of every Island. However, I want the start and stop of every Island instead.
My nuance is this:
personID | status | unixtime | column d | column e | column f
1 2 213214 x y z
1 2 213325 x y z
1 2 213326 x y z
1 2 213327 x y z
1 2 213328 x y z <-- I want this
1 3 214330 x y z <-- Any of this is OK
1 3 214331 x y z
1 3 214332 x y z <-- I want this or
1 2 324543 x y z <-- I want this
So instead of the start of the islands, I want the end of the island. If I get something in between it's totally ok, preferably it would be the end. But I really want what is the "right before" and "right after" the status changes, if this gives any meaning. This could be a specific status.

This query produces all rows that either end or start a partition (or both in case of a single-row partition):
SELECT *
FROM (
SELECT *
, lag(status) OVER w IS DISTINCT FROM status AS partition_start
, lead(status) OVER w IS DISTINCT FROM status AS partition_end
FROM tbl
WINDOW w AS (PARTITION BY personID ORDER BY unixtime)
) sub
WHERE (partition_start OR partition_end)
ORDER BY personID, unixtime;
db<>fiddle here
Note that with PARTITION BY personID, rows with a different personID do not interrupt the "island". I added rows to your test case in the fiddle to demonstrate the effect.
If your requirements are different, you'll have to define how.

select t.*
from (select t.*,
case when status <> lag(status,1,NULL) over(partition by personID order by unixtime)
then 1
when lag(status,1,NULL) over(partition by personID order by unixtime) is null
then 1
else 0 end as start_status,
case when status <> lead(status,1,NULL) over(partition by personID order by unixtime)
then 1
when lead(status,1,NULL) over(partition by personID order by unixtime) is null
then 1
else 0 end as end_status
from mytable t
) t
where end_status = 1
--or start_status = 1 -- uncomment this line if you want start statuses as well

Related

Finding adjacent column values from the last non-null value of a certain column in Snowflake (SQL) using partition by

Say I have the following table:
ID
T
R
1
2
1
3
Y
1
4
1
5
1
6
Y
1
7
I would like to add a column which equals the value from column T based on the last non-null value from column R. This means the following:
ID
T
R
GOAL
1
2
1
3
Y
1
4
Y
3
1
5
4
1
6
Y
4
1
7
6
I do have many ID's so I need to make use of the OVER (PARTITION BY ...) clause. Also, if possible, I would like to use a single statement, like
SELECT *
, GOAL
FROM TABLE
So without any extra select statement.
T is in ascending order so just null it out according to R and take the maximum looking backward.
select *,
max(case when R is not null then T end)
over (
partition by id
order by T
rows between unbounded preceding and 1 preceding
) as GOAL
from TBL
http://sqlfiddle.com/#!18/c927a5/5

How to select the last value which is not null?

I have the following table:
id a b
1 1 kate
1 4 null
1 3 paul
1 3 paul
1 2 lola
2 1 kim
2 9 null
2 2 null
In result it should be this:
1 3 paul
2 1 kim
I want to get the last a where b is not null. Something like:
select b
from (select,b
row_num() over (partition by id order by a desc) as num) as f
where num = 1
But this way I get a null value, because to the last a = 4 corresponds to b IS NULL. Maybe there is a way to rewrite ffill method from pandas?
Assuming:
a is defined NOT NULL.
You want the row with the greatest a where b IS NOT NULL - per id.
SELECT DISTINCT ON (id) *
FROM tbl
WHERE b IS NOT NULL
ORDER BY id, a DESC;
db<>fiddle here
Detailed explanation:
Select first row in each GROUP BY group?
Try:
select id, a, b
from (select id, a, b,
row_num() over (partition by id order by a desc nulls last) as num
from unnamedTable) t
where num = 1
Or, if that isn't right, try it with nulls first. I can never remember which way it works with desc.
If you aren't guaranteed to have at least one non-null per id then you'll want to move nulls to the bottom of the list rather than filtering those rows out entirely.
select id, a, b
from (
select id, a, b,
row_number() over (
partition by id
order by case when b is not null then 0 else 1 end, a desc
) as num
) as f
where num = 1
You can wrap this around a cte and join it back to the main table if you wish to keep the original columns as is, but looking at your expected output and logic, this should do it. Having said that, row_number() based approach might be a tad faster.
select distinct
id,
max(a) over (partition by id) as a,
first_value(b) over (partition by id order by a desc) as b
from tbl
where b is not null;

SQL query to find the entries corresponding to the maximum count of each type

I have a table X in Postgres with the following entries
A B C
2 3 1
3 3 1
0 4 1
1 4 1
2 4 1
3 4 1
0 5 1
1 5 1
2 5 1
3 5 1
0 2 2
1 2 3
I would like to find out the entries having maximum of Column C for every kind of A and B i.e (group by B) with the most efficient query possible and return corresponding A and B.
Expected Output:
A B C
1 2 3
2 3 1
0 4 1
0 5 1
Please help me with this problem . Thank you
demo: db<>fiddle
Using DISTINCT ON:
SELECT DISTINCT ON (B)
A, B, C
FROM
my_table
ORDER BY B, C DESC, A
DISTINCT ON gives you exactly the first row for an ordered group. In this case B is grouped.
After ordering B (which is necessary): We first order the maximum C (with DESC) to the top of each group. Then (if there are tied MAX(C) values) we order the A to get the minimum A to the top.
Seems like it is a greatest n per group problem:
WITH cte AS (
SELECT *, RANK() OVER (PARTITION BY B ORDER BY C DESC, A ASC) AS rnk
FROM t
)
SELECT *
FROM cte
WHERE rnk = 1
You're not clear which A needs to be considered, the above returns the row with smallest A.
itseems to me you need max()
select A,B, max(c) from table_name
group by A,B
this will work:
select * from (SELECT t.*,
rank() OVER (PARTITION BY A,B order by C) rank
FROM tablename t)
where rank=1 ;

how to select one row from several rows with minimum value

The question based on SQL query to select distinct row with minimum value.
Consider the table:
id game point
1 x 1
1 y 10
1 z 1
2 x 2
2 y 5
2 z 8
Using suggested answers from mentioned question (select the ids that have the minimum value in the point column, grouped by game) we obtain
id game point
1 x 1
1 z 1
2 x 2
The question is how to obtain answer with single output for each ID. Both outputs
id game point
1 x 1
2 x 2
and
id game point
1 z 1
2 x 2
are acceptable.
Use row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by point asc) as seqnum
from t
) t
where seqnum = 1;
We assume that all point entries are distinct(for each id and it's game so we can obtain the minimum of each id with it's game), Using a subquery and an inner join with two conditions would give you the result you,re waiting for.If it doesnt work with you I got another solution :
SELECT yt.*,
FROM Yourtable yt INNER JOIN
(
SELECT ID, MIN(point) MinPoint
FROM Yourtable
GROUP BY ID
) t ON yt.ID = t.ID AND yt.Record_Date = yt.MinDate

PLSQL or SSRS, How to select having all values in a group?

I have a table like this.
ID NAME VALUE
______________
1 A X
2 A Y
3 A Z
4 B X
5 B Y
6 C X
7 C Z
8 D Z
9 E X
And the query:
SELECT * FROM TABLE1 T WHERE T.VALUE IN (X,Z)
This query gives me
ID NAME VALUE
______________
1 A X
3 A Z
4 B X
6 C X
7 C Z
8 D Z
9 E X
But i want to see all values of names which have all params. So, only A and C have both X and Z values, and my desired result is:
ID NAME VALUE
______________
1 A X
2 A Y
3 A Z
6 C X
7 C Z
How can I get the desired result? No matter with sql or with reporting service. Maybe "GROUP BY ..... HAVING" clause will help, but I'm not sure.
By the way I dont know how many params will be in the list.
I realy appreciate any help.
The standard approach would be something like
SELECT id, name, value
FROM table1 a
WHERE name IN (SELECT name
FROM table1 b
WHERE b.value in (x,y)
GROUP BY name
HAVING COUNT(distinct value) = 2)
That would require that you determine how many values are in the list so that you can use a 2 in the HAVING clause if there are 2 elements, a 5 if there are 5 elements, etc. You could also use analytic functions
SELECT id, name, value
FROM (SELECT id,
name,
value,
count(distinct value) over (partition by name) cnt
FROM table1 t1
WHERE t1.value in (x,y))
WHERE cnt = 2
I prefer to structure these "sets within sets" of queries as an aggregatino. I find this is the most flexible approach:
select t.*
from t
where t.name in (select name
from t
group by name
having sum(case when value = 'X' then 1 else 0 end) > 0 and
sum9case when value = 'Y' then 1 else 0 end) > 0
)
The subquery for the in finds all names that have at least one X value and one Y value. Using the same logic, it is easy to adjust for other conditions (X and Y and Z,; X and Y but not Z and so on). The outer query just returns all the rows instead of the names.