Select Distinct Rows And Include Non-Distinct Identifier - sql

Consider a table like the following.
ID Value Change_Date
1 A 1/1/2017
1 B 1/2/2017
1 B 1/3/2017
2 C 1/1/2017
2 C 1/3/2017
3 D 1/1/2017
3 E 1/3/2017
3 F 1/4/2017
3 D 1/10/2017
I would like to perform a select statement which effectively does a distinct, but includes the change_date value of the first chronological occurrence of the distinct row. So the above table would be rendered into something like this:
ID Value Change_Date
1 A 1/1/2017
1 B 1/2/2017
2 C 1/1/2017
3 D 1/1/2017
3 E 1/3/2017
3 F 1/4/2017
3 D 1/10/2017
Is this even remotely possible in Oracle? I'm trying to weed out a bunch of dummy updates from an audit table, but need the change_date so I can show when it changed from value to value. A query like
select distinct id, value from my_table
will obviously get me seed information, but I need to tie it back to the proper dates. I could possibly use this and min(change_date), but that would mean the query would see the first row for id 3 as being the same as the last row for id 3, which is not correct.
EDIT: Please note, I'm not looking for the simple distinct on ID and Value. I need to also include when it switches back to a previous value, as seen for ID 3. All four values of ID 3 should be preserved in the output.

You seem to only want the first row when multiple id/value pairs are in a row. Use lag():
select t.*
from (select t.*,
lag(value) over (partition by id order by change_date) as prev_value
from t
) t
where prev_value is null or prev_value <> value;

Related

Get certain rows, plus rows before and after

Let's say I have the following data set:
ID
Identifier
Admission_Date
Release_Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
234
2
4/15/22
4/18/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
789
2
7/1/22
7/5/22
321
2
6/1/21
6/3/21
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
321
2
5/6/21
5/10/21
I want all rows with identifier=1. I also want rows that are either directly below or above rows with Identifier=1 - sorted by most recent to least recent.
There is always a row below rows with identifier=1. There may or may not be a row above. If there is no row with identifier=1 for an ID, then it will not be brought in with a prior step.
The resulting data set should be as follows:
ID
Identifier
Admission Date
Release Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
I am using DBeaver, which runs PostgreSQL.
I admittedly don't know Postgres well so the following could possibly be optimised, however using a combination of lag and lead to obtain the previous and next dates (assuming Admission_date is the one to order by) you could try
with d as (
select *,
case when identifier = 1 then Lag(admission_date) over(partition by id order by Admission_Date desc) end pd,
case when identifier = 1 then Lead(admission_date) over(partition by id order by Admission_Date desc) end nd
from t
)
select id, Identifier, Admission_Date, Release_Date
from d
where identifier = 1
or exists (
select * from d d2
where d2.id = d.id
and (d.Admission_Date = pd or d.admission_date = nd)
)
order by Id, Admission_Date desc;
One way:
SELECT (x.my_row).* -- decompose fields from row type
FROM (
SELECT identifier
, lag(t) OVER w AS t0 -- take whole row
, t AS t1
, lead(t) OVER w AS t2
FROM tbl t
WINDOW w AS (PARTITION BY id ORDER BY admission_date)
) sub
CROSS JOIN LATERAL (
VALUES (t0), (t1), (t2) -- pivot
) x(my_row)
WHERE sub.identifier = 1
AND (x.my_row).id IS NOT NULL; -- exclude rows with NULL ( = missing row)
db<>fiddle here
The query is designed to only make a single pass over the table.
Uses some advanced SQL / Postgres features.
About LATERAL:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
About the VALUES expression:
Postgres: convert single row to multiple rows (unpivot)
The manual about extracting fields from a composite type.
If there are many rows per id, other solutions will be (much) faster - with proper index support. You did not specify ...

How can I select a table skipping duplicated value postgreSQL

I have a table like this.
id
grade_1
grade_2
createdAt
1
1
1
20220304
2
1
1
20220301
3
4
2
20220228
I want to select the current row(in here, id=1) and a row where the grade's value is different with the row I selected.(in here, id=3)
Like This
id
grade_1
grade_2
createdAt
1
1
1
20220304
3
4
2
20220228
I tried to use subquery but it doesn't really worked for me. Is there any way to skip the duplicated value when selecting table?
You can just do it with group by and a max value to retieve the one you want
SELECT
grade_1,
grade_2,
Max(createdAt)
from
yourTable
Group by
grade_1,
grade_2

SQL - Need to find duplicates where one column can have multiple values

I am pretty sure this SQL requires using GROUP BY and HAVING, but not sure how to write it.
I have a table similar to this:
ID
Cust#
Order#
ItemCode
DataPoint1
DataPoint2
1
001
123
I
xxxyyyxxx
123456
2
001
123
Insert
xxxyyyxxx
123456
3
001
123
Delete
asdf
9999
4
001
123
D
asdf
9999
In this table Rows 1 & 2 are effectively duplicates, as are rows 3 & 4.
This is determined by the ItemCode having the value of 'I' or 'Insert' in rows 1 & 2. And 'D' or 'Delete' in rows 3 & 4.
How could I write a SQL select statement to return rows 2 and 4, as I am interested in pulling out the duplicated rows with the higher ID value.
Thanks for any help.
Replace the "offending" column with a consistent value. Then, you can use row_number() or a similar mechanism:
select t.*
from (select t.*,
row_number() over (partition by Cust#, Order#, left(ItemCode, 1), DataPoint1, DataPoint2
order by id asc
) as seqnum
from t
) t
where seqnum > 1;
Note: Not all databases support left(), but all support the functionality somehow. This does assume that the first character of the ItemCode is sufficient to identify identical rows, regardless of the value.

Select the first row in the last group of consecutive rows

How would I select the row that is the first occurrence in the last 'grouping' of consecutive rows, where a grouping is defined by the consecutive appearance of a particular column value (in the example below state).
For example, given the following table:
id
datetime
state
value_needed
1
2021-04-01 09:42:41.319000
incomplete
A
2
2021-04-04 09:42:41.319000
done
B
3
2021-04-05 09:42:41.319000
incomplete
C
4
2021-04-05 10:42:41.319000
incomplete
C
5
2021-04-07 09:42:41.319000
done
D
6
2021-04-012 09:42:41.319000
done
E
I would want the row with id=5 as it it is the first occurrence of state=done in the last (i.e. most recent) grouping of state=done.
Assuming all columns NOT NULL.
SELECT *
FROM tbl t1
WHERE NOT EXISTS (
SELECT FROM tbl t2
WHERE t2.state <> t1.state
AND t2.datetime > t1.datetime
)
ORDER BY datetime
LIMIT 1;
db<>fiddle here
NOT EXISTS is only true for the last group of peers. (There is no later row with a different state.)
ORDER BY datetime and take the first. Voilá.
Here's a window function solution that accesses your table only once (which may or may not perform better for large data sets):
SELECT *
FROM (
SELECT *,
LEAD (state) OVER (ORDER BY datetime DESC)
IS DISTINCT FROM state AS first_in_group
FROM tbl
) t
WHERE first_in_group
ORDER BY datetime DESC
LIMIT 1
A dbfiddle based on Erwin Brandstetter's. To illustrate, here's the value of first_in_group for each row:
id datetime state value_needed first_in_group
---------------------------------------------------------------------
6 2021-04-12 09:42:41.319 done E f
5 2021-04-07 09:42:41.319 done D t
4 2021-04-05 10:42:41.319 incomplete C f
3 2021-04-05 09:42:41.319 incomplete C t
2 2021-04-04 09:42:41.319 done B t
1 2021-04-01 09:42:41.319 incomplete A t

SQL Group by only correlative rows

Say I have the following table:
Code A B C Date ID
------------------------------
50 1 1 A 2018-01-08 150001
50 1 1 A 2018-01-15 165454
50 1 1 B 2018-02-01 184545
50 1 1 A 2018-02-02 195487
I need the sql query to output the following:
Code A B C Min(Date) Min(ID)
-------------------------------
50 1 1 A 2018-01-08 150001
50 1 1 B 2018-02-01 184545
50 1 1 A 2018-02-02 195487
If I use standard group by, rows 1,2,4 are grouped in 1 row, and this is not that I want.
I want to select the row with MIN(date) and MIN(id) from the duplicate records that are together based on column code, A, B and C
in this case 1st 2 rows are duplicates so i want the min() row.
and 3rd and 4th row are distinct.
Note that the database is Vertica 8.1, that is very similar to Oracle or PostgreSQL
I think you would need the analytic function LAG(). Using this function, you can get the value of the previous row (or NULL if it's the first row itself). So you can check if the value on the previous row is different or not, and filter accordingly.
I'm not familiar with Vertica, but this should be the correct documentation for it: https://my.vertica.com/docs/7.0.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Analytic/LAGAnalytic.htm
Please try the query below, it should do it:
SELECT l.Code, l.A, l.B, l.C, l.Date, l.ID
FROM (SELECT t.*,
LAG(t.C, 1) OVER (PARTITION BY t.Code, t.A ORDER BY t.Date) prev_val
FROM table_1 t) l
WHERE l.C != l.prev_val
OR l.prev_val IS NULL
ORDER BY l.Code, l.A, l.Date