How do I "dedup" rows based on most recently updated

How do I "dedup" rows based on most recently updated - sql

Lets say I have a table whose content looks like
ID Name Last Update
============================
1 A 1 JAN 2018
1 A 2 JAN 2018
1 A 3 JAN 2018
2 B 3 JAN 2018
2 B 6 JAN 2018
I want to get the result
ID Name Last Update
============================
1 A 3 JAN 2018
2 B 6 JAN 2018
How can I do it?
I tried to group by ID but, how do I get the most recent?

While #Nik's solution can work in situations where there are either no ties for the MAX(date) values (or it doesn't matter which tie value gets selected and whether this produces multiple output rows), an alternative approach is to group all records by ID sort all records belonging to one group by date in descending order and then pick the very first result row per group.
This can be achieved by using the SQL standard window function ROW_NUMBER() like this:
SELECT ID, NAME, DATE
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY ID
ORDER BY DATE DESC) RN
, ID
, NAME
, DATE
FROM <TABLE_NAME>
)
WHERE RN = 1;

You could use a query like this to get the results that you need:
SELECT *
FROM table
WHERE (ID, date) IN (SELECT
ID, MAX(Last Update)
FROM table
GROUP BY ID)

Related

SQL select 1 row out of several rows that have similar values

I have a table like this:
ID
OtherID
Date
1
z
2022-09-19
1
b
2021-04-05
2
e
2022-04-05
3
t
2022-07-08
3
z
2021-03-02
I want a table like this:
ID
OtherID
Date
1
z
2022-09-19
2
e
2022-04-05
3
t
2022-07-08
That have distinct pairs consisted of ID-OtherID based on the Date values which are the most recent.
The problem I have now is the relationship between ID and OtherID is 1:M
I've looked at SELECT DISTINCT, GROUP BY, LAG but I couldn't figure it out. I'm sorry if this is a duplicate question. I couldn't find the right keywords to search for the answer.
Update: I use Postgres but would like to know other SQL as well.

This works for many dbms (versions of postgres, mysql and others) but you may need to adapt if something else. You could use a CTE, or a join, or a subquery such as this:
select id, otherid, date
from (
select id, otherid, date,
rank() over (partition by id order by date desc) as id_rank
from my_table
)z
where id_rank = 1
id
otherid
date
1
z
2022-09-19T00:00:00.000Z
2
e
2022-04-05T00:00:00.000Z
3
t
2022-07-08T00:00:00.000Z

You can use a Common Table Expression (CTE) with ROW_NUMBER() to assign a row number based on the ID column (then return the first row for each ID in the WHERE clause rn = 1):
WITH cte AS
(SELECT ID,
OtherID,
Date,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Date DESC) AS rn
FROM sample_table)
SELECT ID,
OtherID,
Date
FROM cte
WHERE rn = 1;
Result:
ID
OtherID
Date
1
z
2022-09-19
2
e
2022-04-05
3
t
2022-07-08
Fiddle here.

How to add single value in a new column

my goal is to put the value of the 1 row in every row of the new column.
First value in this example is the number 10.
The New Table is showing my goal.
Table
Product ID Name Value
1 ABC 10
2 XYZ 22
3 LMM 8
New Table
Product ID Name Value New Column
1 ABC 10 10
2 XYZ 22 10
3 LMM 8 10
I would fetch the value with the row_rumber function, but how i get that value in every row?

You can use the first_value() window function:
select product_id, name, value,
first_value(value) over (order by product_id) as new_column
from the_table
order by product_id;
Rows in a table have no implied sort order. So the "first row" can only be defined when an order by is present.

Assuming you want to pick the first one according to the product ID, you can do:
select *,
( select value
from (select *, row_number() over(order by product_id) as rn from t) x
where rn = 1
) as new_column
from t

Extract column from SQL table based on another column if the same table

I m using POSTGRESQL.
Table of PURCHASES looks like this:
ID | CUSTOMER_ID | YEAR
1 1 2011
2 2 2012
3 2 2012
4 1 2013
5 3 2014
6 3 2014
7 3 2015
I need to extract 'ID' of the purchase with the latest 'date/year' for each CUSTOMER.
For example for CUSTOMER_ID 1 the year s 2013 which correcponds with id '4'.
I need to get ONE column as a return data structure.
PS. i m stuck with this kinda simple task )))

If you want one row per customer, you can use distinct on:
select distinct on (customer_id) id
from purchases
order by customer_id, year desc;
This returns one column which is an id from the most recent year for that customer.

This should work, but doesn't look too pretty...
SELECT DISTINCT ON(CUSTOMER_ID) ID FROM PURCHASES P
WHERE (CUSTOMER_ID,YEAR) =
(SELECT CUSTOMER_ID,MAX(YEAR) FROM PURCHASES WHERE CUSTOMER_ID = P.CUSTOMER_ID
GROUP BY CUSTOMER_ID);
So for input
ID | CUSTOMER_ID | YEAR
1 1 2011
2 2 2012
3 2 2012
4 1 2013
5 3 2014
6 3 2014
7 3 2015
It will return
id
4
2
7
Meaning:
For the lowest CUSTOMER_ID (it is 1) the id is 4 (year 2013)
Next we have CUSTOMER_ID (it is 2) the id is 2 (year 2012)
Lastly the CUSTOMER_ID (it is 3) the id is 7 (year 2015)
The idea behind this:
Group by CUSTOMER_ID
For each group select max(year)
While looping over all records - if Customer_id and year equals those from number 2. then select ID from this record.
Without DISTINCT ON(CUSTOMER_ID) it would return 2 records
for CUSTOMER_ID = 2, because for both years 2012 it would find some records while looping.
If you write in the beginning instead of:
SELECT DISTINCT ON(CUSTOMER_ID) ID FROM PURCHASES P
this code:
SELECT DISTINCT ON(CUSTOMER_ID) * FROM PURCHASES P
then you will see everything clearly.

Use row_number() analytic function with partition by customer_id to select by each customer with descending ordering by year ( if ties occur for year values [e.g. they're equal], then the below query brings the least ID values for each customer_id. e.g. 4, 2, 7 respectively )
WITH P2 AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY CUSTOMER_ID ORDER BY YEAR DESC) AS RN,
*
FROM PURCHASES
)
SELECT ID FROM P2 WHERE RN = 1
Demo

How to get MAX Hike in Min month?

below is table:
Name | Hike% | Month
------------------------
A 7 1
A 6 2
A 8 3
b 4 1
b 7 2
b 7 3
Result should be:
Name | Hike% | Month
------------------------
A 8 3
b 7 2

Here is one way of doing this:
SELECT Name, [Hike%], Month
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY [Hike%] DESC, Month) rn
FROM yourTable
) t
WHERE rn = 1
ORDER BY Name;
If you instead want to return multiple records per name, in the case where two or more records might be tied for having the greatest hike%, then replace ROW_NUMBER with RANK.

use correlated subquery
select Name,min(Hike) as Hike,min(Month) as Month
from
(
select * from tablename a
where Hike in (select max(Hike) from tablename b where a.name=b.name)
)A group by Name

You can use something similar to the below:
SELECT Name, MAX(Hike), Month
FROM table
GROUP BY Name, Month
Hope this helps :)

Limiting output with different criterias

I have the following SQL statement:
select
row_number() over(),
car, group, yearout
from (select..... )inner
where year(inner.yearout) between '2010' and '2030'
order by inner.group)temp
the output is like
1 test1 1 2010
2 test2 1 2010
3 test3 1 2012
4 test1 2 2010
5 test1 3 2011
and so on.
There is another table called outerno with is filled like:
no yearo amnt
1 2010 10
2 2010 15
3 2010 5
4 2010 10
5 2010 15
6 2010 8
1 2011 4
2 2011 15
and so on.
There are 6 groups in the table for each year.
Now the problem is that I need to limit the output of the query as stated in the outerno table.
So I need the first 10 row for 2010 for group 1, the first 15 rows of 2010 for group 2 and so on. For each year and group there is a value in the outerno.
I tried to use row_number but I don't know how to limit the output in this way since I would be needing for example rows 1-10, 50-65, 83-88 and so on.
Any idea on how to do this?
Thanks in advance for all your help.
TheVagabond

You'd use ROW_NUMBER() to give you record numbers per group. Then add a WHERE clause to only get row numbers up to the desired number. In ROW_NUMBER's ORDER BY you can spcify which records to prefer.
select row_number() over (), car, group, yearout
from
(
select
row_number() over (partition by inner.group, inner.yearout order by inner.car) as rn,
inner.car, inner.group, inner.yearout
from (select..... ) inner
where inner.yearout between '2010' and '2030'
order by inner.group
) all_records
where all_records.rn <=
(
select amnt
from outerno
where outerno.year = all_records.yearout
and outerno.no = all_records.group
);
BTW: I wouldn't choose group for a column name, as it is a reserved word in SQL.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How do I "dedup" rows based on most recently updated - sql

You could use a query like this to get the results that you need: SELECT * FROM table WHERE (ID, date) IN (SELECT ID, MAX(Last Update) FROM table GROUP BY ID)

Related

SQL select 1 row out of several rows that have similar values

How to add single value in a new column

Extract column from SQL table based on another column if the same table

How to get MAX Hike in Min month?

Limiting output with different criterias

Categories

Resources