Remove Duplicate Values from Table and leave the ID with the most current date - sql

Not a SQL guru, but I need a SQL statement that will return a table of unique values with the most current date, so remove all the duplicate values based on ID and keep the ID with the most current date.
My current SQL statement is this:
Select Bill_To_Merchant_ID, Agreement_Termination_Notification_Date
From STG.Fact_Agreement
Current result:
Bill_To_Merchant_ID Agreement_Termination_Notification_Date
----------------------------------------------------------------
1 01/09/2020
1 03/09/2020
2 05/09/2020
2 07/09/2020
3 06/09/2020
3 16/09/2020
Expected result:
Bill_To_Merchant_ID Agreement_Termination_Notification_Date
----------------------------------------------------------------
1 03/09/2020
2 07/09/2020
3 16/09/2020
If there are no duplicates, that record remains in the result set.

If you want to actually delete the not most recent records, then use:
DELETE
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.Bill_To_Merchant_ID = t2.Bill_To_Merchant_ID AND
t2.Agreement_Termination_Notification_Date > t1.Agreement_Termination_Notification_Date);
If you instead can tolerate just viewing your data as in the expected output, then aggregation should work:
SELECT
Bill_To_Merchant_ID,
MAX(Agreement_Termination_Notification_Date) AS Agreement_Termination_Notification_Date
FROM yourTable
GROUP BY
Bill_To_Merchant_ID;

I think you just want group by:
select Bill_To_Merchant_ID, max(Agreement_Termination_Notification_Date)
from STG.Fact_Agreement
group by Bill_To_Merchant_ID;

Related

SQL Group By and Order -- retrieve detail for most recent entries in table

I want to Make a SQL Query that returns the newest Entry for each bot_id.
My current request looks like this, but it ALWAYS returns the first Entry. DESC & ASC don't make any difference:
SELECT bot_id, id
FROM t_request_history
GROUP BY bot_id
ORDER BY request_time DESC
The Tables looks like this:
t_request_history
id
bot_id
request
response
error
request_time
1
usr_e74ae42b-080c-48e0-9e6c
a
a
0
2021-09-16 23:37:10
2
usr_e74ae42b-080c-48e0-9e6c
a
a
1
2021-09-16 23:37:35
3
usr_e74ae42b-080c-48e0-9e6c
a
a
1
2021-09-16 23:43:20
4
delete
1
1
1
2021-09-16 23:44:21
5
delete
1
1
0
2021-09-16 23:44:32
6
delete
1
1
0
2021-09-16 23:44:41
Wanted Result
bot_id
id
delete
6
usr_e74ae42b-080c-48e0-9e6c
3
Actual Result
bot_id
id
delete
4
usr_e74ae42b-080c-48e0-9e6c
1
Is there any way to make this query work?
It looks like your id values go up with time. That is, it looks like newer entries in your table have higher id values than older entries. If this is true,
SELECT bot_id, MAX(id) id
FROM t_request_history
GROUP BY bot_id
gets you what you want.
If the id values don't go up with time, you have to use a subquery to find the latest time for each bot_id.
SELECT bot_id, MAX(request_time) request_time
FROM t_request_history
GROUP BY bot_id
Then you join that subquery to your table like this:
SELECT a.bot_id, a.id
FROM t_request_history a
JOIN (
SELECT bot_id, MAX(request_time) request_time
FROM t_request_history
GROUP BY bot_id
) b ON a.bot_id = b.bot_id
AND a.request_time = b.request_time
The ON condition of the JOIN chooses just the rows with the latest times from your table.
You could do this using a Window/Analytic function
SELECT distinct
bot_id, MAX(request_time) OVER(PARTITION BY bot_id)
FROM t_request_history
;
SQLFiddle
You could also find more details on the usage # MariaDB WindowFunction Max documentation

update in oracle sql : multiple rows in 1 table

I am new to SQL and I am no good with more advanced queries and functions.
So, I have this 1 table with sales:
id date seller_name buyer_name
---- ------------ ------------- ------------
1 2015-02-02 null Adrian
1 2013-05-02 null John B
1 2007-11-15 null Chris F
2 2014-07-12 null Jane A
2 2011-06-05 null Ted D
2 2010-08-22 null Maryanne A
3 2015-12-02 null Don P
3 2012-11-07 null Chris T
3 2011-10-02 null James O
I would like to update the seller_name for each id, by putting the buyer_name from previous sale as seller_name to newer sale date. For example, for on id 1 John B would then be seller in 2015-02-02 and buyer in 2013-05-02. Does that make sense?
P.S. This is the perfect case, the table is big and the ids are not ordered so neat.
merge into your_table a
using ( select rowid rid,
lead(buyer_name, 1) over (partition by id order by date desc) seller
from your_table
) b
on (a.rowid = b.rid )
when matched then update set a.seller_name= b.seller;
Explanation : Merge into statement performs different operations based on matched or not matched criterias. Here you have to merge into your table, in the using having the new values that you want to take and also the rowid which will be your matching key. The lead function gets the result from the next n rows depending on what number you specify after the comma. After specifying how many rows to jump you also specify on what part to work, which in your case is partitioned by id and ordered by date so you can get the seller, who was the previous buyer. Hope this clears it up a bit.
Either of the below query can be used to perform the desire action
merge into sandeep24nov16_2 table1
using(select rowid r, lag(buyer_name) over (partition by id order by "DATE" asc) update_value from sandeep24nov16_2 ) table2
on (table1.rowid=table2.r)
when matched then update set table1.seller_name=table2.update_value;
or
merge into sandeep24nov16_2 table1
using(select rowid r, lead(buyer_name) over (partition by id order by "DATE" desc) update_value from sandeep24nov16_2 ) table2
on (table1.rowid=table2.r)
when matched then update set table1.seller_name=table2.update_value;
select a.*,
lag(buyer_name, 1) over(partition by id order by sale_date) seller_name
from <your_table> a;

Using GROUP BY, select ID of record in each group that has lowest ID

I am creating a file orginization system where you can add content items to multiple folders.
I am storing the data in a table that has a structure similar to the following:
ID TypeID ContentID FolderID
1 101 1001 1
2 101 1001 2
3 102 1002 3
4 103 1002 2
5 103 1002 1
6 104 1001 1
7 105 1005 2
I am trying to select the first record for each unique TypeID and ContentID pair. For the above table, I would want the results to be:
ID
1
3
4
6
7
As you can see, the pairs 101 1001 and 103 1002 were each added to two folders, yet I only want the record with the first folder they were added to.
When I try the following query, however, I only get result that have at least two entries with the same TypeID and ContentID:
select MIN(ID)
from table
group by TypeID, ContentID
results in
ID
1
4
If I change MIN(ID) to MAX(ID) I get the correct amount of results, yet I get the record with the last folder they were added to and not the first folder:
ID
2
3
5
6
7
Am I using GROUP BY or the MIN wrong? Is there another way that I can accomplish this task of selecting the first record of each TypeID ContentID pair?
MIN() and MAX() should return the same amount of rows. Changing the function should not change the number of rows returned in the query.
Is this query part of a larger query? From looking at the sample data provided, I would assume that this code is only a snippet from a larger action you are trying to do. Do you later try to join TypeID, ContentID or FolderID with the tables the IDs are referencing?
If yes, this error is likely being caused by another part of your query and not this select statement. If you are using joins or multi-level select statements, you can get different amount of results if the reference tables do not contain a record for all the foreign IDs.
Another suggestion, check to see if any of the values in your records are NULL. Although this should not affect the GROUP BY, I have sometime encountered strange behavior when dealing with NULL values.
Use ROW_NUMBER
WITH CTE AS
(SELECT ID,TypeID,ContentID,FolderID,
ROW_NUMBER() OVER (PARTITION BY TypeID,ContentID ORDER BY ID) as rn FROM t
)
SELECT ID FROM CTE WHERE rn=1
Use it with ORDER BY:
select *
from table
group by TypeID, ContentID
order by id
SQLFiddle: http://sqlfiddle.com/#!9/024016/12
Try with first ( id) instead of min(id)
select first(id)
from table
group by TypeID, ContentID
It works ?

Eliminate NULL records in distinct select statement

In SQL SERVER 2008
Relation : Employee
empid clock-in clock-out date Cmpid
1 10 11 17-06-2015 001
1 11 12 17-06-2015 NULL
1 12 1 NULL 001
2 10 11 NULL 002
2 11 12 NULL 002
I need to populate table temp :
insert into temp
select distinct empid,date from employee
This gives all
3 records since they are distinct but what
I need is
empid date CMPID
1 17-06-2015 001
2 NULL 002
Depending on the size and scope of your table, it might just be more prudent to add
WHERE columnName is not null AND columnName2 is not null to the end of your query.
Null is different from other date value. If you wont exclude null record you have to add a and condition like table.filed is not null.
It sounds like what you want is a result table containing a row or tuple (relational databases don't have records) for every employee with a date column showing the date on which the worked or null if they didn't work. Right?
Something like this should do you:
select e.employee_id
from ( select distinct
empid
from employee
) master
left join employee detail on detail.empid = master.empid
and detail.date is not null
The master virtual table gives you the set of destinct employees; the detail gives you employees with non-null dates on which they worked. The left join gives you everything from master with any matches from detail blended in.
Rows in master with no matching rows in details, are returned once with the contributing columns from detail set to null. Rows in master with matching rows in detailare repeated once for each such match, with the detail columns reflecting the matching row's values.
This will give you the lowest date or null for each empid
SELECT empid,
MIN(date) date,
MIN(cmpid) cmpid
FROM employee
GROUP BY empid
try this
select distinct empid,date from employee where date is not null

Trouble performing Postgres group by non-ID column to get ID containing max value

I'm attempting to perform a GROUP BY on a join table table. The join table essentially looks like:
CREATE TABLE user_foos (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL,
foo_id INT NOT NULL,
effective_at DATETIME NOT NULL
);
ALTER TABLE user_foos
ADD CONSTRAINT user_foos_uniqueness
UNIQUE (user_id, foo_id, effective_at);
I'd like to query this table to find all records where the effective_at is the max value for any pair of user_id, foo_id given. I've tried the following:
SELECT "user_foos"."id",
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
Unfortunately, this results in the error:
column "user_foos.id" must appear in the GROUP BY clause or be used in an aggregate function
I understand that the problem relates to "id" not being used in an aggregate function and that the DB doesn't know what to do if it finds multiple records with differing ID's, but I know this could never happen due to my trinary primary key across those columns (user_id, foo_id, and effective_at).
To work around this, I also tried a number of other variants such as using the first_value window function on the id:
SELECT first_value("user_foos"."id"),
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
and:
SELECT first_value("user_foos"."id")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id"
HAVING "user_foos"."effective_at" = max("user_foos"."effective_at")
Unfortunately, these both result in a different error:
window function call requires an OVER clause
Ideally, my goal is to fetch ALL matching id's so that I can use it in a subquery to fetch the legitimate full row data from this table for matching records. Can anyone provide insight on how I can get this working?
Postgres has a very nice feature called distinct on, which can be used in this case:
SELECT DISTINCT ON (uf."user_id", uf."foo_id") uf.*
FROM "user_foos" uf
ORDER BY uf."user_id", uf."foo_id", uf."effective_at" DESC;
It returns the first row in a group, based on the values in parentheses. The order by clause needs to include these values as well as a third column for determining which is the first row in the group.
Try:
SELECT *
FROM (
SELECT t.*,
row_number() OVER( partition by user_id, foo_id ORDER BY effective_at DESC ) x
FROM user_foos t
)
WHERE x = 1
If you don't want to use a sub query based on a composite of all three keys then you need to create a "dense rank" window function field that orders subsets of id, user_id and foo_id by effective date with the rank order field. Then subquery that and take the records where rank_order=1. Since the rank ordering was by effective date you are getting all fields of the record with the highest effective date for each foo and user.
DATSET
1 1 1 01/01/2001
2 1 1 01/01/2002
3 1 1 01/01/2003
4 1 2 01/01/2001
5 2 1 01/01/2001
DATSET WITH RANK ORDER PARTITIONED BY FOO_ID, USER_ID ORDERED BY DATE DESC
1 3 1 1 01/01/2001
2 2 1 1 01/01/2002
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001
SELECT * FROM QUERY ABOVE WHERE RANK_ORDER=1
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001