Access query: Delete duplicated records based on max value

Access query: Delete duplicated records based on max value - sql

I have the following table structure :
I would like to be able to delete from the table the duplicated mails, leaving for each mail account only the one with the highest quality score. At the moment I have come up with the following SQL code :
DELETE *
FROM Table
WHERE ( Table.[Email Adress] & Table.[Quality Score] ) NOT IN
(
SELECT (Table.[Email Adress] & Max(Table.[Quality Score])
FROM Table
GROUP BY Table.[Email Adress]
);
However when I run it, it asks me for a parameter value and clearly doesn't work as I intended.
Do you have any solution?

You can simplify your query to this:
DELETE FROM Table AS t
WHERE t.[Quality Score] <> (
SELECT Max([Quality Score])
FROM Table
WHERE [Email Adress] = t.[Email Adress]
);
No need to GROUP BY [Email Adress] but you need a WHERE clause.
Or with EXISTS:
DELETE FROM Table AS t
WHERE EXISTS (
SELECT 1 FROM Table
WHERE [Email Adress] = t.[Email Adress] AND [Quality Score] > t.[Quality Score]
);
In case there are duplicate scores then you can keep the row with the highest score and the lowest id like this:
DELETE FROM Table AS t
WHERE EXISTS (
SELECT 1 FROM Table
WHERE [Email Adress] = t.[Email Adress]
AND ([Quality Score] > t.[Quality Score] OR ([Quality Score] = t.[Quality Score] AND id < t.id))
);

One method uses a correlated subquery:
delete from t
where t.quality_score < (select max(t2.quality_score)
from t as t2
where t2.email_address = t.email_address
);
Note: If you have duplicate highest scores, this keeps all of them. To address, you can use the id column:
delete from t
where t.id <> (select top 1 t2.id
from t as t2
where t2.email_address = t.email_address
order by t2.quality_score desc, id
);

Related

SQL Create a duplicate row for each additional count

I have a table with a many to many relationship, in which I need to make a 1 to 1 without modifying the schema. Here is the pseudo code:
Reports {
Id INT,
Description NVARCHAR(256),
ReportFields...
}
ScheduledReports {
ScheduledReportId INT
ReportId INT (FK)
Frequency INT
}
When I run this query:
SELECT [ReportID], COUNT(*) as NumberOfReports
FROM [ScheduledReports]
GROUP BY ReportId
HAVING COUNT(*) > 1
I get return the results of all the reports who have duplicates.
ReportId, NumberOfReports
1, 2
2, 4
Foreach additional report (e.g NumberOfReports -1).
I need to create a duplicate row in the Reports table. However I'm having trouble on figuring out how to turn the count into a join (since I don't want to use cursors).
Here is my query:
INSERT INTO Reports (Description)
SELECT Description
FROM Reports
WHERE ReportId IN (SELECT [ReportID]
FROM [ScheduledReports]
GROUP BY ReportId
HAVING COUNT(*) > 1)
How do I Join the ReportRow on itself for Count(*) -1 times?

The below query should get you a sequencing of the schedules per unique report. You can then use the sequencing > 1 to determine which values will need to be inserted to your report table. Output of this select should probably be cached, since it will
Indicate which rows need to be added to your Reports by their current ID
Can be used to later update the referenced ReportID in your schedules table
SELECT *
FROM (
SELECT Reports.Id
,ScheduledReportId
,ROW_NUMBER() OVER (
PARTITION BY ReportId
ORDER BY ScheduledReportId
) AS [Sequencing]
FROM Reports
INNER JOIN ScheduledReports on ScheduledReports.ReportId = Reports.Id
WHERE ReportId IN (SELECT [ReportID]
FROM [ScheduledReports]
GROUP BY ReportId
HAVING COUNT(*) > 1)) AS SequencedReportAndSchedules

How remove from user database type duplicate records with id 221

I have the following stored procedure:
ALTER PROCEDURE [dbo].[sp_ImportSurveys]
#surveys udtSurveys readonly
AS
BEGIN
INSERT INTO Surveys
(Funzione,
Id_Intervento,
Titolo_Intervento,
Titolo_Rilievo,
ImportDownloadDate,
Oggetto_Valutato,
Id_Oggetto_Valutato,
Id,
Id_Banca,
Cod_ABI,
Legal_Entity,
Title,
Descrizione_Rilievo,
Azione_di_Mitigazione,
Owner_Azione_di_Mitigazione,
Utente_Censimento,
Severita_Rilievo,
Data_Scadenza,
Anno,
StatusId)
SELECT Funzione,
Id_Intervento,
Titolo_Intervento,
Titolo_Rilievo,
DataDownload,
Oggetto_Valutato,
Id_Oggetto_Valutato,
CONVERT(nvarchar(450), Id) + Funzione,
Id_Banca,
Cod_ABI,
Legal_Entity,
Titolo_Rilievo,
Descrizione_Rilievo,
Azione_di_Mitigazione,
Owner_Azione_di_Mitigazione,
Utente_Censimento,
Severita_Rilievo,
Data_Scadenza,
Anno,
2
FROM #surveys sur
WHERE NOT EXISTS (Select * from dbo.Surveys WHERE dbo.Surveys.Id = (CONVERT(nvarchar(450), sur.Id) + Funzione))
END
udtSurveys is used like params by stored procedure.
Before inserting records into table surveys I need to remove all duplicate column with Id from udtSurveys.
Would you please show me an example of how use group by or another way to remove duplicated records before inserting to table?

You can simply use a CTE to filter all the duplicate rows from #surveys parameter.
I've updated your query with a cte_tbl by assuming you want to keep the original rows and remove its duplicates.
ALTER PROCEDURE [dbo].[sp_ImportSurveys]
#surveys udtSurveys readonly
AS
BEGIN
;WITH cte_tbl AS (
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Funzione)
FROM #surveys sur
WHERE NOT EXISTS ( SELECT 1
FROM dbo.Surveys
WHERE dbo.Surveys.Id = (CONVERT(nvarchar(450), sur.Id) + Funzione))
)
INSERT INTO Surveys
(Funzione,
Id_Intervento,
Titolo_Intervento,
Titolo_Rilievo,
ImportDownloadDate,
Oggetto_Valutato,
Id_Oggetto_Valutato,
Id,
Id_Banca,
Cod_ABI,
Legal_Entity,
Title,
Descrizione_Rilievo,
Azione_di_Mitigazione,
Owner_Azione_di_Mitigazione,
Utente_Censimento,
Severita_Rilievo,
Data_Scadenza,
Anno,
StatusId)
SELECT Funzione,
Id_Intervento,
Titolo_Intervento,
Titolo_Rilievo,
DataDownload,
Oggetto_Valutato,
Id_Oggetto_Valutato,
CONVERT(nvarchar(450), Id) + Funzione,
Id_Banca,
Cod_ABI,
Legal_Entity,
Titolo_Rilievo,
Descrizione_Rilievo,
Azione_di_Mitigazione,
Owner_Azione_di_Mitigazione,
Utente_Censimento,
Severita_Rilievo,
Data_Scadenza,
Anno,
2
FROM cte_tbl
WHERE RN = 1 -- will only fetch the distinct id-rows
END

One way is to nest the query which gets the duplicate record like. Then inner select gets you the id if there is there are more than 1 record.
declare #id varchar = 'ABC'
delete from [dbo].[TABLE_NAME]
where id in (Select id from [TABLE_NAME]
where [id] = #id
group by id
having count(*) > 1
)

Query to determine cumulative changes to records

Given the following table containing the example rows, I’m looking for a query to give me the aggregate results of changes made to the same record. All changes are made against a base record in another table (results table), so the contents of the results table are not cumulative.
Base Records (from which all changes are made)
Edited Columns highlighted
I’m looking for a query that would give me the cumulative changes (in order by date). This would be the resulting rows:
Any help appreciated!
UPDATE---------------
Let me offer some clarification. The records being edited exist in one table, let's call that [dbo].[Base]. When a person updates a record from [dbo].[Base], his updates go into [dbo].[Updates]. Therefore, a person is always editing from the base table.
At some point, let's say once a day, we need to calculate the sum of changes with the following rule:
For any given record, determine the latest change for each column and take the latest change. If no change was made to a column, take the value from [dbo].[Base]. So, one way of looking at the [dbo].[Updates] table would be to see only the changed columns.
Please let's not discuss the merits of this approach, I realize it's strange. I just need to figure out how to determine the final state of each record.
Thanks!

This is dirty, but you can give this a shot (test here: https://rextester.com/MKSBU15593)
I use a CTE to do an initial CROSS JOIN of the Base and Update tables and then a second to filter it to only the rows where the IDs match. From there I use FIRST_VALUE() for each column, partitioned by the ID value and ordered by a CASE expression (if the Base column value matches the Update column value then 1 else 0) and the Datemodified column to get the most recent version of the each column.
It spits out
CREATE TABLE Base
(
ID INT
,FNAME VARCHAR(100)
,LNAME VARCHAR(100)
,ADDRESS VARCHAR(100)
,RATING INT
,[TYPE] VARCHAR(5)
,SUBTYPE VARCHAR(5)
);
INSERT INTO dbo.Base
VALUES
( 100,'John','Doe','123 First',3,'Emp','W2'),
( 200,'Jane','Smith','Wacker Dr.',2,'Emp','W2');
CREATE TABLE Updates
(
ID INT
,DATEMODIFIED DATE
,FNAME VARCHAR(100)
,LNAME VARCHAR(100)
,ADDRESS VARCHAR(100)
,RATING INT
,[TYPE] VARCHAR(5)
,SUBTYPE VARCHAR(5)
);
INSERT INTO dbo.Updates
VALUES
( 100,'1/15/2019','John','Doe','123 First St.',3,'Emp','W2'),
( 200,'1/15/2019','Jane','Smyth','Wacker Dr.',2,'Emp','W2'),
( 100,'1/17/2019','Johnny','Doe','123 First',3,'Emp','W2'),
( 200,'1/19/2019','Jane','Smith','2 Wacker Dr.',2,'Emp','W2'),
( 100,'1/20/2019','Jon','Doe','123 First',3,'Cont','W2');
WITH merged AS
(
SELECT b.ID AS IDOrigin
,'1/1/1900' AS DATEMODIFIEDOrigin
,b.FNAME AS FNAMEOrigin
,b.LNAME AS LNAMEOrigin
,b.ADDRESS AS ADDRESSOrigin
,b.RATING AS RATINGOrigin
,b.[TYPE] AS TYPEOrigin
,b.SUBTYPE AS SUBTYPEOrigin
,u.*
FROM base b
CROSS JOIN
dbo.Updates u
), filtered AS
(
SELECT *
FROM merged
WHERE IDOrigin = ID
)
SELECT distinct
ID
,FNAME = FIRST_VALUE(FNAME) OVER (PARTITION BY ID ORDER BY CASE WHEN FNAME = FNAMEOrigin THEN 1 ELSE 0 end, datemodified desc)
,LNAME = FIRST_VALUE(LNAME) OVER (PARTITION BY ID ORDER BY CASE WHEN LNAME = LNAMEOrigin THEN 1 ELSE 0 end, datemodified desc)
,ADDRESS = FIRST_VALUE(ADDRESS) OVER (PARTITION BY ID ORDER BY CASE WHEN ADDRESS = ADDRESSOrigin THEN 1 ELSE 0 end, datemodified desc)
,RATING = FIRST_VALUE(RATING) OVER (PARTITION BY ID ORDER BY CASE WHEN RATING = RATINGOrigin THEN 1 ELSE 0 end, datemodified desc)
,[TYPE] = FIRST_VALUE([TYPE]) OVER (PARTITION BY ID ORDER BY CASE WHEN [TYPE] = TYPEOrigin THEN 1 ELSE 0 end, datemodified desc)
,SUBTYPE = FIRST_VALUE(SUBTYPE) OVER (PARTITION BY ID ORDER BY CASE WHEN SUBTYPE = SUBTYPEOrigin THEN 1 ELSE 0 end, datemodified desc)
FROM filtered

Don't you just want the last record?
select e.*
from edited e
where e.datemodified = (select max(e2.datemodified)
from edited e2
where e2.id = e.id
);

Find the rows that has the same column

I want to know how to do the following in SQL :
SELECT *
FROM table_A
WHERE id IN(:myValues)
AND other_colum has the same value
For example, if i've a conversation table(iduser,idconversation), I want SQL query that returns some of Ids that have the same conversation id. It should return
35;105
37;105
35;106
37;106
With 35,37 the idUsers and 105,106 the conversations they have in common.
To go further, i work with Doctrine and PostegreSQL, and the table that I want to query is generated (many to many relation) but i've difficulty to integrate sub-query.
**public function getAllCommonConversationByUserId($ids)
{
return $this->createQueryBuilder('c')
->select('c.id')
->innerJoin('c.idUser', 'recievedConversation')
->where('recievedConversation IN (:ids)')
->andWhere('$qb->expr()->eq("SELECT id FROM table GROUP BY(id) HAVING COUNT(*) >1")')
->setParameter(':ids', $ids)
->getQuery()
->getResult();
}**

Just:
SELECT *
FROM table_A
WHERE idconversation in ('105','106') and iduser in ('35','37')
UPDATE:
Are you saying if the idconversation is duplicate? (showing multiple times?)
If so:
Select *
From table
where idconversation in
(
Select idconversation
From table
group by (idconversation)
Having count(*) >1
)
--where iduser in ('35','37')

Try to do this:
select id, conversation
from [your table name]
where
conversation in (
select conversation
from [your table name]
where id in (35)
)
It return all of the participants of the conversation with user id = 35
If you have duplicates in your table, please add distinct to select statement.

You can get the conversations using group by and having:
SELECT conversationid
FROM table_A
WHERE userid in (35, 37)
GROUP BY userid
HAVING count(distinct userid) = 2;
If you want the original rows, you can join back to the original table.

How to insert using different table based on condition in same query

I am merging data in one table from tables of 2 database. Structure is as per below:
Table in new Database :
User Table : {UserName,Email}
Table in Database1 :
User Table : {UserName,Email,LastLogin}
Table in Database2 :
User Table : {UserName,Email,LastLogin}
Now i need to write query that if Email address are same in 2 tables from database 1 and database2 then we need to insert record where LastLogin is latest.
Can someone suggest over this.

I think you are in need of this.. :)
Try modifying it accordingly..
declare #Email_1 nvarchar(100),#Email_2 nvarchar(100),#UserName nvarchar(100),#Lastlogin_1 datetime,#Lastlogin_2 datetime,#loop int=0
use [Database1]
while #loop != (select count(Distinct Email) from [User Table])
BEGIN
use [Database1]
set #Email_1 = (select Distinct Email from [User Table] order by email asc offset #loop rows fetch next 1 rows only)
set #LastLogin_1 = (select top 1 max(LastLogin) from [User Table] where email=#Email_1)
use [Database2]
set #Email_2 = (select top 1 Email from [User Table] where Email like '%#Email_1%')
set #LastLogin_2 = (select top 1 max(LastLogin) from [User Table] where email=#Email_2)
if #email_1=#email_2
BEGIN
if #LastLogin_1>#LastLogin_2
BEGIN
use [Database_1]
set #username = (select top 1 Username from [user table] where email=#email_1 and lastlogin=#Lastlogin_1)
use [New Database]
insert into [User Table]
select #username,#email_1
END
else if #LastLogin_1<#LastLogin_2
BEGIN
use [Database_2]
set #username = (select top 1 Username from [user table] where email=#email_2 and lastlogin=#Lastlogin_2)
use [New Database]
insert into [User Table]
select #username,#email_1
use [Database1]
END
END
set #loop=#loop+1
END

My following code are assuming all tables are in one database, this is more for demo convenience purpose. In real world, when you have tables in different databases, then you need to use 3-part naming convention, i.e.
[DB].[Schema].[Table]
Also I am testing in sql server environment.
use tempdb
drop table dbo.merge_tbl, dbo.t1, dbo.t2;
create table dbo.merge_tbl (username varchar(30), email varchar(30));
create table dbo.t1 (username varchar(30), email varchar(30), lastlogin datetime)
create table dbo.t2 (username varchar(30), email varchar(30), lastlogin datetime)
go
-- insert a few records to tables
insert into dbo.t1 (username, email, lastlogin)
values ('james1', 'j#a.com', '20161001'), ('jenny1', 'j2#b.com', '20161002'), ('jeffrey1', 'j3#c.com', '20150101')
insert into dbo.t2(username, email, lastlogin)
values ('james2', 'j#a.com', '20161006'), ('jenny2', 'j2#b.com', '20151002'), ('jeffrey2', 'j4#c.com', '20170101')
go
-- this is the insert statement
insert into dbo.merge_tbl (username, email)
select case when t1.lastlogin >= t2.lastlogin then t1.username else t2.username end
, case when t1.lastlogin >= t2.lastlogin then t1.email else t2.email end
from dbo.t1
inner join dbo.t2
on t1.email = t2.email;
go
-- check result
select * from dbo.merge_tbl
Here is the result

follow 3 part naming convention([Database Name].[Schema].[Table name]) while accessing your table in different database. Also there may be multiple log in entries for a single user in the table [User Table] ,so use grouping function in such scenario.
If you are using SQL Server,use the below script for achieving the result.
INSERT INTO [New Database].dbo.[User Table] (UserName,Email)
SELECT CASE WHEN MAX(a.LastLogin)>=MAX(b.LastLogin) THEN a.[UserName] ELSE b.[UserName] END [UserName],a.Email
FROM Database1.dbo.[User Table] a
JOIN Database2.dbo.[User Table] b
ON a.Email=b.Email
GROUP BY a.[UserName],b.[UserName],a.Email

Assuming the three database db0, db1 and db2. For merging data in database db0 run the following query:
insert into db0.user('name','email')
select name, email from (
select db1.name,db1.email,
case
when db1.user.lastlogin >= db2.user.lastlogin
then db1.user.lastlogin
else db2.user.lastlogin
end as lastlogin
from db1.user,db2.user
where db1.user.email = db2.user.email ) as a
UPDATE
insert into db0.user(all columns)
select * from (
select "apply case for some columns" from db1.user,db2.user
where "apply your condition" ) as a
consider the following facts:
1. In this above query we are doing cross multiplication between db1.user and db2.user by which we get all possible values from these 2 tables
If you want to apply condition in "where" clause for email(for example db1.user.email = db2.user.email) then you can write db1.user.email or db2.user.email after "select", because both values are same.
If you want to apply >= or <= condition you have to apply "case" after "select".Because you have to fetch any one value from 2 tables
For example you want to fetch lastlogin data, for this condition I have applied case in lastlogin column. By this condition you will get column value from either db1.user or db2.user.
conditions in where clause can be written in case form after "select" also.
Please have a look on the UPDATE portion.
And I went for prayer, that is why this late reply.
If it helps mark as right answer.
Thank you

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Access query: Delete duplicated records based on max value - sql

Related

SQL Create a duplicate row for each additional count

How remove from user database type duplicate records with id 221

Query to determine cumulative changes to records

Find the rows that has the same column

How to insert using different table based on condition in same query

Categories

Resources