SQL Server: Duplicates but based on specific criteria

SQL Server: Duplicates but based on specific criteria - sql

I am trying to find duplicates based on forename, surname, and dateofbirth in my database. Below is what I have
Customers table:
custid cust_refno forename surname dateofbirth
1 10 David John 10-02-1980
2 20 Peter Broad 15-08-1978
3 30 Sarah Holly 16-09-1982
4 40 Mathew Mark 25-08-2001
5 50 Matt Mark 25-08-2001
Address table:
addid cust_refno addresstype line1
1 10 address No. 10, Mineview Road
2 10 address No. 20, Mineview Lane
3 20 address Rockview cottage, blackthorn
4 30 mobile 0504135864
5 40 address No. 64, New Lane
6 40 mobile 0504896532
7 50 address No. 11, John's cottage
Some customers have multiple addresses, so they are not duplicates. I am trying to find a way to avoid displaying those as duplicates. Can you advice how I can do that?
my query:
SELECT DISTINCT t.FORENAME, t.SURNAME, t.CUST_REFNO, t.DATE_OF_BIRTH , a.LINE1 FROM CUSTOMERS AS t
LEFT OUTER JOIN dbo.ADDRESS a
ON t.CUST_REFNO = a.CUST_REFNO
INNER JOIN (
SELECT FORENAME, surname, DTTM_OF_BIRTH
FROM CUSTOMERS GROUP BY FORENAME, SURNAME, DATE_OF_BIRTH
HAVING COUNT(*) > 1) AS td
ON t.FORENAME = td.FORENAME AND t.DTTM_OF_BIRTH = td.DATE_OF_BIRTH
AND t.SURNAME = td.SURNAME
WHERE a.addresstype = 'address'
my result is:
Forename surname cust_refno dateofbirth line1
David John 10 10-02-1980 No. 10, Mineview Road
David John 10 10-02-1980 No. 20, Mineview Lane
But in reality it is not a duplicate. Its just that the addresses are different. Is there a way to compare the cust_refno and see if it already exists so even if the address is different if the cust_refno is the same it does not show again?

If you want to get the customers with duplicates address, you can count how many times a customer has the same address and return just that with more than one:
SELECT t.FORENAME, t.SURNAME, t.CUST_REFNO, t.DATE_OF_BIRTH , a.LINE1
FROM CUSTOMERS AS t INNER JOIN ADDRESS a ON t.CUST_REFNO = a.CUST_REFNO
GROUP BY t.FORENAME, t.SURNAME, t.CUST_REFNO, t.DATE_OF_BIRTH , a.LINE1
HAVING COUNT(a.LINE1) > 1

You can use window functions to filter out customers with more than one address. Then aggregation can be used to return the duplicates:
select forename, surname, dateofbirth
from customers c join
(select a.*,
count(*) over (partition by cust_refno) as cnt
from addresses a
where addresstype = 'address'
) a
on c.cust_refno = a.cust_refno
where cnt = 1
group by forename, surname, dateofbirth
having count(*) > 1;
If you want the full customer record, just use window functions twice:
select c.*
from (select c.*,
count(*) over (partition by forename, surname, dateofbirth) as cnt
from customers c
) c join
(select a.*,
count(*) over (partition by cust_refno) as cnt
from addresses a
where addresstype = 'address'
) a
on c.cust_refno = a.cust_refno
where a.cnt = 1 and c.cnt > 1;

You can use the analytical function count and row_number as follows:
select * from
(SELECT t.FORENAME, t.SURNAME, t.CUST_REFNO, t.DATE_OF_BIRTH ,
a.LINE1,
row_number() over (partition by t.FORENAME, t.SURNAME, t.DATE_OF_BIRTH
order by 1) as rn,
count(1) over (partition by t.FORENAME, t.SURNAME, t.DATE_OF_BIRTH) as cnt
FROM CUSTOMERS AS t
LEFT OUTER JOIN dbo.ADDRESS a ON t.CUST_REFNO = a.CUST_REFNO
WHERE a.addresstype = 'address') t
where cnt > 1 and rn = 1

Related

SQL Server : finding duplicates based on first few characters on column

I want to find duplicates based on the first three characters of the surname, is there a way a to do that on SQL? I can compare the whole name, but how to do we compare the first few characters?
Below are my tables
custid forename surname dateofbirth
----------------------------------------
1 David John 16-09-1985
2 David Jon 16-09-1985
3 Sarah Smith 10-08-2015
4 Peter Proca 11-06-2011
5 Peter Proka 11-06-2011
This is my query that I am currently running to compare
SELECT
y.id, y.forename, y.surname
FROM
customers y
INNER JOIN
(SELECT
forename, surname, COUNT(*) AS CountOf
FROM customers
GROUP BY forename, surname
HAVING COUNT(*) > 1) dt ON y.forename = dt.forename

You can use left():
select c.*
from (select c.*, count(*) over (partition by left(surname, 3)) as cnt
from customers c
) c
order by surname;
You can include the forename as well in the partition by if you mean forename and first three letters of surname.

You can use exists as follows:
select t.* from t
Where exists
(select 1 from t tt
Where left(t.surname, 3) = left(tt.surname, 3) and t.custid <> tt.custid
)
order by t.surname;

Name of Teacher with Highest Wage - recursive CTE

I am trying to get the max salary of each dept and display that teacher by first name as a separate column. So dept 1 may have 4 rows but one name showing for max salary. I'm Using SQL SERVER
With TeacherList AS(
Select Teachers.FirstName,Teachers.LastName,
Teachers.FacultyID,TeacherID, 1 AS LVL,PrincipalTeacherID AS ManagerID
FROM dbo.Teachers
WHERE PrincipalTeacherID IS NULL
UNION ALL
Select Teachers.FirstName,Teachers.LastName,
Teachers.FacultyID,Teachers.TeacherID, TeacherList.LVL +
1,Teachers.PrincipalTeacherID
FROM dbo.Teachers
INNER JOIN TeacherList ON Teachers.PrincipalTeacherID =
TeacherList.TeacherID
WHERE Teachers.PrincipalTeacherID IS NOT NULL)
SELECT * FROM TeacherList;
SAMPLE OUTPUT :
Teacher First Name | Teacher Last Name | Faculty| Highest Paid In Faculty
Eric Smith 1 Eric
Alex John 1 Eric
Jessica Sewel 1 Eric
Aaron Gaye 2 Aaron
Bob Turf 2 Aaron

I'm not sure from your description but this will return all teachers and the last row is the name of the teacher with the highest pay on the faculty.
select tr.FirstName,
tr.LastName,
tr.FacultyID,
th.FirstName
from Teachers tr
join (
select FacultyID, max(pay) highest_pay
from Teachers
group by FacultyID
) t on tr.FacultyID = t.FacultyID
join Teachers th on th.FacultyID = t.FacultyID and
th.pay = t.highest_pay
this will produce an unexpected result (duplicate rows) if there are more persons with the highest salary on the faculty. In such case you may use window functions as follows:
select tr.FirstName,
tr.LastName,
tr.FacultyID,
t.FirstName
from Teachers tr
join
(
select t.FirstName,
t.FacultyID
from
(
select t.*,
row_number() over (partition by FacultyID order by pay desc) rn
from Teachers t
) t
where t.rn = 1
) t on tr.FacultyID = t.FacultyID
This will display just one random teacher from faculty with highest salary.
dbfiddle demo

You can do this with a CROSS APPLY.
SELECT FirstName, LastName, FacultyID, HighestPaid
FROM Teachers t
CROSS APPLY (SELECT TOP 1 FirstName AS HighestPaid
FROM Teachers
WHERE FacultyID = t.FacultyID
ORDER BY Salary DESC) ca

Find duplicates in two columns in a table

I just browse this post and tried all the scripts but still i am not getting what i am expecting.
Here is my table
Name email
BRIAN MAT BRIAN.MAT#abc.Com
BRIAN MAT BRIAN MAT#abc.Com
AMY Lee AMY.Lee#abc.Com
AMY.Lee AMY.Lee#abc.Com
Madison Taylor Madison.Tyl#abc.com
SELECT Name
FROM Employee
GROUP BY Name
HAVING ( COUNT(Name > 1 )
result
BRIAN MAT
SELECT email
FROM Employee
GROUP BY email
HAVING ( COUNT(email> 1 )
Result
AMY.Lee#abc.Com
I was trying to group this two script but it shows blank
SELECT
Name, email,COUNT(*)
FROM
Employee
GROUP BY
Name, email
HAVING
COUNT(*) > 1
Please correct me what i am missing in my script to acheive the result like below
Name email
BRIAN MAT BRIAN.MAT#abc.Com
BRIAN MAT BRIAN MAT#abc.Com
AMY Lee AMY.Lee#abc.Com
AMY.Lee AMY.Lee#abc.Com

You could use windowed COUNT:
WITH cte AS (
SELECT *,
COUNT(*) OVER(PARTITION BY name) AS c_name,
COUNT(*) OVER(PARTITION BY Email) AS c_email
FROM Employee
)
SELECT name, email
FROM cte
WHERE c_name > 1 OR c_email > 1;

SELECT *
FROM Employee
WHERE Name IN (SELECT Name
FROM Employee
GROUP BY Name
HAVING COUNT(Name > 1)
)
OR Email IN (SELECT email
FROM Employee
GROUP BY email
HAVING COUNT(email> 1)
)

The least complicated. A quick and dirty solution.
SELECT
a.name,
a.email, count(*)
FROM
employee a
INNER JOIN
employee b on b.name = a.name or b.email = a.email
GROUP BY a.name, a.email
HAVING COUNT(*) > 1

oracle display customer who purchased most cars WITHOUT analytic functions

I'm currently trying to answer the following question:
Display the name of the customer who has purchased the most cars from Archie’s Luxury Motors.
Tables I'm working with:
Customer
(custID, name, DOB, streetAddress, suburb, postcode,
gender, phoneNo, email, type)
SalesTransaction
(VIN, custID, agentID, dateOfSale, agreedPrice)
My query
select *
from (
select customer.name
from customer, salestransaction
where customer.custID = salestransaction.custID
group by (customer.name), salestransaction.custID
order by count(*) desc
)
where rownum=1;
Now I've found out that I cannot use analytic functions (rownum & rank)
How would I go about doing this with using pure transactional SQL only?

You could use MAX and COUNT aggregate functions:
WITH data AS
(SELECT c.name cust_nm,
COUNT(*) cnt
FROM customer c,
salestransaction s
WHERE c.custID = s.custID
GROUP BY c.NAME
ORDER BY cnt DESC
)
SELECT cust_nm FROM data WHERE cnt =
(SELECT MAX(cnt) FROM DATA
);
An example from EMP and DEPT table:
SQL> WITH data AS
2 (SELECT e.deptno,
3 COUNT(*) cnt
4 FROM emp e,
5 dept d
6 WHERE e.deptno = d.deptno
7 GROUP BY e.deptno
8 ORDER BY cnt DESC
9 )
10 SELECT deptno FROM DATA WHERE cnt =
11 (SELECT MAX(cnt) FROM DATA
12 );
DEPTNO
----------
30
SQL>

Oracle Sql query Group by Clause

MY_TABLE = Table with 2 columns Number, City.
Desired Output = City and count of unique Number associated to the city. Seattle, Bellevue is part of Combined. Even though there are 4 numbers associated to Seattle, Bellevue the output is 3 as there are only 3 distinct numbers - 123, 456, 786.
MY_TABLE
Number City
123 Seattle
456 Bellevue
789 LosAngeles
780 LosAngeles
123 Bellevue
786 Bellevue
Desired Output:
Combined 3
LosAngeles 2
Query so far:
SELECT NUMBER, CITY FROM MY_TABLE WHERE LOOKUP_ID=100 AND CITY IN
('Seattle', 'Bellevue', 'LosAngeles')
GROUP BY NUMBER, CITY
Would highly appreciate if anyone provides a recommendations around the same.

You could do something like
SELECT (case when city IN ('Seattle', 'Bellevue')
then 'Combined'
else city
end) city,
count( distinct number )
FROM my_table
WHERE lookup_id = 100
AND city IN ('Seattle', 'Bellevue', 'LosAngeles')
GROUP BY (case when city IN ('Seattle', 'Bellevue')
then 'Combined'
else city
end)
Of course, my guess is that you have some other table that tells you which CITY values need to be combined rather than having a hard-coded CASE statement.

with t as (
SELECT (
case when city IN ('Seattle', 'Bellevue')
then 'Combined'
else city
end
) city, number from my_table
)
select city, count(distinct number) from t
group by city
Tell please, if it was useful

Try this:
SELECT
(CASE CITY
WHEN 'Seattle' THEN ‘Combined’
WHEN 'Bellevue' THEN ‘Combined’
ELSE CITY
END), COUNT(*)
FROM
MY_TABLE
WHERE
LOOKUP_ID=100 AND CITY IN ('Seattle', 'Bellevue', 'LosAngeles')
GROUP BY
NUMBER,
(CASE CITY
WHEN 'Seattle' THEN ‘Combined’
WHEN 'Bellevue' THEN ‘Combined’
ELSE CITY
END)
that should do what you asked for, but I suspect that you have some other tables where you define which cities should be considered the same, in such a case you'll need to join on those tables

There are 3 answers already and none of them are generic for more cities.
Try this:
SELECT City, COUNT(Number) AS ExclusiveNumbers
FROM (SELECT q2.City, q2.CityNumCount, b.Number
FROM MY_Table b INNER JOIN
(SELECT c.City, MAX(NumOccurs) AS CityNumCount
FROM My_Table c INNER JOIN
(SELECT Number, COUNT(City) AS NumOccurs
FROM My_Table
GROUP BY Number) q1 ON c.Number = q1.Number
GROUP BY c.City) q2 ON b.City = q2.City) q3
WHERE CityNumCount = 1
GROUP BY City
UNION
SELECT 'Combined', COUNT(DISTINCT Number)
FROM (SELECT q2.City, q2.CityNumCount, b.Number
FROM MY_Table b INNER JOIN
(SELECT c.City, MAX(NumOccurs) AS CityNumCount
FROM My_Table c INNER JOIN
(SELECT Number, COUNT(City) AS NumOccurs
FROM My_Table
GROUP BY Number) q1 ON c.Number = q1.Number
GROUP BY c.City) q2 ON b.City = q2.City) q3
WHERE CityNumCount > 1
The top half of the union works out, for each City name that has no numbers in common with any other city, how many different numbers it has.
The bottom half works out the count of different numbers for cities that do have numbers in common with other cities. These 2 figures will always add up to the count of distinct numbers in the original table.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server: Duplicates but based on specific criteria - sql

Related

SQL Server : finding duplicates based on first few characters on column

Name of Teacher with Highest Wage - recursive CTE

Find duplicates in two columns in a table

oracle display customer who purchased most cars WITHOUT analytic functions

Oracle Sql query Group by Clause

Categories

Resources