SQL Server: finding duplicates between two parameters in different tables

SQL Server: finding duplicates between two parameters in different tables - sql

I am trying to find duplicates in SQL Server where customers with the same forename, surname, and mobile number match. The thing is they are in different tables.
custid forename surname dateofbirth
-----------------------------------
1 David John 16-09-1985
2 David Jon 16-09-1985
3 Sarah Smith 10-08-2015
4 Peter Proca 11-06-2011
5 Peter Broca 11-06-2011
addid custid line1
-------------------------
1 1 0504135846
2 2 0504135846
3 3 0506523145
4 4 0503698521
5 5 0503698521
I am currently able to find duplicates by forename and surname, but if I want to find based on mobile numbers how can I bring it in?
select c.*
from
(select
c.*,
count(*) over (partition by left(surname, 3)) as cnt
from
customers c) c
order by
surname;

Use join:
select c.*
from (select c.*, t2.line1
count(*) over (partition by surname, forename, line1) as cnt
from customers c join
table2 t2
on t2.custid = c.custid
) c
order by surname;

Here you go, just JOIN on the table. Using HAVING might simplify your query as well
SELECT COUNT(*), c.forename, c.surname, mn.line1
FROM customers c
INNER JOIN mobilenumber mn ON c.custid=mn.custid
GROUP BY c.forename, c.surname, mn.line1
HAVING COUNT(*)>1
Also, you might need to LEFT JOIN if there is a chance that some records wont be in the mobilenumbers table.

Related

Displaying columns as rows SQL

I have the below tables:
Corporate table:
CorporateId DirectorId ManagerId SalesId
1 1 1 1
2 2 2 3
3 3 4 5
Employee table:
EmployeeId FirstName LastName
1 Tim Sarah
2 Tom Paulsen
3 Tam Margo
4 Eli Lot
5 Ziva Lit
I want to display, for one corporate,the names of the Director, Manager and Sales in rows. Example with corporate 3:
EmployeeId FirstName LastName
3 Tam Margo
4 Eli Lot
5 Ziva Lit
How can I do that? I know how to display rows as columns using pivot, but unsure if pivot can be used here also.
Any help please?

You may join the two tables as the following:
SELECT E.EmployeeId, E.FirstName, E.LastName
FROM Employee E JOIN Corporate C
ON E.EmployeeID IN (C.DirectorId ,C.ManagerId ,C.SalesId)
WHERE C.CorporateId=3
See a demo.

You can first un-pivot your rows into columns by using cross apply, after which you simply join the pivoted rows to your employee table:
select e.*
from corporate c
cross apply (
select EmployeeId from (
values (Directorid), (ManagerId), (SalesId)
)r(EmployeeId)
)r
join employee e on e.EmployeeId = r.EmployeeId
where c.CorporateId = 3;

SQL MAX aggregate function not bringing the latest date

Purpose: I am trying to find the max date of when the teachers made a purchase and type.
Orders table
ID
Ordertype
Status
TeacherID
PurchaseDate
SchoolID
TeacherassistantID
1
Pencils
Completed
1
1/1/2021
1
1
2
Paper
Completed
1
3/5/2021
1
1
3
Notebooks
Completed
1
4/1/2021
1
1
4
Erasers
Completed
2
2/1/2021
2
2
Teachers table
TeacherID
Teachername
1
Mary Smith
2
Jason Crane
School table
ID
schoolname
1
ABC school
2
PS1
3
PS2
Here is my attempted code:
SELECT o.ordertype, o.status, t.Teachername, s.schoolname
,MAX(o.Purchasedate) OVER (PARTITION by t.ID) last_purchase
FROM orders o
INNER JOIN teachers t ON t.ID=o.TeacherID
INNER JOIN schools s ON s.ID=o.schoolID
WHERE o.status in ('Completed','In-progress')
AND o.ordertype not like 'notebook'
It should look like this:
Ordertype
Status
teachername
last_purchase
schoolname
Paper
Completed
Mary Smith
3/5/2021
ABC School
Erasers
Completed
PS1
2/1/2021
ABC school
It is bringing multiple rows instead of just the latest purchase date and its associated rows. I think i need a subquery.

Aggregation functions are not appropriate for what you are trying to do. Their purpose is to summarize values in multiple rows, not to choose a particular row.
Just a window function does not filter any rows.
You want to use window functions with filtering:
SELECT ordertype, status, Teachername, schoolname, Purchasedate
FROM (SELECT o.ordertype, o.status, t.Teachername, s.schoolname,
o.Purchasedate,
ROW_NUMBER() OVER (PARTITION by t.ID ORDER BY o.PurchaseDate DESC) as seqnum
FROM orders o JOIN
teachers t
ON t.ID = o.TeacherID
schools s
ON s.ID = o.schoolID
WHERE o.status in ('Completed', 'In-progress') AND
o.ordertype not like 'notebook'
) o
WHERE seqnum = 1;

You can use it in different way. it's better to use Group By for grouping the other columns and after that use Order by for reorder all records just like bellow.
SELECT top 1 o.ordertype, o.status, t.Teachername, s.schoolname
,o.Purchasedate
FROM orders o
INNER JOIN teachers t ON t.ID=o.TeacherID
INNER JOIN schools s ON s.ID=o.schoolID
having o.status in ('Completed','In-progress')
AND o.ordertype not like 'notebook'
group by o.ordertype, o.status, t.Teachername, s.schoolname
order by o.Purchasedate Desc

Case Statement With Multiple Joins

I have two tables, emp and location. I need to fetch the records for all the matching eid s' of emp table based on location type.
If the location type=2 then we need to fetch the city associated with it.
If we don't have type=2 record we need to fetch type=1 associated city for the matching eid.
My case statement works fine until there are two records for the eid of both type 1 and type 2. But I need to fetch only type 2 in this case
select case when a.type=2 then a.city
When a.type=1 then a.city
Else '0' End As City
From location a
Join emp r
On a.eid=r.eid
emp table
eid ename
1 james
2 mark
3 kristie
4 john
5 allen
location table
city eid type
athens 1 2
melbourne 2 1
london 2 2
newyork 3 1
output:
eid ename city type
1 james athens 2
2 mark london 2
3 kristie newyork 1

I think the most direct way to represent what you're asking for is:
select coalesce(l2.city, l1.city, '0') as city
From emp r
left join location l1
on l1.eid = r.eid
and l1.type=1
left join location l2
on l2.eid = r.eid
and l2.type=2
The subquery-based solution proposed by Jeremy Real may also work, but it assumes that 1 and 2 are they only values in the table for location.type (and I just don't find it to be as intuitive).

Try this:
select a.eid
,r.ename
,case when a.type=2 then b.city
when a.type=1 then b.city
else '0' End As City
from (
select a.eid, max(a.type) as type
From location a
group by a.eid
) a
right outer join location b
on a.eid = b.eid and a.type=b.type
inner join emp r
on b.eid=r.eid

You want to rank your cities. Use ROW_NUMBER to do that:
select e.eid, e.name, l.city, l.type
from emp e
join
(
select
city, eid, type,
row_number() over (partition by eid order by type desc) as rn
from location
) l on l.eid = e.eid and l.rn = 1;
rn is 1 for the better city per eid (where "better" is the one with the higher type).

INNER JOIN on CTE (Common Table Expression) Without PK

I have a CTE in which I am finding duplicate records matching on 5 columns:
;WITH DuplicateCount AS
(
SELECT
FirstName,
LastName,
DateofBirth,
Email,
c1.Status,
Count(*) AS TotalCount
FROM Customer c
INNER JOIN Customer_1 c1 ON c1.customerID = c.customerID
GROUP BY FirstName, LastName, DateofBirth, Email, c1.Status
HAVING COUNT(*) > 1
)
I am then selecting Status and TotalCount from that CTE and joining an Enum table to produce readable data
;WITH DuplicateCount AS
(
SELECT
FirstName,
LastName,
DateofBirth,
Email,
c1.Status,
Count(*) AS TotalCount
FROM Customer c
INNER JOIN Customer_1 c1 ON c1.customerID = c.customerID
GROUP BY FirstName, LastName, DateofBirth, Email, c1.Status
HAVING COUNT(*) > 1
)
SELECT e.Display, dc.TotalCount
FROM DuplicateCount dc
INNER JOIN Enum e ON dc.Status = e.Index
In this scenario, I am able to pull back readable data and use Excel to spit out a graph report of duplicates by Status.
Problem
I need to join the Customer_1 table once again to gather one more column: Stage. Here is how I tried to do it:
;WITH DuplicateCount AS
(
SELECT customerID,
FirstName,
LastName,
DateofBirth,
Email,
c1.Status,
Count(*) AS TotalCount
FROM Customer c
INNER JOIN Customer_1 c1 ON c1.customerID = c.customerID
GROUP BY customerID, FirstName, LastName, DateofBirth, Email, c1.Status
HAVING COUNT(*) > 1
)
SELECT e.Display,
CASE
WHEN c1.Stage = 6 THEN 'First'
WHEN c1.Stage = 7 THEN 'Second'
WHEN c1.Stage = 8 THEN 'Third'
WHEN c1.Stage = 11 THEN 'Fourth'
WHEN c1.Stage = 9 THEN 'Fifth'
WHEN c1.Stage = 10 THEN 'Sixth'
WHEN c1.Stage = 12 THEN 'Unknown'
ELSE ''
END AS Stage,
dc.TotalCount
FROM DuplicateCount dc
INNER JOIN Enum e ON dc.Status = e.Index
INNER JOIN Customer_1 c1 ON c1.customerID = dc.customerID
Obviously, that didn't work because none of my records will have duplicate PKs.
Is there a way to join a table to my CTE without a PK? Or somehow add a PK to my CTE without grouping by it?
Edit: This is what I am trying to achieve
|FirstName | LastName | Stage | Total Count
| John | Smith | First | 2
| John | Smith | Third | 2
| Alex | Smith | First | 2
| Jane | Smith | Third | 2
| Jane | Smith | First | 2
| Jack | Smith | Second | 2
Then, when reporting on this data:
John Smith has 4 total records. Two in First, two in Third
Alex Smith has 2 total records. Two in First
Jane Smith has 4 total records. Two in First and two in Third
Jack Smith has 2 total records. Two in Second.
When graphing this data, I should be able to see:
First: 6 total.
Second: 2 total.
Third: 4 total.
Ideally, I could then also bring in CreatedDate and begin to gather data-over-time reports for:
How many duplicates per Stage.
How many duplicates per Person.
How many duplicates for specific date ranges, events, etc.

The cardinality of the two sets of data don't match. By that I mean the first set of data with the identified duplicates in is aggregated data across a number of customers (without identifying any customers). You can't then take the multiple separate Customer IDs and attribute them back to the aggregated rows.
I think what you need to do is re-frame what you are trying to get out of your data and work backwards. Post an example set of results that you are trying to achieve.
UPDATE:
It seems you want a list of customer\stage groups with counts?:
SELECT customerID,
FirstName,
LastName,
DateofBirth,
Email,
c1.Status,
CASE
WHEN c1.Stage = 6 THEN 'First'
WHEN c1.Stage = 7 THEN 'Second'
WHEN c1.Stage = 8 THEN 'Third'
WHEN c1.Stage = 11 THEN 'Fourth'
WHEN c1.Stage = 9 THEN 'Fifth'
WHEN c1.Stage = 10 THEN 'Sixth'
WHEN c1.Stage = 12 THEN 'Unknown'
ELSE ''
END AS Stage,
Count(*) AS TotalCount
FROM Customer c
INNER JOIN Customer_1 c1 ON c1.customerID = c.customerID
GROUP BY customerID, FirstName, LastName, DateofBirth, Email, c1.Status, c1.Stage
HAVING COUNT(*) > 1

Having trouble finding the right join statement for my query

I am trying to export data from a database and am joining the "customers" table with the "orders" table. It's a one to many relationship where customers can have multiple orders. I'm trying to write a query that returns basic customer info from the customers table - email_address, firstname, lastname, but to also include the date of the last order they placed.
customers as c
- customer_id
- firstname
- lastname
- email_address
orders as o
- orders_id
- customers_id
- purchase_date
I want the result to return a single result for each customer where the purchase date is the last purchase that customer made.
c.firstname, c.lastname, c.email_address, o.purchase_date
What is the correct SQL syntax to make this happen?

select c.*, o.LastOrderDate
from customers c
LEFT JOIN
(select customers_id, max(purchase_date) as LastOrderDate
from orders
group by customers_id) o on o.customers_id=c.customers_id
Will get all customers and the date of the last order, if one exists.

What about:
SELECT c.firstname, c.lastname, c.email_address, MAX(o.purchase_date)
FROM customers AS c
JOIN orders AS o ON o.customers_id = c.customer_id
GROUP BY c.firstname, c.lastname, c.email_address
This only lists customers who have placed at least one order. If you want all customers, then you should be able to use a LEFT JOIN instead of a simple (INNER) JOIN as shown.

This returns all Customers, regardless of whether they have any Orders:
SQL> select c.name
2 , c.email_address
3 , ( select max (o.order_date) from orders o
4 where o.customer_no = c.customer_no )as last_order
5 from customers c
6 /
NAME EMAIL_ADDRESS LAST_ORDE
-------------------- ------------------------- ---------
ACME Industries info#acme.com 07-APR-10
Tyrell Corporation accounts#tyrellcorp.com 26-MAR-10
Lorax Textiles Co the.lorax#hotmail.com
SQL>
It is equivalent to the LEFT OUTER JOIN:
SQL> select c.name
2 , c.email_address
3 , o.last_order_date
4 from customers c
5 left join ( select o.customer_no
6 , max (o.order_date) as last_order_date
7 from orders o
8 group by o.customer_no ) o
9 on o.customer_no = c.customer_no
10 /
NAME EMAIL_ADDRESS LAST_ORDE
-------------------- ------------------------- ---------
ACME Industries info#acme.com 07-APR-10
Tyrell Corporation accounts#tyrellcorp.com 26-MAR-10
Lorax Textiles Co the.lorax#hotmail.com
SQL>
A RIGHT OUTER JOIN would only return rows for Customers with Orders. Assuming an Order must have a Customer (i.e. enforced foreign key) then that would be the same as an INNER JOIN.
If your flavour of database supports analytic functions then RANK() offers an alternative way of solving it...
SQL> select name
2 , email_address
3 , order_date
4 from (
5 select c.name
6 , c.email_address
7 , o.order_date
8 , rank () over (partition by c.customer_no
9 order by o.order_date desc ) as rnk
10 from customers c
11 join orders o
12 on ( o.customer_no = c.customer_no)
13 )
14 where rnk = 1
15 /
NAME EMAIL_ADDRESS ORDER_DAT
-------------------- ------------------------- ---------
ACME Industries info#acme.com 07-APR-10
Tyrell Corporation accounts#tyrellcorp.com 26-MAR-10
SQL>
This also only returns rows for Customers with Orders.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server: finding duplicates between two parameters in different tables - sql

Use join: select c.* from (select c., t2.line1 count() over (partition by surname, forename, line1) as cnt from customers c join table2 t2 on t2.custid = c.custid ) c order by surname;

Related

Displaying columns as rows SQL

SQL MAX aggregate function not bringing the latest date

Case Statement With Multiple Joins

INNER JOIN on CTE (Common Table Expression) Without PK

Having trouble finding the right join statement for my query

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server: finding duplicates between two parameters in different tables - sql

Use join: select c.* from (select c.*, t2.line1 count(*) over (partition by surname, forename, line1) as cnt from customers c join table2 t2 on t2.custid = c.custid ) c order by surname;

Related

Displaying columns as rows SQL

SQL MAX aggregate function not bringing the latest date

Case Statement With Multiple Joins

INNER JOIN on CTE (Common Table Expression) Without PK

Having trouble finding the right join statement for my query

Categories

Resources

Use join: select c.* from (select c., t2.line1 count() over (partition by surname, forename, line1) as cnt from customers c join table2 t2 on t2.custid = c.custid ) c order by surname;