Can we use join with in same table while using group by function? - sql

For instance, I have a table with columns below:
pk_id,address,first_name,last_name
and I have a query like this to display the first name ans last name that are repetitive(duplicates)
select first_name,last_name
from table
group by first_name,last_name
having count(*)>1;
but the above query just returns first and last names but I want to display pk_id and address too that are tied to these duplicate first and last names
Can we use joins to do this on the same table.Please help!!

A simple way of doing is to build a view with the pk_id and the count of duplicates. Once you have it, it is only a matter of using a JOIN on the base table, and a filter to only keep rows having a duplicate:
SELECT T.*
FROM T
JOIN (SELECT "pk_id",
COUNT(*) OVER(PARTITION BY "first_name", "last_name") cnt
FROM T) V
ON T."pk_id" = V."pk_id"
WHERE cnt > 1
See http://sqlfiddle.com/#!4/3ecd0/9

You have to call it from an outer query, like this:
select * from table
where first_name||last_name in
(select first_name||last_name from
(select first_name, last_name, count( * )
from table
group by first_name,last_name
having count( * ) > 1
)
)
note: you may not need to concatenate the 2 fields, but I haven't tested thaT.

with
my_duplicates as
(
select
first_name,
last_name
from
my_table
group by
first_name,
last_name
having
count(*) > 1
)
select
bb.pk_id,
bb.address,
bb.first_name,
bb.last_name
from
my_duplicates aa
join my_table bb on
(
aa.first_name = bb.first_name
and
aa.last_name = bb.last_name
)
order by
bb.last_name,
bb.first_name,
bb.pk_id

Related

solution for writing query

I have a match table with winningteamid and stadiumid as attributes,
I need to retrieve the winningteamid which has won all its games in the same stadium.
I tried this, and I'm getting additional unwanted rows:
select winningteamid
from match
group by winningteamid
having count(winningteamid) in (
select count(*) from match group by (winningteamid,stadiumid)
Try this please (please write which RDBMS you are using):
with cte as (
select winningteamid, stadiumid, count(stadiumid) over (partition by winningteamid) as count
from match
group by winningteamid, stadiumid
)
select * from cte where count = 1;
You should use this:
SELECT MAX(winningteamid)
FROM (
SELECT DISTINCT winningteamid, stadiumid
FROM match
)
GROUP BY stadiumid
HAVING COUNT(*) = 1;
I suppose it's as easy as using HAVING to verify only one distinct stadiumid:
select winningteamid
from match
group by winningteamid
having count(distinct stadiumid) = 1

Select a NON-DISTINCT column in a query that return distincts rows

The following query returns the results that I need but I have to add the ID of the row to then update it. If I add the ID directly in the select statement it will return me more results then I need because each ID is unique so the DISTINCT statement see the line as unique.
SELECT DISTINCT ucpse.MemberID, ucpse.ProductID, ucpse.UserID
FROM UserCustomerProductSalaryExceptions as ucpse
WHERE EXISTS (SELECT NULL
FROM UserCustomerProductSalaryExceptions as upcse2
WHERE ucpse.userid = upcse2.userid AND ucpse.MemberID = upcse2.MemberID AND ucpse.ProductID = upcse2.ProductID
GROUP BY upcse2.UserID, upcse2.memberid, upcse2.productid
HAVING COUNT(UserID) >= 2
)
So basically I need to add ucpse.ID in the Select statement while keeping DISTINCT values for MemberID,ProductID and UserID.
Any Ideas ?
Thank you
According to you comment:
If the data has been duplicated 67 times for a given employee with a given product and a given client, I need to keep only one of thoses records. It's not important which one, so this is why I use DISTINC to obtain unique combinaison of given employee with a given product and a given client.
You can use MIN() or MAX() and GROUP BY instead of DISTINCT
SELECT MAX(ucpse.ID) AS ID, ucpse.MemberID, ucpse.ProductID, ucpse.UserID
FROM UserCustomerProductSalaryExceptions as ucpse
WHERE EXISTS (SELECT NULL
FROM UserCustomerProductSalaryExceptions as upcse2
WHERE ucpse.userid = upcse2.userid AND ucpse.MemberID = upcse2.MemberID AND ucpse.ProductID = upcse2.ProductID
GROUP BY upcse2.UserID, upcse2.memberid, upcse2.productid
HAVING COUNT(UserID) >= 2
)
GROUP BY ucpse.MemberID, ucpse.ProductID, ucpse.UserID
UPDATE:
From you comments I think the below query is what you need
DELETE FROM UserCustomerProductSalaryExceptions
WHERE ID NOT IN ( SELECT MAX(ucpse.ID) AS ID
FROM #UserCustomerProductSalaryExceptions
GROUP BY ucpse.MemberID, ucpse.ProductID, ucpse.UserID
HAVING COUNT(ucpse.ID) >= 2
)
If all you want is to delete the duplicates, this will do it:
WITH X AS
(SELECT ID,
ROW_NUMBER() OVER (PARTITION BY MemberID, ProductID, UserID ORDER BY ID) AS DupRowNum<br
FROM UserCustomerProductSalaryExceptions
)
DELETE X WHERE DupRowNum > 1
ID's not necessary - try:
UPDATE uu SET
<your settings here>
FROM UserCustomerProductSalaryExceptions uu
JOIN ( <paste your entire query above here>
) uc ON uc.MemberID=uu.MemberId AND uc.ProductID=uu.ProductId AND uc.UserID=uu.UserId
From the sound of your data structure (which I would STRONGLY advise normalizing as soon as possible), it sounds like you should be updating all the records. It sounds as if each duplicate is important because it contains some information about an employee's relation to a customer or product.
I would probably update all the records. Try this:
UPDATE UCPSE
SET
--Do your updates here
FROM UserCustomerProductSalaryExceptions as ucpse
JOIN
(
SELECT UserID, MemberID, ProductID
FROM UserCustomerProductSalaryExceptions
GROUP BY UserID, MemberID, ProductID
HAVING COUNT(UserID) >= 2
) T
ON ucpse.UserID = T.UserID AND ucpse.MemberID = T.MemberID AND ucpse.ProductID = T.ProductID

Need to select ALL columns while using COUNT/Group By

Ok so I have a table in which ONE of the columns have a FEW REPEATING records.
My task is to select the REPEATING records with all attributes.
CustID FN LN DOB City State
the DOB has some repeating values which I need to select from the whole table and list all columns of all records that are same within the DOB field..
My try...
Select DOB, COUNT(DOB) As 'SameDOB' from Table1
group by DOB
HAVING (COUNT(DOB) > 1)
This only returns two columns and one row 1st column is the DOB column that occurs more than once and the 2nd column gives count on how many.
I need to figure out a way to list all attributes not just these two...
Please guide me in the right direction.
I think a more general solution is to use windows functions:
select *
from (select *, count(*) over (partition by dob) as NumDOB
from table
) t
where numDOB > 1
The reason this is more general is because it is easy to change to duplicates across two or more columns.
Select *
FROM Table1 T
WHERE T.DOB IN( Select I.DOB
FROM Table1 I
GROUP BY I.DOB
HAVING COUNT(I.DOB) > 1)
Try joining with a subquery, which will also allow you to see the count
select t.*, a.SameDOB from Table1 t
join (
Select DOB, COUNT(DOB) As 'SameDOB' from Table1
group by DOB
HAVING (COUNT(DOB) > 1)
) a on a.dob = t.dob
select *
from table1, (select count(*) from table1) as cnt

How do I create a SQL Distinct query and add some additional fields

I have the following query that selects combinations of first and last names and show me dupes. It works, not problems here.
I want to include three other fields for reference; Id, cUser, and cDate. These additional fields, however, should not be used to determine duplicates as I'd likely not end up with any duplicates.
SELECT * FROM
(SELECT FirstName, LastName, COUNT(*) as "Count"
FROM Contacts
WHERE ContactTypeID = 1
GROUP BY LastName,FirstName
) AS X
WHERE COUNT > 1
ORDER BY COUNT DESC
Any suggestions? Thanks!
SELECT *
FROM (
SELECT *, COUNT(*) OVER (PARTITION BY FirstName, LastName) AS cnt
FROM Contacts
WHERE ContactTypeId = 1
) q
WHERE cnt > 1
ORDER BY
cnt DESC
This will return all fields for each of the duplicated records.
If these fields are always the same then you can include them in GROUP BY and it will not affect the detection of duplicates
If they are not then you must decide what kind of aggregate function you will apply on them, for example MAX() or MIN() would work and would give you some indication of which values are associated with some of the attributes for the duplicates.
Otherwise, if you want to see all of the records you can join back to the source
SELECT X2.* FROM
(SELECT FirstName, LastName, COUNT(*) as "Count"
FROM Contacts
WHERE ContactTypeID = 1
GROUP BY LastName,FirstName
) AS X INNER JOIN Contact X2 ON X.LastName = X2.LastName AND X.FirstName = X2.FirstName
WHERE COUNT > 1
ORDER BY COUNT DESC

SQL query, distinct rows needed

I have the following table structured like this:
So basically as you can see, the department goes through name changes every couple of years. Look at number 16 for example. I want a select query that will only get the name when the date is the greatest. How do I do that?
select ID, Name from departments o
where o.thedate=
(select max(i.thedate) from departments i where o.id=i.id)
SELECT ID,
First(Name) AS FirstOfName, First(DateChange) AS FirstOfDateChange
FROM departments
GROUP BY ID
ORDER BY First(DateChange) DESC;
What is the primary key for this table? This does a subquery the same table with a name comparison.
SELECT
id,
name,
date
FROM table
WHERE name = (SELECT TOP 1 name
FROM table AS subtable
WHERE subtable.name = table.name
ORDER BY date DESC)
SELECT d.*
FROM Departments d
INNER JOIN (SELECT pk
FROM Departments
GROUP BY ID
HAVING theDate=MAX(theDate)) m ON m.pk=d.pk
WHERE [Name]="Department"