SQL Server | Select all specific duplicate in columns - sql

My problem right now is that I need to make an statement where select all rows that are duplicates with specific needs.
For example I got table 1 (users):
Users:
----------------------------------------------
ID name lastname birth file number
1 Max Lix 2015-02-01 D43-892
2 Chris Maura 2010-12-25 E33-722
4 Lena Paul 2005-05-11 S85-458
5 Max Lix 2019-02-01 D23-992
6 Lena Paul 2005-05-11 S84-488
7 Lena Paul 2005-05-11 S75-258
----------------------------------------------
Address(u_ID = ID of Users table):
----------------------------------------------
ID u_ID Street number zip
1 1 Heystr. 12 4556
2 2 Nostr. 2 8978
3 4 Yesstr. 8a 2545
I need to get all rows where the name, lastname and birth does match with other rows and also get the address for that person.
The result should look like this:
Result:
----------------------------------------------
name lastname birth filenumber address
Max Lix 2015-02-01 D43-892 Heystr. 12 4556
Max Lix 2019-02-01 D23-992 Heystr. 12 4556
Lena Paul 2005-05-11 S85-458 Yesstr. 8a 2545
Lena Paul 2005-05-11 S84-488 Yesstr. 8a 2545
Lena Paul 2005-05-11 S75-258 Yesstr. 8a 2545
The first idea that I had was to use GROUP BY and HAVING but that does only return one row but I need every single duplicate matching the name, lastname and birth.

Use this:
select u.name, u.lastname, u.birth, u.filenumber, concat(a.street, ' ', a.number, ' ', a.zip) address
from users u
left join address a
on a.u_id = u.id
where
exists (
select 1 from users
where users.name = u.name and user.lastname = u.lastname and user.birth = u.birth and users.id <> u.id
)
With the condition:
users.name = u.name and user.lastname = u.lastname and user.birth = u.birth and users.id <> u.id
you can find the dupilcates.
Use inner join instead of left join if you want the duplicates only once.

SELECT name, lastname, birth, filenumber, concat(street,' ' , number, ' ', zip) as address
FROM Users A, Adress
WHERE u_id = a.id
AND (SELECT COUNT(1)
FROM Users B
WHERE A.name = B.name
AND A.lastname = B.lastname) > 1

with duplicate as ( -- this CTE makes a list of duplicated user_IDs
Select u.id
from users
group by name, lastname, birth
having count(*) >= 2
)
Select concat(street,' ' , number, ' ', zip) as address,
name, lastname, birth, filenumber
from duplicates d -- gather the data
join addresses on d.id=uid
join users on u.id=d.uid
will return you a report of all the people having an homonym with same birthday

Please try below,
select U.name,U.lastname,U.birth,U.filenumber,concat(street,' ',number,' ',zip) as address
from Users U join Address A on
U.ID=A.u_Id
group by name,lastname,birth
having count(*)>1

Related

Aggregate Functions To Pull More Record Field Data

I would like to know what would be the best way to get the data from a specific row when I use a Group By query. The real query is more complex than the example I'm providing here so I'm looking for something other than a sub-select on the Sales table.
I'm using MSSQL 2008 and I would like something that allow me to get the date field from the Sales record that has the max(amount).
Query
select uid, firstName, lastName, AmountFromTagetedRow, DateFromTargetedRow,
from users u inner join
sales s on u.uid = s.custID
group by uid, firstName, lastName
order by uid
USERS
uid firstName lastName
1 Sam Smith
2 Joe Brown
3 Kim Young
SALES
sid Custid date amount ...
1 1 2016-01-02 100
2 3 2016-01-12 485
3 1 2016-01-22 152
4 2 2016-02-01 156
5 1 2016-02-02 12
6 1 2016-03-05 84
7 2 2016-03-10 68
RESULTS
uid firstName LastName amount date
1 Sam Smith 152 2016-01-22
2 Joe Brown 156 2016-02-01
3 Kim Young 485 2016-01-12
Your posted query doesn't match your amount but something like this should get you pointed in the right direction.
with SortedResults as
(
select uid
, firstName
, lastName
, AmountFromTagetedRow
, DateFromTargetedRow
, ROW_NUMBER() over (partition by u.uid order by AmountFromTagetedRow desc) as RowNum
from users u inner join
sales s on u.uid = s.custID
group by uid
, firstName
, lastName
)
select *
from SortedResults
where RowNum = 1
order by uid

Pulling Datetime when using COUNT(UserID)

I have the tables Users
UserID FirstName LastName Email
------ --------- -------- -----
1 Fred Smith fs#abc.com
2 Bob Hill bh#abc.com
3 Jane Doe jd#abc.com
and LoginSession
LoginSessionID UserID StartDate
-------------- ------ ---------
1 1 2014-11-23 08:37:14.836
2 1 2014-11-25 11:13:53.225
3 2 2014-12-01 03:15:33.846
4 1 2014-12-01 17:34:19.036
5 3 2014-12-05 12:55:01.998
6 1 2014-12-14 17:20:14.636
7 3 2014-12-15 10:02:17.376
What I am trying to do is find the users who have logged on only once and find out when that was.
I have managed to find the users who have logged on only once by using
SELECT
U.FirstName, U.LastName, COUNT(L.UserID) AS Visits
FROM
LoginSession L
JOIN
Users U ON U.UserID = L.UserID
GROUP BY
U.FirstName, U.LastName
HAVING
COUNT(L.UserID) = 1
But I also want to pull through the L.StartDate of those users. If I add it to the select query I get an error because it's not contained in an aggregate function or GROUP BY clause. If I add it the the GROUP BY line (to avoid that error) I get each and every login handily marked as 1 visit!
I also tried using a subquery but I got an error because it returned more than one result.
I really am totally stumped!
You can do this with aggregation:
select UserId, min(StartDate) as StartDate
from LoginSession ls
group by UserId
having count(*) = 1;
The min() returns the value you want, because there is only one row that matches. You can use an addition join to get additional information about the users.
select u.*, lsu.StartDate
from Users u join
(select UserId, min(StartDate) as StartDate
from LoginSession ls
group by UserId
having count(*) = 1
) lsu
on lsu.UserId = u.UserId;
You can use windowed version of COUNT:
SELECT FirstName, LastName, StartDate
FROM (
SELECT U.FirstName, U.LastName, L.StartDate,
COUNT(*) OVER (PARTITION BY U.UserID) AS cnt
FROM LoginSession L
JOIN Users U ON U.UserID = L.UserID ) AS t
WHERE t.cnt = 1
COUNT with OVER clause will return the number of records per U.UserID. Using an outer query you can fetch exactly these records.
Demo here

SQL Server - select min date and id from foreign key

These are my tables:
USER:
id_user name email last_access id_company
1 jhonatan abc#abc.com 2014-12-15 1
2 cesar cef#cef.com 2014-12-31 1
3 john 123#123.com 2015-01-09 2
4 steven 897#asdd.cpom 2015-01-02 2
5 greg sd#touch.com 2014-12-07 1
6 kyle fb#fb.com 2014-11-20 1
COMPANY:
id_company company
1 Facebook
2 Appslovers
I need to know, what are the users which has the MIN last_access per company (just one). It could be like this:
id_user name last_access company
6 kyle 2014-11-20 Facebook
4 steven 2015-01-02 Appslovers
Is it possible ?
Use window function
SELECT id_user,
NAME,
last_access,
company
FROM (SELECT id_user,
NAME,
last_access,
company,
Row_number()OVER(partition BY company ORDER BY last_access) rn
FROM users u
JOIN company c
ON u.id_company = c.id_company) a
WHERE rn = 1
or join both the tables find the min last_access date per company then join the result back to the users table to get the result
SELECT id_user,
NAME,
a.last_access,
a.company
FROM users u
JOIN(SELECT u.id_company,
Min(last_access) last_access,
company
FROM users u
JOIN company c
ON u.id_company = c.id_company
GROUP BY u.id_company,
company) a
ON a.id_company = u.id_company
AND u.last_access = a.last_access
This can be done in many ways, for example by using a window function like row_number to partition the data and then selecting the top rows from each group like this:
;with cte (id_user, name, last_access, company, seq) as (
select
id_user,
name,
last_access,
company,
seq = row_number() over (partition by u.id_company order by last_access)
from [user] u
inner join [company] c on u.id_company = c.id_company
)
select id_user, name, last_access, company
from cte where seq = 1

INNER JOIN on CTE (Common Table Expression) Without PK

I have a CTE in which I am finding duplicate records matching on 5 columns:
;WITH DuplicateCount AS
(
SELECT
FirstName,
LastName,
DateofBirth,
Email,
c1.Status,
Count(*) AS TotalCount
FROM Customer c
INNER JOIN Customer_1 c1 ON c1.customerID = c.customerID
GROUP BY FirstName, LastName, DateofBirth, Email, c1.Status
HAVING COUNT(*) > 1
)
I am then selecting Status and TotalCount from that CTE and joining an Enum table to produce readable data
;WITH DuplicateCount AS
(
SELECT
FirstName,
LastName,
DateofBirth,
Email,
c1.Status,
Count(*) AS TotalCount
FROM Customer c
INNER JOIN Customer_1 c1 ON c1.customerID = c.customerID
GROUP BY FirstName, LastName, DateofBirth, Email, c1.Status
HAVING COUNT(*) > 1
)
SELECT e.Display, dc.TotalCount
FROM DuplicateCount dc
INNER JOIN Enum e ON dc.Status = e.Index
In this scenario, I am able to pull back readable data and use Excel to spit out a graph report of duplicates by Status.
Problem
I need to join the Customer_1 table once again to gather one more column: Stage. Here is how I tried to do it:
;WITH DuplicateCount AS
(
SELECT customerID,
FirstName,
LastName,
DateofBirth,
Email,
c1.Status,
Count(*) AS TotalCount
FROM Customer c
INNER JOIN Customer_1 c1 ON c1.customerID = c.customerID
GROUP BY customerID, FirstName, LastName, DateofBirth, Email, c1.Status
HAVING COUNT(*) > 1
)
SELECT e.Display,
CASE
WHEN c1.Stage = 6 THEN 'First'
WHEN c1.Stage = 7 THEN 'Second'
WHEN c1.Stage = 8 THEN 'Third'
WHEN c1.Stage = 11 THEN 'Fourth'
WHEN c1.Stage = 9 THEN 'Fifth'
WHEN c1.Stage = 10 THEN 'Sixth'
WHEN c1.Stage = 12 THEN 'Unknown'
ELSE ''
END AS Stage,
dc.TotalCount
FROM DuplicateCount dc
INNER JOIN Enum e ON dc.Status = e.Index
INNER JOIN Customer_1 c1 ON c1.customerID = dc.customerID
Obviously, that didn't work because none of my records will have duplicate PKs.
Is there a way to join a table to my CTE without a PK? Or somehow add a PK to my CTE without grouping by it?
Edit: This is what I am trying to achieve
|FirstName | LastName | Stage | Total Count
| John | Smith | First | 2
| John | Smith | Third | 2
| Alex | Smith | First | 2
| Jane | Smith | Third | 2
| Jane | Smith | First | 2
| Jack | Smith | Second | 2
Then, when reporting on this data:
John Smith has 4 total records. Two in First, two in Third
Alex Smith has 2 total records. Two in First
Jane Smith has 4 total records. Two in First and two in Third
Jack Smith has 2 total records. Two in Second.
When graphing this data, I should be able to see:
First: 6 total.
Second: 2 total.
Third: 4 total.
Ideally, I could then also bring in CreatedDate and begin to gather data-over-time reports for:
How many duplicates per Stage.
How many duplicates per Person.
How many duplicates for specific date ranges, events, etc.
The cardinality of the two sets of data don't match. By that I mean the first set of data with the identified duplicates in is aggregated data across a number of customers (without identifying any customers). You can't then take the multiple separate Customer IDs and attribute them back to the aggregated rows.
I think what you need to do is re-frame what you are trying to get out of your data and work backwards. Post an example set of results that you are trying to achieve.
UPDATE:
It seems you want a list of customer\stage groups with counts?:
SELECT customerID,
FirstName,
LastName,
DateofBirth,
Email,
c1.Status,
CASE
WHEN c1.Stage = 6 THEN 'First'
WHEN c1.Stage = 7 THEN 'Second'
WHEN c1.Stage = 8 THEN 'Third'
WHEN c1.Stage = 11 THEN 'Fourth'
WHEN c1.Stage = 9 THEN 'Fifth'
WHEN c1.Stage = 10 THEN 'Sixth'
WHEN c1.Stage = 12 THEN 'Unknown'
ELSE ''
END AS Stage,
Count(*) AS TotalCount
FROM Customer c
INNER JOIN Customer_1 c1 ON c1.customerID = c.customerID
GROUP BY customerID, FirstName, LastName, DateofBirth, Email, c1.Status, c1.Stage
HAVING COUNT(*) > 1

MySql Join a View Table as a Boolean

I have a users table, and a view table which lists some user ids... They look something like this:
Users:
User_ID | Name | Age | ...
555 John Doe 35
556 Jane Doe 24
557 John Smith 18
View_Table
User_ID
555
557
Now, when I do run a query to retrieve a user:
SELECT User_ID,Name,Age FROM Users WHERE User_ID = 555
SELECT User_ID,Name,Age FROM Users WHERE User_ID = 556
I also would like to select a boolean, stating whether or not the user I'm retrieving is present in the View_Table.
Result:
User_ID Name Age In_View
555 John Doe 35 1
556 Jane Doe 24 0
Any help would be greatly appreciated. Efficiency is a huge plus. Thanks!!
SELECT Users.User_ID,Name,Age, View_Table.User_ID IS NOT NULL AS In_View
FROM Users
LEFT JOIN View_table USING (User_ID)
WHERE User_ID = 555
SELECT
User_ID, Name, Age,
CASE WHEN v.UserID is not null THEN 1 ELSE 0 END AS In_View
FROM Users u
LEFT JOIN View_Table v on u.User_ID = v.UserID
WHERE UserID ...;
I would do a LEFT JOIN. So long as you have key/index for User_ID, it should be very efficient.
SELECT User_ID,Name,Age, IF(View_Table.User_ID, 1, 0) AS In_View
FROM Users LEFT JOIN View_Table USING(User_ID)
WHERE User_ID = 555
I know this is an "Old" question but just happened upon this and none of these answers seemed to be that good. So I thought I would throw in my 2 cents
SELECT
u.User_ID,
u.Name,
u.Age,
COALESCE((SELECT 1 FROM View_Table AS v WHERE v.User_ID = u.User_ID ), 0) AS In_View
FROM
Users AS u
WHERE
u.User_ID = 555
Simply select 1 with a correlated query ( or null ) then to get the 0 we can use the handy function COALESCE which returns the first non-null value left to right.