Identifying duplicate records in SQL along with primary key - sql

I have a business case scenario where I need to do a lookup into our SQL "Users" table to find out email addresses which are duplicated. I was able to do that by the below query:
SELECT
user_email, COUNT(*) as DuplicateEmails
FROM
Users
GROUP BY
user_email
HAVING
COUNT(*) > 1
ORDER BY
DuplicateEmails DESC
I get an output like this:
user_email DuplicateEmails
--------------------------------
abc#gmail.com 2
xyz#yahoo.com 3
Now I am asked to list out all the duplicate records in a single row of its own and display some additional properties like first name , last name and userID. All this information is stored in this table "Users". I am having difficulty doing so. Can anyone help me or put me toward right direction?
My output needs to look like this:
user_email DuplicateEmails FirstName LastName UserID
------------------------------------------------------------------------------
abc#gmail.com 2 Tim Lentil timLentil
abc#gmail.com 2 John Doe johnDoe12
xyz#yahoo.com 3 brian boss brianTheBoss
xyz#yahoo.com 3 Thomas Hood tHood
xyz#yahoo.com 3 Mark Brown MBrown12

There are several ways you could do this. Here is one using a cte.
with FoundDuplicates as
(
SELECT
uter_email, COUNT(*) as DuplicateEmails
FROM
Users
GROUP BY
uter_email
HAVING
COUNT(*) > 1
)
select fd.user_email
, fd.DuplicateEmails
, u.FirstName
, u.LastName
, u.UserID
from Users u
join FoundDuplicates fd on fd.uter_email = u.uter_email
ORDER BY fd.DuplicateEmails DESC

Use count() over( Partition by ), example

You can solve it like:
DECLARE #T TABLE
(
UserID VARCHAR(20),
FirstName NVARCHAR(45),
LastName NVARCHAR(45),
UserMail VARCHAR(45)
);
INSERT INTO #T (UserMail, FirstName, LastName, UserID) VALUES
('abc#gmail.com', 'Tim', 'Lentil', 'timLentil'),
('abc#gmail.com', 'John', 'Doe', 'johnDoe12'),
('xyz#yahoo.com', 'brian', 'boss', 'brianTheBoss'),
('xyz#yahoo.com', 'Thomas', 'Hood', 'tHood'),
('xyz#yahoo.com', 'Mark', 'Brown', 'MBrown12');
SELECT *, COUNT (1) OVER (PARTITION BY UserMail) MailCount
FROM #T;
Results:
+--------------+-----------+----------+---------------+-----------+
| UserID | FirstName | LastName | UserMail | MailCount |
+--------------+-----------+----------+---------------+-----------+
| timLentil | Tim | Lentil | abc#gmail.com | 2 |
| johnDoe12 | John | Doe | abc#gmail.com | 2 |
| brianTheBoss | brian | boss | xyz#yahoo.com | 3 |
| tHood | Thomas | Hood | xyz#yahoo.com | 3 |
| MBrown12 | Mark | Brown | xyz#yahoo.com | 3 |
+--------------+-----------+----------+---------------+-----------+

Use a window function like this:
SELECT u.*
FROM (SELECT u.*, COUNT(*) OVER (PARTITION BY user_email) as numDuplicateEmails
FROM Users
) u
WHERE numDuplicateEmails > 1
ORDER BY numDuplicateEmails DESC;

I think this will also work.
WITH cte (
SELECT
*
,DuplicateEmails = ROW_NUMBER() OVER (Partition BY user_email ORder by user_email)
FROM Users
)
Select * from CTE
where DuplicateEmails > 1

Related

Bigquery: Group query results in arrays

I have a table that lists friends from a particular user:
user_id | friend_name
1 | JOEL
1 | JACK
2 | MARIA
I want to have them grouped by user_id and each row has an array with all the friends.
How can make a selection that would do this transformation?
UPDATE:
select user_id, array_agg(friend_name) as friends
from your_table
group by user_id
Works fine.
However I forgot a small detail, the table has another column.
user_id | friend_name | friends_age
1 | JOEL | 21
1 | JACK | 30
2 | MARIA | 25
My solution was to add another array_agg:
select user_id, array_agg(friend_name), array_agg(friend_age)as friends
from your_table
group by user_id
I believe it works, the only problem is when age is Null, in that case, I need to add a CASE WHEN clause.
select user_id, array_agg(friend_name),
array_agg(CASE friend_age IS NULL THEN 0 ELSE friend_age END)as friends
from your_table
group by user_id
select user_id, array_agg(friend_name) as friends
from your_table
group by user_id

sql GROUP within group

Let's say I have this database:
ID | Name | City
1 | John | TLV
2 | Abe | JLM
3 | John | JLM
I want to know how many people with different names are in each city.
I tried to use GROUP BY like this:
SELECT `city`, count(`index`) as `num` FROM `people`
GROUP BY `city`, `name`
But this seems to group by both.
City | num
TLV | 1
JLM | 1
What I want to do is to group by city, and group the results by name.
City | num
TLV | 1
JLM | 2
How can I do this?
I think you want this:
SELECT `city`, count(distinct name) as `num`
FROM `people`
GROUP BY `city`;
You might want just count(name) . . . I'm not sure what you mean by "differently named". count(name) is preferable, if you don't need the distinct.

How to get the previous row-text

First of all: I am a SQL beginner and I use SQL Server 2008.
The tables as it is now, is written as:
SELECT
Transaction.description, Person.name
FROM
Transaction, Person, SystemUser
WHERE
Person.personnumber = SystemUser.personnumber
AND Transaction.art_ID = SystemUser.art_ID
ORDER BY
Transaction.description
where personnumber is PK nvarchar (could look like N0890) where the last numbers of it grows with +1 for every new person.
art_ID (Transaction) is PK smallint, art_ID (SystemUser) is smallint, description is nvarchar.
I want to get the text from the previous row, in the same column, so that I can manipulate the text to be clear and make the result-table look more simple.
Example as it is now:
|Transactions | Persons |
|-------------------|----------|
|Statistic | Ursula |
|Statistic | Peter |
|Statistic | Alan |
|Settlement | Christie |
|Settlement | Tania |
|Deptor department | Jack |
|Economy department | Rickie |
|Economy department | Annie |
|Economy department | Tom |
|Economy department | Seth |
How I want it to be:
|Transactions | Persons |
|-------------------|----------|
|Statistic | Ursula |
| | Peter |
| | Alan |
|Settlement | Christie |
| | Tania |
|Deptor department | Jack |
|Economy department | Rickie |
| | Annie |
| | Tom |
| | Seth |
as in select case when description = description - 1 row then ''
I have searched for examples and every one of them are based on integers, not varchar/nvarchar), and I keep getting errors when i try to do it with varchars. Such as With CTE, min() and max().
Do you have any ideas of what function I can use or how to set up the select-statement to do as I want?
First use a rank function to identify just one of them:
SELECT Transaction.description, Person.name,
RANK() OVER (PARTITION BY Transaction.description ORDER BY Person.name) As R
FROM Transaction, Person, SystemUser
WHERE Person.personnumber = SystemUser.personnumber
AND Transaction.art_ID = SystemUser.art_ID
ORDER BY Transaction.description, Person.name
Notice the lines you want to see have 1 against them? Use that:
SELECT
CASE WHEN R=1 THEN Transaction.description ELSE '' END description,
Person.name
FROM
(
SELECT Transaction.description, Person.name,
RANK() OVER (PARTITION BY Transaction.description ORDER BY Person.name) As R
FROM Transaction, Person, SystemUser
WHERE Person.personnumber = SystemUser.personnumber
AND Transaction.art_ID = SystemUser.art_ID
) Subtable
ORDER BY Transaction.description, Person.name
I think following SQL should work
CREATE TABLE #TempTable (rowrank INT, description VARCHAR(256), name VARCHAR(256));
INSERT INTO #TempTable (rowrank, description, name)
VALUES
Select RANK() OVER (ORDER BY Transaction.description)
,Transaction.description
,name
FROM Transaction, Person, SystemUser
WHERE Person.personnumber = SystemUser.personnumber
AND Transaction.art_ID = SystemUser.art_ID
ORDER BY Transaction.description
SELECT
CASE
WHEN prev.RANK = TT.RANK
THEN ""
ELSE TT.Description
END AS Description,
name
FROM #TempTable TT
LEFT JOIN #TempTable prev ON prev.rownum = TT.rownum - 1

Data Matching with SQL and assigning Identity ID's

How to write a query that will match data and produce and identity for it.
For Example:
RecordID | Name
1 | John
2 | John
3 | Smith
4 | Smith
5 | Smith
6 | Carl
I want a query which will assign an identity after matching exactly on Name.
Expected Output:
RecordID | Name | ID
1 | John | 1X
2 | John | 1X
3 | Smith | 1Y
4 | Smith | 1Y
5 | Smith | 1Y
6 | Carl | 1Z
Note: The ID should be unique for every match. Also, it can be numbers or varchar.
Can somebody help me with this? The main thing is to assign the ID's.
Thanks.
How about this:
with temp as
(
select 1 as id,'John' as name
union
select 2,'John'
union
select 3,'Smith'
union
select 4,'Smith'
union
select 5,'Smith'
union
select 6,'Carl'
)
SELECT *, DENSE_RANK() OVER
(ORDER BY Name) as NewId
FROM TEMP
Order by id
The first part is for testing purposes only.
Please try:
SELECT *,
Rank() over (order by Name ASC)
FROM table
This structure seems to work:
CREATE TABLE #Table
(
Department VARCHAR(100),
Name VARCHAR(100)
);
INSERT INTO #Table VALUES
('Sales','michaeljackson'),
('Sales','michaeljackson'),
('Sales','jim'),
('Sales','jim'),
('Sales','jill'),
('Sales','jill'),
('Sales','jill'),
('Sales','j');
WITH Cte_Rank AS
(
SELECT [Name],
rw = ROW_NUMBER() OVER (ORDER BY [Name])
FROM #Table
GROUP BY [Name]
)
SELECT a.Department,
a.Name,
b.rw
FROM #Table a
INNER JOIN Cte_Rank b
ON a.Name = b.Name;

get table join with column value

How can i join tables using column value?
I have three tables as listed below:
messages_table
-----------------------------------------------------------------------------
msg_id | msg_sub | msg_to | to_user_type | msg_from | from_user_type
-----------------------------------------------------------------------------
001 | test | 88 | manager | 42 | admin
002 | test2 | 88 | manager | 94 | manager
admin_table
-------------------------—
admin_id | admin_name
-------------------------—
001 | Super Admin
manager_table
---------------------------
manager_id | manager_name
---------------------------
88 | Mandela
94 | Kristen
How can i get the desired output as shown below with SQL query. I.e.
Join tables with respect to column values when the following criteria is met:
If user_type = admin then it should join with admin_table.
If user_type = manager then it should join with manager_table.
Desired output:
-----------------------------------------------------
msg_id | msg_sub | msg_to_name | msg_from_name
-----------------------------------------------------
001 | test | Mandela | Super Admin
002 | test2 | Mandela | Kristen
I.e. Get the join sql query based on column value.
EDIT:
I want to fetch the datafrom sql query not form the serverside coding.
I tried this query from here, i.e. Winfred's Idea ( Answered )
However, I could not understand it.
msg_by_usertype is the column based, where the value manager then it should select manager_table and if it is admin the to admin_table
As far as I understood your question, you can try this:
SELECT msg_id,
msg_body,
usersBy.userName AS msg_by,
usersTo.userName AS msg_to,
msg_by_usertype
FROM messages
INNER JOIN
(SELECT admin_id As id, admin_name as userName
FROM admin_table
UNION
SELECT manager_id As id, manager_name as userName
FROM manager_table ) usersTo ON msg_to = usersTo.id
INNER JOIN
(SELECT admin_id As id, admin_name as userName
FROM admin_table
UNION
SELECT manager_id As id, manager_name as userName
FROM manager_table ) usersBy ON msg_by = usersBy.id
Here is an SQL Fiddle to see how it works. (It only works if you cant have an admin who has the same id like a manager. Id should be unique in both tables.)
Please use the below SQL
SELECT msg_id,
msg_body,
usersBy.userName AS msg_by,
usersTo.userName AS msg_to,
msg_by_usertype
FROM messages
INNER JOIN
(SELECT admin_id As id, admin_name as userName,'admin' as usertype
FROM admin_table
UNION
SELECT manager_id As id, manager_name as userName,'manager' as usertype
FROM manager_table ) usersTo
ON msg_to = usersTo.id and msg_by_usertype = usersTo.usertype
if I understand your question correctly, you want a result like this?
MSG_ID MSG_BODY MSG_TO BY MSG_BY_USERTYPE
---------- ---------- ---------- ----------- ---------------
001 test adm1 managone manager
002 sadff adm1 adm3? admin
If so, you could use this
SELECT MSG_ID, MSG_BODY, MSG_TO,
CASE
WHEN MSG_BY_USERTYPE = 'admin' THEN COALESCE(
(SELECT ADMIN_NAME FROM ADMIN_TABLE
WHERE MSG_BY = ADMIN_ID), RTRIM(MSG_BY) CONCAT '?')
WHEN MSG_BY_USERTYPE = 'manager' THEN COALESCE(
(SELECT MANAGER_NAME FROM MANAGER_TABLE
WHERE MSG_BY = MANAGER_ID), RTRIM(MSG_BY) CONCAT '?')
ELSE ' '
END AS BY,
MSG_BY_USERTYPE
FROM MESSAGES