find duplicated record by first and last name

find duplicated record by first and last name - sql

I have a table called beneficials. Some facts about it:
A beneficial belongs to one organization
An organization has many beneficial
Beneficials have first and last names and no other identification form.
Some sample data from the table
| id | firstname | lastname | organization_id |
|----|-----------|----------|-----------------|
| 1 | jan | kowalski | 1 |
| 2 | jan | kovalski | 3 |
| 3 | john | doe | 1 |
| 4 | jan | kowalski | 2 |
I want to find if a beneficial from an organization is also present in other organizations through first and last name and if so, I want to get the organization or organizations ids.
in the sample data above, what I want is given organization id 1, the query should return 2 because jan kowalski is also beneficial on organization 2 but not 3 because even though they match the first name, they don't match the last name
I came up with the following query:
with org_beneficials as (
select firstname, lastname from beneficials where organization_id = ? and deleted_at is null
)
select organization_id from beneficials
where firstname in (select firstname from org_beneficials)
and lastname in (select lastname from org_beneficials)
and deleted_at is null
and organization_id <> ?;
it kinda works but returns some false positive if beneficial from different organizations share the same first or last name. I need to match both first and last names and I can't figure out how.
I thought about joining the table itself but I'm not sure if this would work since an organization has many beneficials. Adding a column like fullname is not something I want to do it here

You can group by first and last names, then filter for duplicates
SELECT firstname, lastname
FROM beneficials
GROUP BY firstname, lastname
HAVING COUNT(*) > 1;
After your edits, it seems you want to select the records of people of a given organization that also appear in a different organization
SELECT *
FROM beneficials a
WHERE a.organization_id != 1
AND EXISTS (
SELECT 1
FROM beneficials b
WHERE a.firstname = b.firstname
AND a.lastname = b.lastname
AND b.organization_id = 1
);

Related

Reorganize multiple rows in a new table with more columns

I have a table that looks like that:
+----------+----------+----------+----------+--------------------------+
| Club | Role | Name | Lastname | Email |
+----------+----------+----------+----------+--------------------------+
| Porto | 1 | Peter | Pan | peter.pan#mail.com |
| Porto | 2 | Michelle | Obama | michelle.obama#mail.com |
| Monaco | 1 | Serena | Williams | serena.williams#mail.com |
| Monaco | 2 | David | Beckham | david.beckham#mail.com |
+----------+----------+----------+----------+--------------------------+
and i want to get a table like that:
+----------+-----------------+-----------------+---------------------------------+-----------------+-----------------+---------------------------------+
| Club | Role 1 Name | Role 1 Lastname | Role 1 Email | Role 2 Name | Role 2 Lastname | Role 2 Email |
+----------+-----------------+-----------------+---------------------------------+-----------------+-----------------+---------------------------------+
| Porto | Peter | Pan | peter.pan#mail.com | Michelle | Obama | michelle.obama#mail.com |
| Monaco | Serena | Williams | serena.williams#mail.com | David | Beckham | david.beckham#mail.com |
+----------+-----------------+-----------------+---------------------------------+-----------------+-----------------+---------------------------------+
where the persons with different roles in each club puts in the same row.
I would ideally like to find a way to do that in Excel, but i am not sure if its possible. If not, SQL code would also help a lot.

Here is what I could come up with for an excel formula. Hopefully it can push you in the right direction.
This formula is assuming that your first table exists at the range A1:E5 and the second table exists at the range G1:M3. It is also assuming that the second table's column names are just repeating without the Role number attached to the front of it (same as the first table). This formula is an array formula, so you have to make sure to do CTRL+SHIFT+ENTER when inputting it.
{=INDEX($A$2:$E$5,
MATCH(1,($G2=$A$2:$A$5)*((FLOOR((COLUMN() - COLUMN($H$1) ) / 3,1) + 1)=$B$2:$B$5),0),
MATCH(H$1,$A$1:$E$1,0))}
The first part is using the INDEX forumla which pulls data from the range suppled ($A$2:$E$5) based on the row and column numbers supplied by the following MATCH formulas.
The first MATCH is supplying the row number for when the result of the lookup array section is equal to 1. I am checking two conditions, the first is to check for the "Club" name ($G2=$A$2:$A$5) and the second is to check for which "Role" we are currently on ((FLOOR((COLUMN() - COLUMN($H$1) ) / 3,1) + 1). This is using the FLOOR function to round the result down to the whole number and dividing by the number of columns (3 in this case: Name, Lastname, and Email).
The final MATCH is pulling the column number based on the header names from both tables. If you wanted to incorporate the changing names of the roles in the column headers, you could change this part to something like this:
{=INDEX($A$2:$E$5,
MATCH(1,($G2=$A$2:$A$5)*((FLOOR((COLUMN() - COLUMN($H$1) ) / 3,1) + 1)=$B$2:$B$5),0),
MOD(COLUMN() - COLUMN($H$1),3) + 3)}
I am adding 3 to the end of the mod because of the original range that was selected for table 1. The columns that we want to pull start at location 3 in the range.

If you want to do this in Oracle Sql, there's a nice approach in analytical sql.
To convert/swap rows to columns or columns to rows, we can use Pivot or Unpivot operators.
In you example use below query to covert data as you like,
select * from
(
with all_roles as
(select 1 role from dual union all
select 2 role from dual),
ddata as
(select 1 c_role, 'porto' club, 'peter' fname,'pan' lname,'peter.pan#mail.com' email from dual union all
select 2 c_role, 'porto' club, 'Michelle' fname, 'Obama' lname,'michelle.obama#mail.com' email from dual union all
select 1 c_role, 'monaco' club, 'Serena' fname, 'Williams' lname,'serena.williams#mail.com' email from dual union all
select 2 c_role, 'monaco' club, 'David' fname, 'Beckham' lname,'david.beckham#mail.com' email from dual )
(select role, club, fname,lname, email from ddata,all_roles
where all_roles.role=ddata.c_role)) all_data
pivot (
max(fname) fname,
max(lname) lname,
max(email) email
for role in ( 1 role1, 2 role2 )
)
order by club;

SQL Server query - don't want multiple rows with identical data

I have a SQL Server database that has the following three tables - this is simplified for this post.
Stakeholder table (a table that stores a persons personal data... name, address city, state, zip, etc)
Stakeholder_id full_name
---------------------------------------
1 Joe Stakeholder
2 Eric Stakeholder
SH Inquiry table (a table that stores information about when a stakeholder contacts us)
sh_inquiry_id inquiry_link_ID
-----------------------------------------------
1 1
2 1
3 2
Sh Contacts (a table that stores information about when we contact a stakeholder)
sh_contact_id contact_link_id
-----------------------------------------
1 1
2 1
3 2
I want to write a SQL query that shows the stakeholder information once then show all inquiries and all contacts underneath the stakeholder row? is that possible with SQL? So in this case joe stakeholder would be shown once and then there would be 4 rows next (2 inquiries and 2 contacts). Eric stakeholder would be shown once with two rows, 1 inquiry and one contact.
Thanks for any assistance in advance.

As has already been mentioned, you probably want to handle this in your application code. However, you can use a UNION query to sort of do what you want.
With the query below, I changed your latter 2 tables to SH_Inquiry and SH_Contacts (replaced spaces with underscores), which is generally a good habit (it's a bad idea to have spaces in your object names). Also, depending on how your tables are laid out, you might want to merge your Contacts and Inquiry tables (e.g. have one table, with a contact_type field that identifies it as "inbound" or "outbound").
Anyways, using a CTE and union:
WITH Unionized AS
(
SELECT
stakeholder_id,
full_name,
NULL AS contact_or_inquiry,
NULL AS contact_or_inquiry_id
FROM Stakeholder
UNION ALL
SELECT
inquiry_link_id AS stakeholder_id,
NULL AS full_name,
'inquiry' AS contact_or_inquiry,
sh_inquiry_id AS contact_or_inquiry_id
FROM SH_Inquiry
UNION ALL
SELECT
contact_link_id AS stakeholder_id,
NULL AS full_name,
'contact' AS contact,
sh_contact_id AS contact_or_inquiry_id
FROM SH_Contacts
)
SELECT
full_name,
contact_or_inquiry,
contact_or_inquiry_id
FROM Unionized
ORDER BY
stakeholder_id,
contact_or_inquiry,
contact_or_inquiry_id
giving you these results:
+------------------+--------------------+-----------------------+
| full_name | contact_or_inquiry | contact_or_inquiry_id |
+------------------+--------------------+-----------------------+
| Joe Stakeholder | NULL | NULL |
| NULL | contact | 1 |
| NULL | contact | 2 |
| NULL | inquiry | 2 |
| Eric Stakeholder | NULL | NULL |
| NULL | contact | 3 |
| NULL | inquiry | 1 |
| NULL | inquiry | 3 |
+------------------+--------------------+-----------------------+

How to return unique rows having count() of multiple columns = 1 using group by?

So here is my situation:
____________________________________________
| idnumber | name | sectiongroup |
--------------------------------------------
| 123 | Joe | one |
| 123 | Barry | two |
| 1234 | Laura | one |
| 1234 | LauraCopyCat | one |
--------------------------------------------
I am trying to build a query which will return any unique (i.e. - COUNT(idnumber) = 1) id numbers in a given sectiongroup. So if you are in sectiongroup number one and no one else in your sectiongroup has the same ID number as you, then I want your idnumber. If someone in group two happens to have the same idnumer, that is okay, I still want your idnumber.
For example, Barry and Joe have the same id number but they are in separate sectiongroups, so I want to return their idnubers. However, Laura and LauraCopyCat have the SAME sectiongroup, so I do NOT want their idnumbers to be returned. So far I have the following:
SELECT idnumber
FROM namestable
GROUP BY idnumber, sectiongroup
HAVING(COUNT(idnumber) = 1)
Is there a way to add sectiongroup into the COUNT()=1 condition?

Just use COUNT(*) to avoid confusion. This will count the number of records in the particular group. Remember, a group consists of the unique combinations of values in the fields specified in your GROUP BY statement.
SELECT idnumber
FROM namestable
GROUP BY idnumber, sectiongroup
HAVING COUNT(*) = 1
Note that this will result in duplicate idnumbers, if you have records that share an id but have different subgroups. To remove duplicate, just change SELECT to SELECT DISTINCT.
Tested here: http://sqlfiddle.com/#!9/b0a50c/3

Selecting Active roles from denormalized table with duplicates

I have a bit of a garbage table that I need to extract data from.
Name | Person# | Assignment_Status | Group
--------------------------------------------------
Smith, John | 1234567 | NLE | G1
Smith, John | 1234567 | Active | G2
Jones, Jane | 7654321 | Active | G1
James, Jack | 9876541 | LOA | G3
Peep, Laura | 6549871 | ServiceLOA | G1
Some, One | 3219875 | NLE | G2
Every time a person moves groups their current assignment_status gets set to NLE and a new record gets create to set the assignment_status to Active for the new group. When a person leaves the company they also set the assignment_status to NLE. This table does not have a Unique row ID nor does it have a date stamp.
I need a query that reduces the table to 1 record per employee and if the employee has multiple records I need the Assignment_Status that is not NLE. For example, John Smith should show as active for G2.
My first attempt was:
SELECT *
INTO #TempAssignments
FROM
(SELECT
ROW_NUMBER() OVER (ORDER BY aID) AS ID,
Name,
Person#,
(CASE WHEN Assignment_Status='NLE' THEN 1 ELSE 0 END) AS aID,
Group
FROM
tblAssignments)
With the data in a temp table then I created a second query to select the MIN of ID, the MIN of aID and GROUP BY Name and Person# then joined that back to the temp table to get the Group for the given ID.
This seems to work however this is a solution that needs to be deployed in multiple reports so I was wondering if there isn't a more compact way of doing this.

The following query:
SELECT Name, Person#, [Group]
FROM (
SELECT Name, Person#, [Group],
ROW_NUMBER() OVER (PARTITION BY Person#, Name
ORDER BY CASE
WHEN Assignment_Status <> 'NLE' THEN 0
ELSE 1
END) AS rn
FROM tblAssignments ) t
WHERE t.rn = 1
will select one record for each employee, as identified by a Person#, Name value pair. If the employee has multiple records, then a record with Assignment_Status that is not NLE will be selected.

SELECT certain fields based on certain WHERE conditions

I am writing an advanced MySQL query that searches a database and retrieves user information. What I am wondering is can I select certain fields if WHERE condition 1 is met and select other fields if WHERE condition 2 is met?
Database: users
------------------------
| user_id | first_name |
------------------------
| 1 | John |
------------------------
| 2 | Chris |
------------------------
| 3 | Sam |
------------------------
| 4 | Megan |
------------------------
Database: friendship
--------------------------------------
| user_id_one | user_id_two | status |
--------------------------------------
| 2 | 4 | 0 |
--------------------------------------
| 4 | 1 | 1 |
--------------------------------------
Status 0 = Unconfirmed
Status 1 = Confirmed
OK, as you can see John & Megan are confirmed friends while Chris & Megan are friends but the relationship is unconfirmed.
The query I am trying to write is as follow: Megan(4) searches for new friends I want all of the users except for the ones she is a confirmed friend with to be returned. So, the results should return 2,3. But since a relationship with user_id 2 exists but is not confirmed, I want to also return the status since an entry in the friendship table does exist between the two. If a user exist but there is no connection in the relationship table it still returns that users information but returns status as a NULL or doesn't return status at all since it doesn't exist in that table.
I hope this makes since. Ask questions if you need to.

Why not use a left join or an if-not-exists?
SELECT users.*
FROM (users LEFT JOIN friendships
ON status=1 AND (user_id_one=user_id OR user_id_two=user_id) )
WHERE
status IS NULL
or
SELECT users.*
FROM users
WHERE
NOT EXISTS (SELECT *
FROM friendships
WHERE status=1
AND (user_id_one=user_id
OR user_id_two=user_id))

You can create to separate queries and then UNION the result tables. In each query, add a field that always has the same value.
So something like this should work:
(SELECT id, 'Not Friends' As Status FROM t1 WHERE condition1)
UNION
(SELECT id, 'Unconfirmed' As Status FROM t1 WHERE condition2)
Just make sure the same number and name of fields exists in both queries.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

find duplicated record by first and last name - sql

Related

Reorganize multiple rows in a new table with more columns

SQL Server query - don't want multiple rows with identical data

How to return unique rows having count() of multiple columns = 1 using group by?

Selecting Active roles from denormalized table with duplicates

SELECT certain fields based on certain WHERE conditions

Categories

Resources