SQL - check for duplicate names/nicknames

SQL - check for duplicate names/nicknames - sql

I have a query that gives me duplicate names in my table. But, I need to add the checking of nicknames. I've tried many variations but am still stumped. The following query takes oave 12 minutes to run so I canceled it.
WITH TEAM2 as
(
SELECT ID, LastName, FirstName, Name,
ROW_NUMBER() OVER (PARTITION BY LastName, FirstName order by LastName, FirstName,ID DESC) RN
FROM dbo.vw_Users_Details
WHERE Lastname <> ''
AND Firstname <> ''
AND Not_Dupe_Flag <> 1
)
SELECT a.ID, a.LastName, a.FirstName
FROM TEAM2 a
where exists (select 1
from TEAM2 b
where (b.FirstName = a.FirstName
and b.LastName = a.LastName
and b.RN > 1)
OR
(b.LastName = a.LastName
AND EXISTS (SELECT 1 FROM pdNicknames AS c WHERE c.NAME = a.firstname AND c.variation = b.firstname)
and b.RN > 1)
)
order by a.LastName, a.FirstName, a.id

You can use the having clause.
For example:
select b.Branches_ShortName
from kplus..Folders f
inner join kplus..Portfolios p on p.Portfolios_Id = f.Portfolios_Id
inner join kplus..Branches b on b.Branches_Id = p.Branches_Id
group by Branches_ShortName
having count(Branches_ShortName) > 1
This will provide only the Branches that have more than 1 Folder :)

Okay, you're attempting to find all users who share the same name/nickname.
I believe the following should work;
SELECT a.ID, a.LastName, a.FirstName
FROM dbo.vw_Users_Details as a
WHERE a.LastName <> ''
AND a.FirstName <> ''
AND EXISTS (SELECT '1'
FROM dbo.vw_Users_Details as b
LEFT JOIN pdNicknames as c
ON (c.name = b.FirstName
AND c.variation = a.FirstName)
OR (c.name = a.FirstName
AND c.variation = b.FirstName)
WHERE b.ID <> a.ID
AND b.LastName = a.LastName
AND (b.FirstName = a.FirstName
OR (c.name IS NOT NULL OR c.variation IS NOT NULL)
)
)
I make no garuantees about the execution performance of this statement, as you haven't provided enough information for us to know. However, it's likely to be better, given you won't need the OLAP; I do recommend indicies on the various names and variation, of course. I left off Not_Dupe_Flag because I'm a little confused by it's use (because you seem to be using '1' as 'false', which is opposite to how most comparisons are setup); at minimum, never include 'Not' as part of a boolean variable name - it makes reasoning about it difficult (use Unique_Name or Duplicated_Name, either of which is immediately understandable).
EDIT:
If you need to restrict your selection, I recommend encapsulating the query in a view (including the ROW_NUMBER() function), and query the view. Alternatively, if your RDBMS supports it, wrap the query in a CTE. Multiple nested FROM clauses are like multiple nested if statements - confusing. Being able to logically seperate parts of the query with a view or CTE goes a long way to retaining sanity.

Related

SQL Nested select grouped with several rows of results

Hope this makes sense..
I have the following database tabels.
I am trying to group a resultset together in an SQL statement.
This is my current SQL statements:
SELECT
Patient.ID,
Patient.Name,
AnimalType.Value as AnimalType,
Patient.Age,
Customer.Firstname,
Customer.Lastname
FROM Patient
INNER JOIN Customer ON Patient.Owner_FK = Customer.ID
INNER JOIN AnimalType ON Patient.Type_FK = AnimalType.ID
SELECT
Treatment.Treatment_Date,
TreatmentType.Type
FROM Treatment
INNER JOIN TreatmentItem ON Treatment.ID = TreatmentItem.Treatment_FK
INNER JOIN TreatmentType ON TreatmentItem.TreatmentType_FK = TreatmentType.ID
INNER JOIN Patient ON TreatmentItem.Patient_FK = Patient.ID
WHERE Patient.ID = 132
There are two issues with this,
I have a static ID, and the results are split.
This is result of the above SQL's
My Issue is that the last resultset, should be together with the corresponding "Animal (patient)".
But without duplicate data. I could get the data all in one go, but then i would have a lot of duplicate rows of data with only the TreatmentType being different..
So how do i make this work ?
I have searched to no avail, and have not been able to make a correct Group by, that would make it work.
Does it make any sense ?
Is it even possible ?
example of desired result:

I believe you can achieve what you want with a single query, CASE statements, and the ROW_NUMBER() function, but it would require conversions of all non-text columns.
Here is a rough stab at a potential solution (I did not build your DB, so I haven't verified that this exact SQL runs, but the overall concept works).
WITH CTE_PatientTreatments AS (
SELECT
-- Get the row number for each treatment for a given patient
ROW_NUMBER() OVER (PARTITION BY Patient.ID ORDER BY Treatment.ID) AS RowNum,
Patient.ID,
Patient.Name,
AnimalType.Value as AnimalType,
Patient.Age,
Customer.Firstname,
Customer.Lastname,
Treatment.Treatment_Date,
TreatmentType.Type
FROM Patient
INNER JOIN Customer ON Patient.Owner_FK = Customer.ID
INNER JOIN AnimalType ON Patient.Type_FK = AnimalType.ID
INNER JOIN TreatmentItem ON TreatmentItem.Patient_FK = Patient.ID
INNER JOIN Treatment ON Treatment.ID = TreatmentItem.Treatment_FK
INNER JOIN TreatmentType ON TreatmentItem.TreatmentType_FK = TreatmentType.ID
WHERE Patient.ID = 132
-- Ensure rows are sorted so that rows for the same patient are always together
ORDER BY Patient.ID, Treatment.ID
)
-- Only display patient information for the first row
SELECT -- Convert numeric columns to text so that the "ELSE ''" doesn't get coerced into a number (0)
CASE WHEN (RowNum > 1) THEN '' ELSE CAST(ID AS VARCHAR) END AS ID,
CASE WHEN (RowNum > 1) THEN '' ELSE Name END AS Name,
CASE WHEN (RowNum > 1) THEN '' ELSE AnimalType END AS AnimalType,
CASE WHEN (RowNum > 1) THEN '' ELSE CAST(Age AS VARCHAR) END AS Age,
CASE WHEN (RowNum > 1) THEN '' ELSE Firstname END AS Firstname,
CASE WHEN (RowNum > 1) THEN '' ELSE Lastname END AS Lastname,
Treatment_Date,
Type
FROM CTE_PatientTreatments

Selecting ONLY Duplicates from a joined tables query

I have the following query that I'm trying to join two tables matching their ID so I can get the duplicated values in "c.code". I've tried a lot of queries but nothing works. I have a 500k rows in my database and with this query I only get 5k back, which is not right. Im positive it's at least 200K. I also tried to use Excel but it's too much for it to handle.
Any ideas?
Thanks in advance, everyone.
SELECT c.code, c.name as SCT_Name, t.name as SYNONYM_Name, count(c.code)
FROM database.Terms as t
join database.dbo.Concepts as c on c.ConceptId = t.ConceptId
where t.TermTypeCode = 'SYNONYM' and t.ConceptTypeCode = 'NAME_Code' and c.retired = '0'
Group by c.code, c.name, t.name
HAVING COUNT(c.code) > = 1
Order by c.code

with data as (
select c.code, c.name as SCT_Name, t.name as SYNONYM_Name
from database.Terms as t inner join database.dbo.Concepts as c
on c.ConceptId = t.ConceptId
where
t.TermTypeCode = 'SYNONYM'
and t.ConceptTypeCode = 'NAME_Code'
and c.retired = '0'
)
select *
--, (select count(*) from data as d2 where d2.code = data.code) as code_count
--, count(*) over (partition by code) as code_count
from data
where code in (select code from data group by code having count(*) > 1)
order by code

If you want just duplicates of c.code, your Group By is wrong (and so is your Having clause). Try this:
SELECT c.code
FROM database.Terms as t
join database.dbo.Concepts as c on c.ConceptId = t.ConceptId
where t.TermTypeCode = 'SYNONYM' and t.ConceptTypeCode = 'NAME_Code' and c.retired = '0'
Group by c.code
HAVING COUNT(c.code) > 1
This will return all rows where you have more than one c.code value.

You need to use INTERSECT instead of JOIN. Basically you perform the select on the first table then intersect with the second table. The result is the duplicate rows.
Only select the id column, though, otherwise the intersect won't work as expected.

Create custom field in SELECT if other field is null

This is a seemingly simple thing to do but I can't find any reference to it. I want to add a customized field to my select statement if the value of another field is null. In the below I want to create a field named 'IMPACT' that shows a value of 'Y' if the LOCATION_ACCOUNT_ID field in the subquery is null. How do I do this?
SELECT FIRST_NAME,LAST_NAME,ULTIMATE_PARENT_NAME, IMPACT = IF LOCATION_ACCOUNT_ID IS NULL THEN 'Y' ELSE ''
FROM (SELECT DISTINCT A.FIRST_NAME,
A.LAST_NAME,
B.LOCATION_ACCOUNT_ID,
A.ULTIMATE_PARENT_NAME
FROM ACTIVE_ACCOUNTS A,
QL_ASSETS B
WHERE A.ACCOUNT_ID = B.LOCATION_ACCOUNT_ID(+)

Use CASE instead of IF:
SELECT
FIRST_NAME,
LAST_NAME,
ULTIMATE_PARENT_NAME,
CASE WHEN LOCATION_ACCOUNT_ID IS NULL THEN 'Y' ELSE '' END AS IMPACT
FROM (
SELECT DISTINCT
A.FIRST_NAME,
A.LAST_NAME,
B.LOCATION_ACCOUNT_ID,
A.ULTIMATE_PARENT_NAME
FROM ACTIVE_ACCOUNTS A,
QL_ASSETS B
WHERE A.ACCOUNT_ID = B.LOCATION_ACCOUNT_ID(+)
You should also use LEFT JOIN syntax instead of the old (+) syntax (but that's more of a style choice in this case - it does not change the result):
SELECT
FIRST_NAME,
LAST_NAME,
ULTIMATE_PARENT_NAME,
CASE WHEN LOCATION_ACCOUNT_ID IS NULL THEN 'Y' ELSE '' END AS IMPACT
FROM (
SELECT DISTINCT
A.FIRST_NAME,
A.LAST_NAME,
B.LOCATION_ACCOUNT_ID,
A.ULTIMATE_PARENT_NAME
FROM ACTIVE_ACCOUNTS A
LEFT JOIN QL_ASSETS B
ON A.ACCOUNT_ID = B.LOCATION_ACCOUNT_ID
)
In fact, since you aren't using any of the columns from B in your result (only checking for existence) you can just use EXISTS:
SELECT
FIRST_NAME,
LAST_NAME,
ULTIMATE_PARENT_NAME,
CASE WHEN EXISTS(SELECT NULL
FROM QL_ASSETS
WHERE LOCATION_ACCOUNT_ID = A.ACCOUNT_ID)
THEN 'Y'
ELSE ''
END AS IMPACT
FROM ACTIVE_ACCOUNTS A

Use a case statement:
SELECT FIRST_NAME,
LAST_NAME,
ULTIMATE_PARENT_NAME,
CASE WHEN Location_Account_ID IS NULL THEN 'Y' ELSE '' END AS IMPACT
FROM (
SELECT DISTINCT A.FIRST_NAME,
A.LAST_NAME,
B.LOCATION_ACCOUNT_ID,
A.ULTIMATE_PARENT_NAME
FROM ACTIVE_ACCOUNTS A,
QL_ASSETS B
WHERE A.ACCOUNT_ID = B.LOCATION_ACCOUNT_ID(+)
) a
p.s. also added a alias for your derived table so you wont get an error for that.

I didn't exactly get what you were asking based on your following statement
(that shows a value of 'Y' if the LOCATION_ACCOUNT_ID field in the
subquery is null)
I can suggest that you use an expression.
Put this statement in between your expression.
NVL(B.LOCATION_ACCOUNT_ID,'Y') IMPACT

Fetch rows joined that don't match a query

I have a an application with interns who can apply to internships.
We have a script that automatically qualifies the interns.
I would like to fetch all interns who aren't qualified to any internship.
The database is like this :
Intern (Id,...)
Application (Id, status, intern_id, internship_id,...)
Internship (Id,...)
For the status, we have 'applied', 'qualified', 'current' and 'done'
Basically I need to do a join request of Interns who don't have Application with 'qualified' status, but my sql skills are pretty basic.
EDIT
I forgot to mention that there can be several applications for an intern, an intern can only be qualified to one internship, but he can still be applying to another, so this won't work
SELECT *
FROM Intern i
INNER JOIN Application a ON i.Id = a.intern_id
WHERE a.status <> 'qualified'

Include the status in a LEFT JOIN, and filter for non-matches like so:
SELECT ....
FROM Intern i
LEFT JOIN
Application a
ON i.Id = a.intern_id
AND a.status = 'qualified'
WHERE a.Id IS NULL
Substitute your list of columns, obviously!

SELECT *
FROM Intern i
WHERE NOT EXISTS
(SELECT *
FROM Intern j
, Application a
WHERE j.id = a.intern_id
AND i.id = j.id
AND a.status = 'qualified')

SELECT * FROM Intern
WHERE Id IN ( SELECT Intern_id FROM Application WHERE status <> 'qualified' );

This query should return what you're looking for:
SELECT *
FROM Intern i
INNER JOIN Application a ON i.Id = a.intern_id
WHERE a.status <> 'qualified'
Obviously, you could replace the * with just the fields you were looking for. If you're looking for the interns that DIDN'T qualify, I wouldn't think you would need the Internship table. This does assume that there is only one application per intern... if that is not the case, you could use this query:
SELECT *
FROM Intern i
WHERE i.Id IN
(
SELECT intern_id
FROM Application a
WHERE a.status <> 'qualified'
)

SQL, Matching Rows Across Multiple Tables

If I have two tables, A and B which have identical layout of:
Forename
Middlename
Surname
Date of Birth
Table A contains my data, table B contains data I wish to compare to table A.
I'd like to return all matches that are full matches (Forename, Middlename and Surname) as well as partial matches (First initial, surname, dob).
What would be the most efficient way of doing this and being able to distinguish between the two?
My initial thoughts are that I could do this with two passes however there must be a more efficient way as over a large number of records this could be quite inefficient.

You can do this:
select T1.*, T2.*, 'exact-match' as mode
from T1 inner join T2
on T1.fname = T2.fname
and T1.mname = T2.mname
and T1.lname = T2.lname
and t1.dob = T2.dob
UNION
select t1.*, t2.*, 'partial-match' as mode
from T1 inner join T2
on left(T1.fname,1) = LEFT(T2.fname,1)
and T1.lname = T2.lname
and T1.dob = T2.dob
where T1.fname <> T2.fname
The last line is there because otherwise exact matches would also satisfy the partial match test. You can get rid of that where-clause if you like. The second part of the query ignores middle name, and treats "Tim Q Jones" and "Tom X Jones" as a partial-match if they're born on the same day. That's what you asked for, right?

If you really want to avoid two queries, you could do something like this:
SELECT A.*,
CASE WHEN A.Middlename <> B.Middlename) THEN 'Partial'
ELSE 'Full'
END AS MatchType
FROM A
JOIN B ON (A.Forename = B.Forename AND
A.Middlename = B.Middlename AND
A.Surname = B.Surname)
OR
(LEFT(A.Forename,1) = LEFT(B.Forename,1) AND
A.Surname = B.Surname AND
A.DoB = B.DoB)
A JOIN with two different sets of JOIN criteria, and a case in the select that identifies which of the sets must have resulted in the joined records (If Middlename doesn't match, it must not have been a "full" match that resulted in the join).

This will do it in a single pass.
The condition for recognizing a full match has to be on both forename and middlename, otherwise it will classify some matches incorrectly.
select Forename, Middlename, Surname, DateOfBirth,
Case
when A.ForeName=B.ForeName and A.Middlename = B.middlename then 'full'
Else 'partial'
end as MatchType
from A
inner join B on
-- (Forename, Middlename and Surname)
(A.ForeName=B.ForeName
and A.Middlename = B.middlename
and A.Surname = B.surname)
or
-- (First initial, surname, dob)
(A.ForeName LIKE LEFT(B.ForeName,1)+'%'
and A.Surname = B.surname
and A.DateOfBirth = B.DateOfBirth)

Select
T1.Forename
, T1.Middlename
, T1.Surname
, T1.[Date of Birth]
, Case When T1.[Forename] = T2.[Forename] and T1.Middlename = T2.Middlename
Then 'Full'
else 'Partial'
end as Match_Type
From Table1 as T1
Inner Join Table2
on Left(Table1.[Forename], 1) = Left(Table2.[Forename], 1)
and Table1.[Date Of Birth] = Table2.[Date Of Birth]
and Table1.Surname = Table2.Surname

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - check for duplicate names/nicknames - sql

Related

SQL Nested select grouped with several rows of results

Selecting ONLY Duplicates from a joined tables query

Create custom field in SELECT if other field is null

Fetch rows joined that don't match a query

SQL, Matching Rows Across Multiple Tables

Categories

Resources