Pick exact records - sql

INSERT INTO filerecord (fname,
lname,
transdate,
memberid)
VALUES ('tyler',
'smith',
TO_DATE ('07/01/2016', 'mm/dd/yyyy'),
'111');
I get the following result:
fname lname email
fernando hernandez fh#yahoo.com
fernando hernandez ts#hotmail.com
IT should not display Tyler Smith's Name & Email due to the transdate clause. I am looking for someone to get me only the Fernando Result.
I cannot change the structure of the tables.

You are getting a Cartesian product.
You have two entries with memberid = '111' in both tables, so this condition
AND b.memberid = e.memberid
filters you down to 4 rows --
one with Tyler Smith joined to Tyler Smith's email
one with Fernando Hernandez joined to Fernando Hernandez's e-mail,
one with Tyler Smith joined to Fernando Hernandez's email and
one with Fernando Hernandez joined to Tyler Smith's email
Then, this condition:
AND b.transdate >= TO_DATE ('07/02/2016', 'mm/dd/yyyy');
excludes two of those but not the other two.
You need a better data model and/or a more exact way to join your two tables.
In general, anytime you join two tables, one of the tables should have all of its primary key columns specified to avoid this type of error. (There are exceptions to that, as always).

It is not. I just created a new table and checked.
Table member
create table member (programid varchar(10), memberid int, fname varchar(10), lname varchar(10), email varchar(25))
Content
+-----------+----------+----------+-----------+----------------+
| PROGRAMID | MEMBERID | FNAME | LNAME | EMAIL |
+-----------+----------+----------+-----------+----------------+
| BLUE | 111 | tyler | smith | ts#hotmail.com |
| GREEN | 111 | fernando | hernandez | fh#yahoo.com |
+-----------+----------+----------+-----------+----------------+
Table filerecord
create table filerecord (fname varchar(10), lname varchar(10), transdate date, memberid int);
Content
+-------+-------+-----------+----------+
| FNAME | LNAME | TRANSDATE | MEMBERID |
+-------+-------+-----------+----------+
| tyler | smith | 7/1/2016 | 111 |
+-------+-------+-----------+----------+
Last Query
SELECT b.fname, b.lname, e.email
FROM filerecord b, member e
WHERE b.memberid IN ('111')
AND b.memberid = e.memberid
AND b.transdate >= TO_DATE ('07/02/2016', 'mm/dd/yyyy');
Result -
no rows selected.

Since memberid is not a unique identifier, there no reason to expect good things from a join condition like b.memberid = e.memberid.
Instead, you seem to suggest (or believe) that (memberid, fname, lname) should be a unique identifier. So, use it in the join condition:
... WHERE ... AND b.memberid = e.memberid
AND b.fname = e.fname
AND b.lname = e.lname
...
Like so:
SQL> SELECT b.fname, b.lname, e.email
2 FROM filerecord b, member e
3 WHERE b.memberid IN ('111')
4 AND b.memberid = e.memberid
5 AND b.fname = e.fname
6 AND b.lname = e.lname
7 AND b.transdate >= TO_DATE ('07/02/2016', 'mm/dd/yyyy');
FNAME LNAME EMAIL
-------------------- -------------------- --------------------
fernando hernandez fh#yahoo.com
1 row selected.

Related

SQL Lookup function

I have a Table that has employee info and another that has Manager assignments.
The manager info table references employee table by employee id in addition to managers.
I have been able to left join the table but I want to display the name rather than just employee ID.
SELECT CONCAT_WS(' ',[FirstName],[LastName]) AS FullName,
[EmployeeID]
,[Status]
,[LastName]
,[FirstName]
,[ManagersTbl].[ManagerIDf]
FROM [EmployeesTbl]
LEFT JOIN [ManagersTbl] on [EmployeesTbl].[EmployeeID] = [ManagersTbl].[EmployeeIDf]
WHERE Status = 'A'
Manager Table
_____________
| EmployeeIDf | ManagerIDf |
-----------------------------
001T | 005C
002J | 005C
_______________________________________________
Employee Table
______________
| EmployeeID | FirstName | LastName | Status |
-----------------------------------------------
001T | Tom | Spanks | A
002J | John | Doe | A
005C | Cruisin | Bruisin | A
_______________________________________________
End Result needed
_________________
|EmployeeID | StaffName | ManagerName |
------------------------------------------
001T | Tom Spanks | Crusin Bruisin
002J | John Doe | Crusin Bruisin
Please try this,
SELECT
EmployeeID
, CONCAT_WS(' ',A.[FirstName],A.[LastName]) AS StaffName
,CONCAT_WS(' ',B.[FirstName],B.[LastName]) AS ManagerName
FROM [EmployeesTbl] A
LEFT JOIN [ManagersTbl] M on A.[EmployeeID] = M.[EmployeeIDf]
LEFT JOIN [EmployeesTbl] B
ON M.[Manageridf]
=B.[EmployeeID]
WHERE Status = 'A'
There are numerous routes you can take, one possible solution is to join between your two tables to get your StaffName and then use an inline correlated subquery to get the ManagerName
select EmployeeId,
concat_ws(' ',FirstName,LastName) StaffName,
(select concat_ws(' ',FirstName,LastName) from EmployeeTable em where em.EmployeeID=m.ManagerIDf) ManagerName
from ManagerTable m join EmployeeTable e on e.EmployeeID=m.EmployeeIDf
where e.status='A'
you can use concatenation and join with two table by primary key employeeid.
concatenation function or + sign to concat in sql .

SQL Query Find Exact and Near Dupes

I have a SQL table with FirstName, LastName, Add1 and other fields. I am working to get this data cleaned up. There are a few instances of likely dupes -
All 3 columns are the exact same for more than 1 record
The First and Last are the same, only 1 has an address, the other is blank
The First and Last are similar (John | Doe vs John C. | Doe) and the address is the same or one is blank
I'm wanting to generate a query I can provide to the users, so they can check these records out, compare their related records and then delete the one they don't need.
I've been looking at similarity functions, soundex, and such, but it all seems so complicated. Is there an easy way to do this?
Thanks!
Edit:
So here is some sample data:
FirstName | LastName | Add1
John | Doe | 1 Main St
John | Doe |
John A. | Doe |
Jane | Doe | 2 Union Ave
Jane B. | Doe | 2 Union Ave
Alex | Smith | 3 Broad St
Chris | Anderson | 4 South Blvd
Chris | Anderson | 4 South Blvd
I really like Critical Error's query for identifying all different types of dupes. That would give me the above sample data, with the Alex Smith result not included, because there are no dupes for that.
What I want to do is take that result set and identify which are dupes for Jane Doe. She should only have 2 dupes. John Doe has 3, and Chris Anderson has 2. Can I get at that sub-result set?
Edit:
I figured it out! I will be marking Critical Error's answer as the solution, since it totally got me where I needed to go. Here is the solution, in case it might help others. Basically, this is what we are doing.
Selecting the records from the table where there are dupes
Adding a WHERE EXISTS sub-query to look in the same table for exact dupes, where the ID from the main query and sub-query do not match
Adding a WHERE EXISTS sub-query to look in the same table for similar dupes, using a Difference factor between duplicative columns, where the ID from the main query and sub-query do not match
Adding a WHERE EXISTS sub-query to look in the same table for dupes on 2 fields where a 3rd may be null for one of the records, where the ID from the main query and sub-query do not match
Each subquery is connected with an OR, so that any kind of duplicate is found
At the end of each sub-query add a nested requirement that either the main query or sub-query be the ID of the record you are looking to identify duplicates for.
DECLARE #CID AS INT
SET ANSI_NULLS ON
SET NOCOUNT ON;
SET #CID = 12345
BEGIN
SELECT
*
FROM #Customers c
WHERE
-- Exact duplicates.
EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Add1 = c.Add1
AND x.Id <> c.Id
AND (x.ID = #CID OR c.ID = #CID)
)
-- Match First/Last name are same/similar and the address is same.
OR EXISTS (
SELECT * FROM #Customers x WHERE
DIFFERENCE( x.FirstName, c.FirstName ) = 4
AND DIFFERENCE( x.LastName, c.LastName ) = 4
AND x.Add1 = c.Add1
AND x.Id <> c.Id
AND (x.ID = #CID OR c.ID = #CID)
)
-- Match First/Last name and one address exists.
OR EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Id <> c.Id
AND (
x.Add1 IS NULL AND c.Add1 IS NOT NULL
OR
x.Add1 IS NOT NULL AND c.Add1 IS NULL
)
AND (x.ID = #CID OR c.ID = #CID)
);
Assuming you have a unique id between records, you can give this a try:
DECLARE #Customers table ( FirstName varchar(50), LastName varchar(50), Add1 varchar(50), Id int IDENTITY(1,1) );
INSERT INTO #Customers ( FirstName, LastName, Add1 ) VALUES
( 'John', 'Doe', '123 Anywhere Ln' ),
( 'John', 'Doe', '123 Anywhere Ln' ),
( 'John', 'Doe', NULL ),
( 'John C.', 'Doe', '123 Anywhere Ln' ),
( 'John C.', 'Doe', '15673 SW Liar Dr' );
SELECT
*
FROM #Customers c
WHERE
-- Exact duplicates.
EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Add1 = c.Add1
AND x.Id <> c.Id
)
-- Match First/Last name are same/similar and the address is same.
OR EXISTS (
SELECT * FROM #Customers x WHERE
DIFFERENCE( x.FirstName, c.FirstName ) = 4
AND DIFFERENCE( x.LastName, c.LastName ) = 4
AND x.Add1 = c.Add1
AND x.Id <> c.Id
)
-- Match First/Last name and one address exists.
OR EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Id <> c.Id
AND (
x.Add1 IS NULL AND c.Add1 IS NOT NULL
OR
x.Add1 IS NOT NULL AND c.Add1 IS NULL
)
);
Returns
+-----------+----------+-----------------+----+
| FirstName | LastName | Add1 | Id |
+-----------+----------+-----------------+----+
| John | Doe | 123 Anywhere Ln | 1 |
| John | Doe | 123 Anywhere Ln | 2 |
| John | Doe | NULL | 3 |
| John C. | Doe | 123 Anywhere Ln | 4 |
+-----------+----------+-----------------+----+
Initial resultset:
+-----------+----------+------------------+----+
| FirstName | LastName | Add1 | Id |
+-----------+----------+------------------+----+
| John | Doe | 123 Anywhere Ln | 1 |
| John | Doe | 123 Anywhere Ln | 2 |
| John | Doe | NULL | 3 |
| John C. | Doe | 123 Anywhere Ln | 4 |
| John C. | Doe | 15673 SW Liar Dr | 5 |
+-----------+----------+------------------+----+

Multi-Pass Duplication Identification with Exclusions

I have a customer table with several hundred thousand records. There are a LOT of duplicates of varying degrees. I am trying to identify duplicate records with level of possibility of being a duplicate.
My source table has 7 fields and looks like this:
I look for duplicates, and put them into an intermediate table with the level of possibility, table name, and the customer number.
Intermediate Table
CREATE TABLE DataCheck (
id int identity(1,1),
reason varchar(100) DEFAULT NULL,
tableName varchar(100) DEFAULT NULL,
tableID varchar(100) DEFAULT NULL
)
Here is my code to identify and insert:
-- Match on Company, Contact, Address, City, and Phone
-- DUPE
INSERT INTO DataCheck
SELECT 'Duplicate','CUSTOMER',tcd.uid
FROM #tmpCoreData tcd
INNER JOIN
(SELECT
company,
fname,
lname,
add1,
city,
phone1,
COUNT(*) AS count
FROM #tmpCoreData
WHERE company <> ''
GROUP BY company, fname, lname, add1, city, phone1
HAVING COUNT(*) > 1) dl
ON dl.company = tcd.company
ORDER BY tcd.company
In this example, it would insert ids 101, 102
The problem is when I perform the next pass:
-- Match on Company, Address, City, Phone (Diff Contacts)
-- LIKELY DUPE
INSERT INTO DataCheck
SELECT 'Likely Duplicate','CUSTOMER',tcd.uid
FROM #tmpCoreData tcd
INNER JOIN
(SELECT
company,
add1,
city,
phone1,
COUNT(*) AS count
FROM #tmpCoreData
WHERE company <> ''
GROUP BY company, add1, city, phone1
HAVING COUNT(*) > 1) dl
ON dl.company = tcd.company
ORDER BY tcd.companyc
This pass would then insert, 101, 102 & 103.
The next pass drops the phone so it would insert 101, 102, 103, 104
The next pass would look for company only which would insert all 5.
I now have 14 entries into my intermediate table for 5 records.
How can I add an exclusion so the 2nd pass groups on the same Company, Address, City, Phone but DIFFERENT fname and lname. Then it should only insert 101 and 103
I considered adding a NOT IN (SELECT tableID FROM DataCheck) to ensure IDs aren't added multiple times, but on the 3rd of 4th pass it may find a duplicate and entered 700 records after the row it's a duplicate of, so you lose the context of it's a dupe of.
My output uses:
SELECT
dc.reason,
dc.tableName,
tcd.*
FROM DataCheck dc
INNER JOIN #tmpCoreData tcd
ON tcd.uid = dc.tableID
ORDER BY dc.id
And looks something like this, which is a bit confusing:
I'm going to challenge your perception of your issue, and instead propose that you calculate a simple "confidence score", which will also help you vastly simplify your results table:
WITH FirstCompany AS (SELECT custNo, company, fname, lname, add1, city, phone1
FROM(SELECT custNo, company, fname, lname, add1, city, phone1,
ROW_NUMBER() OVER(PARTITION BY company ORDER BY custNo) AS ordering
FROM CoreData) FC
WHERE ordering = 1)
SELECT RankMapping.description, Duplicate.custNo, Duplicate.company, Duplicate.fname, Duplicate.lname, Duplicate.add1, Duplicate.city, Duplicate.phone1
FROM (SELECT FirstCompany.custNo AS originalCustNo, Duplicate.*,
CASE WHEN FirstCompany.custNo = Duplicate.custNo THEN 1 ELSE 0 END
+ CASE WHEN FirstCompany.fname = Duplicate.fname AND FirstCompany.lname = Duplicate.lname THEN 1 ELSE 0 END
+ CASE WHEN FirstCompany.add1 = Duplicate.add1 AND FirstCompany.city = Duplicate.city THEN 1 ELSE 0 END
+ CASE WHEN FirstCompany.phone1 = Duplicate.phone1 THEN 1 ELSE 0 END
AS ranking
FROM FirstCompany
JOIN CoreData Duplicate
ON Duplicate.custNo >= FirstCompany.custNo
AND Duplicate.company = FirstCompany.company) Duplicate
JOIN (VALUES (4, 'original'),
(3, 'duplicate'),
(2, 'likely dupe'),
(1, 'possible dupe'),
(0, 'not likely dupe')) RankMapping(score, description)
ON RankMapping.score = Duplicate.ranking
ORDER BY Duplicate.originalCustNo, Duplicate.ranking DESC
SQL Fiddle Example
... which generates results that look like this:
| description | custNo | company | fname | lname | add1 | city | phone1 |
|-----------------|--------|----------|---------|--------|--------------|--------------|------------|
| original | 101 | ACME INC | JOHN | DOE | 123 ACME ST | LOONEY HILLS | 1231234567 |
| duplicate | 102 | ACME INC | JOHN | DOE | 123 ACME ST | LOONEY HILLS | 1231234567 |
| likely dupe | 103 | ACME INC | JANE | SMITH | 123 ACME ST | LOONEY HILLS | 1231234567 |
| possible dupe | 104 | ACME INC | BOB | DOLE | 123 ACME ST | LOONEY HILLS | 4564567890 |
| not likely dupe | 105 | ACME INC | JESSICA | RABBIT | 456 ROGER LN | WARNER | 4564567890 |
This code baselessly assumes that the smallest custNo is the "original", and assumes matches will be equivalent to solely that one, but it's completely possible to get other matches as well (just unnest the subquery in the CTE, and remove the row number).

TSQL Delete Duplicates in table after comparing results found in duplicate search

I have duplicate data in a single table.
Table Layout
accountNumber | firstName | lastName | address | zip
SMI2365894511 | Paul | Smith | 1245 Rd | 89120
SMI2365894511 | Paul | Smith | |
I have the below query to find and display the duplicates.
select *
from tableA a
join (select accountNumber
from tableA
group by accountNumber
having count(*) > 1 ) b
on a.accountNumber = b.accountNumber
What I would like to do is compare the results of the above query and remove the duplicate that doesn't have any address information. I'm using MS SQL Server 2014
EDIT** I have the query the way it is so can see both duplicate rows
delete a
from XmaCustomerDetails a
join ( select accountNumber
from XmaCustomerDetails
group by accountNumber
having count(*) > 1 ) b
on a.accountNumber = b.accountNumber
WHERE address is null

Need Solution. 2 row into 1 row sql server 2008

This is my 2 normalize table which has Clientid into Telephone table
(ClientID int Primary key,
FName varchar(25),
LName varchar(25),
HomeAddress varchar(50))
CREATE TABLE Telephone
(TelephoneID tinyint IDENTITY(1,1)Primary key,
TelephoneNo int,
ClientID int foreign key references Client(ClientID))
so the values for my Client table..
ClientID | FName | LName | HomeAddress
1 marvin Biu p.guevarra st.
2 harry sendon cali st.
and into my Telephone table..
TelephoneID | TelephoneNo | ClientID
1 1234567 1
2 7654321 1
3 2222222 2
since it is possible to that a client has a multiple telephone no
so i would like to become like this..
ClientID | FName | LName | HomeAddress | Telephones
1 marvin Biu p.guevarra st. 1234567, 7654321
2 harry sendon cali st. 2222222
i only come up with the code like this
select distinct lname, CAST(telephoneno AS VARCHAR(10)) + ',' + CAST(telephoneno AS VARCHAR(10)) as Telephones
from telephone
left join client
on client.clientid = telephone.clientid
ended up like this..
LName | Telephones
Biu 1234567, 1234567
Biu 7654321, 7654321
sendon 2222222
Please anyone help, its ok that the table ended up in a simple form like above shown,
i really wanted the 1234567 telephoneno join with 7654321 telephoneno with coma in between in telephones column with one Lname "Biu" column. thats make 1 row. :/
SELECT
lname,
Telephones = STUFF((SELECT ','+ CAST(t.telephoneno AS VARCHAR(10))
FROM telephone t
WHERE t.clientid = c.clientid
For XML PATH('')
),1,1,'')
FROM client c
Always qualify your tables in a join, it makes it much clearer to see what's going on:
select distinct t.lname, CAST(t.telephoneno AS VARCHAR(10)) + ',' + CAST(c.telephoneno AS VARCHAR(10)) as Telephones
from telephone t
left join client c
on c.clientid = t.clientid