Query identifying duplicate records with different values [closed]

Query identifying duplicate records with different values [closed] - sql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a table in which I have loaded records from 4 different sources. There is no common factor between the records from each source except for FName, LName, Addr1, Addr2, City, State and Zip. Each record is assigned a FileID by me based upon which source file they were loaded from. I need to construct a query in which I can identify which person/household was found to be in all 4 files, 3 files, 2 files, etc. I only need to maintain one record for each duplicate person/household.
The other tricky part is that I have email address on 2 of the 4 files and also an Emailable field field that is coming in on the other two files. This is a factor which I need to consider also when maintaining the single selected record.
For example, one group will be indicated by: “individuals who are in all of the following lists: DMA , Vehicle Ownership and Lifestyle (Wealth Engine) lists. These individuals must have an email address.” But then another group needs to be identified as “individuals who are in all of the following lists: DMA , Vehicle Ownership and Lifestyle (Wealth Engine) lists. These individuals DO NOT have an email address”
Example Data:
ID FirstName LastName FullName Address1 Address2 City State Zip Zip4 EmailAddress FILE EMAILABLE
06925901SNDCR44110G6520 S Nylah Watson NULL 1234 Main NULL Anytown ST 10000 2000 NULL DMA Y
1641189779 Nylah Watson NULL 1234 Main Anytown ST 10000 2000 nylahwatson#gmail.com LST
06925901SNDCR44110G6520 S Nylah Watson NULL 1234 Main NULL Anytown ST 10000 2000 NULL VEH Y
374977111 Nylah Watson NULL 1234 Main NULL Anytown ST 10000 2000 nylahwatson#gmail.com V12 NULL
48770181SBRNT 1345M6352 S Watson Nylah NULL 4321 Main NULL HOUSTON TX 20000 3000 NULL DMA N
48770181SBRNT 1345M6352 S Watson Nylah NULL 4321 Main NULL HOUSTON TX 20000 3000 NULL VEH N
1933990731 Watson Nylah NULL 4321 Main Houston TX 20000 3000 LST

Depending on how many/how flexible the groups you need are, you may want to do something like this:
Select Name, Address... -- Fields you want to group on (consider to identify the same person)
, max(case when File = 'DMA' then 1 else 0 end) as HasDMAFile
, max(case when File = 'Veh' then 1 else 0 end) as HasVEHFile --repeat for your other file types
, max(case when emailaddress is not null then 1 else 0 end) as HasEmail
From MyTable
Group by Name, Address...--same list of fields as you use at the beginning of your select
Then use that result set to create your combination groups, for example:
; with CTE as (Query from Above)
Select *
, case when HasDMAFile = 1
and HasVEHFile = 1
and HasLSTFile = 1
and HasEmail = 0
then 'Group 1'
when HasDMAFile = 1
and HasVEHFile = 1
and HasLSTFile = 1
and HasEmail = 1
then 'Group 2' --etc for more groups here; make sure your groups are either mutually exclusive or in a logical order, since the case statement is evaluated in the order it's written
end as Grp
from CTE

SELECT
FName, LName, Addr1, Addr2, City, State, Zip,count(distinct File)
FROM <TABLE>
GROUP BY FName, LName, Addr1, Addr2, City, State, Zip
Having count(distinct File)=4
The emails could be coalesced, but you need to figure out an order of preference.

Related

Use auxiliary table to define values for another table's column

I am working with a table that currently uses multiple CASE expressions to define the behavior of one of the columns, i.e.:
SELECT
Employee
,Company
,Department
,Area
,Flag = CASE
WHEN Company = 'Amazon' and Department in ('IT', 'HR')
THEN 0
WHEN Department = 'Legal'
THEN 1
WHEN Area = 'Cloud'
THEN 1
ELSE 0
END
FROM Table1
Which would result in something like the following dummy data:
Employee
Company
Department
Area
Flag
Cindy
Amazon
IT
Support
0
Jack
Amazon
HR
Support
0
Bob
Microsoft
Legal
Contracts
1
Joe
Amazon
Legal
Research
1
Lauren
Google
IT
Cloud
1
Jane
Apple
UX
Research
0
I am trying to simplify the Flag expression by using an auxiliary Mappings table that has the following structure, in order to get the value for the Flag column:
Company
Department
Area
Flag
Amazon
IT
NULL
0
Amazon
HR
NULL
0
NULL
Legal
NULL
1
NULL
NULL
Cloud
1
The NULL values mean the column could take any value. Is it possible to achieve this without falling into multiple CASE statements?

Conditionally mapping column names to row values

Assume that we have a table where we have one field for zip code and the rest are binary fields (1 or NULL) with names corresponding to various places. For example, imagining the table has 201 fields with the first field titled "zip code" containing zip codes and the latter being 200 binary value fields titled with city names: Chicago, New York, Houston, etc.
Assume that row one contains zip code 11373. While one could use coalesce to find the first non-null value and return "New York" another value like "Elmhurst" may also be true.
zip_code new_york chicago elmhurst dover maspeth
10001 1 NULL NULL NULL NULL
07801 NULL NULL NULL 1 NULL
11373 1 NULL 1 NULL 1
The goal is to map the column names to each respective zip code and get an output like so:
zip_code city
10001 new_york
07801 dover
11373 new_york
11373 elmhurst
11373 maspeth
Any help is much appreciated.

This is a great use case for SQL UNPIVOT:
SELECT unpvt.*
FROM
#x UNPIVOT (v FOR statename IN (new_york, chicago,elmhurst, dover, maspeth)) AS unpvt

One method uses union all:
select zip_code, 'New York' as city from t where new_york = 1
union all
select zip_code, 'Chicago' as city from t where chicago = 1
union all
. . .

Using a field to filter a selection on a second field in SQL Server

I have a table ClientContacts, which holds basic information about a pairing of clients. Some of the details held in this table include P1Title, P2Title, P1FirstName, P2FirstName. For each row in this table there may be details of one or two clients, with a CustomerId that represents the pairing. Within this table is also ContactId, which is used to link to the table described below.
In a second table ContactDetails which contains rows that hold a specific contact detail that is associated with a client. Each client may have a number of rows in this table, each row holding a different detail such as HomeNumber, MobileNumber and Email. This table also contains a Type field which represents the type of contact detail held in the row. 1 = Home number, 2 = Mobile number and 3 = email. The Note field is also included, which may hold either Mr or Mrs denoting whether the mobile number held belongs to Person1 or Person2 in the client pairing.
Here is a visual structure of the tables.
ClientContacts
CustomerId ContactId Person1Title Person1FirstName Person1LastName Person2Title Person2FirstName Person2LastName
1 100 Mr Bob BobLastname Mrs Bobette BobetteLastname
2 101 Mr John JohnLastname Mrs Johnette JohnetteLastname
ContactDetails
ContactId Detail Type Note
100 012345 1
100 077777 2 P1
100 012333 1
100 088888 2 P2
101 099999 1
101 012211 1
101 066666 2
101 email#email.com 3
I want to construct a query that allows me to pull back the information of both of the clients, as well as figure out whether any of the mobile numbers stored in the ContactDetails table belongs to either of the two clients, if it does, I need to be able to tell which belongs to Person1 or Person2 in the pairing.
In addition, if the note field is null for a particular mobile number (type = 2), the first mobile number should be used for Person1 and the second should be used for Person2.
Below is my desired output:
Output
CustomerId Person1Firstname
Person1Lastname Person2Firstname Person2Lastname Home Person1Mobile Person2Mobile Person2Email
1 Bob BobLastname Bobette BobetteLastname 012211 077777 088888 null
I have a partially working query that manages to extract the mobile numbers and relates them to P1 or P2, however this only works if the Note field is not null.
select
cc.CustomerId,
cc.Person1Forename,
cc.Person1Surname,
cc.Person2Forename,
cc.Person2Surname,
max(case when cd.Type = 3 then cd.Detail end) as 'Home',
max(case when cd.Type = 4 and cd.Note = cc.P1Title then cd.Detail end) as 'Person1Mobile',
max(case when cd.Type = 4 and cd.Note = cc.P2Title then cd.Detail end) as 'Person2Mobile',
max(case when cd.Type = 5 then cd.Detail end) as 'Email'
from ClientContacts cc join
ContactDetails
cd on cc.ContactId = cd.ContactId
I'm unsure how to proceed from here. Any help would be appreciated.

Trying to find duplication in records where address is different only in the one field and only by a certain number

I have a table of listings that has NAP fields and I wanted to find duplication within it - specifically where everything is the same except the house number (within 2 or 3 digits).
My table looks something like this:
Name Housenumber Streetname Streettype City State Zip
1 36 Smith St Norwalk CT 6851
2 38 Smith St Norwalk CT 6851
3 1 Kennedy Ave Campbell CA 95008
4 4 Kennedy Ave Campbell CA 95008
I was wondering how to set up a qry to find records like these.
I've tried a few things but can't figure out how to do it - any help would be appreciated.
Thanks

Are you looking to find something that shows the amount of these rows you have like this?
SELECT
StreenName,
City,
State,
Zip,
COUNT(*)
FROM YourTable
group by StreenName, City, State, Zip
HAVING COUNT(*) >1
Or maybe trying to find all of the rows that have the same street, city, state, and zip?
SELECT
A.HouseNumber,
A.StreetName,
A.City,
A.State,
A.Zip
FROM YourTable as A
INNER JOIN YourTable as B
ON A.StreetName = B.StreetName
AND A.City = B.City
AND A.State = B.State
AND A.Zip = B.Zip
AND A.HouseNumber <> B.HouseNumber

Here is one way to do it. You'll need a unique ID for the table to run this, as you wouldn't want to select the exact same person if theyre the only one there. This'll just spit out all the results where there is at least one duplicate.
Edit: Woops, just realized in comments it says varchar for the street number...hmm. So you could just run a cast on it. The OP never said anything about house numbers in varchar or being letters and numbers in the original post. As for letters in the street number field, I've been a third party shipping provider for 2 yrs in the past and I have never seen one; with the exception of an apt., which would be a diff field. Its just as likely that someone put varchar there for some other reason(leading 0's), or for no reason. Of oourse there could be, but no way of knowing whats in the field without response from OP. To run cast to int its the same except this for each instance: Cast(mt.HouseNumber as int)
select *
from MyTable mt
where exists (select 1
from MyTable mt2
where mt.name = mt2.name
and mt.street = mt2.street
and mt.state = mt2.state
and mt.city = mt2.city
and mt2.HouseNumber between (mt.HouseNumber -3) and (mt.HouseNumber +3)
and mt.UID != mt2.UID
)
order by mt.state, mt.city, mt.street
;
Not sure how to run the -3 +3 if there are letters involed...unless you know excatly where they are and you can just simply cut them out then cast.

statistic syntax in access

I want to do some statistic for the Point in my appliation,this is the columns for Point table:
id type city
1 food NewYork
2 food Washington
3 sport NewYork
4 food .....
Each point belongs to a certain type and located at the certain city.
Now I want to caculate the numbers of points in different city for each type.
For example, there are two types here :food and sport.
Then I want to know:
how many points of `food` and `sport` at NewYork
how many points of `food` and `sport` at Washington
how many points of `food` and `sport` at Chicago
......
I have tried this:
select type,count(*) as num from point group by type ;
But I can not group the by the city.
How to make it?
Update
id type city
1 food NewYork
2 sport NewYork
3 food Chicago
4 food San
And I want to get something like this:
NewYork Chicago San
food 2 1 1
sport 1 0 0
I will use the html table and chart to display these datas.
So I need to do the counting, I can use something like this:
select count(*) from point where type='food' and city ='San'
select count(*) from point where type='food' and city ='NewYork'
....
However I think this is a bad idea,so I wonder if I can use the sql to do the counting.
BTW,for these table data,how do people organization their structure using json?

this's what you want:
SELECT city,
COUNT(CASE WHEN [type] = 'food' THEN 1 END) AS FoodCount,
COUNT(CASE WHEN [type] = 'sport' THEN 1 END) AS SportCount
FROM point
GROUP BY city

UPDATE:
To get the results in an aggregated row/column format you need to use a pivot table. In Access it's called a Crosstab query. You can use the Crosstab query wizard to generate the query via a nice UI or cut straight to the SQL:
TRANSFORM COUNT(id) AS CountOfId
SELECT type
FROM point
GROUP BY type
PIVOT city
The grouping is used to count the number of Id's for each type. The additional PIVOT clause groups the data by city and displays each grouping in a separate column. The end result looks something like this:
NewYork Chicago San
food 2 1 1
sport 1 0 0

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas