SQL Query, GROUP/COUNT issue with INNER JOIN

SQL Query, GROUP/COUNT issue with INNER JOIN - sql

I've got a data set composed primarily of dates, IDs, and addresses, that looks a bit like this:
datadate id address
20150801 Bob 123
20150801 Bob 123
20150801 Dan 345
20150801 Dan 456
20150801 Dan 567
20150801 George 234
20150801 Jim 123
20150801 Jim 123
20150801 John 678
20150801 John 123
20150802 Tom 123
20150802 Tom 234
20150802 Tom 345
My goal is to write a query which identifies any IDs which are associated with multiple distinct addresses for a specific date (or date range). I want the query results to give me the name and distinct addresses. So, for this data set, the results I'd like to see would look like this, for date 8/1/2015:
datadate id address
20150801 Dan 345
20150801 Dan 456
20150801 Dan 567
20150801 John 678
20150801 John 123
The query I've worked up so far is this, but it's not really working for me:
SELECT a.[datadate], a.[id], a.[address], b.[count1]
FROM table1 AS a INNER JOIN (SELECT [id], COUNT([address]) as [count1] FROM table1 GROUP BY [id] having count1 > 1 ) AS b ON a.[id]=b.[id]
WHERE a.[datadate] = '20150801'
ORDER BY a.[id], a.[address];
Any suggestions?

Just modifying your existing query a little bit, you can change your having to count(distinct address) and then joining back to the table to get your address values like this:
SELECT t.datadate
,t.id
,t1.address
FROM (
SELECT datadate
,id
,count(DISTINCT address) address
FROM test
WHERE datadate = '20150801'
GROUP BY datadate,id
HAVING count(DISTINCT address) > 1
) t
INNER JOIN test t1 ON t.datadate = t1.datadate
AND t.id = t1.id;
I tested this on SQL Server, but should be similar in MS-Access as well.
SQL Fiddle Demo

Edit
I just read your question again and it appears you want all duplicates. In which case I would use exists to see if another row with the same id but a different address exists.
select * from mytable t1
where datadate = '20150801'
and exists (
select 1 from mytable t2
where t2.id = t1.id
and t2.address <> t1.address
and t2.datadate = t1.datadate
)

Related

Changing record values based on whether there are duplicates when two tables are combined

I know I can join Table #1 and Table #2 with a UNION and then filter out duplicate Id's using DISTINCT. However, for the duplicate contacts I'd like to change DrinkPreference to Coke/Pepsi.
Is this possible?
Starting Table #1
Id
FirstName
LastName
DrinkPreference
123
Tom
Bannon
Pepsi
124
Sarah
Smith
Pepsi
Starting Table #2
id
FirstName
LastName
DrinkPreference
125
Jim
Henry
Coke
123
Tom
Bannon
Coke
Table? #3 - combined with DrinkPreference set to Coke/Pepsi where contact exists in both tables?
Id
FirstName
LastName
DrinkPreference
125
Jim
Henry
Coke
123
Tom
Bannon
Coke/Pepsi
124
Sarah
Smith
Pepsi

You can try this one
SELECT coalesce(t1.firstname, t2.firstname) AS firstname,coalesce(t1.lastname,t2.lastname) AS lastname, CASE WHEN t1.drinkpreferences IS NULL THEN t2.drinkpreferences WHEN t2.drinkpreferences IS NULL THEN t1.drinkpreferences
ELSE t1.drinkpreferences || '/' || t2.drinkpreferences END AS drinkpreferences FROM table1 t1 FULL JOIN table2 t2 ON t1.id = t2.id

Achievable using multiple unions and joins.
select distinct FirstName, LastName, case when ct = 2 then 'Coke/Pepsi' else DrinkPreference end
from (
select FirstName, LastName, DrinkPreference, Id from table1
union all
select FirstName, LastName, DrinkPreference, Id from table2) a
left join
(
select count(1)ct, Id from
(select Id from table1
union all
select Id from table2) t1
group by Id
) b on b.Id = a.Id

Select records based on Id and most updated record of the same Id

I have a table Applications
The user can submit more than one application. The user can also update an existing application, but instead of updating the record itself, we will insert a new record with the same ApplicationNumber
Id ApplicationNum ApplicantId ApplicantName CreateDate
1 101 789 John May-20-2021
2 101 789 John May-21-2021
3 102 789 John May-22-2021
4 103 123 Maria May-31-2021
I want to return the list of applications based on the ApplicantId, but I don’t want to display both records of the same ApplicationNumber
If I use this select statement
Select * from Applications where ApplicantId = 789
This is the result I currently get
1 101 789 John May-20-2021
2 101 789 John May-21-2021
3 102 789 John May-22-2021
This is the result I want to get
2 101 789 John May-21-2021
3 102 789 John May-22-2021
Notice that record Id = 1 is not displayed because it is an old version of record Id = 2
How can I achieve this?

I like using ROW_NUMBER along with a TIES trick here:
SELECT TOP 1 WITH TIES *
FROM Applications
WHERE ApplicantId = 789
ORDER BY ROW_NUMBER() OVER (PARTITION BY ApplicantId, ApplicationNum ORDER BY Id DESC);

Might be easier to just use:
select max(Id) as Id, ApplicationNum, ApplicantId, ApplicantName, max(CreateDate) as CreateDate
from Applications
where ApplicantId = 789
group by ApplicationNum, ApplicantId, ApplicantName

The traditional way which is usually the most performant is to use row_number and select the desired row from each group of Applicants
select Id, ApplicationNum, ApplicantId, ApplicantName, CreateDate
from (
select *, Row_Number() over(partition by ApplicantId, ApplicationNum order by Id desc) rn
from Applications
where ApplicantId=789
)a
where rn=1

sql that identifies which account numbers have multiple agents

I dont think a count will work here, can someone help me get an sql that identifies which account numbers have multiple agents, more than two agents in the where condition.
AGENT_NAME ACCOUNT_NUMBER
Clemons, Tony 123
Cipollo, Michael 123
Jepsen, Sarah 567
Joanos, James 567
McMahon, Brian 890
Novak, Jason 437
Ralph, Melissa 197
Reitwiesner, John 221
Roman, Marlo 123
Rosenzweig, Marcie 890
Results should be something like this.
ACCOUNT_NUMBER AGENT_NAME
123 Cipollo, Michael
123 Roman, Marlo
123 Clemons, Tony
890 Rosenzweig, Marcie
890 McMahon, Brian
567 Joanos, James
567 Jepsen, Sarah

You can do this using window functions:
select t.account_number, t.agent_name
from (select t.*, min(agent_name) over (partition by account_number) as minan,
max(agent_name) over (partition by account_number) as maxan
from table t
) t
where minan <> maxan;
If you know the agent names are never duplicated, you could just do:
select t.account_number, t.agent_name
from (select t.*, count(*) over (partition by account_number) as cnt
from table t
) t
where cnt > 1;

Assuming your table name is test, this should pull all the records with duplicate ACCOUNT_NUMBER:
select * from test where ACCOUNT_NUMBER in
(select ACCOUNT_NUMBER from test
group by ACCOUNT_NUMBER having
count(ACCOUNT_NUMBER)>1)
order by ACCOUNT_NUMBER

Using count function u can get the result
CREATE TABLE #TEMP
(
AGENT_NAME VARCHAR(100),
ACCOUNT_NUMBER INT
)
INSERT INTO #TEMP
VALUES ('CLEMONS, TONY',123),
('CIPOLLO, MICHAEL',123),
('JEPSEN, SARAH',567),
('JOANOS, JAMES',567),
('MCMAHON, BRIAN',890),
('NOVAK, JASON',437),
('RALPH, MELISSA',197),
('REITWIESNER, JOHN',221),
('ROMAN, MARLO',123),
('ROSENZWEIG, MARCIE',890)
SELECT a.ACCOUNT_NUMBER,a.AGENT_NAME
FROM #TEMP A
JOIN(SELECT COUNT(1) CNT,
ACCOUNT_NUMBER
FROM #TEMP
GROUP BY ACCOUNT_NUMBER) B
ON A.ACCOUNT_NUMBER = B.ACCOUNT_NUMBER
WHERE B.CNT != 1

Retrieve all distinct records from table and if any changes happen between two similar distinct record then need to consider both. Using select query

I want to convert table1 into table2. As I need to find out all distinct records excluding mis_date fro the table and most important condition is if any changes happen between two similar distinct records than in that case I want both of them as two distinct records.
Example:
i/p
empId Empname Pancard MisDate
123 alex ads234 31/11/2012
123 alex ads234 31/12/2012
123 alex ads234 31/01/2013
123 alex dds124 29/02/2013
123 alex ads234 31/03/2013
123 alex ads234 31/04/2013
123 alex dds124 30/05/2013
Expected o/p
empId Empname Pancard MisDate
123 alex ads234 31/11/2012
123 alex dds124 29/02/2013
123 alex ads234 31/03/2013
123 alex dds124 30/05/2013

Assuming there's only one row for each MisDate (otherwise you'll have to find another way to specify ordering):
SELECT t1.empId, t1.Empname, t1.Pancard
FROM Table1 t1
LEFT OUTER JOIN Table1 t2
ON t2.MisDate = (SELECT MAX(MisDate) FROM Table1 t3 WHERE t3.MisDate < t1.MisDate)
WHERE t2.empId IS NULL
OR t2.empId <> t1.empId OR t2.Empname <> t1.Empname OR t2.Pancard <> t1.Pancard
SQL Fiddle example
This performs a self-join on the previous record, as ordered by MisDate, outputting if it is different or if there is no previous record (it is the first row).
Note: You've got some funky dates. I assume these are just transcription errors and have corrected them in the fiddle.

SQL Join Ignore multiple matches (fuzzy results ok)

I don't even know what the name of my problem is called, so I'm just gonna put some sample data. I don't mind fuzzy results on this (this is the best way I can think to express it. I don't mind if I overlook some data, this is for approximated evaluation, not for detailed accounting, if that makes sense). But I do need every record in TABLE 1, and I would like to avoid the nulls case indicated below.
IS THIS POSSIBLE?
TABLE 1
acctnum sub fname lname phone
12345 1 john doe xxx-xxx-xxxx
12346 0 jane doe xxx-xxx-xxxx
12347 0 rob roy xxx-xxx-xxxx
12348 0 paul smith xxx-xxx-xxxx
TABLE 2
acctnum sub division
12345 1 EAST
12345 2 WEST
12345 3 NORTH
12346 1 TOP
12346 2 BOTTOM
12347 2 BALLOON
12348 1 NORTH
So if we do a "regular outer" join, we'd get some results like this, since the sub 0's don't match the second table:
TABLE AFTER JOIN
acctnum sub fname lname phone division
12345 1 john doe xxx-xxx-xxxx EAST
12346 0 jane doe xxx-xxx-xxxx null
12347 0 rob roy xxx-xxx-xxxx null
12348 0 paul smith xxx-xxx-xxxx null
But I would rather get
TABLE AFTER JOIN
acctnum sub fname lname phone division
12345 1 john doe xxx-xxx-xxxx EAST
12346 0 jane doe xxx-xxx-xxxx TOP
12347 0 rob roy xxx-xxx-xxxx BALLOON
12348 0 paul smith xxx-xxx-xxxx NORTH
And I'm trying to avoid:
TABLE AFTER JOIN
acctnum sub fname lname phone division
12345 1 john doe xxx-xxx-xxxx EAST
12345 1 john doe xxx-xxx-xxxx WEST
12345 1 john doe xxx-xxx-xxxx NORTH
12346 0 jane doe xxx-xxx-xxxx TOP
12346 0 jane doe xxx-xxx-xxxx BOTTOM
12347 0 rob roy xxx-xxx-xxxx BALOON
12348 0 paul smith xxx-xxx-xxxx NORTH
So I decided to go with using a union and two if conditions. I'll accept a null for conditions where the sub account is defined in table 1 but not in table 2, and for everything else, I'll just match against the min.

If I'm understanding correctly, it looks like you're trying to join on the sub column if it matches. If there's no match on sub, then you want it to select the "first" row for that acctnum. Is this correct?
If so, you'll need to left join on the full match, then perform another left join on a select statement that determines the division that corresponds to the lowest sub value for that acctnum. The row_number() function can help you with this, like this:
select
t1.acctnum,
t1.sub,
t1.fname,
t1.lname,
t1.phone,
isnull(t2_match.division, t2_first.division) as division
from table1 t1
left join table2 t2_match on t2_match.acctnum = t1.acctnum and t2_match.sub = t1.sub
left join
(
select
acctnum,
sub,
division,
row_number() over (partition by acctnum order by sub) as rownum
from table2
) t2_first on t2_first.acctnum = t1.acctnum
EDIT
If you don't care at all about which record you get back from table 2 when a matching sub doesn't exist, you could combine two different queries (one that matches the sub and one that just takes the min or max division) with a union.
select
t1.acctnum,
t1.sub,
t1.fname,
t1.lname,
t1.phone,
t2.division
from table1 t1
join table2 t2 on t2.acctnum = t1.acctnum and t2.sub = t1.sub
union
select
t1.acctnum,
t1.sub,
t1.fname,
t1.lname,
t1.phone,
min(t2.division)
from table1 t1
join table2 t2 on t2.acctnum = t1.acctnum
left join table2 t2_match on t2_match.acctnum = t1.acctnum and t2_match.sub = t1.sub
where t2_match.acctnum is null
Personally, I don't find the union syntax any more compelling and you now have to maintain the query in two places. For this reason, I'd favor the row_number() approach.

try to use
SELECT MIN(Table_1.acctnum) as acctnum , MIN(Table_1.sub) as sub,MIN( Table_1.fname) as fname, MIN(Table_1.lname) as name, MIN(Table_1.phone) as phone, MIN(Table_2.division) as division
FROM Table_1 INNER JOIN Table_2 ON Table_1.acctnum = Table_2.acctnum AND Table_1.sub = Table_2.sub
where Table_1.sub>0
group by Table_1.acctnum
union
SELECT MIN(Table_1.acctnum) as acctnum , MIN(Table_1.sub) as sub,MIN( Table_1.fname) as fname, MIN(Table_1.lname) as name, MIN(Table_1.phone) as phone, MIN(Table_2.division) as division
FROM Table_1 INNER JOIN Table_2 ON Table_1.acctnum = Table_2.acctnum
where Table_1.sub=0
group by Table_1.acctnum
this is the result
12345 1 john doe xxxxxxxxxx EAST
12346 0 jane doe xxxxxxxxxx BOTTOM
12347 0 rob roy xxxxxxxxxx BALLOON
12348 0 paul smith xxxxxxxxxx NORTH
if you change min to max TOP will be insted of BOTTOM on the second row

It may also work for you:
SELECT t1.acctnum, t1.sub, t1.fname, t1.lname, t1.phone,
ISNULL(MAX(t2.division),MAX(t3.division)) as division
FROM table_1 t1
LEFT JOIN table_2 t2 ON (t2.acctnum = t1.acctnum AND t1.sub = t2.sub)
LEFT JOIN table_2 t3 ON (t3.acctnum = t1.acctnum)
GROUP BY t1.acctnum, t1.sub, t1.fname, t1.lname, t1.phone

This will give your desired result, exactly (for the shown data):
Updated to not assume there is always a sub==1 value:
SELECT
T1.acctnum,
T1.sub,
T1.fname,
T1.lname,
T1.phone,
T2.division
FROM
TABLE_1 T1
LEFT JOIN
TABLE_2 T2 ON T1.acctnum = T2.acctnum
AND
T2.sub = (SELECT MIN(T3.sub) FROM TABLE_2 T3 WHERE T1.acctnum = T3.acctnum)
ORDER BY
T1.lname,
T1.fname,
T1.acctnum

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Query, GROUP/COUNT issue with INNER JOIN - sql

Related

Changing record values based on whether there are duplicates when two tables are combined

Select records based on Id and most updated record of the same Id

sql that identifies which account numbers have multiple agents

Retrieve all distinct records from table and if any changes happen between two similar distinct record then need to consider both. Using select query

SQL Join Ignore multiple matches (fuzzy results ok)

Categories

Resources