SQL Find most rows that match between two tables - sql

I am using SQL Server 2012 I have two tables like the following
Table1 and Table 2 both have many groups, indicated by the group column. The name of the group may match in both tables, but it may not. What is important is finding the group on Table 2 that has the most members that match members in a group on Table1.
I first tried doing this with a vlookup, but the problem is vlookup pulls the first entry in the Group column that has a match, not the group with the most matches. Below vlookup would pull BBB, but the correct result is CCC.
Ties may occur. There might be more than one group on Table2 that match Table1 with the same number of members thus the best thing may be to count the number of matches, but there are thousands of groups so it's not ideal to sort and sift through a column with counts. I need something like a case statement where if there is a MAX(match) then Table1 would show the group name with MAX(Match) in the derived column BestMatch. It'd be most ideal if the column could display all the groups on table 2 that have MAX(Match which may be one or more. Perhaps it could be comma separated.
If not if the column could just say tie and I could look for the tie, it'd be ideal if this is the best option, when the word tie appears it repeats besides every member that matches so I will know to look for groups that matching which accounts and how many that matched.

We really could do with some expected output to help clarify the question.
If I understand you correctly however, this query will get you close to the results you require:
;with cte as
( SELECT t1a.[group] AS Group1
, t2a.[Group] AS Group2
, RANK() OVER(PARTITION BY t1a.[group]
ORDER BY COUNT(t2a.[Group]) DESC) AS MatchRank
FROM Table1 t1a
JOIN Table2 t2a
ON t1a.member = t2a.member
GROUP BY t1a.[group], t2a.[GRoup])
SELECT *
FROM cte
WHERE MatchRank=1
The query doesn't identify ties, but it will display any tied results...
If you are a newbie to common table expressions(the ;with statement) there is a useful description here.

select *
from Table1 t1
outer apply
(
select top 1 t2.[Group]
from Table2 t2
where t2.Member = t1.Member
group by t2.[Group]
order by count(*) desc
) m

It may not be the most elegant solution but I think it could do the work:
select *
from
(select t1.[group] as t1group, t1.member, t2.[group] as t2group
from Table1 t1 inner join Table2 t2 on t1.member = t2.member)a
where member = (select max(t1.member)
from Table1 t1 inner join Table2 t2 on t1.member = t2.member)
In case of 2 rows from Table2 matching the maximum members in Table1, both results would be displayed
PS: an example of your desired results would have been helpful

Count member matches per group pair and rank them so the group pairs with the highest match count get rank #1. Once you found these, you can select the related records from table1 and table2.
select t1.grp, t1.member, t2.grp
from t1
join
(
select
t1.grp as grp1,
t2.grp as grp2,
rank() over (order by count(*) desc) as rnk
from t1
join t2 on t2.member = t1.member
group by t1.grp, t2.grp
) grps on grps.rnk = 1 and grps.grp1 = t1.grp
left join t2 on t2.grp = grps.grp2 and t2.member = t1.member
order by t1.grp, t1.member, t2.grp;
This gives you ties in separate rows, e.g. for AAA having four different members (123,456,789,555) with two matches both in CCC and DDD:
grp1 member grp2
AAA 123 CCC
AAA 123 DDD
AAA 456 CCC
AAA 789
AAA 555 DDD
If you want one row per grp1 and member with all matching grp2 in a string then you need some clumsy STUFF trick in SQL Server as far as I am aware. Look up "GROUP_CONCAT in SQL Server" to find the technique needed.

Related

How to recursively look up value from within a join after getting value from another table?

I have a table that looks like this:
T1
pid reason cid
-----------------------
1 aaa C1
1 bbb C2
I want to select the cid with greatest value to get to the reason. I can select the greatest cid like so:
SELECT MAX(TRIM('C' from cid)) AS id
FROM T1
However when I try to introduce the reason column like so:
SELECT reason, MAX(TRIM('C' from cid)) AS id
FROM T1
GROUP BY reason
I get the following result:
reason cid
---------------
aaa 1
bbb 2
I only need the reason where cid equals 2.
To add more complexity I only want to extract the reason field from the greatest cid from within a LEFT join.
select t2.*
from t2
left join (select to pick `reason` with greatest `cid`)
on t2.pid = t1.pid
How do I take the greatest value in cid from T1, look up the reason field and pull it into T1 via the left join?
It looks like reason isn't used at all in your final query, but maybe that's just a typo?
Use limit 1 to get the last row:
SELECT pid, reason, cid FROM T1 order by cid DESC limit 1;
Then, your final query could be something like this:
select t2.*, reason
from t2
left join (select pid, reason FROM T1 order by cid DESC limit 1) as sub on sub.pid = t2.pid;
If cid is not unique, you may want to add more columns to the order by clause to make your result deterministic.
The query would be
SELECT reason,
trim('C' from cid)
FROM t1
ORDER BY trim('C' from cid) DESC
LIMIT 1;
I cannot answer the part about the outer join, because the requirement is too vague.
You can use the HAVING clause:
SELECT reason, MAX(trim('C' from cid)) as id
FROM T1
GROUP BY reason
HAVING MAX(trim('C' from cid)) = 2;

How do I just get the first matching row?

I have a fairly complex SQL query - part of which requires to look up a company_ID value found in the first table to obtain the company_Name in the second table. The second table may have variants of the company name, but that is OK - I just need the first match.
So, tableA looks something like this (approx 2 dozen columns and many rows)
company_ID (CHAR(12))
161012348876
561254435253
103929478273
141567643542
tableB looks something like this
company_ID (Integer) Company_name
161012348876 Watson & Jones Ltd
161012348876 Watson and Jones
561254435253 Fictional Co. plc
103929478273 Made Up Corp.
161012348876 Watson Jones Ltd
141567643542 Thingymajig Gmbh.
This query will return multiple rows for 161012348876. What're good ways just to get one row returned for each matching company_id (i.e. 4 rows instead of 6)?
SELECT *, t2.company_name
FROM tableA t1
JOIN tableB t2 ON t1.company_id = cast(t2.company_id as CHAR(12))
I am using Teradata SQL.
Any help much appreciated.
SELECT *, t2.company_name
FROM tableA t1
JOIN tableB t2 ON t1.company_id = cast(t2.company_id as CHAR(12))
GROUP BY t1.company_id
Will return 1 row for each unique t1.company_id
The following query will get one Name for each company id. The Group by t2.company_id and MAX(t2.company_name) will get a unique name for each id and then join it with tableA.
SELECT t1.Company_ID, t2.company_name
FROM tableA t1
JOIN (SELECT t2.company_id , MAX(t2.company_Name) [aName]
FROM tableB t2 GROUP BY t2.company_id ) as t3
ON t1.company_id = cast(t3.company_id as CHAR(12))
Instead of user2989408's MAX subquery you can also do a
SELECT company_id , company_Name
FROM tableB
QUALIFY ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY company_name) = 1
--if you don't care about MIN/MAX or want a more random result:
QUALIFY COUNT(*) OVER (PARTITION BY company_id ROWS UNBOUNDED PRECEDING) = 1
But assuming that *company_id* is the PI of tableB the MAX will probably perform better.

How to compare tables and find duplicates and also find columns with different value

I have the following tables in Oracle 10g:
Table1
Name Status
a closed
b live
c live
Table2
Name Status
a final
b live
c live
There are no primary keys in both tables, and I am trying to write a query which will return identical rows without looping both tables and comparing rows/columns. If the status column is different then the row in the Table2 takes presedence.
So in the above example my query should return this:
Name Status
a final
b live
c live
Since you have mentioned that there are no Primary Key on both tables, I'm assuming that there maybe a possibility that a row may exist on Table1, Table2, or both. The query below uses Common Table Expression and Windowing function to get such result.
WITH unionTable
AS
(
SELECT Name, Status, 1 AS ordr FROM Table1
UNION
SELECT Name, Status, 2 AS ordr FROM Table2
),
ranks
AS
(
SELECT Name, Status,
ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY ordr DESC) rn
FROM unionTable
)
SELECT Name, Status
FROM ranks
WHERE rn = 1
SQLFiddle Demo
Something like this?
SELECT table1.Name, table2.Status
FROM table1
INNER JOIN table2 ON table1.Name = table2.Name
By always returning table2.Status you've covered both the case when they're the same and when they're different (essentially it doesn't matter what the value of table1.Status is).

How do I limit the number of rows returned by this LEFT JOIN to one?

So I think I've seen a solution to this however they are all very complicated queries. I'm in oracle 11g for reference.
What I have is a simple one to many join which works great however I don't need the many. I just want the left table (the one) to just join any 1 row which meets the join criteria...not many rows.
I need to do this because the query is in a rollup which COUNTS so if I do the normal left join I get 5 rows where I only should be getting 1.
So example data is as follows:
TABLE 1:
-------------
TICKET_ID ASSIGNMENT
5 team1
6 team2
TABLE 2:
-------------
MANAGER_NAME ASSIGNMENT_GROUP USER
joe team1 sally
joe team1 stephen
joe team1 louis
harry team2 ted
harry team2 thelma
what I need to do is join these two tables on ASSIGNMENT=ASSIGNMENT_GROUP but only have 1 row returned.
when I do a left join I get three rows returned beaucse that is the nature of hte left join
If oracle supports row number (partition by) you can create a sub query selecting where row equals 1.
SELECT * FROM table1
LEFT JOIN
(SELECT *
FROM (SELECT *,
ROW_NUMBER()
OVER(PARTITION BY assignmentgroup ORDER BY assignmentgroup) AS Seq
FROM table2) a
WHERE Seq = 1) v
ON assignmet = v.assignmentgroup
You could do something like this.
SELECT t1.ticket_id,
t1.assignment,
t2.manager_name,
t2.user
FROM table1 t1
LEFT OUTER JOIN (SELECT manager_name,
assignment_group,
user,
row_number() over (partition by assignment_group
--order by <<something>>
) rnk
FROM table2) t2
ON ( t1.assignment = t2.assignment_group
AND t2.rnk = 1 )
This partitions the data in table2 by assignment_group and then arbitrarily ranks them to pull one arbitrary row per assignment_group. If you care which row is returned (or if you want to make the row returned deterministic) you could add an ORDER BY clause to the analytic function.
I think what you need is to use GROUP BY on the ASSIGNMENT_GROUP field.
http://www.w3schools.com/sql/sql_groupby.asp
In MySQL you could just GROUP BY ASSIGNMENT and be done. Oracle is more strict and refuses to just choose (in an undefined way) which values of the three rows to choose. That means all returned columns need to be part of GROUP BY or be subject to an aggregat function (COUNT, MIN, MAX...)
You can of course choose to just don't care and use some aggregat function on the returned columns.
select TICKET_ID, ASSIGNMENT, MAX(MANAGER_NAME), MAX(USER)
from T1
left join T2 on T1.ASSIGNMENT=T2.ASSIGNMENT_GROUP
group by TICKET_ID, ASSIGNMENT
If you do that I would seriously doubt that you need the JOIN in the first place.
MySQL could also help with GROUP_CONCAT in the case that you want a string concatenation of group values for a column (humans often like that), but with Oracle that is staggeringly complex.
Using a subquery as already suggested is an option, look here for an example. It also allows you to sort the subquery before selecting the top row.
In Oracle, if you want 1 result, you can use the ROWNUM statement to get the first N values of a query e.g.:
SELECT *
FROM TABLEX
WHERE
ROWNUM = 1 --gets the first value of the result
The problem with this single query is that Oracle never returns the data in the same order. So, you must oder your data before use rownum:
SELECT *
FROM
(SELECT * FROM TABLEX ORDER BY COL1)
WHERE
ROWNUM = 1
For your case, looks like you only need 1 result, so your query should look like:
SELECT *
FROM
TABLE1 T1
LEFT JOIN
(SELECT *
FROM TABLE2 T2 WHERE T1.ASSIGNMENT = T2.ASSIGNMENT_GROUP
AND
ROWNUM = 1) T3 ON T1.ASSIGNMENT = T3.ASSIGNMENT_GROUP
you can use subquery - select top 1

Counts for distinct values in different tables where columns are common to separate tables

I have no idea if that title conveys what I want it to.
I have two tables containing phone records (one for each account) and I'd like to get call counts for the numbers that are common to each account. In other words:
Table 1
Number ...
8675309
8675309
8675310
8675310
8675312
Table 2
Number ...
8675309
8675309
8675309
8675310
8675311
Querying with something like:
SELECT DISTINCT table1.number, COUNT(table1.number), COUNT(table2.number) FROM table1, table2 WHERE table1.number = table2.number GROUP BY table1.number
would hopefully produce:
8675309|2|3
8675310|2|1
Instead, it currently produces something like:
8675309|6|6
8675310|2|2
It appears to be multiplying the count from each table. Presumably, this is because I'm not joining the tables the way I should for this goal. Or because by the time I ask for COUNT(table1.number) the tables have already been joined in some multiplicative way. Should I not be doing a JOIN and instead something that would read like: "where table2.number CONTAINS(table1.number)"?
Any tips?
One way is with subqueries:
SELECT t1.number, t1.table1Count, t2.table2Count
from (select number, count(*) table1Count
from table1
group by number) t1
inner join (select number, count(*) table2Count
from table2
group by number) t2
on t2.number = t1.number
This assumes that you only want to list numbers that appear in both tables. If you want to list all numbers that appear in one table and optionally the other, you'd use a left or right outer join; if you wanted all numbers that appeared in either or both tables, you'd use a full outer join.
Another and potentially more efficient way requires the presence of a single column that uniquely identifies each row in each table:
SELECT
t1.number
,count(distinct t1.PrimaryKeyValue) table1Count
,count(distinct t2.PrimaryKeyValue) table2Count
from table1 t1
inner join table2 t2
on t2.number = t1.number
group by t1.number
This makes the same assumptions as before, and can also be adjusted modified via outer joins.
One way is to use a couple of derived tables to compute your counts separately and then join them to produce your final summary:
select t1.number, t1.count1, t2.count2
from (select number, count(number) as count1 from table1 group by number) as t1
join (select number, count(number) as count2 from table2 group by number) as t2
on t1.number = t2.number
There are probably other ways but that should work and it is the first thing that came to mind.
You're getting your "multiplicative" effect pretty much for the reasons you suspect. If you have this:
table1(id,x) table2(id,x)
------------ ------------
1, a 4, a
2, a 5, a
3, b 6, b
Then joining them on x will give you this:
1,a, 4,a
1,a, 5,a
2,a, 4,a
2,a, 5,a
...
Usually you could use a GROUP BY to sort out the duplicates but you can't do that because it would mess up your per-table counts.
Try this:
select tab1.number,tab1.num1,tab2.num2
from
(SELECT number, COUNT(number) as num1 from table1 group by number) as tab1
left join
(SELECT number, COUNT(number) as num2 from table2 group by number) as tab2
on tab1.number = tab2.number