How to compare tables and find duplicates and also find columns with different value - sql

I have the following tables in Oracle 10g:
Table1
Name Status
a closed
b live
c live
Table2
Name Status
a final
b live
c live
There are no primary keys in both tables, and I am trying to write a query which will return identical rows without looping both tables and comparing rows/columns. If the status column is different then the row in the Table2 takes presedence.
So in the above example my query should return this:
Name Status
a final
b live
c live

Since you have mentioned that there are no Primary Key on both tables, I'm assuming that there maybe a possibility that a row may exist on Table1, Table2, or both. The query below uses Common Table Expression and Windowing function to get such result.
WITH unionTable
AS
(
SELECT Name, Status, 1 AS ordr FROM Table1
UNION
SELECT Name, Status, 2 AS ordr FROM Table2
),
ranks
AS
(
SELECT Name, Status,
ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY ordr DESC) rn
FROM unionTable
)
SELECT Name, Status
FROM ranks
WHERE rn = 1
SQLFiddle Demo

Something like this?
SELECT table1.Name, table2.Status
FROM table1
INNER JOIN table2 ON table1.Name = table2.Name
By always returning table2.Status you've covered both the case when they're the same and when they're different (essentially it doesn't matter what the value of table1.Status is).

Related

Cross joining tables to see which partners in one table have a report from another table [duplicate]

table1 (id, name)
table2 (id, name)
Query:
SELECT name
FROM table2
-- that are not in table1 already
SELECT t1.name
FROM table1 t1
LEFT JOIN table2 t2 ON t2.name = t1.name
WHERE t2.name IS NULL
Q: What is happening here?
A: Conceptually, we select all rows from table1 and for each row we attempt to find a row in table2 with the same value for the name column. If there is no such row, we just leave the table2 portion of our result empty for that row. Then we constrain our selection by picking only those rows in the result where the matching row does not exist. Finally, We ignore all fields from our result except for the name column (the one we are sure that exists, from table1).
While it may not be the most performant method possible in all cases, it should work in basically every database engine ever that attempts to implement ANSI 92 SQL
You can either do
SELECT name
FROM table2
WHERE name NOT IN
(SELECT name
FROM table1)
or
SELECT name
FROM table2
WHERE NOT EXISTS
(SELECT *
FROM table1
WHERE table1.name = table2.name)
See this question for 3 techniques to accomplish this
I don't have enough rep points to vote up froadie's answer. But I have to disagree with the comments on Kris's answer. The following answer:
SELECT name
FROM table2
WHERE name NOT IN
(SELECT name
FROM table1)
Is FAR more efficient in practice. I don't know why, but I'm running it against 800k+ records and the difference is tremendous with the advantage given to the 2nd answer posted above. Just my $0.02.
SELECT <column_list>
FROM TABLEA a
LEFTJOIN TABLEB b
ON a.Key = b.Key
WHERE b.Key IS NULL;
https://www.cloudways.com/blog/how-to-join-two-tables-mysql/
This is pure set theory which you can achieve with the minus operation.
select id, name from table1
minus
select id, name from table2
Here's what worked best for me.
SELECT *
FROM #T1
EXCEPT
SELECT a.*
FROM #T1 a
JOIN #T2 b ON a.ID = b.ID
This was more than twice as fast as any other method I tried.
Watch out for pitfalls. If the field Name in Table1 contain Nulls you are in for surprises.
Better is:
SELECT name
FROM table2
WHERE name NOT IN
(SELECT ISNULL(name ,'')
FROM table1)
You can use EXCEPT in mssql or MINUS in oracle, they are identical according to :
http://blog.sqlauthority.com/2008/08/07/sql-server-except-clause-in-sql-server-is-similar-to-minus-clause-in-oracle/
That work sharp for me
SELECT *
FROM [dbo].[table1] t1
LEFT JOIN [dbo].[table2] t2 ON t1.[t1_ID] = t2.[t2_ID]
WHERE t2.[t2_ID] IS NULL
You can use following query structure :
SELECT t1.name FROM table1 t1 JOIN table2 t2 ON t2.fk_id != t1.id;
table1 :
id
name
1
Amit
2
Sagar
table2 :
id
fk_id
email
1
1
amit#ma.com
Output:
name
Sagar
All the above queries are incredibly slow on big tables. A change of strategy is needed. Here there is the code I used for a DB of mine, you can transliterate changing the fields and table names.
This is the strategy: you create two implicit temporary tables and make a union of them.
The first temporary table comes from a selection of all the rows of the first original table the fields of which you wanna control that are NOT present in the second original table.
The second implicit temporary table contains all the rows of the two original tables that have a match on identical values of the column/field you wanna control.
The result of the union is a table that has more than one row with the same control field value in case there is a match for that value on the two original tables (one coming from the first select, the second coming from the second select) and just one row with the control column value in case of the value of the first original table not matching any value of the second original table.
You group and count. When the count is 1 there is not match and, finally, you select just the rows with the count equal to 1.
Seems not elegant, but it is orders of magnitude faster than all the above solutions.
IMPORTANT NOTE: enable the INDEX on the columns to be checked.
SELECT name, source, id
FROM
(
SELECT name, "active_ingredients" as source, active_ingredients.id as id
FROM active_ingredients
UNION ALL
SELECT active_ingredients.name as name, "UNII_database" as source, temp_active_ingredients_aliases.id as id
FROM active_ingredients
INNER JOIN temp_active_ingredients_aliases ON temp_active_ingredients_aliases.alias_name = active_ingredients.name
) tbl
GROUP BY name
HAVING count(*) = 1
ORDER BY name
See query:
SELECT * FROM Table1 WHERE
id NOT IN (SELECT
e.id
FROM
Table1 e
INNER JOIN
Table2 s ON e.id = s.id);
Conceptually would be: Fetching the matching records in subquery and then in main query fetching the records which are not in subquery.
First define alias of table like t1 and t2.
After that get record of second table.
After that match that record using where condition:
SELECT name FROM table2 as t2
WHERE NOT EXISTS (SELECT * FROM table1 as t1 WHERE t1.name = t2.name)
I'm going to repost (since I'm not cool enough yet to comment) in the correct answer....in case anyone else thought it needed better explaining.
SELECT temp_table_1.name
FROM original_table_1 temp_table_1
LEFT JOIN original_table_2 temp_table_2 ON temp_table_2.name = temp_table_1.name
WHERE temp_table_2.name IS NULL
And I've seen syntax in FROM needing commas between table names in mySQL but in sqlLite it seemed to prefer the space.
The bottom line is when you use bad variable names it leaves questions. My variables should make more sense. And someone should explain why we need a comma or no comma.
I tried all solutions above but they did not work in my case. The following query worked for me.
SELECT NAME
FROM table_1
WHERE NAME NOT IN
(SELECT a.NAME
FROM table_1 AS a
LEFT JOIN table_2 AS b
ON a.NAME = b.NAME
WHERE any further condition);

Join table on Count

I have two tables in Access, one containing IDs (not unique) and some Name and one containing IDs (not unique) and Location. I would like to return a third table that contains only the IDs of the elements that appear more than 1 time in either Names or Location.
Table 1
ID Name
1 Max
1 Bob
2 Jack
Table 2
ID Location
1 A
2 B
Basically in this setup it should return only ID 1 because 1 appears twice in Table 1 :
ID
1
I have tried to do a JOIN on the tables and then apply a COUNT but nothing came out.
Thanks in advance!
Here is one method that I think will work in MS Access:
(select id
from table1
group by id
having count(*) > 1
) union -- note: NOT union all
(select id
from table2
group by id
having count(*) > 1
);
MS Access does not allow union/union all in the from clause. Nor does it support full outer join. Note that the union will remove duplicates.
Simple Group By and Having clause should help you
select ID
From Table1
Group by ID
having count(1)>1
union
select ID
From Table2
Group by ID
having count(1)>1
Based on your description, you do not need to join tables to find duplicate records, if your table is what you gave above, simply use:
With A
as
(
select ID,count(*) as Times From table group by ID
)
select * From A where A.Times>1
Not sure I understand what query you already tried, but this should work:
select table1.ID
from table1 inner join table2 on table1.id = table2.id
group by table1.ID
having count(*) > 1
Or if you have ID's in one table but not the other
select table1.ID
from table1 full outer join table2 on table1.id = table2.id
group by table1.ID
having count(*) > 1

SQL Find most rows that match between two tables

I am using SQL Server 2012 I have two tables like the following
Table1 and Table 2 both have many groups, indicated by the group column. The name of the group may match in both tables, but it may not. What is important is finding the group on Table 2 that has the most members that match members in a group on Table1.
I first tried doing this with a vlookup, but the problem is vlookup pulls the first entry in the Group column that has a match, not the group with the most matches. Below vlookup would pull BBB, but the correct result is CCC.
Ties may occur. There might be more than one group on Table2 that match Table1 with the same number of members thus the best thing may be to count the number of matches, but there are thousands of groups so it's not ideal to sort and sift through a column with counts. I need something like a case statement where if there is a MAX(match) then Table1 would show the group name with MAX(Match) in the derived column BestMatch. It'd be most ideal if the column could display all the groups on table 2 that have MAX(Match which may be one or more. Perhaps it could be comma separated.
If not if the column could just say tie and I could look for the tie, it'd be ideal if this is the best option, when the word tie appears it repeats besides every member that matches so I will know to look for groups that matching which accounts and how many that matched.
We really could do with some expected output to help clarify the question.
If I understand you correctly however, this query will get you close to the results you require:
;with cte as
( SELECT t1a.[group] AS Group1
, t2a.[Group] AS Group2
, RANK() OVER(PARTITION BY t1a.[group]
ORDER BY COUNT(t2a.[Group]) DESC) AS MatchRank
FROM Table1 t1a
JOIN Table2 t2a
ON t1a.member = t2a.member
GROUP BY t1a.[group], t2a.[GRoup])
SELECT *
FROM cte
WHERE MatchRank=1
The query doesn't identify ties, but it will display any tied results...
If you are a newbie to common table expressions(the ;with statement) there is a useful description here.
select *
from Table1 t1
outer apply
(
select top 1 t2.[Group]
from Table2 t2
where t2.Member = t1.Member
group by t2.[Group]
order by count(*) desc
) m
It may not be the most elegant solution but I think it could do the work:
select *
from
(select t1.[group] as t1group, t1.member, t2.[group] as t2group
from Table1 t1 inner join Table2 t2 on t1.member = t2.member)a
where member = (select max(t1.member)
from Table1 t1 inner join Table2 t2 on t1.member = t2.member)
In case of 2 rows from Table2 matching the maximum members in Table1, both results would be displayed
PS: an example of your desired results would have been helpful
Count member matches per group pair and rank them so the group pairs with the highest match count get rank #1. Once you found these, you can select the related records from table1 and table2.
select t1.grp, t1.member, t2.grp
from t1
join
(
select
t1.grp as grp1,
t2.grp as grp2,
rank() over (order by count(*) desc) as rnk
from t1
join t2 on t2.member = t1.member
group by t1.grp, t2.grp
) grps on grps.rnk = 1 and grps.grp1 = t1.grp
left join t2 on t2.grp = grps.grp2 and t2.member = t1.member
order by t1.grp, t1.member, t2.grp;
This gives you ties in separate rows, e.g. for AAA having four different members (123,456,789,555) with two matches both in CCC and DDD:
grp1 member grp2
AAA 123 CCC
AAA 123 DDD
AAA 456 CCC
AAA 789
AAA 555 DDD
If you want one row per grp1 and member with all matching grp2 in a string then you need some clumsy STUFF trick in SQL Server as far as I am aware. Look up "GROUP_CONCAT in SQL Server" to find the technique needed.

SQL: is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

left join
(
SELECT my_number, MAX(id) as id
FROM table1
GROUP BY location
) newNum
on newNum.Part = c.OtherPart
left join table2 t2 on t2.id = newNum.id and t2.site = a.site
Situation: I have the data fields (among others) my_number, location, and id in table1. In table2 I have the data fields (among others) id, site, date. I am joining those to other views/tables (c and a) that have some of the same data fields and my_numbers.
My goal: Each my_number has multiple id's and I want the greatest id value for each site. That is why I used group by site.
Then I need to get the 'date' of the my_number based on the id, because the second table does not contain the my_number, just its associated id.
There are a total of 3 sites, so I need the 3 greatest id value for each site. Then I want to get the 'date' of those 3 id values
Output table ex:
a.num a.site a.date c.OtherPart T2.date
15 TN 1.1.16 17 3.19.16
15 FL 2.21.16 17 4.22.16
15 TX 1.7.15 17 3.21.16
When you put something like max(column) in a SQL query, the max function is operating on a set of values of column from a group. If you've defined your query with a group by, such that the results are grouped, then every column (other than the one on which you are grouping) has multiple values.
In your case, location has one value (it's what you're grouping by), but my_number and id have multiple values. If my_number is (1,2,3,4) and id is (5,6,7,8), you can display sum(my_number) or max(my_number) but obviously you can't display on a single row the 'number' my_number. It is not a number, but a list.
This is what is meant when the error message says "SQL: is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause" If you put the column my_column in an aggregate function (like sum) it will work, or if you add it to the group by clause it will work.
Not sure what exactly you want to do about your sql, you should only fetch columns which appears in GROUP BY or the other column in an aggregate function, anyway try this please;)
left join
(
SELECT my_number, id
FROM table1 T1,
(SELECT location, MAX(id) as id
FROM table1
GROUP BY location) TMP
WHERE T1.id= TMP.id AND T1.location = TMP.location
) newNum
on newNum.Part = c.OtherPart
left join table2 t2 on t2.id = newNum.id and t2.site = a.site
And you also can fix this error by following sql but may not what you want to do,
left join
(
SELECT location, MAX(id) as id
FROM table1
GROUP BY location
) newNum
on newNum.Part = c.OtherPart
left join table2 t2 on t2.id = newNum.id and t2.site = a.site

select and display empty column in case one condition at where clause is not met (because of non-existent element at other table)

I have 2 Tables, the table1 and table2.
Table1 has columns P_KEY, ID, some_value, some_value2, ....
Table2 has columns P2_KEY, RELATED_DATA, F_KEY, DATA_VALUE.
P_KEY and ID of table1 are unique pairs.
For each P_KEY (and thus each ID) of table1 there are multiple F_KEY entries of table2 with various data of each ID.
The RELATED_DATA rows have specific values (from a defined range elsewhere), which define the kind of Data stored for each ID.
I need to select the DATA_VALUE of ID where RELATED_DATA is 1 and 500. If one of them or both do not exist, I still want to display the row with (examples):
ID, DATA_VALUE (where RELATED_DATA=500), <empty column>
ID, <empty column>, DATA_VALUE (where RELATED_DATA=1)
ID, <empty column>, <empty column>
ID, DATA_VALUE (where RELATED_DATA=500), DATA_VALUE (where RELATED_DATA=1)
I have an SQL like the below, and I need to display row for A.ID, even when B1.RELATED_DATA and/or B2.RELATED_DATA do NOT exist (no row entry at table2), or this is empty string, having the B1.DATA_VALUE and/or B2.DATA_VALUE cell(s) showing as empty cell(s):
select A.ID, B1.DATA_VALUE, B2.DATA_VALUE
from table1 A, table2 B1, table2 B2
where B1.F_KEY = A.P_KEY
and B2.F_KEY = A.P_KEY
and B1.RELATED_DATA = 500
and B2.RELATED_DATA = 1
and A.ID='OneValue';
Purpose is to know the cases at which when B1.RELATED_DATA=500 and/or B2.RELATED_DATA=1 row do not exist or they are empty strings for ID.
Thank you.
What you'd need to do is to pivot the rows in table2 to get the two data_values as columns, rather than rows. Then you can join that back to table1 like so:
select a.id,
b.data_value_500,
b.data_value_1
from table1 a
inner join (select f_key,
max(case when related_data = 500 then data_value end) data_value_500,
max(case when related_data = 1 then data_value end) data_value_1
from table2
group by f_key) b on (a.p_key = b.f_key)
where a.id = 'OneValue';
N.B. I've used an inner join under the assumption that there will always be a row present in table2 for each p_key in table1. If that's not the case, switch to a left outer join instead.
Also, if you're on Oracle 11g or above, it's possible to use the built in PIVOT functionality to do the pivoting of table2:
select a.id,
b.data_value_500,
b.data_value_1
from table1 a
inner join (select f_key,
data_value_1,
data_value_500
from table2
pivot (max(data_value) for related_data in (1 as data_value_1,
500 as data_value_500)))
where a.id = 'OneValue';