I currently have two tables with the same structure. Table A (as an example) has 10,000 rows, table B has 100,000 rows. I need to obtain the rows that are in Table B that are not in Table A, but only if certain fields are the same (and one is not).
Right now, the query is something like:
select *
from tableA A
where (A.field1, A.field2) in (select field1, field2 from tableB B)
and A.field3 not in (select field3 from B)
This works, but probably a better performant solution could be done with JOINs. I have tried to do it but all I get is a very huge list of duplicated rows. Could someone point me in the right direction?
Based on your current query this is what it translates to as joins:
select *
from tableA A
inner join tableB B on A.field1 = B.field1 and A.field2 = B.field2
left outer join tableB C on A.field3 = C.field3
where c.field3 is null
A faster query would be:
select A.pk
from tableA A
inner join tableB B on A.field1 = B.field1 and A.field2 = B.field2
left outer join tableB C on A.field3 = C.field3
where c.field3 is null
group by A.pk
This would give you the rows you need to add to tableB because they aren't found.
Or you can just get the fields you want to pull over:
select A.field1, A.field2, A.field3
from tableA A
inner join tableB B on A.field1 = B.field1 and A.field2 = B.field2
left outer join tableB C on A.field3 = C.field3
where c.field3 is null
group by A.field1, A.field2, A.field3
[NOT] EXISTS is your friend:
SELECT *
FROM tableA A
WHERE EXISTS ( SELECT * FROM tableB B
WHERE A.field1 = B.field1
AND A.field2 = B.field2
)
AND NOT EXISTS ( SELECT * FROM tableB B
WHERE A.field3 = B.field3
);
Note: if the joined columns are NOT NULLable, the [NOT] EXISTS() version will behave exactly the same as the [NOT] IN version
Reading the question text again (and again):
I need to obtain the rows that are in Table B that are not in Table A, but only if certain fields are the same (and one is not).
SELECT *
FROM tableB B
WHERE EXISTS ( SELECT * FROM tableA A
WHERE A.field1 = B.field1
AND A.field2 = B.field2
AND A.field3 <> B.field3
);
Related
Currently SQL statement as below:
SELECT result FROM tableA WHERE
field1 = (SELECT x FROM tableB WHERE field3 = "XXX") OR
field2 = (SELECT x FROM tableB WHERE field3 = "XXX");
Can simplify the two select statementSELECT x FROM tableB WHERE field3 = "XXX" ?
Have using JOIN and currently the statement as below:
SELECT resultB from tableA a join
tableB b on
b.field4 = a.field1 or
b.field4 = a.field2 where
b.field3 = "XXX";
Further on, "XXX" is from below SQL statement:
SELECT DISTINCT(resultA) FROM tableC c WHERE c.field5 is false;
How to combine this to having result as below:
resultA(1), resultB(1)
resultA(2), resultB(2)
resultA(3), resultB(3)
...
Thanks.
You can join the tables.
SELECT
a.result
FROM tableA a
join tableB b on b.x = a.field1 or b.x = a.field2
where b.field3 = 'XXX'
https://www.w3schools.com/sql/sql_join.asp
You can do it with EXISTS:
SELECT a.result
FROM tableA a
WHERE EXISTS (
SELECT 1
FROM tableB b
WHERE b.field3 = 'XXX' AND b.x IN (a.field1, a.field2)
)
Although you can "simplify" the logic, if you care about performance that is probably not a good route. Instead, I would suggest using exists twice:
SELECT a.result
FROM tableA a
WHERE EXISTS (SELECT 1
FROM tableB b
WHERE b.x = a.field1 AND b.field3 = 'XXX'
) OR
EXISTS (SELECT 1
FROM tableB b
WHERE b.x = a.field2 AND b.field3 = 'XXX'
);
This can make use of an index on tableB(x, field3).
I have two tables and I want to select all values from "TABLE A" that have a different value in a column from "TABLE B".
I tried this
SELECT A.* FROM tableA A
left join tableB B ON A.id = B.id WHERE B.column <> 1;
But this just return the value that I want to ignore.
SELECT A.*
FROM tableA A
INNER JOIN tableB B
ON A.id = B.id
WHERE B.column != 1;
or
SELECT A.* FROM tableA A WHERE A.Id NOT IN (SELECT B.Id FROM tableB B WHERE B.column != 1)
Depends on your SQL you can use <> or !=
I would suggest not exists:
SELECT A.*
FROM tableA A
WHERE NOT EXISTS (SELECT 1 FROM tableB B WHERE A.id = B.id AND B.column = 1);
Using a JOIN can result in duplicated rows, if more than one row in B matches the JOIN condition.
I'am dealing with a pretty basic SQL query, but I cannot understand why the non matching records are not represented with the null values in right table.
I have table A and table B with a composite key and some data in table B that I know that they do not match the key in table A. However, the result set returns only rows with matching keys without non matching (null) records.
SELECT *
FROM TableA a LEFT JOIN
TableB b
ON a.Field1 = b.Field1 AND
a.Field2 = b.Field2
WHERE b.Field1 IS NULL
I was expecting to see records from table A and those records from table B that do not match to be represented by Nulls.
EDIT************************************************************
Link with sample data and tables:
https://drive.google.com/file/d/1PNlyqO4mwMBOGgQnWVlduiDKaDjSaE8v/view?usp=sharing
Last record in TableB should be seen because value for Field5 differs from value in TableA.
The problem with your attempt is that you start with the records in TableA then LEFT JOIN against TableB. This forces the engine to only display records from TableA, with additional rows/columns from TableB if they match, but not records from TableB that aren't on TableA.
Either you want to reverse the join order:
SELECT
*
FROM
TableB b
LEFT JOIN TableA a ON
a.Field1 = b.Field1 AND
a.Field2 = b.Field2
WHERE
a.Field1 IS NULL -- records from A table shouldn't exist
Or as RIGHT JOIN
SELECT
*
FROM TableA a RIGHT JOIN
TableB b
ON a.Field1 = b.Field1 AND
a.Field2 = b.Field2
WHERE a.Field1 IS NULL -- records from A table shouldn't exist
Or a FULL JOIN if you want records from both displayed, even if no match on the other table (no WHERE clause):
SELECT
*
FROM TableA a FULL JOIN
TableB b
ON a.Field1 = b.Field1 AND
a.Field2 = b.Field2
Your SQL assumes inner join when you put is null check to where clause. Try this:
SELECT *
FROM TableA a LEFT JOIN
TableB b
ON a.Field1 = b.Field1 AND
a.Field2 = b.Field2 AND
B.Field1 is null and B.Field2 is null <== Added.
Just wondering.... what is the difference between
select
a.field1, b.field1
from
table1 a
inner join
table2 b on a.field2 = b.field2
and
select
a.field1, b.field1
from
table1 a
inner join
(select field1, field
from table2 ) b on a.field2 = b.field2
I've seen this SQL query in one of the legacy systems that I am currently handling. I checked out the execution plan immediately to compare but the results seems the same.
Sorry for being so ignorant. :)
I think we use this to optimise performance.
Refer your first query;
select a.field1, b.field1
from table1 a
inner join table2 b
on a.field2 = b.field2
Actually you have used only 2 columns in table2
(b.field1 to display and b.field2 to join the tables)
So no use of retrieving all the fields in table2.
If table2 consist of highly weighted columns (eg: image, blob) your query response time going to be slow.
I've got a query that is:
SELECT DISTINCT a.field1, a.field2, a.field3, b.field1, b.field2, a.field4
FROM table1 a
JOIN table2 b ON b.fielda = a.fieldb
WHERE a.field1 = 'xxxx'
I run this and it returns three xxxx rows. I need all of the information listed above with the first field being distinct. Do I have the correct syntax for this?
In Postgres, you can use distinct on:
select distinct on (a.field1) a.field1, a.field2, a.field3, b.field1, b.field2, a.field4
from table1 a join
table2 b
on b.fielda = a.fieldb
where a.field1 = 'xxxx'
order by a.field1;
In either Postgres or SQL Server, you can use row_number():
select ab.*
from (select a.field1, a.field2, a.field3, b.field1, b.field2, a.field4,
row_number() over (partition by a.field1 order by a.field1) as seqnum
from table1 a join
table2 b
on b.fielda = a.fieldb
where a.field1 = 'xxxx'
) ab
where seqnum = 1;
Or, since you only want one row, you can use limit/top:
select a.field1, a.field2, a.field3, b.field1, b.field2, a.field4
from table1 a join
table2 b
on b.fielda = a.fieldb
where a.field1 = 'xxxx'
limit 1;
In SQL Server:
select top 1 a.field1, a.field2, a.field3, b.field1, b.field2, a.field4
from table1 a join
table2 b
on b.fielda = a.fieldb
where a.field1 = 'xxxx';
One option is to use row_number():
with cte as (
select distinct a.field1, a.field2, a.field3, b.field1, b.field2, a.field4,
row_number() over (partition by a.field1 order by a.field1) rn
from table1 a
join table2 b on b.fielda = a.fieldb
where a.field1 = 'xxxx'
)
select *
from cte
where rn = 1
But you need to define which record to take. This orders by field1 which essentially will take a random record...
As you can read from your comments, DISTINCT cannot work for you. It gives you distinct rows. What you need instead is an aggregation, so as to get from three records to only one.
So the first comment you got (by sgeddes) is already the answer you need: "what values should the other columns have?". How shall the dbms know? You didn't tell it.
One row per field1 usually means GROUP BY field1. Then for every other field decide what you want to see: The maximum of field2 maybe? The minimum of field3? The avarage of field4?
select a.field1, max(a.field2), min(a.field3), count(b.field1), sum(b.field2), avg(a.field4)
from table1 a
join table2 b on b.fielda = a.fieldb
where a.field1 = 'xxxx'
group by a.field1;