PostgreSQL condition: one table equals another - sql

I have seen commands like DELETE FROM A USING B WHERE A=B in PostgreSQL scripts. Can anyone point me a reference and explain the logic behind the A=B? Is it a good way to match all columns of two tables? Many thanks!

This is a form of JOIN -- because Postgres doesn't support "join" for this purpose.
The equivalent SELECT would be:
select . . .
from a join
b
on a.a = b.b;
You can also express this using a correlated subquery:
delete from a
where exists (select 1 from b where b.b = a.a);

Related

SAS SQL - Two tables (A and B) where the only shared field is the key - want everything in A that is NOT in B

I have two tables where the fields are different except for a shared key. I need to only keep the records with keys that are in A and NOT in B. I don't want records that are only in B or records that are in both A and B (so to exclude anything in the inner join).
I see SAS SQL references to "EXCEPT" but it seems that can only be used if all fields are shared across the two tables since a key is not used. Is there another way?
Do you have to use SQL?
data want ;
merge A (in=in1) B(keep=id in=in2);
by id;
if in1 and not in2 ;
run;
Just use NOT EXISTS:
proc sql;
select a.*
from a
where not exists (select 1 from b where a.key = b.key);
You could use the exists operator:
SELECT *
FROM a
WHERE NOT EXISTS (SELECT *
FROM b
WHERE a.id = b.id)
One more approach with except is to get all id's (or the key column) in A that are not in B. Then use those ids to get all records from A.
select a.*
from a
inner join (select id from a except select id from B) t
on a.id = t.id

SQL JOIN that uses OR in the ON statement

I’m running a SQL query on Google BigQuery and want to do this kind of SQL command:
SELECT ... FROM A JOIN B
ON A.col1=B.col1 AND (A.col2=B.col2 OR A.col3=B.col3)
This fails though with the error:
Error: ON clause must be AND of = comparisons of one field name from each table, with all field names prefixed with table name.
Is there a way to rewrite the SQL to get this kind of functionality?
Turns out this works, which is equivalent to a UNION ALL statement in Google BigQuery. Not sure how to do it if you just want a UNION, since DISTINCT is actually not supported in BigQuery. Luckily it's enough for me as is.
SELECT ... FROM
(SELECT ... FROM A JOIN B ON A.col1=B.col1 AND A.col2=B.col2),
(SELECT ... FROM A JOIN B ON A.col1=B.col1 AND A.col3=B.col3)
This should work:
SELECT ... FROM A CROSS JOIN B
WHERE A.col1=B.col1 AND (A.col2=B.col2 OR A.col3=B.col3)

Difference between tables with the same structure

I have two tables with the same structure and with slightly different rows - Table A, and Table B.
I would like to extract all the rows that are contained in table A but not in Table B.
CAn you help me do that?
By the way - Table A is in definition form, it does not previously created.
And additionaly - I have 15 sql scripts to analyse.
I would like to find some software that can help me with visualization of the entire proces (composed of 15 sql scripts).
Can you suggest something good?
try
SELECT * FROM Table_A
EXCEPT
SELECT * FROM Table_B
See http://en.wikipedia.org/wiki/Set_operations_%28SQL%29#EXCEPT_operator
One way is to use an left outer join this selects all in the first table and then matches these in the second. If the extra columns coming from the second table a NULL then there is no matching record in the second.
Suppose columns a to c are unique in both tables
select a.*
from tableA a
left outer join tableB on a.a = b.a and ... a.c = b.c
where b.a is null and ... and b.c is null
I was facing on a regular basis the same problem so I wrote my own software that can handle large databases (dozens of columns, tens of thousands of line) efficiently. I imagine you solved your problem but I post here if anybody else face the same problem.
The software is in R and can query and save to a MySQL server. To test it out though it may be easier to export your bases to two csv files as configuring the MySQL link (via RMySQL) may take a little time. Check it out on gitHub.
We use it on a very regular basis in my team and are happy with it.
A pain in the butt to write the query manually, so there are tools (like RedGate's SQL Compare) that do it for you. But...
SELECT
A.*
,B.*
FROM
A LEFT OUTER JOIN B
ON A.Field1 = B.Field1
AND A.Field2 = B.Field2
... -- join on each field
WHERE
B.Field1 IS NULL OR
B.Field2 IS NULL OR
... -- check for any NULL fields in B
If you're not interested in all data differences and only key differences then just change the list of fields you join on and filter on to the key fields.

SQL Method of checking that INNER / LEFT join doesn't duplicate rows

Is there a good or standard SQL method of asserting that a join does not duplicate any rows (produces 0 or 1 copies of the source table row)? Assert as in causes the query to fail or otherwise indicate that there are duplicate rows.
A common problem in a lot of queries is when a table is expected to be 1:1 with another table, but there might exist 2 rows that match the join criteria. This can cause errors that are hard to track down, especially for people not necessarily entirely familiar with the tables.
It seems like there should be something simple and elegant - this would be very easy for the SQL engine to detect (have I already joined this source row to a row in the other table? ok, error out) but I can't seem to find anything on this. I'm aware that there are long / intrusive solutions to this problem, but for many ad hoc queries those just aren't very fun to work out.
EDIT / CLARIFICATION: I'm looking for a one-step query-level fix. Not a verification step on the results of that query.
If you are only testing for linked rows rather than requiring output, then you'd use EXISTS.
More correctly, you need a "semi-join" but this isn't supported by most RDBMS unless as EXISTS
SELECT a.*
FROM TableA a
WHERE EXISTS (SELECT * FROM TableB b WHERE a.id = b.id)
Also see:
Using 'IN' with a sub-query in SQL Statements
EXISTS vs JOIN and use of EXISTS clause
SELECT JoinField
FROM MyJoinTable
GROUP BY JoinField
HAVING COUNT(*) > 1
LIMIT 1
Is that simple enough? Don't have Postgres but I think it's valid syntax.
Something along the lines of
SELECT a.id, COUNT(b.id)
FROM TableA a
JOIN TableB b ON a.id = b.id
GROUP BY a.id
HAVING COUNT(b.id) > 1
Should return rows in TableA that have more than one associated row in TableB.

SQL (any) Request for insight on a query optimization

I have a particularly slow query due to the vast amount of information being joined together. However I needed to add a where clause in the shape of id in (select id from table).
I want to know if there is any gain from the following, and more pressing, will it even give the desired results.
select a.* from a where a.id in (select id from b where b.id = a.id)
as an alternative to:
select a.* from a where a.id in (select id from b)
Update:
MySQL
Can't be more specific sorry
table a is effectively a join between 7 different tables.
use of * is for examples
Edit, b doesn't get selected
Your question was about the difference between these two:
select a.* from a where a.id in (select id from b where b.id = a.id)
select a.* from a where a.id in (select id from b)
The former is a correlated subquery. It may cause MySQL to execute the subquery for each row of a.
The latter is a non-correlated subquery. MySQL should be able to execute it once and cache the results for comparison against each row of a.
I would use the latter.
Both queries you list are the equivalent of:
select a.*
from a
inner join b on b.id = a.id
Almost all optimizers will execute them in the same way.
You could post a real execution plan, and someone here might give you a way to speed it up. It helps if you specify what database server you are using.
YMMV, but I've often found using EXISTS instead of IN makes queries run faster.
SELECT a.* FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.id = a.id)
Of course, without seeing the rest of the query and the context, this may not make the query any faster.
JOINing may be a more preferable option, but if a.id appears more than once in the id column of b, you would have to throw a DISTINCT in there, and you more than likely go backwards in terms of optimization.
I would never use a subquery like this. A join would be much faster.
select a.*
from a
join b on a.id = b.id
Of course don't use select * either (especially never use it when doing a join as at least one field is repeated) and it wastes network resources to send unnneeded data.
Have you looked at the execution plan?
How about
select a.*
from a
inner join b
on a.id = b.id
presumably the id fields are primary keys?
Select a.* from a
inner join (Select distinct id from b) c
on a.ID = c.AssetID
I tried all 3 versions and they ran about the same. The execution plan was the same (inner join, IN (with and without where clause in subquery), Exists)
Since you are not selecting any other fields from B, I prefer to use the Where IN(Select...) Anyone would look at the query and know what you are trying to do (Only show in a if in b.).
your problem is most likely in the seven tables within "a"
make the FROM table contain the "a.id"
make the next join: inner join b on a.id = b.id
then join in the other six tables.
you really need to show the entire query, list all indexes, and approximate row counts of each table if you want real help