PostgreSQL query to check if a row is referenced from multiple tables - sql

I have one master table A, and two different sub tables (B, C) which is referenced by foreign key in table A, I want to check if a row exists with foreign key fk-1 in tables B or C.
I tried by selecting rows from A with exists clause on B & C which are selected using the fk-1 further OR'ed together and found the result.
SELECT A.id FROM A where A.id = fk-1 AND
(
EXISTS (select B.id from B where B.fk_1 = fk-1)
OR EXISTS (select C.id from C where C.fk_1 = fk-1)
);
Can this be optimised or is there any better ways to do this.
Thanks in advance.

For a single check that is the fastest given you indexed A.id, B.fk_1 and C.fk_1
A common pitfall is calling this SQL for every single row you might want to check. The check can be way faster if all rows are checked at once. (Faster per row checked)
So in case you want to check a bunch of them at the same time, you could do:
SELECT A.id FROM A WHERE A.id IN (
SELECT B.fk_1 FROM B [WHERE xxx]
UNION SELECT C.fk_1 FROM C [WHERE xxx])
Use [WHERE xxx] to place a WHERE to filter the relevant results you might want.
One recommeded check would be "WHERE B.fk_1 IS NOT NULL" to filter out records without FK.

Related

SQL Inner Join w/ Unique Vals

Questions similar to this one about using DISTINCT values in an INNER JOIN have been asked a few times, but I don't see my (simple) use case.
Problem Description:
I have two tables Table A and Table B. They can be joined via a variable ID. Each ID may appear on multiple rows in both Table A and Table B.
I would like to INNER JOIN Table A and Table B on the distinct values of ID which appear in Table B and select all rows of Table A with a Table A.ID which appears matching some condition in Table B.
What I want:
I want to make sure I get only one copy of each row of Table A with a Table A.ID matching a Table B.ID which satisfies [some condition].
What I would like to do:
SELECT * FROM TABLE A
INNER JOIN (
SELECT DISTINCT ID FROM TABLE B WHERE [some condition]
) ON TABLE A.ID=TABLE B.ID
Additionally:
As a further (really dumb) constraint, I can't say anything about the SQL standard in use, since I'm executing the SQL query through Stata's odbc load command on a database I have no information about beyond the variable names and the fact that "it does accept SQL queries," ( <- this is the extent of the information I have).
If you want all rows in a that match an id in b, then use exists:
select a.*
from a
where exists (select 1 from b where b.id = a.id);
Trying to use join just complicates matters, because it both filters and generates duplicates.

SQL joining without common keys

If I have a table with the following atributes:
A: id, race, key1
B: key1, driving_id
C: driving_id, fines
why would it be possible for us to have the following queries:
select A.id, A.race, B.key1, B.driving_id, C.fines
from A
left join B on A.key1=B.key1
left join C on B.driving_id= C.driving_id
even though there are no common keys for A and C in the last line of the SQL query?
The query that you have written is parsed as:
select A.id, A.race, B.key1, B.driving_id, C.fines
from (A left join
B
on A.key1 = B.key1
) left join
C
on B.driving_id = C.driving_id;
That is, C is -- logically -- being joined to the result of A and B. Any keys from those tables would be valid.
Although your original query is the preferable way to write it, you could also write:
select ab.id, ab.race, ab.key1, ab.driving_id, C.fines
from (select . . . -- whatever columns you need
from A left join
B
on A.key1 = B.key1
) ab left join
C
on ab.driving_id = C.driving_id;
The three versions are all equivalent, but the last one may help you better understand what is going on with joins between multiple tables.
Without seeing sample data from the three tables, we might not know for sure in the query makes any sense or would even run. Assuming it does run, then there should be nothing wrong with the join logic. For example, it is perfectly possible for table B to have a key key1 which relates to the A table, while at the same time having another key driving_id which relates to the C table. Note that either of these keys (but not both) could be a primary key in the B table, and if not then each key would be a foreign key.
The LEFT JOIN keyword returns all records from the left table (tableA), and the matched records from the right table (tableB). Furthermore, Similarly it returns all records from the result of first set, and the matched records from the right table (tableC). The result is NULL from the right side, if there is no match.
So A & C have a link through table B.

SAS SQL - Two tables (A and B) where the only shared field is the key - want everything in A that is NOT in B

I have two tables where the fields are different except for a shared key. I need to only keep the records with keys that are in A and NOT in B. I don't want records that are only in B or records that are in both A and B (so to exclude anything in the inner join).
I see SAS SQL references to "EXCEPT" but it seems that can only be used if all fields are shared across the two tables since a key is not used. Is there another way?
Do you have to use SQL?
data want ;
merge A (in=in1) B(keep=id in=in2);
by id;
if in1 and not in2 ;
run;
Just use NOT EXISTS:
proc sql;
select a.*
from a
where not exists (select 1 from b where a.key = b.key);
You could use the exists operator:
SELECT *
FROM a
WHERE NOT EXISTS (SELECT *
FROM b
WHERE a.id = b.id)
One more approach with except is to get all id's (or the key column) in A that are not in B. Then use those ids to get all records from A.
select a.*
from a
inner join (select id from a except select id from B) t
on a.id = t.id

Delete Query using Inner joins on more than two tables

I want to delete records from a table using inner joins on more than two tables. Say if I have tables A,B,C,D with A's pk shared in all other mentioned tables. Then how to write a delete query to delete records from table D using inner joins on table B and A since the conditions are fetched from these two tables. I need this query from DB2 perspective. I am not using IN clause or EXISTS because of their limitations.
From your description, I take the schema as:
A(pk_A, col1, col2, ...)
B(pk_B, fk_A, col1, col2, ..., foreign key fk_A references A(pk_A))
C(pk_c, fk_A, col1, col2, ..., foreign key fk_A references A(pk_A))
D(pk_d, fk_A, col1, col2, ..., foreign key fk_A references A(pk_A))
As you say DB2 will allow only 1000 rows to be deleted if IN clause is used. I don't know about DB2, but Oracle allows only 1000 manual values inside the IN clause. There is not such limit on subquery results in Oracle at least. EXISTS should not be a problem as any database, including Oracle and DB2 checks only for existence of rows, be it one or a million.
There are three scenarios on deleting data from table D:
You want to delete data from table D in which fk_A (naturally) refers to a record in table A using column A.pk_A:
DELETE FROM d
WHERE EXISTS (
SELECT 1
FROM a
WHERE a.pk_A = d.fk_A
);
You want to delete data from table D in which fk_A refers to a record in table A, and that record in table A is also referred to by column B.fk_A. We do not want to delete the data from D that is in A but not in B. We can write:
DELETE FROM d
WHERE EXISTS (
SELECT 1
FROM a
INNER JOIN b ON a.pk_A = b.fk_A
WHERE a.pk_A = d.fk_A
);
The third scenario is when we have to delete data in table D that refers to a record in table A, and that record in A is also referred by columns B.fk_A and table C.fk_A. We want to delete only that data from table D which is common in all the four tables - A, B, C and D. We can write:
DELETE FROM d
WHERE EXISTS (
SELECT 1
FROM a
INNER JOIN b ON a.pk_A = b.fk_A
INNER JOIN c ON a.pk_A = c.fk_A
WHERE a.pk_A = d.fk_A
);
Depending upon your requirement you can incorporate one of these queries.
Note that "=" operator would return an error if the subquery retrieves more than one line. Also, I don't know if DB2 supports ANY or ALL keywords, hence I used a simple but powerful EXISTS keyword which performs faster than IN, ANY and ALL.
Also, you can observe here that the subqueries inside the EXISTS clause use "SELECT 1", not "SELECT a.pk" or some other column. This is because EXISTS, in any database, looks for only existence of rows, not for any particular values inside the columns.
Based on 'Using SQL to delete rows from a table using INNER JOIN to another table'
The key is that you specify the name of the table to be deleted from
as the SELECT. So, the JOIN and WHERE do the selection and limiting,
while the DELETE does the deleting. You're not limited to just one
table, though. If you have a many-to-many relationship (for instance,
Magazines and Subscribers, joined by a Subscription) and you're
removing a Subscriber, you need to remove any potential records from
the join model as well.
DELETE subscribers
FROM subscribers INNER JOIN subscriptions
ON subscribers.id = subscriptions.subscriber_id
INNER JOIN magazines
ON subscriptions.magazine_id = magazines.id
WHERE subscribers.name='Wes';
delete from D
where fk = (select d.fk from D d,A a,B b where a.pk = b.fk and b.fk = d.fk )
this should work

PostgreSQL libpq: PQNumber and column aliases

In a Postgres libpq sql there is a function PQfnumber: Returns the column number associated with the given column name.
Lets say I have a select:
select a.*, b.* from a, b where a.id = b.id
now if I will call
number = PQfnumber(pgresult, "a.id");
it will return -1.
Correct way is to call:
number = PQfnumber(pgresult, "id");
which returns position of a.id. So how would I need to call the function to get column number of b.id?
The only way around it seems to write a different select:
select a.id as a_id, a.*, b.id as b_id, b.* from a, b where a.id = b.id
number = PQfnumber(pgresult, "b_id");
Any other way around this?
No, you've found the right way.
Of course, with a.id = b.id in an inner join (as in the example code), why would you care which column you were looking at? Also, there are good reasons not to have just an id column as the primary key of every table. Even if a lot of tables have single-column integer keys, if you consistently name columns which hold a primary key to a given table, terser and more efficient syntax like JOIN ... USING is available.
If you use construct like this:
number = PQfnumber(pgresult, "a.id");
then you're query should contain a column alias like this:
SELECT a.id AS "a.id", b.* FROM a, b WHERE a.id = b.id;
You do have ambiguity in your code, should you tried such query in the PL/pgSQL language, you would have received the 42702: ambiguous_column exception.
I see the only way out here — you should give unique aliases for all the ambitious columns of your query. In fact, it is a good practice to give aliases for all columns, I always do so.