What does it mean to INNER JOIN before an INSERT? - sql

I have the following case where I'm doing an insert into a table, however, before I can do that, I to grab a foreign key ID that's associated with another table. That foreign key ID is not a simply look up, but rather requires an INNER JOIN of two other tables to be able to get that ID.
So, what I'm currently doing is the following:
Inner joining A, B and grabbing the ID that I need.
Once I resolve the value from above, I insert into table C with
the foreign key that I got from step 1.
Now, I was wondering if there is a better way for doing this. Could I do the join of table A and B and insert into table C all in one statement? This is where I was getting confused on what it means to INNER JOIN across tables and then INSERT. Are you potentially inserting into multiple tables?

You can use the insert-select syntax to insert the results of a query (which may or may not involve a join) to another table. E.g.:
INSERT INTO C
SELECT col_from_a, col_from_b
FROM a
JOIN b ON a.id = b.id

Related

SQL Inner Join w/ Unique Vals

Questions similar to this one about using DISTINCT values in an INNER JOIN have been asked a few times, but I don't see my (simple) use case.
Problem Description:
I have two tables Table A and Table B. They can be joined via a variable ID. Each ID may appear on multiple rows in both Table A and Table B.
I would like to INNER JOIN Table A and Table B on the distinct values of ID which appear in Table B and select all rows of Table A with a Table A.ID which appears matching some condition in Table B.
What I want:
I want to make sure I get only one copy of each row of Table A with a Table A.ID matching a Table B.ID which satisfies [some condition].
What I would like to do:
SELECT * FROM TABLE A
INNER JOIN (
SELECT DISTINCT ID FROM TABLE B WHERE [some condition]
) ON TABLE A.ID=TABLE B.ID
Additionally:
As a further (really dumb) constraint, I can't say anything about the SQL standard in use, since I'm executing the SQL query through Stata's odbc load command on a database I have no information about beyond the variable names and the fact that "it does accept SQL queries," ( <- this is the extent of the information I have).
If you want all rows in a that match an id in b, then use exists:
select a.*
from a
where exists (select 1 from b where b.id = a.id);
Trying to use join just complicates matters, because it both filters and generates duplicates.

SQL joining without common keys

If I have a table with the following atributes:
A: id, race, key1
B: key1, driving_id
C: driving_id, fines
why would it be possible for us to have the following queries:
select A.id, A.race, B.key1, B.driving_id, C.fines
from A
left join B on A.key1=B.key1
left join C on B.driving_id= C.driving_id
even though there are no common keys for A and C in the last line of the SQL query?
The query that you have written is parsed as:
select A.id, A.race, B.key1, B.driving_id, C.fines
from (A left join
B
on A.key1 = B.key1
) left join
C
on B.driving_id = C.driving_id;
That is, C is -- logically -- being joined to the result of A and B. Any keys from those tables would be valid.
Although your original query is the preferable way to write it, you could also write:
select ab.id, ab.race, ab.key1, ab.driving_id, C.fines
from (select . . . -- whatever columns you need
from A left join
B
on A.key1 = B.key1
) ab left join
C
on ab.driving_id = C.driving_id;
The three versions are all equivalent, but the last one may help you better understand what is going on with joins between multiple tables.
Without seeing sample data from the three tables, we might not know for sure in the query makes any sense or would even run. Assuming it does run, then there should be nothing wrong with the join logic. For example, it is perfectly possible for table B to have a key key1 which relates to the A table, while at the same time having another key driving_id which relates to the C table. Note that either of these keys (but not both) could be a primary key in the B table, and if not then each key would be a foreign key.
The LEFT JOIN keyword returns all records from the left table (tableA), and the matched records from the right table (tableB). Furthermore, Similarly it returns all records from the result of first set, and the matched records from the right table (tableC). The result is NULL from the right side, if there is no match.
So A & C have a link through table B.

How to drop one join key when joining two tables

I have two tables. Both have lot of columns. Now I have a common column called ID on which I would join.
Now since this variable ID is present in both the tables if I do simply this
select a.*,b.*
from table_a as a
left join table_b as b on a.id=b.id
This will give an error as id is duplicate (present in both the tables and getting included for both).
I don't want to write down separately each column of b in the select statement. I have lots of columns and that is a pain. Can I rename the ID column of b in the join statement itself similar to SAS data merge statements?
I am using Postgres.
Postgres would not give you an error for duplicate output column names, but some clients do. (Duplicate names are also not very useful.)
Either way, use the USING clause as join condition to fold the two join columns into one:
SELECT *
FROM tbl_a a
LEFT JOIN tbl_b b USING (id);
While you join the same table (self-join) there will be more duplicate column names. The query would make hardly any sense to begin with. This starts to make sense for different tables. Like you stated in your question to begin with: I have two tables ...
To avoid all duplicate column names, you have to list them in the SELECT clause explicitly - possibly dealing out column aliases to get both instances with different names.
Or you can use a NATURAL join - if that fits your unexplained use case:
SELECT *
FROM tbl_a a
NATURAL LEFT JOIN tbl_b b;
This joins on all columns that share the same name and folds those automatically - exactly the same as listing all common column names in a USING clause. You need to be aware of rules for possible NULL values ...
Details in the manual.

insert multiple records into multiple columns of a table from many tables

I want to insert multiple records into multiple columns of a table from many tables. Below is my query, but I just get to insert the records into the first column. The other columns populate with nulls. Can you let me know what am I doing wrong?
INSERT INTO [dbo].[dim_one_staging] ([Parent], [Child], [Child_Alias], [Operator])
SELECT
p.[Parent], c.[Child], a.[Child_Alias], o.[Child_Operator]
FROM
[dbo].[Staging_Parent] AS p
INNER JOIN
[dbo].[Staging_Child] AS c ON p.[id] = c.[id]
INNER JOIN
[dbo].[Staging_Child_Alias] AS a ON c.[id] = a.[id]
INNER JOIN
[dbo].[Staging_Operator] AS o ON a.[id] = o.[id]
Your query is syntactically correct. That doesn't mean it does what you want it to do.
It could be that you have no values in
,c.[Child]
,a.[Child_Alias]
,o.[Child_Operator]
for the records that meet the rest of the query conditions and thus null is the correct value.
It could be that you have no valaues in the join tables for those fields but you should have values, in which case there is a bug in the way the data in being entered into these tables.
Or it could be that you are trying to get values froma table where the value is not required and put them into a table where it is and thus need to use coalesce (or default values) to define what should go in there if the value is null.
Yet another possibility is that there is trigger on the table that is nulling the values out.
Only you can detrmine what the problem is from teh data structure you have and the meaning attached to the data. I don't know how to fix your problem because I don't actually understand your datamodel as far as meaning (as opposed structure.)

Delete Query using Inner joins on more than two tables

I want to delete records from a table using inner joins on more than two tables. Say if I have tables A,B,C,D with A's pk shared in all other mentioned tables. Then how to write a delete query to delete records from table D using inner joins on table B and A since the conditions are fetched from these two tables. I need this query from DB2 perspective. I am not using IN clause or EXISTS because of their limitations.
From your description, I take the schema as:
A(pk_A, col1, col2, ...)
B(pk_B, fk_A, col1, col2, ..., foreign key fk_A references A(pk_A))
C(pk_c, fk_A, col1, col2, ..., foreign key fk_A references A(pk_A))
D(pk_d, fk_A, col1, col2, ..., foreign key fk_A references A(pk_A))
As you say DB2 will allow only 1000 rows to be deleted if IN clause is used. I don't know about DB2, but Oracle allows only 1000 manual values inside the IN clause. There is not such limit on subquery results in Oracle at least. EXISTS should not be a problem as any database, including Oracle and DB2 checks only for existence of rows, be it one or a million.
There are three scenarios on deleting data from table D:
You want to delete data from table D in which fk_A (naturally) refers to a record in table A using column A.pk_A:
DELETE FROM d
WHERE EXISTS (
SELECT 1
FROM a
WHERE a.pk_A = d.fk_A
);
You want to delete data from table D in which fk_A refers to a record in table A, and that record in table A is also referred to by column B.fk_A. We do not want to delete the data from D that is in A but not in B. We can write:
DELETE FROM d
WHERE EXISTS (
SELECT 1
FROM a
INNER JOIN b ON a.pk_A = b.fk_A
WHERE a.pk_A = d.fk_A
);
The third scenario is when we have to delete data in table D that refers to a record in table A, and that record in A is also referred by columns B.fk_A and table C.fk_A. We want to delete only that data from table D which is common in all the four tables - A, B, C and D. We can write:
DELETE FROM d
WHERE EXISTS (
SELECT 1
FROM a
INNER JOIN b ON a.pk_A = b.fk_A
INNER JOIN c ON a.pk_A = c.fk_A
WHERE a.pk_A = d.fk_A
);
Depending upon your requirement you can incorporate one of these queries.
Note that "=" operator would return an error if the subquery retrieves more than one line. Also, I don't know if DB2 supports ANY or ALL keywords, hence I used a simple but powerful EXISTS keyword which performs faster than IN, ANY and ALL.
Also, you can observe here that the subqueries inside the EXISTS clause use "SELECT 1", not "SELECT a.pk" or some other column. This is because EXISTS, in any database, looks for only existence of rows, not for any particular values inside the columns.
Based on 'Using SQL to delete rows from a table using INNER JOIN to another table'
The key is that you specify the name of the table to be deleted from
as the SELECT. So, the JOIN and WHERE do the selection and limiting,
while the DELETE does the deleting. You're not limited to just one
table, though. If you have a many-to-many relationship (for instance,
Magazines and Subscribers, joined by a Subscription) and you're
removing a Subscriber, you need to remove any potential records from
the join model as well.
DELETE subscribers
FROM subscribers INNER JOIN subscriptions
ON subscribers.id = subscriptions.subscriber_id
INNER JOIN magazines
ON subscriptions.magazine_id = magazines.id
WHERE subscribers.name='Wes';
delete from D
where fk = (select d.fk from D d,A a,B b where a.pk = b.fk and b.fk = d.fk )
this should work