How to find differences between 2 tables with different key names - sql

So say I have 2 tables. TableA has keys FirstName, LastName, Age, Location. TableB has keys 1FirstName, 1LastName, 1Age, 1Location. Table A is a working table, and TableB is a reference table. How could I go about finding what records exist in TableB that DO NOT exist in TableA

Using an outer join will join all rows of left table to either right table or NULL
Then including only NULL from right hand side will give you what you need.
TableB as left side
SELECT
*
FROM (
SELECT
b.*,
a.*
FROM TableB b
LEFT OUTER JOIN TableA a
ON
b.1FirstName = a.Firstname
AND
b.1LastName = a.Lastname
AND
b.1Age = a.Age
AND
b.1Location = a.Location
) q
WHERE q.FirstName IS NULL

Related

Full Outer Join failing to return all records from both tables

I have a pair of tables I need to join, I want to return any record that's in tableA, tableB or both. I think I need a FULL OUTER JOIN
This query return 1164 records
SELECT name FROM tableA
WHERE reportDay = '2022-Apr-05'
And this one return 3339 records
SELECT name FROM tableB
WHERE reportDay = '2022-Apr-05'
And this one returns 3369 records (so there must be 30 records in tableA that aren't in tableB)
select distinct name FROM tableA where reportDay = '2022-Apr-05'
union distinct
select distinct name FROM tableB where reportDay = '2022-Apr-05'
I want to obtain a list of all matching records in either table. The query above returns 3369 records, so a FULL OUTER JOIN should also return 3369 rows (I think). My best effort so far is shown below. It returns 1164 rows and returns what looks to me to be a left join between tableA and tableB.
SELECT tableA.name.*, tableB.name.*
FROM tableA
FULL OUTER JOIN tableB
ON (tableA.name = tableB.name and tableB.reportDay = '2022-Apr-05')
WHERE tableA.reportDay = '2022-Apr-05'
Help appreciated. (if this looks question looks familiar, it's a follow-on question to this one )
UPDATE - Sorry (#forpas) to keep moving the goalposts - I'm trying to match test data to real-data scenario's.
DROP TABLE tableA;
DROP TABLE tableB;
CREATE TABLE tableA (name VARCHAR(10),
reportDay DATE,
val1 INTEGER,
val2 INTEGER);
CREATE TABLE tableB (name VARCHAR(10),
reportDay DATE,
test1 INTEGER,
test2 INTEGER);
INSERT INTO tableA values ('A','2022-Apr-05',1,2),
('B','2022-Apr-05',3,4), ('C','2022-Apr-05',5,6),
('A','2022-Apr-06',1,2), ('B','2022-Apr-06',3,4),
('C','2022-Apr-06',5,6), ('Z','2022-Apr-04',5,6),
('Z','2022-Apr-06',5,6) ;
INSERT INTO tableB values ('A','2022-Apr-03',5,6),
('B','2022-Apr-04',11,22), ('B','2022-Apr-05',11,22),
('C','2022-Apr-05',33,44), ('D','2022-Apr-05',55,66),
('B','2022-Apr-06',11,22), ('C','2022-Apr-06',33,44),
('D','2022-Apr-06',55,66), ('Q','2022-Apr-06',5,6);
SELECT tableA.*, tableB.*
FROM tableA
FULL OUTER JOIN tableB
ON (tableA.name = tableB.name and tableB.reportDay = '2022-Apr-05'
AND tableA.reportDay = '2022-Apr-05' )
For this data, I'd hope to see 4 rows of data 'A' from tableA only, 'B' and 'C' from both tables, and 'D' from table B only. I'm after the 5th April records only! The query (shown above) suggested by #forpas works except that the 'A' record in tableA doesn't get returned.
UPDATE - FINAL EDIT AND ANSWER!
Ok, the solution seem to be to concetenate the two fields together before joining....
SELECT a.*, b.*
FROM tableA a FULL OUTER JOIN tableB b
ON (b.name || b.reportDay) = (a.name || a.reportDay)
WHERE (a.reportDay = '2022-Apr-05' OR a.reportDay IS NULL)
AND (b.reportDay = '2022-Apr-05' OR b.reportDay IS NULL);
The condition for the date should be placed in a WHERE clause:
SELECT a.*, b.*
FROM tableA a FULL OUTER JOIN tableB b
ON b.name = a.name AND a.reportDay = b.reportDay
WHERE '2022-Apr-05' IN (a.reportDay, b.reportDay);
or:
SELECT a.*, b.*
FROM tableA a FULL OUTER JOIN tableB b
ON b.name = a.name
WHERE (a.reportDay = '2022-Apr-05' OR a.reportDay IS NULL)
AND (b.reportDay = '2022-Apr-05' OR b.reportDay IS NULL);
See the demo.

How to join two tables on multiple columns using OR condition in bigquery SQL

Lets say I have two tables.
First table shown below:
tableA
Second table
tableB
Now I want to write a query That will join the two tables above on either name OR email OR phone.
Something like:
SELECT * FROM tableA
LEFT JOIN tableB
ON
(tableA.name_A = tableB.name_B OR tableA.email_A = tableB.email_B OR tableA.phone_A = tableB.phone_B)
And it should produce a table something like this
If you notice,
John matches rows between tableA and tableB on name.
Ally/allie matches rows between tableA and tableB on email.
Sam/Samual matches rows between tableA and tableB on phone
When I try to do this same query though I receive an
error that says LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
I am using BigQuery.
Please help, cheers
Try INNER JOIN
SELECT * FROM tableA
INNER JOIN tableB
ON
(tableA.name_A = tableB.name_B OR tableA.email_A = tableB.email_B OR tableA.phone_A = tableB.phone_B)
or CROSS JOIN:
SELECT * FROM tableA
CROSS JOIN tableB
WHERE
tableA.name_A = tableB.name_B
OR tableA.email_A = tableB.email_B
OR tableA.phone_A = tableB.phone_B
or UNION DISTINCT:
SELECT * FROM tableA
LEFT JOIN tableB
ON tableA.name_A = tableB.name_B
UNION DISTINCT
SELECT * FROM tableA
LEFT JOIN tableB
ON tableA.email_A = tableB.email_B
UNION DISTINCT
SELECT * FROM tableA
LEFT JOIN tableB
ON tableA.phone_A = tableB.phone_B
could you try by using parenthesis
SELECT * FROM tableA
LEFT JOIN tableB
ON
(tableA.name_A = tableB.name_B) OR
(tableA.email_A = tableB.email_B) OR
(tableA.phone_A = tableB.phone_B)

Optimizing SQL Query - Joining 4 tables

I am trying to join 4 tables. Currently I've achieved it by doing this.
SELECT columns
FROM tableA
LEFT OUTER JOIN tableB ON tableB.address_id = tableA.address_id
INNER JOIN tableC ON tableC.company_id = tableA.company_id AND tableC.client_id = ?
UNION
SELECT columns
FROM tableA
LEFT OUTER JOIN tableB ON tableB.address_id = tableA.gaddress_id
INNER JOIN tableD ON tableD.company_id = tableA.company_id AND tableD.branch_id = ?
The structure of tableC and tableD is very similar. Let's say that tableC contains data for clients. And tableD contains data for client's branch. tableA are companies and tableB are addresses My goal is to get data from tableA that are joined to table B (All companies that has addresses) and all the data from tableD and also from tableC.
This wroks nice, but I am afraid that is would be very slow.
I think you can trick it like this:
First UNION between C,D and only the join to the rest of the query, it should improve the query significantly :
SELECT columns
FROM TableA
LEFT OUTER JOIN tableB ON tableB.address_id = tableA.address_id
INNER JOIN(SELECT Columns,'1' as ind_where FROM tableC
UNION ALL
SELECT Columns,'2' FROM TableD) joined_Table
ON (joined_Table.company_id = tableA.company_id AND joined_Table.New_Col_ID= ?)
The New_Col_ID -> just select both branch_id and client_id in the same column and alias it as New_Col_ID or what ever
In addition you can index the tables(if not exists yet) :
TableA(address_id,company_id)
TableB(address_id)
TableC(company_id,client_id)
TableD(company_id,branch_id)
Why should that be slow? You select client adresses and branch addresses and show the complete result. That seems straight-forward.
You join on IDs and this should be fast (as there should be indexes available accordingly). You may want to introduce composite indexes on
create index idx_c on tableC(client_id, company_id)
and
create index idx_d on tableD(branch_id, company_id)
However: UNION is a lot of work for the DBMS, because it has to look for and eliminate duplicates. Can there even be any? Otherwise use UNION ALL.
Try CTE so that you don't have to go through TableA and TableB twice for the union.
; WITH TempTable (Column1, Column2, ...)
AS ( SELECT columns
FROM tableA
LEFT OUTER JOIN tableB
ON tableB.address_id = tableA.gaddress_id
)
SELECT Columns
FROM TempTable
INNER JOIN tableC
ON tableC.company_id = tableA.company_id AND tableC.client_id = ?
UNION
SELECT Columns
FROM TempTable
INNER JOIN tableD ON tableD.company_id = tableA.company_id AND tableD.branch_id = ?

Cartesian join two tables with no records

I have to join Table A (tax related) to Table B (customer related)
I pull at most 1 record but sometimes no record.
Now I need to return the combined record to the user
I though doing a simple Cartesian product would have work
SELECT * FROM TableA, TableB
but that does not work if TableA or TableB is empty
I would do a full outer join but right now do not have anything to join on. I could create temp tables with identity columns and then join on them (since 1 = 1)
But I was looking for a different way?
Thank you
Per your own suggestion, you could use a full outer join to guarantee a row:
select *
TableA a
full outer join
TableB b
on      1=1
To always return at least one row, even if TableA and TableB are emtpy, you could use a fake table:
select *
from (
select 1 as col1
) fake
left join
TableA a
on 1=1
left join
TableB b
on 1=1

Multiple counts in a multi table query SQL

I would like to count The different requests by survey Id's and grouping it by SubjectValue
I have done this on just the one table with a sub query, but I'm not too sure to do it with several. Could anyone help me out?
This is how the 3 tables are joined. The only values of note are
subjectValue - Table A
Request_Id - Table A
Survey_Id - Table C
SELECT TableA.SubjectValue
FROM TableB INNER JOIN
TableA ON TableB.ID = TableA.Request_ID INNER JOIN
Table C ON TableB.Details_ID = TableC.ID
May I also add that all counts should be returned in the same row.
there are 3 different survey Id's so the count will need a where clause on the survey_id.
Hope that makes sense.
Many thanks in advance.
You can use generic Cross-tab method
select
TableA.SubjectValue,
SUM(case when somecol='request1' then 1 else 0 end) as request1,
SUM(case when somecol='request2' then 1 else 0 end) as request2,
.
TableB INNER JOIN
TableA ON TableB.ID = TableA.Request_ID INNER JOIN
Table C ON TableB.Details_ID = TableC.ID
group by
TableA.SubjectValue
You probably dont need to join to table C (surveys) assuming you have a foreign key to it on table B (Requests).
try this.
SELECT TableA.SubjectValue, COUNT(TableB.SurveyID)
FROM TableB
INNER JOIN TableA ON TableB.ID = TableA.Request_ID
Group by TableA.SubjectValue
EDIT: To Include SurveyID use this..
SELECT TableC.SurveyID, TableA.SubjectValue, COUNT(TableB.RequestId)
FROM TableA
INNER JOIN TableB ON TableB.SurveyID = TableA.SurveyID
INNER JOIN TableC ON TableC.RequestID = TableB.RequestID
Group by TableA.SubjectValue, TableC.SurveyID
(hope i didnt get my A, B's and C's mixed up.)