SQL: Combine two tables, into same columns, based on relation table - sql

Not being well versed in complex SQL, I am trying to figure out how I can write a query that will return (almost) the same columns from two tables, based on a "relationship" table. I have tried using UNION, but the number of columns are different between the three tables. I also tried IF...ELSE, but could not get that to function. I have also looked at INCLUDE and EXCLUDE.
Here is my current query:
SELECT
/* Relation Table */
[data_Related_Asset].[ID_Related_Asset]
,[data_Related_Asset].[BIOMED_Tag]
,[data_Related_Asset].[Related_BIOMED_Tag]
/* Lab Table */
,[data_Lab_Asset].[Room]
,[Lab_Area].[Work_Area]
,[data_Lab_Asset].[Pet_Name_Bench]
,[data_Lab_Asset].[BGL_ID]
,[data_Lab_Asset].[BIOMED_Tag] AS LAB_BIOMED
,[data_Lab_Asset].[Endpoint_Tag]
,[Lab_Class].[Class]
,[Lab_Class].[Subclass]
,[Lab_Class].[Subcategory]
/* IT Table */
,[data_IT_Asset].[Room]
,[IT_Area].[Work_Area]
,[data_IT_Asset].[Bench_Instrument]
,[data_IT_Asset].[BIOMED_Tag] AS IT_BIOMED
,[data_IT_Asset].[Endpoint_Tag]
,[IT_Class].[Class]
,[IT_Class].[Subclass]
,[IT_Class].[Subcategory]
FROM [data_Related_Asset]
LEFT JOIN [data_Lab_Asset] ON [data_Lab_Asset].[BIOMED_Tag] = [data_Related_Asset].[Related_BIOMED_Tag]
LEFT JOIN [data_IT_Asset] ON [data_IT_Asset].[BIOMED_Tag] = [data_Related_Asset].[Related_BIOMED_Tag]
LEFT JOIN [tbl_Class] Lab_Class ON [Lab_Class].[ID_Class] = [data_Lab_Asset].[Class_ID]
LEFT JOIN [tbl_Class] IT_Class ON [IT_Class].[ID_Class] = [data_IT_Asset].[Class_ID]
LEFT JOIN [tbl_Work_Area] Lab_Area ON [Lab_Area].[ID_Work_Area] = [data_Lab_Asset].[Work_Area_ID]
LEFT JOIN [tbl_Work_Area] IT_Area ON [IT_Area].[ID_Work_Area] = [data_IT_Asset].[Work_Area_ID]
ORDER BY ID_Related_Asset
The query is being used in a custom app and is set up to search for an "ID" in the [data_Related_Asset].[BIOMED_Tag] column, and return all [Related_BIOMED_Tag] records.
When I run the above query I get all the results I need, but across a lot of columns. If the item being return is in the LAB table, then the LAB_Asset columns are populated, but the IT_Asset columns are all NULL. And if the item is in the IT table, the opposite is true - the LAB_Asset columns are all NULL and the IT_Asset columns are populated. For example, below you can see where rows 2 & 12 returned the IT_Asset information.
I'd like to be able to return everything in the same set of NINE columns to condense the viewed table. (Room, Work_Area, Bench, BGL_ID, BIOMED_Tag, Endpoint_Tag, Class, Subclass, Subcategory) For example, below you can see where I moved the info from the IT_Asset table over to the first columns.
I'm sure I'm missing a simple solution/function here. Any help is greatly appreciated!

You can use UNION but you just have to ensure that you have the same columns in the same order in each statement being union'd.
So for missing columns just use nulls (or any suitable dummy data) e.g.
SELECT col1, col2, null, col4
from tableA
UNION
SELECT col1, null, col3, null
from tableB

Related

Best way to combine two tables, remove duplicates, but keep all other non-duplicate values in SQL

I am looking for the best way to combine two tables in a way that will remove duplicate records based on email with a priority of replacing any duplicates with the values in "Table 2", I have considered full outer join and UNION ALL but Union all will be too large as each table has several 1000 columns. I want to create this combination table as my full reference table and save as a view so I can reference it without always adding a union or something to that effect in my already complex statements. From my understanding, a full outer join will not necessarily remove duplicates. I want to:
a. Create table with ALL columns from both tables (fields that don't apply to records in one table will just have null values)
b. Remove duplicate records from this master table based on email field but only remove the table 1 records and keep the table 2 duplicates as they have the information that I want
c. A left-join will not work as both tables have unique records that I want to retain and I would like all 1000+ columns to be retained from each table
I don't know how feasible this even is but thank you so much for any answers!
If I understand your question correctly you want to join two large tables with thousands of columns that (hopefully) are the same between the two tables using the email column as the join condition and replacing duplicate records between the two tables with the records from Table 2.
I had to do something similar a few days ago so maybe you can modify my query for your purposes:
WITH only_in_table_1 AS(
SELECT *
FROM table_1 A
WHERE NOT EXISTS
(SELECT * FROM table_2 B WHERE B.email_field = A.email_field))
SELECT * FROM table_2
UNION ALL
SELECT * FROM only_in_table_1
If the columns/fields aren't the same between tables you can use a full outer join on only_in_table_1 and table_2
try using a FULL OUTER JOIN between the two tables and then a COALESCE function on each resultset column to determine from which table/column the resultset column is populated

How to merge data from two tables with some common fields

So I have two tables that have similar data. There are columns in table A that match columns in table B, but with different naming conventions, there are also columns from each that have no equivalent in the other table, no rows should be merged, I need a view (I think) with all the rows from both tables, but with some columns merged so that data from table A.columnB and data from table B.columnF both end up in view C.columnD. There would be columns in the view that only had sources in one of the tables and would be null in rows from the other table. I can't change any of the existing table structure as the database is shared across multiple apps. I think I need to use a bunch of FULL OUTER JOIN statements in the view but I'm having trouble wrapping my mind around how to really go about it. If anyone can provide a generic example of how this should look I should be able to take it from there.
Here's an example of what doesn't work (there are a lot more columns on each side of the JOIN in the actual db, truncated for readability):
SELECT
schedule_block.id as vid,
schedule_block.reason as vreason,
schedule_block.when_ts as vwhen,
schedule_block.duration as vduration,
schedule_block.note as vnote,
schedule_block.deleted_ts as vdeleted_when,
schedule_block.deleted_user_id as vdeleted_user_id,
schedule_block.lastmodified_ts as vlastmodified_ts,
schedule_block.lastmodified_user_id as vlastmodified_user_id
FROM schedule_block
FULL OUTER JOIN appointment.appt_when as vwhen ON 1 = 1
FULL OUTER JOIN appointment.patient_id as vpatient ON 1 = 1
FULL OUTER JOIN appointment.duration as vduration on 1 = 1
FULL OUTER JOIN appointment.deleted_when as vdeleted_when ON 1 = 1
Correct me if I'm wrong but I think I can't use a UNION because there are different numbers of columns on each side
You could do something like:
SELECT ColA, CONVERT(DATE, NULL) AS ColB
FROM T1
UNION ALL
SELECT CONVERT(VARCHAR(10), NULL) AS ColA, ColB
FROM T2
Just make sure to match the datatypes.

Multiple rows from Left Join in SQL were rows are uniquely matched

I have two views that I am trying to join. I am joining on three elements, date, case number and surgeon id number. Each should only have one match for the previous case out value, but I am getting multiple rows after my left join.
Here is my code:
CREATE VIEW [dbo].[OR]
AS
SELECT DISTINCT
[ID].*,
[BYSURG].[PREV_PAT_OUT] AS PrevPtOut
FROM
[dbo].[OR_LOG_INDEXED] [ID]
LEFT JOIN
[DBO].[OR_CASE_NUM] BYSURG ON [ID].[SURG_DT] = [BYSURG].[SURG_DT]
AND [ID].[SURGEON_ID] = [BYSURG].[SURGEON_ID]
AND [ID].[CASE_NUM_BY_ROOM] = [BYSURG].[CASE_NUM_BY_ROOM_ADJ]
Any insights are much appreciated.
Thanks!
M
Replace your select block with one that retrieves all columns:
SELECT
*
FROM
[dbo].[OR_LOG_INDEXED] [ID]
LEFT JOIN
[DBO].[OR_CASE_NUM] BYSURG ON [ID].[SURG_DT] = [BYSURG].[SURG_DT]
AND [ID].[SURGEON_ID] = [BYSURG].[SURGEON_ID]
AND [ID].[CASE_NUM_BY_ROOM] = [BYSURG].[CASE_NUM_BY_ROOM_ADJ]
Run it and look at your "duplicate" rows - something about them will no longer be a duplicate - perhaps you've forgotten to include some other criteria in your where clause
Putting DISTINCT in the select block is not the answer - find out what data element about the "duplicate" rows is different and then filter out the rows you don't want

SQL Query returns more

I'm having a bit of a problem with a SQL Query that returns too many results. I'm fairly new to SQL so please bear with me.
Please see the following:
Table Structures
The Query that I use looks like:
SELECT TABLE_B.*
FROM
TABLE_A
JOIN
TABLE_B
ON
TABLE_A.COMMON_ID=TABLE_B.COMMON_ID
AND TABLE_A.SEQ_3C=TABLE_B.SEQ_3C
JOIN
TABLE_C
ON
TABLE_A.COMMON_ID=TABLE_C.EMPLID
WHERE
TABLE_B.ITEM_STATUS<>'C'
and TABLE_A.CHECKLIST_STATUS='I'
and TABLE_A.ADMIN_FUNCTION='ADMA'
and TABLE_A.CHECKLIST_CD='APPL'
and TABLE_A.COMMON_ID = '123456789'
and TABLE_C.ADMIT_TERM='2171'
and TABLE_C.INSTITUTION='SOMEWHERE'
I just want the results from Table_B and not what it's giving me.
Please explain this to me as I have spent 3 days on it non-stop.
What am I missing?
You want data from TABLE_B? Then select from it only and have the conditions on the other tables in your where clause.
The inner joins on the other tables serve as existence tests, I assume? Don't do that. You'd only multiply your records, just as you are doing now, only to have to dismiss duplicates later. That can cause bad performance on large tables and errors in more complicated queries. Use EXISTS or IN instead.
select *
from table_b
where item_status <> 'C'
and (common_id, seq_3c) in
(
select common_id, seq_3c
from table_a
where checklist_status = 'I'
and admin_function = 'ADMA'
and checklist_cd = 'APPL'
)
and common_id in
(
select EMPLID
from table_c
where admit_term = '2171'
and institution = 'SOMEWHERE'
);
SELECT DISTINCT TABLE_B.*
FROM
TABLE_A
JOIN
TABLE_B
ON
TABLE_A.COMMON_ID=TABLE_B.COMMON_ID
AND TABLE_A.SEQ_3C=TABLE_B.SEQ_3C
JOIN
TABLE_C
ON
TABLE_A.COMMON_ID=TABLE_C.EMPLID
WHERE
TABLE_B.ITEM_STATUS<>'C'
and TABLE_A.CHECKLIST_STATUS='I'
and TABLE_A.ADMIN_FUNCTION='ADMA'
and TABLE_A.CHECKLIST_CD='APPL'
and TABLE_A.COMMON_ID = '123456789'
and TABLE_C.ADMIT_TERM='2171'
and TABLE_C.INSTITUTION='SOMEWHERE'
This should be easy to understand without looking at all your tables and output.
Suppose you join two tables, A and B, on a column id. You only want the columns from table B, and in table B the `id' column is a unique identifier.
Even so, if in table A an id (the same id) appears five times, the join will have five rows for that id. Then you just select the columns from table B, so it will look like you got the same row five different times.
Perhaps you don't really need a join? What is your underlying problem you are trying to solve?
It's hard to answer this question without more information about why you're executing these joins. I can explain why you're getting the results you're getting, and hopefully that will allow you to solve the problem yourself.
You start, in your FROM clause, with table A. You join this table with table B on matching COMMON_ID, which, based on the tables you provide, returns three matches for the one record you have in table A. This increases your result set size to three records. Next, you join these three records with table C, on matching ID. Because all ID's are, in fact, identical, this returns nine matches for every record in your current result set: you now have 9 x 3 = 27 records in your result set.
Finally, the WHERE clause comes into effect. This clause excludes 6 out of 9 records in table C, so you have 3 of those records left. Your final result set is therefore 1 (table A) x 3 (table B) x 3 (table C) = 9 records.

Number of Records don't match when Joining three tables

Despite going through every material I could possibly find on the internet, I haven't been able to solve this issue myself. I am new to MS Access and would really appreciate any pointers.
Here's my problem - I have three tables
Source1084 with columns - Department, Sub-Dept, Entity, Account, +few more
R12CAOmappingTable with columns - Account, R12_Account
Table4 with columns - R12_Account, Department, Sub-Dept, Entity, New Dept, LOB +few more
I have a total of 1084 records in Source and the result table must also contain 1084 records. I need to draw a table with all the columns from Source + R12_account from R12CAOmappingTable + all columns from Table4.
Here is the query I wrote. This yields the right columns but gives me more or less number of records with interchanging different join options.
SELECT rmt.r12_account,
srb.version,
srb.fy,
srb.joblevel,
srb.scenario,
srb.department,
srb.[sub-department],
srb.[job function],
srb.entity,
srb.employee,
table4.lob,
table4.product,
table4.newacct,
table4.newdept,
srb.[beg balance],
srb.jan,
srb.feb,
srb.mar,
srb.apr,
srb.may,
srb.jun,
srb.jul,
srb.aug,
srb.sep,
srb.oct,
srb.nov,
srb.dec,
rmt.r12_account
FROM (source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account)
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity )
WHERE ( ( ( srb.[sub-department] ) = table4.subdept )
AND ( ( srb.entity ) = table4.entity )
AND ( ( rmt.r12_account ) = table4.r12_account ) );
In this simple example, Table1 contains 3 rows with unique fld1 values. Table2 contains one row, and the fld1 value in that row matches one of those in Table1. Therefore this query returns 3 rows.
SELECT *
FROM
Table1 AS t1
LEFT JOIN Table2 AS t2
ON t1.fld1 = t2.fld1;
However if I add the WHERE clause as below, that version of the query returns only one row --- the row where the fld1 values match.
SELECT *
FROM
Table1 AS t1
LEFT JOIN Table2 AS t2
ON t1.fld1 = t2.fld1
WHERE t1.fld1 = t2.fld1;
In other words, that WHERE clause counteracts the LEFT JOIN because it excludes rows where t2.fld1 is Null. If that makes sense, notice that second query is functionally equivalent to this ...
SELECT *
FROM
Table1 AS t1
INNER JOIN Table2 AS t2
ON t1.fld1 = t2.fld1;
Your situation is similar. I suggest you first eliminate the WHERE clause and confirm this query returns at least your expected 1084 rows.
SELECT Count(*) AS CountOfRows
FROM (source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account)
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity );
After you get the query returning the correct number of rows, you can alter the SELECT list to return the columns you want. But the columns aren't really the issue until you can get the correct rows.
Without knowing your tables values it is hard to give a complete answer to your question. The issue that is causing you a problem based on how you described it. Is more then likely based on the type of joins you are using.
The best way I found to understand what type of joins you should be using would referencing a Venn diagram explaining the different type of joins that you can use.
Jeff Atwood also has a really good explanation of SQL joins on his site using the above method as well.
Best to just use the query builder. Drop in your main table. Choose the columns you want. Now for any of the other lookup values then simply drop in the other tables, draw the join line(s), double click and use a left join. You can do this for 2 or 30 columns that need to "grab" or lookup other values from other tables. The number of ORIGINAL rows in the base table returned should ALWAYS remain the same.
So just use the query builder and follow the above.
The problem with your posted SQL is you NESTED the joins inside (). Don't do that. (or let the query builder do this for you – they tend to be quite messy but will also work).
Just use this:
FROM source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity )
As noted, I don't see why you are "repeating" the conditions again in the where clause.