Table Left Join - sql

I'm currently trying to get an output from two tables that I want to join and it seems like I have a block in my mind on how to resolve this.
Table 1 has products with unique IDs.
ID | (other info)
-----------------
AA |
BB |
CC |
Table 2 has the unique ID of Table 1 as FK as well as a model number and a part-code that I would like to join onto Table 1. Table 2 has a multitude of other information resulting in the following possible constellation:
ID | FK | model number | part-code
-----------------------------------
01 | AA | model0001 | part923
02 | AA | model0001 |
03 | AA | | part923
04 | BB | model0002 |
05 | BB | | part876
06 | CC | | part551
Information in Table 2 is therefore very scattered and not necessarily complete. I also do not want to assume that for a given FK the model number and the part-code remain the same across all entries (if there are multiple variants for a given FK, I only want one entry, even if it is at random).
The result I am trying to achieve is to get all the information I extract from Table 1, and it is given that there will always be a unique ID (=FK in Table 2), and add the model number and part-code, if existing, to the table without creating any duplicates. The example above should therefore give the following output.
ID | model number | part-code | (other info from table 1)
---------------------------------------------------------
AA | model0001 | part923 |
BB | model0002 | part876 |
CC | | part551 |
I should also mention that Table 2 is extremely large (millions of entries) and I have no way to match the data except with the IDs from Table 1. This table is also quite big - an efficient way of approaching this is therefore necessary.
Thank you for your time in reading this and helping me understand how to approach this.
Best,
Jonas

You are right, you need an OUTER JOIN to get all the records in table1 with whatever matching records are in table2.
Getting only one record per hit from table2 is tricky. This aggregating subquery will produce your desired output. Note that this solution can produce a permutation of (model_number,part_code) which does not exist in any single record in table2 ; I guess it's okay as that is what your sample result set shows for BB. The performance across "millions of entries" may be slow. But that is a (separate) tuning issue.
select t1.id
, t2.model_number
, t2.part_code
, t1.whatever
, t1.blah
, t1.etc
from table1 t1
left outer join ( select fk
, max (model_number) as model_number
, max (part_code) as part_code
from table2
group by fk ) t2
on t1.id = t2.fk
order by t1.id
/

You can try this SQL
SELECT t1.ID, t2.model_number, t2.part-code FROM table1 t1
LEFT JOIN table2 t2 ON t1.ID = t2.FK
GROUP BY t1.ID
Hope that helps!

Related

Deleting duplicate rows with primary keys that are connected to other tables

A process was causing duplicate rows in a table where there were not supposed to be any. There are several great answers to deleting duplicate rows online. But, what if those duplicates with ID primary keys all have data in other tables tied to them?
Is there a way to delete all duplicates in the first table and migrate all data tied to those keys to the single PK ID that wasn't deleted?
For example:
TABLE 1
+-------+----------+----------+------------+
| ID(PK)| Model | ItemType | Color |
+-------+----------+----------+------------+
| 1 | 4 | B | Red |
| 2 | 4 | B | Red |
| 3 | 5 | A | Blue |
+-------+----------+----------+------------+
TABLE 2
+-------+----------+---------+
| ID(PK)| OtherID | Type |
+-------+----------+---------+
| 1 | 1 | Type1 |
| 2 | 1 | Type2 |
| 3 | 2 | Type3 |
| 4 | 2 | Type4 |
| 5 | 2 | Type5 |
+-------+----------+---------+
So I would theoretically want to delete the entry with ID: 2 from TABLE 1, and then have the OtherID fields in TABLE 2 switch to 1. This would actually be needed for X number of tables. This particular situation has 4 tables connected to its ID PK.
You cannot do this automatically. But you can do this with some queries. First, you set all the foreign keys to the correct id, which is presumably the smallest one:
with ids (
select t1.*, min(id) over (partition by Model, ItemType, Color) as min_id
from table1 t1
)
update t2
set t2.otherid = ids.min_id
from table2 t2 join
ids
on t2.otherid = ids.id
where ids.id <> ids.min_id;
Then delete the ids that are either duplicated or not referenced in table2 (depending on which you actually want):
with ids (
select t1.*, min(id) over (partition by Model, ItemType, Color) as min_id
from table1 t1
)
delete from ids
where id <> min_id;
Note: If the database has concurrent users, you might want to put it in single user mode for this operation or lock the tables so they are not modified during these two operations.
To do this right, you want to wrap everything in a single transaction and perform this during a regular maintenance period. Anything else could leave things as inconsistent as they are now.
Make a determination as to which "key" you will use.
Update all of the child tables to use the new "key" where the value is the old "key".
There should be no FK dependencies on the duplicate records, delete them.
Once all ambiguities are resolved, place an unique constraint on (ItemType,Color) (or whatever the real columns are).
If there are a lot of instances, you may need to write a script to handle this and use the information in sys.foreign_keys and sys.foreign_key_columns to determine which records to update and in which order.

New column referencing second table - do I need a join?

I have two tables (first two shown) and need to make a third from the first two - do I need to do a join or can you reference a table without joining?
The third table shown is the desired output. Thanks for any help!
| ACC | CALL DATE | | |
+-----+-----------+--+--+
| 1 1 | 2/1/18 | | |
+-----+-----------+--
+-----+---------------+--+--+
| ACC | PURCHASE DATE | | |
+-----+---------------+--+--+
| 1 1 | 1/1/18 | | |
+-----+---------------+--+--+
+-----+-----------+----------------------+--+
| ACC | CALL DATE | PRIOR MONTH PURCHASE | |
+-----+-----------+----------------------+--+
| 1 1 | 2/1/18 | YES | |
+-----+-----------+----------------------+--+
Of course you can have a query that references multiple tables without joining. union all is an example of an operator that does that.
There is also the question of what you mean by "joining" in the question. If you mean explicit joins, there are ways around that -- such as correlated subqueries. However, these are implementing some form of "join" in the database engine.
As for your query, you would want to use exists with a correlated subquery:
select t1.*,
(case when exists (select 1
from table2 t2
where t2.acc = t1.acc and
datediff(month, t2.purchase_date, t1.call_date) = 1
)
then 'Yes' else 'No'
end) as prior_month_purchase
from table1 t1;
This is "better" than a join because it does not multiply or remove rows. The result set has exactly the rows in the first table, with the additional column.
The syntax assumes SQL Server (which was an original tag). Similar logic can be expressed in other databases, although date functions are notoriously database-dependent.
Lets check the options,
Say if you were to create a new third table on the basis of the data in first two, then every update/inserts/deletes to either of the tables should also propagate into the third table as well.
Say you instead have a view which does what you need, there isnt a need to maintain that third table and also gets you the data needed from the first two each time you query it.
create view third_table as
select a.acc,a.call_date,case when dateadd(mm,-1,a.call_date)=b.purchase_date then 'Yes' else 'No end as prior_month_purchase
from first_table a
left join second_table b
on a.acc=b.acc

SQL Query : Facing issues to get desired records from different tables

I have two tables
Calendar (Calname, CCode, PCode)
Lookup (LCode, Name)
Calendar table contains records like,
Calname | CCode | PCode
abc | O_R | P_R
xyz | C_R | P_C
Lookup table contains records like,
LCode | Name
O_R | Reporting
C_R | Cross
P_R | Process
P_C | ProcessCross
I have to fetch the records in a way where I can get the name of all codes from lookup table which contains the record rowwise.
Desired Output,
Calname | CCode | PCode | CCodeName | PCodeName
abc | O_R | P_R | Reporting | Process
xyz | C_R | P_C | Cross | ProcessCross
I can not apply simply inner join on the basis of code it will not give me desired output.
I tried to use subquery also but it not worked out somehow,
.
Can anyone help me out with this issue.
Thanks
You can try joining the Calendar table to the Lookup table twice, using each of the two codes.
SELECT
c.Calname,
c.CCode,
c.PCode,
COALESCE(t1.Name, 'NA') AS CCodeName,
COALESCE(t2.Name, 'NA') AS PCodeName
FROM Calendar c
LEFT JOIN Lookup t1
ON c.CCode = t1.LCode
LEFT JOIN Lookup t2
ON c.PCode = t2.LCode
An alternative to Tim's answer would be to use scalar subqueries, which may or may not give you some performance benefit due to scalar subquery caching:
SELECT
c.Calname,
c.CCode,
c.PCode,
COALESCE((SELECT l1.name FROM lookup l1 WHERE c.ccode = l1.lcode), 'NA') AS CCodeName,
COALESCE((SELECT l2.name FROM lookup l2 WHERE c.pcode = l2.lcode), 'NA') AS PCodeName
FROM Calendar c;
I would test both answers to see which one works best for your data.

Using self join on a table to compare two columns based on a linked column in the same table

I have the following:
TableA
ID | DocumentType | DocumentCode | DocumentDate | Warehouse | RefecenceCode
---+--------------+--------------+--------------+-----------+--------------
1 | DeliveryNote | DOC-001 | 2017-04-21 | 1 | NULL
2 | Invoice | DOC-002 | 2017-04-21 | 2 | DOC-001
As you can see, the warehouse is different on each document and DOC-002 is related to DOC-001 through the information in ReferenceCode column (which means that was created starting from DOC-001 as a source document).
It is supposed for the DOC-002 to have the same information but sometimes might be different and in this case, I was tried to create a query (I think self join applies here) in order to check what information is different in the DOC-002 in this case compared to DOC-001, based on the reference code, but I couldn't managed to do it.
If someone could give me a hand, I'll be very grateful.
This is the SQL query:
select *
from TableA tbl
inner join TableA tbla on tbl.id = tbla.id
where tbla.ReferenceCode = tbl.DocumentCode
You indeed want to join the table to itself. But joining on the ID column won't work, because that column doesn't relate records to each other. Instead, you need to join on the DocumentCode and ReferenceCode fields. Then only include the records that have some difference (in this case, I'm only comparing the DocumentDate and Warehouse fields).
select tbla.*
from TableA tbl
join TableA tbla on tbl.DocumentCode = tbla.ReferenceCode
where tbla.DocumentDate != tbl.DocumentDate
or tbla.Warehouse != tbl.Warehouse

Join two tables juxtaposing columns with same name sql

I have two sqlite3 tables with same column names and I want to compare them. To do that, I need to join the tables and juxtapose the columns with same name.
The tables share an identical column which I want to put as the first column.
Let's imagine I have table t1 and table t2
Table t1:
SharedColumn | Height | Weight
A | 2 | 70
B | 10 | 100
Table t2:
SharedColumn | Height | Weight
A | 5 | 25
B | 32 | 30
What I want get as a result of my query is :
SharedColumn | Height_1 | Height_2 | Weight_1 | Weight_2
A | 2 | 5 | 70 | 25
B | 10 | 32 | 100 | 30
In my real case i have a lot of columns so I would like to avoid writing each column name twice to specify the order.
Renaming the columns is not my main concern, what interests me the most is the juxtaposition of columns with same name.
There is no way to do that directly in SQL especially because you also want to rename the columns to identify their source, you'll have to use dynamic SQL and honestly? Don't! .
Simply write the columns names, most SQL tools provide a way to generate the select, just copy them and place them in the correct places :
SELECT t1.sharedColumn,t1.height as height_1,t2.height as height_2 ...
FROM t1
JOIN t2 ON(t1.sharedColumn = t2.sharedColumn)+
Try the following query to get the desired result!!
SELECT t1.Height AS Height_1, t1.Weight AS Weight_1, t1.sharedColumn AS SharedColumn
t2.Height AS Height_2, t2.Weight AS Weight_2
FROM t1 INNER JOIN t2
ON t1.sharedColumn = t2.sharedColumn
ORDER By t1.sharedColumn ASC
After that, you can fetch the result by following lines:
$result['SharedColumn'];
$result['Height_1'];
$result['Height_2'];
$result['Weight_1'];
$result['Weight_1'];