Joining Two SQL Tables on Different Databases [duplicate]

Joining Two SQL Tables on Different Databases [duplicate] - sql

This question already has answers here:
Can we use join for two different database tables?
(2 answers)
Closed 4 years ago.
I have two tables on two separate databases. We will call them Table1 and Table2.
Table1:
+----------+----------+----------+---------+------+----------+
| UniqueID | Date1 | Date2 | Fruit | Cost | Duration |
+----------+----------+----------+---------+------+----------+
| 1 | 09/10/18 | 10/20/18 | Apples | 1.50 | 7 |
| 2 | 09/18/18 | 10/25/18 | Oranges | 1.75 | 10 |
| 3 | 10/01/18 | 10/30/18 | Bananas | 2.00 | 10 |
+----------+----------+----------+---------+------+----------+
Table2:
+----------+---------+------+----------+-----------+
| Date1 | Fruit | Cost | Duration | New Price |
+----------+---------+------+----------+-----------+
| 09/10/18 | Savory | 1.50 | 7 | 1.90 |
| 09/18/18 | Citrusy | 1.75 | 10 | 2.50 |
| 10/01/18 | Mealy | 2.00 | 10 | 2.99 |
| 10/20/18 | Savory | 1.50 | 7 | 3.90 |
| 10/25/18 | Citrusy | 1.75 | 10 | 4.50 |
| 10/30/18 | Mealy | 2.00 | 10 | 5.99 |
+----------+---------+------+----------+-----------+
What I need the output to look like:
+----------+----------+--------------------+----------+--------------------+
| UniqueID | Date1 | New Price on Date1 | Date2 | New Price on Date2 |
+----------+----------+--------------------+----------+--------------------+
| 1 | 09/10/18 | 1.90 | 10/20/18 | 3.90 |
| 2 | 09/18/18 | 2.50 | 10/25/18 | 4.50 |
| 3 | 10/01/18 | 2.99 | 10/30/18 | 5.99 |
+----------+----------+--------------------+----------+--------------------+
I need to first convert table1.fruit to the representation of table2.fruit (apples-->savory, oranges-->citrusy, bananas-->mealy) then join on table1.fruit = table2.fruit, table1.duration = table2.duration, table1.cost = table2.cost, table1.date1 = table2.date1, and table1.date2 = table2.date1.
I don't know where to start on writing the statement. I looked over some previous questions posted here, but they really just go over the basics of linking two tables from different databases. Do I convert the table1.fruit first in the select statement, then join, or do I convert table1.fruit in the join statement? How do I join table2.date1 on both table1.date1 and table1.date2 to get the price associated with both dates?
If I can provide any more information for you, please let me know.
I am on SQL Server 2017 using Management Studio.
Thanks for any help in advance!

Create a mapping table to bridge between the different codes for fruits.
IF OBJECT_ID('tempdb..#FruitMappings') IS NOT NULL
DROP TABLE #FruitMappings
CREATE TABLE #FruitMappings (
Table1Fruit VARCHAR(100),
Table2Fruit VARCHAR(100))
INSERT INTO #FruitMappings (
Table1Fruit,
Table2Fruit)
VALUES
('Apples', 'Savory'),
('Oranges', 'Citrusy'),
('Bananas', 'Mealy')
SELECT
T1.*
--, whichever columns you need
FROM
Database1.Schema1.Table1 AS T1
INNER JOIN #FruitMappings AS F ON T1.Fruit = F.Table1Fruit
INNER JOIN Database2.Schema2.Table2 AS T2 ON
F.Table2Fruit = T2.Fruit AND
T1.Cost = T2.Cost AND
T1.Duration = T2.Duration
-- AND any other matches you need
You can use LEFT JOIN or even FULL JOIN, depending if you might have some fruits on a table that aren't available on the other (careful with NULL values if FULL JOIN).

If both databases are on the same SQL Server instance and your SQL Server login has access to both databases you can just use the full form of the object names:
select * -- Whatever...
from Database1.dbo.Table1 t1
inner join Database2.dbo.Table2 t2
on t1,UniqueId = t2.UniqueId -- Or whatever your join condition is
(adding where etc. clauses as required.)
This assumes both databases are using the default schema, otherwise replace dbo as necessary.
If the databases are on different servers you can use linked servers, but there are performance implications (the whole remote table may be read because the optimiser can't do much to filter it).

Related

Query Optimization - subselect in Left Join

I'm working on optimizing a sql query, and I found a particular line that appears to be killing my queries performance:
LEFT JOIN anothertable lastweek
AND lastweek.date>= (SELECT MAX(table.date)-7 max_date_lweek
FROM table table
WHERE table.id= lastweek.id)
AND lastweek.date< (SELECT MAX(table.date) max_date_lweek
FROM table table
WHERE table.id= lastweek.id)
I'm working on a way of optimizing these lines, but I'm stumped. If anyone has any ideas, please let me know!
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1908654 | 145057704 | 720461 | 00:00:29 |
| * 1 | HASH JOIN RIGHT OUTER | | 1908654 | 145057704 | 720461 | 00:00:29 |
| 2 | VIEW | VW_DCL_880D8DA3 | 427487 | 7694766 | 716616 | 00:00:28 |
| * 3 | HASH JOIN | | 427487 | 39328804 | 716616 | 00:00:28 |
| 4 | VIEW | VW_SQ_2 | 7174144 | 193701888 | 278845 | 00:00:11 |
| 5 | HASH GROUP BY | | 7174144 | 294139904 | 278845 | 00:00:11 |
| 6 | TABLE ACCESS STORAGE FULL | TASK | 170994691 | 7010782331 | 65987 | 00:00:03 |
| * 7 | HASH JOIN | | 8549735 | 555732775 | 429294 | 00:00:17 |
| 8 | VIEW | VW_SQ_1 | 7174144 | 172179456 | 278845 | 00:00:11 |
| 9 | HASH GROUP BY | | 7174144 | 294139904 | 278845 | 00:00:11 |
| 10 | TABLE ACCESS STORAGE FULL | TASK | 170994691 | 7010782331 | 65987 | 00:00:03 |
| 11 | TABLE ACCESS STORAGE FULL | TASK | 170994691 | 7010782331 | 65987 | 00:00:03 |
| * 12 | TABLE ACCESS STORAGE FULL | TASK | 1908654 | 110701932 | 2520 | 00:00:01 |
-----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 1 - access("SYS_ID"(+)="TASK"."PARENT")
* 3 - access("ITEM_2"="TASK_LWEEK"."SYS_ID")
* 3 - filter("TASK_LWEEK"."SNAPSHOT_DATE"<"MAX_DATE_LWEEK")
* 7 - access("ITEM_1"="TASK_LWEEK"."SYS_ID")
* 7 - filter("TASK_LWEEK"."SNAPSHOT_DATE">=INTERNAL_FUNCTION("MAX_DATE_LWEEK"))
* 12 - storage("TASK"."CLOSED_AT" IS NULL OR "TASK"."CLOSED_AT">=TRUNC(SYSDATE#!)-15)
* 12 - filter("TASK"."CLOSED_AT" IS NULL OR "TASK"."CLOSED_AT">=TRUNC(SYSDATE#!)-15)

Well, you are not even showing the select. As I can see that the select is done over Exadata ( Table Access Storage Full ) , perhaps you need to ask yourself why do you need to make 4 access to the same table.
You access fourth times ( lines 6, 10, 11, 12 ) to the main table TASK with 170994691 rows ( based on estimation of the CBO ). I don't know whether the statistics are up-to-date or it is optimizing sampling kick in due to lack of good statistics.
A solution could be use WITH for generating intermediate results that you need several times in your outline query
with my_set as
(SELECT MAX(table.date)-7 max_date_lweek ,
max(table.date) as max_date,
id from FROM table )
select
.......................
from ...
left join anothertable lastweek on ( ........ )
left join myset on ( anothertable.id = myset.id )
where
lastweek.date >= myset.max_date_lweek
and
lastweek.date < myset.max_date
Please, take in account that you did not provide the query, so I am guessing a lot of things.

Since complete information is not available I will suggest:
You are using the same query twice then why not use CTE such as
with CTE_example as (SELECT MAX(table.date), max_date_lweek, ID
FROM table table)

Looking at your explain plan, the only table being accessed is TASK. From that, I infer that the tables in your example: ANOTHERTABLE and TABLE are actually the same table and that, therefore, you are trying to get the last week of data that exists in that table for each id value.
If all that is true, it should be much faster to use an analytic function to get the max date value for each id and then limit based on that.
Here is an example of what I mean. Note I use "dte" instead of "date", to remove confusion with the reserved word "date".
LEFT JOIN ( SELECT lastweek.*,
max(dte) OVER ( PARTITION BY id ) max_date
FROM anothertable lastweek ) lastweek
ON 1=1 -- whatever other join conditions you have, seemingly omitted from your post
AND lastweek.dte >= lastweek.max_date - 7;
Again, this only works if I am correct in thinking that table and anothertable are actually the same table.

SQL - Join with default if it's not possible to match

I have two tables which look like this:
Table 1:
+---------+---------+-------------+
| Activity| Area | Responsible |
+---------+---------+-------------+
| Cooking | Meat | Peter |
| Cooking | Vegan | Sia |
| Cleaning| Kitchen | Paul |
| Cleaning| Toilets | Selina |
+---------+---------+-------------+
Table 2:
+---------+---------+-------------+
| Activity| Area | Day |
+---------+---------+-------------+
| Cooking | Meat | Monday |
| Cooking | Vegan | Monday |
| Cleaning| Garden | Friday |
| Cleaning| Toilets | Friday |
+---------+---------+-------------+
Now I want an SQL to join them, so that I can see the responsible persons for each day.
I think the standard SQL would look something like this:
SELECT DAY, ACTIVITY, RESPONSIBLE
FROM TABLE_2 2
LEFT JOIN TABLE_1 1
ON 1.ACTIVITY = 2.ACTIVITY AND 1.AREA = 2.AREA
But now there are some rows which can not be joint (e.g. Cleaning Garden).
In that case (if it is not possible to join) I want always Peter to be responsible for it.
Can I do that in one join (maybe with a CASE statement?) or how would you do this?

Don't use numbers for table names, even if DB2 allows it. Numbers should be numbers.
You are looking for COALESCE():
SELECT t2.DAY, t2.ACTIVITY, COALESCE(t1.RESPONSIBLE, 'Peter') as Responsible
FROM TABLE_2 t2 LEFT JOIN
TABLE_1 t1
ON t1.ACTIVITY = t2.ACTIVITY AND t1.AREA = t2.AREA;

Four Table Join in BigQuery

Okay, so I'm trying to link together four different tables, and its getting very difficult. I provided snippets of each table in the hopes you all could help out
Table 1: data
+--------+--------+-----------+
| charge | amount | date |
+--------+--------+-----------+
| 123 | 10000 | 2/10/2016 |
| 456 | 10000 | 1/28/2016 |
| 789 | 10000 | 3/30/2016 |
+--------+--------+-----------+
Table 2: data_metadata
+--------+------------+------------+
| charge | key | value |
+--------+------------+------------+
| 123 | identifier | trrkfll212 |
| 456 | code | test |
| 789 | ID | 123xyz |
+--------+------------+------------+
Table 3: buyer
+-----+-----------+----------+----------+
| id | date | discount | plan |
+-----+-----------+----------+----------+
| ABC | 2/13/2016 | yes | option a |
| DEF | 2/1/2016 | yes | option a |
| GHI | 1/22/2016 | no | option a |
+-----+-----------+----------+----------+
Table 4: buyer_metadata
+--------------+-----------+--------+
| id | |key| | value |
+--------------+-----------+--------+
| ABC | migration | TRUE |
| DEF | emid | foo |
| GHI | ID | 123xyz |
+--------------+-----------+--------+
Okay, so the tables data and data_metadata are obviously connected by the charge column.
The tables buyer and buyer_metadata are connected by the id column.
But I want to link all of them together. I'm pretty sure the way to accomplish this is through linking the metadata tables together through the common field in the "value" column (in this example: 123xyz).
Could anyone help?

This might look like something like that if all "link" columns are unique :
SELECT *
FROM data d
JOIN data_metadata dm ON d.charge = dm.charge
JOIN buyer_metada bm ON dm.value = bm.value
JOIN buyer b ON bm.id = b.id
If not, I think you'll have to use something like GROUP BY clause

Let's take it in two steps, first create composite tables for data and buyer. Composite table for data:
SELECT data.charge, data.amount, data.date,
data_metadata.key, data_metadata.value
FROM [data] AS data
JOIN (SELECT charge, key, value FROM [data_metadata]) AS data_metadata
ON data.charge = data_metadata.charge
And composite table for buyer:
SELECT buyer.id, buyer.date, buyer.discount, buyer.plan,
buyer_metadata.key, buyer_metadata.value
FROM [buyer] AS buyer
JOIN (SELECT key, value FROM [buyer_metadata]) AS buyer_metadata
ON buyer.id = buyer_metadata.id
And then let's join the two composite tables
SELECT composite_data.*, composite_buyer.*
FROM (
SELECT data.charge, data.amount, data.date,
data_metadata.key, data_metadata.value
FROM [data] AS data
JOIN (SELECT charge, key, value FROM [data_metadata]) AS data_metadata
ON data.charge = data_metadata.charge) AS composite_data
JOIN (
SELECT buyer.id, buyer.date, buyer.discount, buyer.plan,
buyer_metadata.key, buyer_metadata.value
FROM [buyer] AS buyer
JOIN (SELECT key, value FROM [buyer_metadata]) AS buyer_metadata
ON buyer.id = buyer_metadata.id) AS composite_buyer
ON composite_data.value = composite_buyer.value
I haven't tested it but it's probably close.
For reference, here is the page on BigQuery JOINs. And have you seen this SO?

SQL query for many-to-many self-join

I have a database table that has a companion many-to-many self-join table alongside it. The primary table is part and the other table is alternate_part (basically, alternate parts are identical to their main part with different #s). Every record in the alternate_part table is also in the part table. To illustrate:
`part`
| part_id | part_number | description |
|---------|-------------|-------------|
| 1 | 00001 | wheel |
| 2 | 00002 | tire |
| 3 | 00003 | window |
| 4 | 00004 | seat |
| 5 | 00005 | wheel |
| 6 | 00006 | tire |
| 7 | 00007 | window |
| 8 | 00008 | seat |
| 9 | 00009 | wheel |
| 10 | 00010 | tire |
| 11 | 00011 | window |
| 12 | 00012 | seat |
`alternate_part`
| main_part_id | alt_part_id |
|--------------|-------------|
| 1 | 5 | // Wheel
| 5 | 1 | // |
| 5 | 9 | // |
| 9 | 5 | // |
| 2 | 6 | // Tire
| 6 | 2 | // |
| ... | ... | // |
I am trying to produce a simple SQL query that will give me a list of all alternates for a main part. The tricky part is: some alternates are only listed as alternates of alternates, it is not guaranteed that every viable alternate for a part is listed as a direct alternate. e.g., if 'Part 3' is an alternate of 'Part 2' which is an alternate of 'Part 1', then Part 3 is an alternate of Part 1 (even if the alternate_part table doesn't list a direct link). The reverse is also true (Part 1 is an alternate of Part 3).
Basically, right now I'm pulling alternates and iterating through them
SELECT p.*, ap.*
FROM part p
INNER JOIN alternate_part ap ON p.part_id = ap.main_part_id
And then going back and doing the same again on those alternates. But, I think there's got to be a better way.
The SQL query I'm looking for will basically give me:
| part_id | alt_part_id |
|---------|-------------|
| 1 | 5 |
| 1 | 9 |
For part_id = 1, even when 1 & 9 are not explicitly linked in the alternates table.
Note: I have no control whatever over the structure of the DB, it is a distributed software solution.
Note 2: It is an Oracle platform, if that affects syntax.

You have to create hierarchical tree , probably you have to use connect by prior , nocycle query
something like this
select distinct p.part_id,p.part_number,p.description,c.main_part_id
from part p
left join (
select main_part_id,connect_by_root(main_part_id) real_part_id
from alternate_part
connect by NOCYCLE prior main_part_id = alternate_part_id
) c
on p.part_id = c.real_part_id and p.part_id != c.main_part_id
order by p.part_id
You can read full documentation about Hierarchical queries at http://docs.oracle.com/cd/B28359_01/server.111/b28286/queries003.htm

PostgreSQL Inner Join on the same table + second table?

If this is a stupid question, forgive me, I'm not very familiar with PostgreSQL.
I've collected inventory data from used car dealerships in my area and stored it in a postgreSQL table. I've got a second table with particular details regarding certain makes and models. For example:
The dealership table is structured like so:
-----------------------------------------
| Dealership | Make | Model | Year | ID |
----------------------------------------|
| A | Ford | F250 | 2003 | 1 |
| A | Chevy| Cobalt| 2005 | 2 |
| B | Ford | F250 | 2003 | 1 |
| B | Dodge| Chrgr | 2012 | 3 |
-----------------------------------------
The details table is structured like so:
-----------------------------------------
| ID | DetailA| DetailB| DetailC|
-----------------------------------------
| 1 | data | data | data |
| 2 | data | data | data |
| 3 | data | data | data |
| 4 | data | data | data |
-----------------------------------------
My goal is to retrieve vehicle matches from multiple dealerships and display the appropriate details. In the above example, I would like to see:
-----------------------------------------------------
| Make | Model | Year | DetailA | DetailB | DetailC |
-----------------------------------------------------
| Ford | F250 | 2003 | data | data | data |
-----------------------------------------------------
With this result, I will know that both A and B havea 2003 Ford F250 for sale, and can view the related details of the vehicle.
I've tried many different queries, but most are variations on something like this:
SELECT DISTINCT
dealership_table.make,
dealership_table.model,
dealership_table.year
details_table.detaila,
details_table.detailb,
details_table.detailc
FROM
dealership_table
INNER JOIN
details_table
ON
dealership_table.id = details_table.id
WHERE
dealership_table.dealership = 'A'
OR
dealership_table.dealership = 'B'
However this returns all of the distinct matches from the table where dealership is either A or B. I've tried multiple inner-joins, but I an error complaining details_table is specified multiple times.
If I'm doing something really silly, I apologize. Like I said before, I'm pretty much an SQL noob.
What am I doing wrong? How should I go about retrieving the desired results? Any suggestions, solutions, or advice is greatly appreciated!

You can write:
SELECT dealership_table1.make,
dealership_table1.model,
dealership_table1.year,
details_table.detaila,
details_table.detailb,
details_table.detailc
FROM dealership_table dealership_table1
JOIN dealership_table dealership_table2
ON dealership_table1.make = dealership_table2.make
AND dealership_table1.model = dealership_table2.model
AND dealership_table1.year = dealership_table2.year
JOIN details_table
ON dealership_table.id = details_table.id
WHERE dealership_table1.dealership = 'A'
AND dealership_table1.dealership = 'B'
;
(Note that the FROM dealership_table dealership_table1 and JOIN dealership_table dealership_table2 set up distinct "aliases", so you can use the same table multiple different times in the same query without getting name-conflicts.)

I may be misunderstanding your table layout, but I think you should consider changing to a different structure. Here's what I would propose:
Vehicle:
----------------------------
| ID | Make | Model | Year |
----------------------------
| 1 | Ford | F250 | 2003 |
| 2 | Chevy| Cobalt| 2005 |
| 3 | Dodge| Chrgr | 2012 |
----------------------------
Dealership:
----------------------------
| Dealership | ID | Detail |
----------------------------
| A | 1 | data |
| A | 2 | data |
| B | 1 | data |
| B | 3 | data |
----------------------------
This way you're not storing vehicle information (make/model/year) in more than one place.
Here's how you would write your desired query given the above schema:
SELECT Make, Model, Year, A.Detail, B.Detail, C.Detail
FROM Vehicle V
LEFT OUTER JOIN Dealership A on A.Dealership = 'A' and A.id = V.id
LEFT OUTER JOIN Dealership B on B.Dealership = 'B' and B.id = V.id
LEFT OUTER JOIN Dealership C on C.Dealership = 'C' and C.id = V.id

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Joining Two SQL Tables on Different Databases [duplicate] - sql

Related

Query Optimization - subselect in Left Join

SQL - Join with default if it's not possible to match

Four Table Join in BigQuery

SQL query for many-to-many self-join

PostgreSQL Inner Join on the same table + second table?

Categories

Resources