SQL - Combining two databases - sql

Trying to connect two active DB for reporting both running SQL. Recently migrated several customer groups to the new DB but now need to run comparative historical reports. How do I get all data from both db, joins filter out newly added customers on either side based on the join used. Is this possible?

If they are on two separate servers, you will need to link the servers to each other first. This link should set you in the proper direction. Then you need to reference them using 4-part BOL:
SELECT T1.*
FROM [Server1].[Database1].[dbo].[Table1] T1
LEFT JOIN [Server2].[Database2].[dbo].[Table1] T2
ON T1.MyField = T2.MyField
If they are on the same server, you only need to add the database name to your SQL code. So, if you're trying to link data from Table1 in Database1 to Table1 in Database2, you would do this:
SELECT T1.*
FROM [Database1].[Table1] T1
LEFT JOIN [Database2].[Table1] T2
ON T1.MyField = T2.MyField

Related

SQL join on multiple columns or on single calculated column

I'm migrating the backend a budget database from Access to SQL Server and I ran into an issue.
I have 2 tables (let's call them t1 and t2) that share many fields in common: Fund, Department, Object, Subcode, TrackingCode, Reserve, and FYEnd.
If I want to join the tables to find records where all 7 fields match, I can create an inner join using each field:
SELECT *
FROM t1
INNER JOIN t2
ON t1.Fund = t2.Fund
AND t1.Department = t2.Department
AND t1.Object = t2.Object
AND t1.Subcode = t2.Subcode
AND t1.TrackingCode = t2.TrackingCode
AND t1.Reserve = t2.Reserve
AND t1.FYEnd = t2.FYEnd;
This works, but it runs very slowly. When the backend was in Access, I was able to solve the problem by adding a calculated column to both tables. It basically, just concatenated the fields using "-" as a delimiter. The revised query is as follows:
SELECT *
FROM t1 INNER JOIN t2
ON CalculatedColumn = CalculatedColumn
This looks cleaner and runs much faster. The problem is when I moved t1 and t2 to SQL Server, the same query gives me an error message:
I'm new to SQL Server. Can anyone explain what's going on here? Is there a setting I need to change for the calculated column?
Posting this as an answer from my comment.
Usually, this is an issue with mismatched Data types between the two columns referenced. Check and make sure the data types of the two fields (CompositeID) are the same.
You have to calculate the columns before joining them as the ON clause can only access columns for the table.
It is no good to have two identical tables anyway so you should rethink your design completely.
SELECT t1a.*,t2a.*
FROM (SELECT CalculatedColumn, * FROM t1) t1a INNER JOIN (SELECT CalculatedColumn, * FROM t2 ) t2a
ON t1a.CalculatedColumn = t2a.CalculatedColumn

BigQuery how to automatically handle "duplicate column names" on left join

I am working with a dataset of tables that (a) often requires joining tables together, however also (b) frequently has duplicate columns names. Any time I write a query along the lines of:
SELECT
t1.*, t2.*
FROM t1
LEFT JOIN t2 ON t1.this_id = t2.matching_id
...I get the error Duplicate column names in the result are not supported. Found duplicate(s): this_col, that_col, another_col, more_cols, dupe_col, get_the_idea_col
I understand that with BigQuery, it is better to avoid using * when selecting tables, however my data tables aren't too big + my bigquery budget is high, and doing these joins with all columns helps significantly with data exploration.
Is there anyway BigQuery can automatically handle / rename columns in these situations (e.g. prefix the column with the table name), as opposed to not allowing the query all together?
Thanks!
The simplest way is to select records rather than columns:
SELECT t1, t2
FROM t1 LEFT JOIN
t2
ON t1.this_id = t2.matching_id;
This is pretty much what I do for ad hoc queries.
If you want the results as columns and not records (they don't look much different in the results), you can use EXCEPT:
SELECT t1.* EXCEPT (duplicate_column_name),
t2.* EXCEPT (duplicate_column_name),
t1.duplicate_column_name as t1_duplicate_column_name,
t2.duplicate_column_name as t2_duplicate_column_name
FROM t1 LEFT JOIN
t2
ON t1.this_id = t2.matching_id;
Is there anyway BigQuery can automatically handle / rename columns in these situations (e.g. prefix the column with the table name), as opposed to not allowing the query all together?
This is possible with BigQuery Legacy SQL - which can be handy for data exploration unless you are dealing with data types or using some functions/features specific to standard sql
So below
#legacySQL
SELECT t1.*, t2.*
FROM table1 AS t1
LEFT JOIN table2 AS t2
ON t1.this_id = t2.matching_id
will produce output where all column names will be prefixed with respective alias like t1_this_id and t2_matching_id

driving_site hint for multiple remote tables

I have a query of the following format. It uses two remote tables and a local table.
SELECT *
FROM table1#db2 t1 INNER JOIN table2#db2 t2 -- two large remote tables on the same DB
ON t1.id = t2.id
WHERE t1.prop = '1'
AND t2.prop = '2'
AND t1.prop2 IN (SELECT val FROM tinylocaltable)
I'm wondering how to properly use the DRIVING_SITE query hint to push the bulk of the work to db2 (i.e. ensure the join and conditions are applied on db2). Most of the examples I see of DRIVING_SITE reference only one remote table. Is SELECT /*+DRIVING_SITE(t1)*/ * sufficient or do I need to list both remote tables (t1 and t2) in the hint? If the latter, what is the proper syntax?
(If you're wondering why this isn't being executed on db2 to start with, it's because this is actually one UNION ALL section of a larger query, where the other UNION ALL sections use the local DB).
The DRIVING_SITE hint instructs the optimizer to execute the query at a different site than that selected by the database
Your query uses
FROM table1#db2 t1 INNER JOIN table2#db2 t2
where both tables are on the same "different site", so
SELECT /*+ DRIVING_SITE(t1)*/
should be OK (in my opinion. Can't find anything in documentation that would suggest different).

JOIN on table on another database server vs. JOIN on a SELECT on table on another server

I’m curious if there is any difference in performance between JOINing on a table residing on another SQL server instance and JOINing on a subset of that same table the other server instance. In other words, would performance be the same for the following two queries:
SELECT t1.CustomerName, t2.Address, t2.Phone
FROM Table1 t1
LEFT JOIN [Server X].dbo.Table2 t2 on t2.CustomerID = t1.CustomerID
And
SELECT t1.CustomerName, t2.Address, t2.Phone
FROM Table1 t1
LEFT JOIN (SELECT CustomerID, Address. Phone FROM [Server X].dbo.Table2)
t2
on t2.CustomerID = t1.CustomerID
We can assume that Table2 contains more than just these two columns. I’m wondering if SELECTing only the columns I need vs. JOINing on the entire table will make any sort of difference, especially given this is a cross server query.
Off the top of my head, I'm not sure, but without testing it, it looks to me like SQL would execute these the same. You can check this out if you have SQL Server Management Studio and run the execution plan.
I believe the top is more efficient (as it would be on the same server) if the inner select were more complex. It's really up to the optimizer.
IMO, both of the queries will have same performance in this case.
If you see the execution plans for this. both plans will be identical.

sql query for join two tables of different databases that are in two Servers

i have two tables tableA in databaseA on ServerA and tableB in databaseB on ServerB.
i just want to perform fullouter join to these tables based on common fieldname of it
In SQL Server, you can create a linked server (in Management Studio, that's under Server Objects.) Then you can use a four part name to join the tables:
select *
from localdb.dbo.localtable as t1
full outer join
linkedserver.remotedb.dbo.remotetable as t2
on t1.col1 = t2.col1
If you're using another database, please edit your question to say which.