Join in linked server or join in host server? - sql

Here's the situation: we have an Oracle database we need to connect to to pull some data. Since getting access to said Oracle database is a real pain (mainly a bureaucratic obstacle more than anything else), we're just planning on linking it to our SQL Server and using the link to access data as we need it.
For one of our applications, we're planning on making a view to get the data we need. Now the data we need is joined from two tables. If we do this, which would be preferable?
This (in pseudo-SQL if such a thing exists):
OPENQUERY(Oracle, "SELECT [cols] FROM table1 INNER JOIN table2")
or this:
SELECT [cols] FROM OPENQUERY(Oracle, "SELECT [cols1] FROM table1")
INNER JOIN OPENQUERY(Oracle, "SELECT [cols2] from table2")
Is there any reason to prefer one over the other? One thing to keep in mind: we are on a limitation on how long the query can run to access the Oracle server.

I'd go with your first option especially if your query contains a where clause to select a sub-set of the data in the tables.
It will require less work on both servers, assuming there are indices on the tables in the Oracle server that support the join operation.

If the inner join significantly reduces the total number of rows, then option 1 will result in much less network traffic (since you won't have all the rows from table1 having to go across the db link

What hamishmcn said applies.
Also, SQL Server doesn't really know anything about the indexes or statistics or cache kept by the oracle server. Therefore, the oracle server can probably do a much more efficient job with the join than the sql server can.

Related

How can I perform the SQL query against different database systems?

I have a question about how to perform database queries against different database systems?
For example, I have a following SQL query string:
SELECT A.F1, A.F2, B.F3, B.F4
FROM TableA A, TableB B
WHERE A.ID=B.ID AND B.ID=xyz
Is there any solution that I can perform the above query when:
TableA is from an Oracle database, TableB is also from an Oracle database from another instance
TableA is from a SQL Server database, TableB is from an Oracle database
TableA is from an Oracle database, TableB is from a SQL Server database
TableA is from a SQL Server database, TableB is also from a SQL Server database from another SQL Server instance
I know that for situation #1 I can use the ORACLE DATABASE LINK feature (also maybe #3). But is there any common solution which can address all of the scenarios above? For instance, maybe there is another scenario that I want to join two tables from MySQL and SQL Server databases.
For coding I am using C#/.NET, any recommendation is welcome, including joining the data in the code.
Thanks!
This is known as a federated query
You can use SQL Servers federation ability (linked servers) and run the query in SQL Server. You can use Oracles federation ability (Oracle heterogeneous services) and run the query in Oracle.
Either way you need to pick a database server, make the other (foreign) database known to that server then execute the query on the server.
Keep in mind you cannot expect good performance from this as in most cases the executing server is loading all the source records locally and joining locally.
Otherwise you need to write your own 'federation' ability in your app. This is non trivial, i.e. data types don't match exactly between vendors so you need to build some smarts. Also if you are joining particularly large tables you'd better make sure your algorithm is optimised or you'll run our of memory. Further to this, federated query ability in existing products (SQL, Oracle, Cognos etc.) is the result of very large companies doing a lot of development.
Can you tell us how many records you expect to join, and are the various source database servers mostly fixed or is this for an ad hoc query tool?

Access slow when joining on Teradata SQL connections?

I have a simple Access database I use to create a number of reports. I've linked to a Teradata database server in our organization to pull some additional employee-level details. There is a simple left join on employee number, and all I pull is the name and the role.
The query without the connect takes maybe a minute or so to run and is very quick once loaded. Left joining on the Teradata connection slows everything down to a crawl. It can take 10 minutes or so to run the query through Excel. When the query is loaded in Access, scrolling through it is very slow.
I should note there's no performance issues with the Teradata server. I pull unrelated reports from the same and different tables, with complex joins and the speed is very quick.
I tried creating an even simpler query that does almost noting, and the performance issues are still there. Here is the code:
SELECT EMPL_DETAILS_CURR.NM_PREFX, EMPL_DETAILS_CURR.NM_GIVEN,
MC.DT_APP_ENTRY, MC.CHANNEL_IND
FROM MC LEFT JOIN EMPL_DETAILS_CURR ON MC.EMP_ID = EMPL_DETAILS_CURR.EMP_ID;
There are only 7000 records in MC.
If you are joining data between MS Access tables and Teradata tables the join has to be completed using the Microsoft JET Engine on your local machine. That means the data that exists in your Teradata tables is being brought down to your local machine to so that it can be joined.
If the tables are all on Teradata and accessed via linked tables in MS Access the join may still be occurring locally. I would suggest running the query as an ODBC Direct (I forget the exact term) query so that the SQL is passed on to Teradata to be executed and the results returned to MS Access when the query completes.

Create query using two databases

I am developing a web application (using classic asp for now) that needs to query two tables on two separate databases on the same server.
I have been given separate application-level login/passwords for each database by the IT team which will be used in my code to connect.
I want to create a query that links two tables (one from each database) together, but not too sure how, and Google hasn't helped me much either.
I'm assuming I have to create two connections to do this as I have two separate logins, but how then would I create a query that uses tables from each.
Just a note - our IT team will not at this stage link the two databases together. I'm pretty much stuck with what I have.
Hope you can help.
Mat
If the IT team wants the databases to be separate, you obviously can't use the database to do the join for you.
You'd have to execute the join client side. Linq is an easy way to do that, but a loop with two datasets also works.
The typical syntax for a cross database query would be something like:
SELECT <columns>
FROM [schema].[table] t1 INNER JOIN
[databaseName].[schema].[table] t2 ON <condition>
However, if you're using 2005 or above, it's well worth creating a synonym in the first database so that the reference to the second database isn't peppered throughout your code (what if the database name changes later)?
CREATE SYNONYM [schema].[synonymName] FOR [databaseName].[schema].[table]
Then your query becomes:
SELECT <columns>
FROM [schema].[table] t1 INNER JOIN
[schema].[synonymName] t2 ON <condition>
You may need to set permissions appropriately such that the user you are using to connect to the first database has access to the second. If the IT team simply grants the relevant access to both databases from the same server principal, that is the best way to achieve it. If you are stuck with having two totally separate logins then that sucks. But there isn't any need to 'link the databases together' in order to run a cross-database query.

SQL Server 2008, Sybase - large select queries over low bandwidth

I need to pull a large amount of data from various tables across a line that has very low bandwidth. I need to minimize the amount of data that gets sent too and fro.
On that side is a Sybase database, on this side SQL Server 2008.
What I need is to pull all the tables from the Sybase database that have to do with this office. Lets say I have the following tables as an example:
Farm
Tree
Branch
etc.
(one farm has many trees, one tree has many branches etc.)
Lets say the "Farm" table has a field called "CountryID", and I only want the data for where CountryID=12. The actual table structures I am looking at are very complex (and I am also not very familiar with them) so I want to try to keep the queries simple.
So I am thinking of setting up a series of views:
CREATE VIEW vw_Farm AS
SELECT * from Farm where CountryID=12
CREATE VIEW vw_Tree AS
SELECT * from Tree where FarmID in (SELECT FarmID FROM vw_Farm)
CREATE VIEW vw_Branch AS
SELECT * from Tree where BranchID in (SELECT BranchID FROM vw_Branch)
etc.
To then pull the actual data across I would then do:
SELECT * from vw_Farm into localDb.Farm
SELECT * from vw_Tree into localDb.Tree
SELECT * from vw_Branch into localDb.Branch
etc.
Simple enough to set up. I am wondering how this will perform though? Will it perform all the SELECT statements on the Sybase side and then just send back the result? Also, since this will be an iterative process, is it possible to index the views for subsequent calls?
Any other optimisation suggestions would also be welcome!
Thanks
Karl
EDIT: Just to clarify, the views will be set up in SQL Server. I am using a linked server using Sybase ASE to set up those views. What is worrying me in particular is whether the fact that the view is in SQL Server on this side and not on Sybase on that side will mean that for each iteration the data from the preceeing view will get pulled across to SQL Server first before the calculations get executed. I want Sybase to do all the calcs and just pass the results across.
It's difficult to be certain without testing, but my somewhat-relevant experience (using linked servers to platforms other than Sybase, and on SQL Server 2005) has been that using subqueries (such as your code for vw_Tree and vw_Branch) more or less guarantees that SQL Server will pull all the data for the outer table into a local temp table, then match it to the results of the inner query.
The problem is that SQL Server has no access to the linked server's table statistics, so can make no meaningful decisions about how to optimise the query.
If you want to be sure to have the work done on the Sybase server, your best bet will be to write code (could be views or stored procedures) on the Sybase side and reference them from SQL Server.
Linked server connections are, in my experience, not particularly resilient over flaky networks. If it's available, you could consider using Integration Services rather than linked-server queries - but even that may not be much better. You may need to consider falling back on moving text files with robocopy and bcp.

Oracle and SQL Dataset

Problem: I have to pull data from a SQL Server database and an Oracle database and put it together in one dataset. The problem I have is: The SQL Server query requires an ID that is only found from the return of the Oracle query.
What I am wondering is: How can I approach this problem in a way such that performance is not harmed?
You can do this with linked servers or by transferring the data all to one side. It's all going to depend on the volume of data on each side.
A general rule of thumb is to execute the query on the side which has the most data.
For instance, if the set of Oracle IDs is small, but the SQL Server set is large, you make a linked server to the Oracle side and execute this on the SQL Server side:
SELECT *
FROM sqlservertable
INNER JOIN linkedserver.oracletable
ON whatever
In this case, if the Oracle side is large (or cannot be prefiltered before the need to join with the SQL Server side), the performance will normally be pretty poor - and will improve a lot by pulling the entire table (or the minimal subset you can determine) into a SQL Server table instead and do the JOIN all on the SQL Server side.