How to query a large table using update command with multiple joins? - sql

I have a big table on an external server, and I am trying do an update on another table on a local machine. The external server table is more than a couple of billion rows. When I run the following update as a stored procedure, it takes a very long time to give results. Is there a method to tweak this so that it can be broken into smaller datasets to query?
update C set C.[P_Flag]='Y' from [LocalMachine].dbo.O_C_Data C inner
join [ExternalServer].dbo.E_Extract E on C.[O_ID]=E.[B_ID] and
C.[P_Option]=E.[P_OPTID] and C.[P]=E.[PD_G] where C.[O_Flag]='Y'

EDIT Just as #usr pointed out, what I said below is not correct. You'd better know why it is slow by checking execution plan, which I didn't master when I faced the same problem. You query might trigger too many executions to remote server. That is to query data on remote server separately for every single row on local. However, I suggest to have a try with the first approach below. It did solve similar problem before. I suspect that you have the same root cause.
When you join local table to a table in remote server, SQL Server data engine will pull all data from table in remote and do JOIN in your local memory. That's why it's extremely slow.
I suggest creating a local temp table, inserting temp table with data pulled from remote server and updating local table with temp table. To achieve that, you still need to JOIN local table and remote table. There are 2 approaches:
Embed local table in query. In your case, can you write
INSERT INTO #TempTable
SELECT E.B_ID
FROM [ExternalServer].dbo.E_Extract E
WHERE E.[B_ID] IN (C_ID1, C_ID2, ...) AND E.[P_OPTID] IN (P_Option1, P_Option2) AND E.[PD_G] IN (P1, P2, ...)
Note, values after IN operator are constants.
Use REMOTE JOIN Hints. Take a look at MSDN
REMOTE
Specifies that the join operation is performed on the site of the right table. This is useful when the left table is a local table and the right table is a remote table. REMOTE should be used only when the left table has fewer rows than the right table.
If the right table is local, the join is performed locally. If both tables are remote but from different data sources, REMOTE causes the join to be performed on the site of the right table. If both tables are remote tables from the same data source, REMOTE is not required.
REMOTE cannot be used when one of the values being compared in the join predicate is cast to a different collation using the COLLATE clause.
REMOTE can be used only for INNER JOIN operations.

Related

Slow Access query when joining SQL table with Access table

I am using a SQL database and MS Access 2019 as the front end. The SQL database tables are linked to the Access db using an ODBC connection.
All my queries (they have multiple joined linked tables) run just fine, but as soon as I add a join to a table stored in the Access app (for example, a small table just for mapping values) the query will slow to a crawl. Doesn't matter if the joined fields are indexed or what type of join I'm using.
If anyone has seen this behaviour and found a solution I would much appreciate hearing it.
Joining tables from two separate databases requires the client app to retrieve both tables in their entirety in order to determine the rows needed. That's why it's slow.
If your Access table is small, try using a stored procedure on the SQL side with the data from Access moved to a temporary table. (Or better yet, move the Access table to SQL).

join the local DB inside a pass-through query(openquery) on a linked server

I am querying a linked Oracle server from a local SQL Server instance using openquery. The issue is that I want to limit the results returned based on a table in the local DB (about 10k rows) while executing in the linked instance. Normally I would do it the other way around and do something like:
SELECT *
FROM OPENQUERY(DBlinked, 'SELECT…') link_serv
INNER JOIN localdb.table1 local_1
ON local_1.column_1 = link_serv.column_1
but because of the size of the linked tables being hit during the openquery (multiple tables with 100’s millions of rows) executing against the entire dataset then sub-setting locally with a join isn’t a good idea.
I’ve tried to reference back to the local server inside openquery but since the table is located on the local file it won't find the table, and I do not have write permission on the linked server so moving the local table over to the linked server would be problematic.
Any guidance on how to limit the results prior to all the joins would be appreciated. Using openquery is not a requirement -- its just the only elegant solution I've found for dealing with resource intensive queries against large linked tables.
Essentially what I’m trying to do is
SELECT * FROM OPENQUERY(oracleDB, '
SELECT *
FROM dbo.table1 Ot
INNER JOIN Local_SQLServ.DB1.DBO.Table1 St
ON St.Column1 = Ot.column1’)

I'd like to merge data sets using an SQL query from different servers (one Sybase the other MS)

Is that possible? I'm using Aquadesk and I can't get it to work. The tables have a matching unique identifier and wondering if I can match them up in some way.
What you need - as I think - are "Federated Servers" (Databases) (you can look this up)
The basic idea behind that is, the you can create (catalog) a table in you local Database that is already residing on an other Database (or Server, or even an other DB System, but that depends in you SQL system and version) -> that is defintely a question for your DBAS
You get a table like 'MYSQL'.'PERSONS' that resides remotely (eg. 'BASE','PERSDATA'), so you can use them in a
`SELECT *
from 'LOCALNAME'.'USERS usr
JOIN 'MYSQL'.'PERSONS' pers
on usr.user_id=pers.id`
So jou can select and join over different Databases (and Servers)
I only used that whith IBM/UDB but it works realy fine, and has a fair performance (altough heavily depending on your statement)

Access slow when joining on Teradata SQL connections?

I have a simple Access database I use to create a number of reports. I've linked to a Teradata database server in our organization to pull some additional employee-level details. There is a simple left join on employee number, and all I pull is the name and the role.
The query without the connect takes maybe a minute or so to run and is very quick once loaded. Left joining on the Teradata connection slows everything down to a crawl. It can take 10 minutes or so to run the query through Excel. When the query is loaded in Access, scrolling through it is very slow.
I should note there's no performance issues with the Teradata server. I pull unrelated reports from the same and different tables, with complex joins and the speed is very quick.
I tried creating an even simpler query that does almost noting, and the performance issues are still there. Here is the code:
SELECT EMPL_DETAILS_CURR.NM_PREFX, EMPL_DETAILS_CURR.NM_GIVEN,
MC.DT_APP_ENTRY, MC.CHANNEL_IND
FROM MC LEFT JOIN EMPL_DETAILS_CURR ON MC.EMP_ID = EMPL_DETAILS_CURR.EMP_ID;
There are only 7000 records in MC.
If you are joining data between MS Access tables and Teradata tables the join has to be completed using the Microsoft JET Engine on your local machine. That means the data that exists in your Teradata tables is being brought down to your local machine to so that it can be joined.
If the tables are all on Teradata and accessed via linked tables in MS Access the join may still be occurring locally. I would suggest running the query as an ODBC Direct (I forget the exact term) query so that the SQL is passed on to Teradata to be executed and the results returned to MS Access when the query completes.

Join in linked server or join in host server?

Here's the situation: we have an Oracle database we need to connect to to pull some data. Since getting access to said Oracle database is a real pain (mainly a bureaucratic obstacle more than anything else), we're just planning on linking it to our SQL Server and using the link to access data as we need it.
For one of our applications, we're planning on making a view to get the data we need. Now the data we need is joined from two tables. If we do this, which would be preferable?
This (in pseudo-SQL if such a thing exists):
OPENQUERY(Oracle, "SELECT [cols] FROM table1 INNER JOIN table2")
or this:
SELECT [cols] FROM OPENQUERY(Oracle, "SELECT [cols1] FROM table1")
INNER JOIN OPENQUERY(Oracle, "SELECT [cols2] from table2")
Is there any reason to prefer one over the other? One thing to keep in mind: we are on a limitation on how long the query can run to access the Oracle server.
I'd go with your first option especially if your query contains a where clause to select a sub-set of the data in the tables.
It will require less work on both servers, assuming there are indices on the tables in the Oracle server that support the join operation.
If the inner join significantly reduces the total number of rows, then option 1 will result in much less network traffic (since you won't have all the rows from table1 having to go across the db link
What hamishmcn said applies.
Also, SQL Server doesn't really know anything about the indexes or statistics or cache kept by the oracle server. Therefore, the oracle server can probably do a much more efficient job with the join than the sql server can.