linked server with cross join

linked server with cross join - sql

I have two servers. One is mine and the other is of the other company. In the second server I can´t create any database or add any functions or store procedures, but I need to return information to do cross join with my database.
for example,
select fieldA, fieldB from localTBL l
left join linkedserver.remoteDB.dboremoteTBL r on l.ID = r.ID
or
select fieldA, fieldB from linkedserver.remoteDB.dboremoteTBL r
where r.ID in (select l.ID from localTBL l)
I did this but the performance was very horrible.
Is it possible to do this with better performance?

For better performance with linked servers, use openquery. Otherwise, you bring back all the data from the remote server first and apply the where clause afterwards.
In your situation, run the subquery first and return the list of values to a variable. Then use that variable in your openquery.

A CTE can be used to bring only the information you require across the wire and then perform the join against the calling server. Something like:
DECLARE #Id As int;
SELECT #Id = 45;
with cte (ID, fieldB)
AS
(
SELECT ID, fieldB
FROM linkedserver.remoteDB.dboremoteTBL
WHERE ID = #Id
)
SELECT lt.fieldA, cte.fieldB
FROM localTbl lt
INNER JOIN cte ON lt.ID = cte.ID
ORDER BY lt.ID;

Yep. Performance will be horrible. It's down to the network between you and the other company, and any authentications and authorisations that have to be done on the way.
This is why Linked Servers aren't used very much, even within a single company: Performance is usually bad. (I've never seen a Linked Server in a separate company and can only sympathise!)
Unless you can upgrade the network link between you there's not much you can do querying from a linked server.
This setup sounds like a short-term solution to a problem which needed a fast fix, and which has lasted longer than expected. If you can get a business case for spending money on it two alternatives are:
Cheapest alternative: is to cache the data locally: have a background Service running which drags the latest version of the data out of the Linked Server tables into a set on table in the local database and then run your queries against the local tables. This does depend on how changeable the remote data is, and how up-to-date your queries have to be. Forex, if you're doing things like getting yesterday's sales data, you might be able to do an overnight pull. If you need more up-to-date data, maybe an hourly pull. You can get quite picky sometimes, and if the data structures support it only pull out data which has changed since that last pull: that makes each pull much smaller and allows more frequent ones, maybe..
More expensive involving work by your and the other company: is to re-architect it so that the other company pushed changes to you as they happen, via a WCF service (or something) you expose. This can then update your local copy as the data comes in.

Related

Slow INNER JOIN Query in MS-Access 2016, Workaround?

This is my first question here, please be gentle.
At work, I inherited responsibility for a MS Access database, which is crucial for my department.
That database was grown over 20 years, with things added, removed and changed. Shortly, it's a convoluted mess. The VBA code contains great stuff like this, I kid you not:
Dim p, strText, A, B, C, d, E, F, G, H, i, j, K, L, M, N, O, Z, R, Q, kd, AfGb, T, LN, DC, EntBez, TP, pack, Press, Fehler, ksoll, Y, zeileninhalt, dateipfad, auslesezeile As String
I'm slowly cleaning it all up, but... anyways:
The Problem
It is slow when opening some forms (7-10 seconds loading time). I was able to narrow it down to the recordsource of these forms, which all use basically the same query or a variation of it.
The user enters a job number in the Main form and hits enter. The underlying query then pulls data from two tables based on the unique key JobNr. The result is a single row containing all the info for this job. These infos are displayed in an Editor form, using the query as recordsource.
The database is split into frontend and backend, t1 and t2 are backend tables each with about 20k entries. Backend sits somewhere on the company servers, frontend is saved locally on each user computer.
This is the query:
SELECT *
FROM t1
INNER JOIN t2 ON t1.JobNr = t2.JobNr
WHERE JobNr = [Forms]![Main]![JobNr];
t1 has JobNr as primary key, t2 has an ID as primary key, JobNr is not indexed. I want to try indexing it in hope of better performance, but currently can't make changes to the backend during busy work days...
This simple query is stupidly slow for what it is. The problem seems to be the order of execution. Instead of getting the single entries from t1 and t2 and joining these to a single dataset, Access seems to first join both friggin tables as a whole and only after that looks up the single dataset the user is interested in.
I was not able to find a solution to dictate the execution order. I tried different ways, like rewriting the SQL code with nested Selects, something like:
SELECT *
FROM
(SELECT * FROM t1
WHERE t1.JobNr = [Forms]![Main]![JobNr]) AS q1
INNER JOIN
(SELECT * FROM t2
WHERE t2.JobNr = [Forms]![Main]![JobNr]) AS q2 ON q1.JobNr = q2.JobNr;
Still slow...
I wanted to try WITH to partition the SQL code, but that's apparently not supported by MS Access SQL.
I tried splitting the query into two queries q1 and q2 in access, that pull the data from t1 resp. t2 with a third query q3 that does the joining of these supposed subsets... to no avail. q1 and q2 individually run blazingly fast with the expected data result, but q3 takes the usual 7-10 seconds.
The current approach I'm working on is running q1 and q2 and saving the acquired data to two temp tables tq1 and tq2 and then joining these in a last query. This works very well in as it rapidly loads the data and displays it in the editor (< 0.5 seconds, hurray!). The problem I'm facing now is updating any changes the user makes in the editor form to the backend tables t1 and t2... Right now, user changes don't take and are lost when closing and reopening the job/editor.
Soooo, what am I missing/doing wrong? Is there any way to make this INNER JOIN query fast without the whole temp table workaround?
If not, how would I go about updating the backend tables from the local temp tables? Changes in the Editor are saved in the temp tables until overwritten by reopening the editor.
I already added intermediary queries, that add the resp. primary keys to the temp tables (this cannot be done directly in the Create Table queries....) but...
I also tried using an Update query when closing the Editor, which doesn't seem to work either, but I might have to debug that one, I'm not sure it even dies anything right now...
Sorry for the long text!
Kind regards and thanks for any help in advance!

The most obvious rework is to move the filter into the join:
SELECT *
FROM t1
INNER JOIN t2 ON (t1.JobNr = t2.JobNr AND t2.JobNr = [Forms]![Main]![JobNr])
My guess is that it's irrelevant if you filter on t1 or t2, but then my guess would also be that Access is smart enough to filter while joining and that appears to be untrue, so check that.
For more detailed performance analysis, a query plan tends to help. See How to get query plans (showplan.out) from Access 2010?
Of course, adjust 14 to your version number.

You need to add a unique index to t2.JobNr, even better make it the primary key.
Everything else is just a waste of time at this point.
Set a date and time for the users to quit their frontends, kick them out if necessary: Force all users to disconnect from 2010 Access backend database
In the long run, moving from an Access backend to a server backend (like the free SQL Server Express) will be a good idea.
Edit: have you tried what happens if you don't do JOIN at all?
SELECT *
FROM t1, t2
WHERE t1.JobNr = [Forms]![Main]![JobNr]
AND t2.JobNr = [Forms]![Main]![JobNr]
Normally you want to avoid this, but it might help in this case.

How to eliminate multiple server calls | MS SQL Server

There is a stored procedure that needs to be modified to eliminate a call to another server.
What is the easiest and feasible way to do this so that the final SP's execution time is faster and also preference to solutions which do not involve much change to the application?
Eg:
select *
from dbo.table1 a
inner join server2.dbo.table2 b on a.id = b.id

Cross server JOINs can be problematic as the optimiser doesn't always pick the most effective solution, which may even result in the entire remote table being dragged over your network to be queried for a single row.
Replication is by far the best option, if you can justify it. This will mean you need to have a primary key on the table you want to replicate, which seems a reasonable constraint (ha!), but might become an issue with a third-party system.
if the remote table is small then it might be better to take a temporary local copy, e.g. SELECT * INTO #temp FROM server2.<database>.dbo.table2;. Then you can change your query to something like this: select * from dbo.table1 a inner join #temp b on a.id = b.id;. The temporary table will be marked for garbage collection when your session ends, so no need to tidy up after yourself.
If the table is larger then you might want to do the above, but also add an index to your temporary table, e.g. CREATE INDEX ix$temp ON #temp (id);. Note that if you use a named index then you will have issues if you run the same procedure twice simultaneously, as the index name won't be unique. This isn't a problem if the execution is always in series.
If you have a small number of ids that you want to include then OPENQUERY might be the way to go, e.g. SELECT * FROM OPENQUERY('server2', 'SELECT * FROM table2 WHERE id IN (''1'', ''2'')');. The advantage here is that you are now running the query on the remote server, so it's more likely to use a more efficient query plan.
The bottom line is that if you expect to be able to JOIN a remote and local table then you will always have some level of uncertainty; even if the query runs well one day, it might suddenly decide to run a LOT slower the following day. Small things, like adding a single row of data to the remote table, can completely change the way the query is executed.

Single SELECT with linked server makes multiple SELECT by ID

This is my issue. I defined a linked server, let's call it LINKSERV, which has a database called LINKDB. In my server (MYSERV) I've got the MYDB database.
I want to perform the query below.
SELECT *
FROM LINKSERV.LINKDB.LINKSCHEMA.LINKTABLE
INNER JOIN MYSERV.MYDB.MYSCHEMA.MYTABLE ON MYKEYFIELD = LINKKEYFIELD
The problem is that if I take a look to the profiler, I see that in the LINKSERV server lots of SELECT are made. They looks similar to:
SELECT *
FROM LINKTABLE WHERE LINKKEYFIELD = #1
Where #1 is a parameter that is changed for every SELECT.
This is, of course, unwanted because it appears to be not performing. I could be wrong, but I suppose the problem is related to the use of different servers in the JOIN. In fact, if I avoid this, the problem disappear.
Am I right? Is there a solution? Thank you in advance.

What you see may well be the optimal solution, as you have no filter statements that could be used to limit the number of rows returned from the remote server.
When you execute a query that draws data from two or more servers, the query optimizer has to decide what to do: pull a lot of data to the requesting server and do the joins there, or somehow send parts of the query to the linked server for evaluation? Depending on the filters and the availability or quality of the statistics on both servers, the optimizer may pick different operations for the join (merge or nested loop).
In your case, it has decided that the local table has fewer rows than the target and requests the target row that correspons to each of the local rows.
This behavior and ways to improve performance are described in Linked Server behavior when used on JOIN clauses
The obvious optimizations are to update your statistics and add a WHERE statement that will filter the rows returned from the remote table.
Another optimization is to return only the columns you need from the remote server, instead of selecting *

INNER JOIN on Linked Server Table much slower than Sub-Query

I came across this very odd situation, and i thought i would throw it up to the crowd to find out the WHY.
I have a query that was joining a table on a linked server:
select a.*, b.phone
from table_a a,
join remote.table_b b on b.id = a.id
(lots of data on A, but very few on B)
this query was talking forever (never even found out the actual run time), and that is when I noticed B had no index, so I added it, but that didn't fix the issue. Finally, out of desperation I tried:
select a.*, b.phone
from table_a a,
join (select id, phone from remote.B) as b on b.id = a.id
This version of the query, in my mind as least, should have the same results, but lo and behold, its responding immediately!
Any ideas why one would hang and the other process quickly? And yes, I did wait to make sure the index had been built before running both.

It's because sometimes(very often) execution plans automatically generated by sql server engine are not as good and obvious as we want to. You can look at execution plan in both situations. I suggest use hint in first query, something like that: INNER MERGE JOIN.
Here is some more information about that:
http://msdn.microsoft.com/en-us/library/ms181714.aspx

For linked servers 2nd variant prefetches all the data locally and do the join, since 1st variant may do inner loop join roundtrip to linked server for every row in A

Remote table as in not on that server? Is it possible that the join is actually making multiple calls out to the remote table while the subquery is making a single request for a copy of the table data, thus resulting in less time waiting on network?

I'm just going to have a guess here. When you access remote.b is it a table on another server?
If it is, the reason the second query is faster is because, you do one query to the other server and get all the fields you need from b, before processing the data. In the first query you are processing data and at the same time you are making several requests to the other server.
Hope this help you.

TSQL Join efficiency

I'm developing an ASP.NET/C#/SQL application. I've created a query for a specific grid-view that involves a lot of joins to get the data needed. On the hosted server, the query has randomly started taking up to 20 seconds to process. I'm sure it's partly an overloaded host-server (because sometimes the query takes <1s), but I don't think the query (which is actually a view reference via a stored procedure) is at all optimal regardless.
I'm unsure how to improve the efficiency of the below query:
(There are about 1500 matching records to those joins, currently)
SELECT dbo.ca_Connections.ID,
dbo.ca_Connections.Date,
dbo.ca_Connections.ElectricityID,
dbo.ca_Connections.NaturalGasID,
dbo.ca_Connections.LPGID,
dbo.ca_Connections.EndUserID,
dbo.ca_Addrs.LotNumber,
dbo.ca_Addrs.UnitNumber,
dbo.ca_Addrs.StreetNumber,
dbo.ca_Addrs.Street1,
dbo.ca_Addrs.Street2,
dbo.ca_Addrs.Suburb,
dbo.ca_Addrs.Postcode,
dbo.ca_Addrs.LevelNumber,
dbo.ca_CompanyConnectors.ConnectorID,
dbo.ca_CompanyConnectors.CompanyID,
dbo.ca_Connections.HandOverDate,
dbo.ca_Companies.Name,
dbo.ca_States.State,
CONVERT(nchar, dbo.ca_Connections.Date, 103) AS DateView,
CONVERT(nchar, dbo.ca_Connections.HandOverDate, 103) AS HandOverDateView
FROM dbo.ca_CompanyConnections
INNER JOIN dbo.ca_CompanyConnectors ON dbo.ca_CompanyConnections.CompanyID = dbo.ca_CompanyConnectors.CompanyID
INNER JOIN dbo.ca_Connections ON dbo.ca_CompanyConnections.ConnectionID = dbo.ca_Connections.ID
INNER JOIN dbo.ca_Addrs ON dbo.ca_Connections.AddressID = dbo.ca_Addrs.ID
INNER JOIN dbo.ca_Companies ON dbo.ca_CompanyConnectors.CompanyID = dbo.ca_Companies.ID
INNER JOIN dbo.ca_States ON dbo.ca_Addrs.StateID = dbo.ca_States.ID

It may have nothing to do with your query and everything to do with the data transfer.
How fast does the query run in query analyzer?
How does this compare to the web page?
If you are bringing back the entire data set you may want to introduce paging, say 100 records per page.

The first thing I normally suggest is to profile to look for potential indexes to help out. But the when the problem is sporadic like this and the normal case is for the query to run in <1sec, it's more likely due to lock contention rather than a missing index. That means the cause is something else in the system causing this query to take longer. Perhaps an insert or update. Perhaps another select query — one that you would normally expect to take a little longer so the extra time on it's end isn't noted.

I would start with indexing, but I have a database that is a third-party application. Creating my own indexes is not an option. I read an article (sorry, can't find the reference) recommending breaking up the query into table variables or temp tables (depending on number of records) when you have multiple tables in your query (not sure what the magic number is).
Start with dbo.ca_CompanyConnections, dbo.ca_CompanyConnectors, dbo.ca_Connections. Include the fields you need. And then subsitute these three joined tables with just the temp table.
Not sure what the issue is (would like to here recommendations) but seems like when you get over 5 tables performance seems to drop.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas