Inner join speed issue (Linked Server column)

Inner join speed issue (Linked Server column) - sql

I have a view (View A) that pulls in columns from a number of tables. It also pulls in a column from another view (View B) which gets its data from a Linked Server table.
Now, View B runs fine, pulling back 11,000 rows in about a second. View A also runs fine. However, if I INNER JOIN from View A to View B on a column that comes from the Linked Server, the entire query runs so slow it times out.
If I INNER JOIN from View A to View B on a column that does NOT come from the Linked Server, it runs fine.
So I traced the issue to joining on a column which resides on the Linked Server. I just have no idea how to fix it.
Can anyone give me any pointers?

The circumstances were slightly different, but both my co-workers and I have seen evidence that if you have something like this:
select something
from LinkedServer.DataBase.Owner.Table
where whatever
then sql server will select the entire table from the other server first, and apply the where clause afterwards. That might be happening to you.
We solve the problem by using openquery instead of the fully qualified method shown above, and by putting the openquery results into a temp table. Then we join to the temp table.

Related

Can someone explain how VIEWS & JOIN works in SQL Server?

Okay guys, i'm literally SCREWED. My professor is on indefinite leave and I have an assignment due next Friday which I'm completely lost on. I've entered all my data into my tables and I'm creating the views. Our assignment is to create 5 reports for a business in SQL and transfer them to Excel to create a frontend.
Basically, can someone describe to me how I would I utilise view and joins to create a report for this

A join means you are going to match up columns in two tables that have the same column, and add the data for both tables together, essentially creating one big table. You can create a view by using this code. What that will do is give you one thing to call, the view, and it will contain all of the code from the joins you did to create it, so you don't have to re-code and re-validate every time you want to use those joins. This isn't the place where we can just give you what you'd learn in your course, but I hope that helps.
Example:
select *
from tableSales a
join tableStaff b on a.Staff_ID = b.Staff_ID
join tableNext c on b.Column = c.Column (you can also join to table a)
This will give you the data from both tables in one place, based on the staff ID. You can then join a column from the tableStaff table to another table and so on.
With this one statement you can run it and see how it puts all the columns into one table. If you put this code into a view, you can then access it. Furthermore, Excel has built in functionality to read the views you have created, and lets you refresh the reports by connecting to the database and then to the view.
Good luck!
Watch out for duplicates!

JOIN SQL query over subsequent tables

I have a doubt about how to properly use JOIN SQL queries.
Imagine that I have 3 tables. I want to make a RIGHT JOIN between two of them. This is, I want to show all the records from the left table and just those records from the right table where the join is equal. Once I have this, I want to make another JOIN (inner or whatever) between the table that was on the right (now is the LEFT table) and the third table (that is the RIGHT table). So that, I would have 3 tables connected. My problem is that I get a message error from access that is:
The SQL statement could not be executed because it contains ambiguous
outer joins. To force one of the joins to be performed first, create a
separate query that performs the first join and then include that
query in your SQL statement.
So, Access is forcing me to use two separates queries but I don't want to use two. I think that this must be possible in just one. Am I right? Do you know if there is a method for this?
Thank you all.

Can you try this ?
Put the inner join first
Source : Source

Single SELECT with linked server makes multiple SELECT by ID

This is my issue. I defined a linked server, let's call it LINKSERV, which has a database called LINKDB. In my server (MYSERV) I've got the MYDB database.
I want to perform the query below.
SELECT *
FROM LINKSERV.LINKDB.LINKSCHEMA.LINKTABLE
INNER JOIN MYSERV.MYDB.MYSCHEMA.MYTABLE ON MYKEYFIELD = LINKKEYFIELD
The problem is that if I take a look to the profiler, I see that in the LINKSERV server lots of SELECT are made. They looks similar to:
SELECT *
FROM LINKTABLE WHERE LINKKEYFIELD = #1
Where #1 is a parameter that is changed for every SELECT.
This is, of course, unwanted because it appears to be not performing. I could be wrong, but I suppose the problem is related to the use of different servers in the JOIN. In fact, if I avoid this, the problem disappear.
Am I right? Is there a solution? Thank you in advance.

What you see may well be the optimal solution, as you have no filter statements that could be used to limit the number of rows returned from the remote server.
When you execute a query that draws data from two or more servers, the query optimizer has to decide what to do: pull a lot of data to the requesting server and do the joins there, or somehow send parts of the query to the linked server for evaluation? Depending on the filters and the availability or quality of the statistics on both servers, the optimizer may pick different operations for the join (merge or nested loop).
In your case, it has decided that the local table has fewer rows than the target and requests the target row that correspons to each of the local rows.
This behavior and ways to improve performance are described in Linked Server behavior when used on JOIN clauses
The obvious optimizations are to update your statistics and add a WHERE statement that will filter the rows returned from the remote table.
Another optimization is to return only the columns you need from the remote server, instead of selecting *

INNER JOIN on Linked Server Table much slower than Sub-Query

I came across this very odd situation, and i thought i would throw it up to the crowd to find out the WHY.
I have a query that was joining a table on a linked server:
select a.*, b.phone
from table_a a,
join remote.table_b b on b.id = a.id
(lots of data on A, but very few on B)
this query was talking forever (never even found out the actual run time), and that is when I noticed B had no index, so I added it, but that didn't fix the issue. Finally, out of desperation I tried:
select a.*, b.phone
from table_a a,
join (select id, phone from remote.B) as b on b.id = a.id
This version of the query, in my mind as least, should have the same results, but lo and behold, its responding immediately!
Any ideas why one would hang and the other process quickly? And yes, I did wait to make sure the index had been built before running both.

It's because sometimes(very often) execution plans automatically generated by sql server engine are not as good and obvious as we want to. You can look at execution plan in both situations. I suggest use hint in first query, something like that: INNER MERGE JOIN.
Here is some more information about that:
http://msdn.microsoft.com/en-us/library/ms181714.aspx

For linked servers 2nd variant prefetches all the data locally and do the join, since 1st variant may do inner loop join roundtrip to linked server for every row in A

Remote table as in not on that server? Is it possible that the join is actually making multiple calls out to the remote table while the subquery is making a single request for a copy of the table data, thus resulting in less time waiting on network?

I'm just going to have a guess here. When you access remote.b is it a table on another server?
If it is, the reason the second query is faster is because, you do one query to the other server and get all the fields you need from b, before processing the data. In the first query you are processing data and at the same time you are making several requests to the other server.
Hope this help you.

Oracle bug? SELECT returns no dupes, INSERT from SELECT has duplicate rows

I'm getting some strange behaviour from an Oracle instance I'm working on. This is 11gR1 on Itanium, no RAC, nothing fancy. Ultimately I'm moving data from one Oracle instance to another in a data warehouse scenario.
I have a semi-complex view running over a DB link; 4 inner joins over large-ish tables and 5 left joins over mid-size tables.
Here's the problem: when I test the view in SQL Developer (or SQL*Plus) it seems fine, no duplication whatsoever. However, when I actually use the view to insert data into a table I get a large number of dupes.
EDIT: - The data is going into an empty table. All of the tables in the query are on the database link. The only thing passed into the query is a date (e.g. INSERT INTO target SELECT * FROM view WHERE view.datecol = dQueryDate) -
I've tried adding a ROW_NUMBER() function to the select statement, partitioned by the PK for the view. All rows come back numbered as 1. Again though, the same statement run as an insert generates the same dupes as before and now conveniently numbered. The number of duped rows is not the same per key. Some records exist 4 times some only exist once.
I find this to behaviour to be extremely perplexing. :) It reminds me of working with Teradata where you have SET tables (unique rows only) and MULTISET tables (duplicate rows allowed) but Oracle has no such functionality.
A select that returns rows to the client should behave identically to one that inserts those rows to another location. I can't imagine a legitimate reason for this to happen, but maybe I'm suffering from a failure of imagination. ;)
I wonder if anyone else has experienced this or if it's a bug on this platform.
SOLUTION
Thanks to #Gary, I was able to get to the bottom of this by using "EXPLAIN PLAN FOR {my query};" and "SELECT * FROM TABLE(dbms_xplan.display);". The explain that actually gets used for the INSERT is very different from the SELECT.
For the SELECT most of the plan operations are 'TABLE ACCESS BY INDEX ROWID' and 'INDEX UNIQUE SCAN'. The 'Predicate Information' block contains all of the joins and filters from the query. At the end it says "Note - fully remote statement".
For the INSERT there is no reference to the indexes. The 'Predicate Information' block is just three lines and a new 'Remote SQL' block shows 9 small SQL statements.
The database has split my query into 9 subqueries and then attempts to join them locally. By running the smaller selects I've located the source of the duplicates.
I believe this is bug in the Oracle compiler around remote links. It creates logical flaws when re-writing the SQL. Basically the compiler is not properly applying the WHERE clause. I was just testing it and gave it an IN list of 5 keys to bring back. SELECT brings back 5 rows. INSERT puts 77,000+ rows into the target and totally ignores the IN list.
{Still looking for a way to force the correct behaviour, I may have to ask for the view to be created on the remote database although that is not ideal from a development viewpoint. I'll edit this when I've got it working…}

It seems to be Oracle Bug, we have found this following workarround:
If you want that your "insert into select ..." work like your "select ...", you can pack your select in a sub select.
For example :
select x,y,z from table1, table2, where ...
--> no duplicate
insert into example_table
select x,y,z from table1, table2, where ...
--> duplicate error
insert into example_table
select * from (
select x,y,z from table1, table2, where ...
)
--> no duplicate
Regards

One thing that comes to mind is that generally an optimizer plan for a SELECT will prefer a FIRST_ROWS plan to give rows back to the caller early, but an INSERT...SELECT will prefer an ALL_ROWS plan as it is going to have to deliver the full dataset.
I'd check the query plans using DBMS_XPLAN.DISPLAY_CURSOR (using the sql_id from V$SQL).
I have a semi-complex view running
over a DB link; 4 inner joins over
large-ish tables and 5 left joins over
mid-size tables.
...
All of the tables in the query are on
the database link
Again, a potential trouble-spot. If all the tables in the SELECT were on the other end of the DB link, the whole query would be sent to the remote database and the resultset returned. Once you throw the INSERT in, it is more likely that the local database will take charge of the query and pull all the data from the child tables over. But that may depend on whether the view is defined in the local database or the remote database. In the latter case, as far as the local optimizer is concerned there is just one remote object and it gets data from that, and the remote database will do the join.
What happens if you just go to the remote DB and do the INSERT on a table there ?

This is a bug in Oracle's handling of joins over DB links. I have a simpler situation which does not involve an INSERT versus SELECT. If I run my query remotely, I get duplicate rows, but if I run it locally, I do not. The only difference between the queries is the "#..." appended to the tables in the remote query. I am querying a 9i database from a 10.2 database using Oracle SQL Developer 3.0.
This even more stupid than that bug in Oracle which prevents you from joining tables with more than 1000 total columns, which is VERY easy to do when querying the ERP system. And no, the error message is nothing about tables having too many columns.
It's almost as stupid as that other Oracle database bug that prohibits querying tables containing LOB locators using ANSI syntax. Only Oracle syntax works!

Several options occur to me.
The dupes you see were already in the destination table ??
If in your Select, you reference the table you are Inserting into, ( ? ), then The Insert is interacting with the select in your combined
Insert ... Select ... From ...
In such a way (cartesian products ?) as to create the duplicates

I can't help but think that maybe you are experiencing a side-effect from something else related to the table. Are there any triggers which may be manipulating data?

How did you determine that there are no dupes in the original table?
As others have noted this seems to be the simpledst explanation for this strange behaviour.

Check your JOINs carefully. Potentially you have no duplicates in the individual tables, but underspecified joins can cause inadvertant CROSS JOINs so that your result set has duplicates due to multiplicity and, when inserted, this violates a uniqueness constraint in your destination table.
What I do in this case is to nest the query in a view or CTE and try to detect the duplicates straight from the SELECT:
WITH resultset AS (
-- blah, blah
)
SELECT a, b, c, COUNT(*)
FROM resultset
GROUP BY a, b, c
HAVING COUNT(*) > 1

I would suggest getting a plan on the query you are running and looking for a CARTESIAN JOIN in there. This could indicate a missing condition that is causing duplicated rows.

AS #Pop has already suggested this behaviour could happen if you are using a different login in SQLPlus to the login when your insert is running. (That is if the other login has a table/view/synonym with the same name)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Inner join speed issue (Linked Server column) - sql

Related

Can someone explain how VIEWS & JOIN works in SQL Server?

JOIN SQL query over subsequent tables

Single SELECT with linked server makes multiple SELECT by ID

INNER JOIN on Linked Server Table much slower than Sub-Query

Oracle bug? SELECT returns no dupes, INSERT from SELECT has duplicate rows

Categories

Resources