Using dot notation vs OpenQuery from SQL Server to Oracle - sql

Trying to bring data from Oracle into SQL Server. SQL has a linked server defined. I need to filter data out on the Oracle side, so there is a WHERE clause that limits the data, based on the value of one column (time period).
Tried performance with two different methods:
OpenQuery:
select * INTO T2 from OpenQuery(LinkedSrv,'select * from SCHEMA.TAB')
dot notation (LinkedServer..Schema.Table):
select * INTO T2 from LinkedSrv..SCHEMA.TAB
Both perform kind of slow, pushing about 5-6k rows/second. For 20M row table, this is not ideal. And then discovered something rather interesting:
select * INTO T2 from LinkedSrv..SCHEMA.TAB WHERE col >= Value
This pushes the throughput up to almost 100k rows/second
Specifying criteria with OpenQuery does not affect the throughout. Explain plan shows
RemoteQuery -> ComputeScalar -> Filter (WHERE) -> TableInsert in the dot notation scenario with WHERE.
Other than that, explain plans are the same. So... How does adding a WHERE clause locally (because this is where it does it) improve throughput by a factor of 10???
... And what can I do to achieve (the desired outcome) the same fast throughput when using OpenQuery?
Thank you!

The difference between the dot notation and the OpenQuery methods is that the first uses client cursor engine and most things are evaluated locally while the second send the Query to the remote server and read the output.
Not always filtering data in the dot notation query is faster from OpenQuery approach. It is based on each the local and remote server resources.
Check out the following stackoverflow question they will give you more information:
SQL 2005 - Linked Server to Oracle Queries Extremely Slow
Additional Information
Best Performer: Distributed query (Four-part) or OPENQUERY when executing linked server queries in SQL Server

Related

SSIS performance vs OpenQuery with Linked Server from SQL Server to Oracle

We have a linked server (OraOLEDB.Oracle) defined in the SQL Server environment. Oracle 12c, SQL Server 2016. There is also an Oracle client (64 bit) installed on SQL Server.
When retrieving data from Oracle (a simple query, getting all columns from a 3M row, fairly narrow table, with varchars, dates and integers), we are seeing the following performance numbers:
sqlplus: select from Oracle > OS File on the SQL Server itself
less than 2k rows/sec
SSMS: insert into a SQL Server table select from Oracle using OpenQuery (passthrough to Oracle, so remote execution)
less than 2k rows/sec
SQL Export/Import tool (in essence, SSIS): insert into a SQL Server table, using the OLEDB Oracle for source and OLEDB SQL Server for target
over 30k rows/second
Looking for ways to improve throughput using OpenQuery/OpenResultSet, to match SSIS throughput. There is probably some buffer/flag somewhere that allows to achieve the same?
Please advise...
Thank you!
--Alex
There is probably some buffer/flag somewhere that allows to achieve the same?
Probably looking for the FetchSize parameter
FetchSize - specifies the number of rows the provider will fetch at a
time (fetch array). It must be set on the basis of data size and the
response time of the network. If the value is set too high, then this
could result in more wait time during the execution of the query. If
the value is set too low, then this could result in many more round
trips to the database. Valid values are 1 to 429,496, and 296. The
default is 100.
eg
exec sp_addlinkedserver N'MyOracle', 'Oracle', 'ORAOLEDB.Oracle', N'//172.16.8.119/xe', N'FetchSize=2000', ''
See, eg https://blogs.msdn.microsoft.com/dbrowne/2013/10/02/creating-a-linked-server-for-oracle-in-64bit-sql-server/
I think there are many way to enhance the performance on the INSERT query, I suggest reading the following article to get more information about data loading performance.
The Data Loading Performance Guide
There are one method you can try which is minimizing the logging by using clustered index. check the link below for more information:
New update on minimal logging for SQL Server 2008

sql server - Openquery vs 4part name

A view that references a remote server
4part name ([ServerName], [DatabaseName], [Owner], [Object Name]
OpenQuery
Which is better performance?
Why is performance good?
AFAIK, it depends a lot on your remote server type.
With recent SQL version (2016) on both server (local and remote), I didn't noticed any difference.
If your remote server is anything else (postgres, mysql...) your really should use OpenQuery as it executes the query on the remote server, getting only the correct resultset. If you use the 4 part name, SQL server will order and filter on local.
For example, take a 4 million record table and execute a query like :
SELECT * FROM reoteserver.database.schema.table where id = 4
With openquery, sql server will get only the record with id 4. Without, it will get all the table, and then filter it to get the id 4.
Late to the party here, but the difference essentially is that 4 part queries are executed locally, thus cannot utilise indexes or keys since the local server doesn't know about them. Instead it essentially retrieves the entire object, then applies the filter. On a small table, you would be unlikely to notice a difference, but on a table with millions of rows, you'd notice a difference. Openquery essentially tells the remote server to execute the query on it's behalf then pass the result back.
General rule I would say is;
NEVER join on to a table using 4 part. Only join using Openquery and I would even avoid that where possible, but that's more of a personal preference.
However, 4 part SP execution i.e. EXEC ServerName.DBName.SchemaName.ObjectName is essentially the same since that also tells the remote server to execute the query on its behalf.

Generic SQL that both Access and ODBC/Oracle can understand

I have a MS Access query that is based on a linked ODBC table (Oracle).
I'm troubleshooting the poor performance of the query here: Access not properly translating TOP predicate to ODBC/Oracle SQL.
SELECT ri.*
FROM user1_road_insp AS ri
WHERE ri.insp_id = (
select
top 1 ri2.insp_id
from
user1_road_insp ri2
where
ri2.road_id = ri.road_id
and year(insp_date) between [Enter a START year:] and [Enter a END year:]
order by
ri2.insp_date desc,
ri2.length desc,
ri2.insp_id
);
The documentation says:
When you spot a problem, you can try to resolve it by changing the local query. This is often difficult to do successfully, but you may
be able to add criteria that are sent to the server, reducing the
number of rows retrieved for local processing.
In many cases you will find that, despite your best efforts, Office Access still retrieves some entire tables unnecessarily and
performs final query processing locally.
However, it's occurred to me that I don't really understand what sort of SQL I should be writing to make both Access and ODBC/Oracle happy.
Should I be writing some sort of generic SQL that Access can understand in a local query AND that can be easily translated to ODBC/Oracle SQL? Is generic SQL a real thing?
What kind of SQL does the ODBC driver use? It depends as typically MS Access has three types of external data connections that interfaces with different SQL dialects each with the ODBC API.
Linked tables that acts like local tables but are ODBC connected data sources and not stored locally. Once they are incorporated in an Access app, these tables can only use MS Access' SQL dialect. They can be joined with local or even other backend tables from other sources.
Hence, why TOP is available in MS Access and not Oracle. You are essentially using Access SQL to manipulate Oracle data. ODBC serves as the origin point of data while Access' Jet/ACE SQL engine does the processing and resultset viewing in cached memory.
Pass-through queries that do not see local tables or anything else in local app's environment. Such queries use the SQL dialect of the connected database here being Oracle.
Hence, why TOP is NOT available in Oracle and double quotes are allowed in column identifiers. Such quoting would fail in MS Access. Essentially, you are using Oracle SQL to manipulate Oracle data in an Access app. You can take the output of the sqlout.txt log and run it in a pass-through query ODBC-connected to your Oracle database.
ADO/DAO Recordsets that are run entirely via code such as VBA and are direct connections to data sources and uses the connecting database's dialect.
Here, you using Oracle SQL to manipulate Oracle data in an Access app via the ODBC API.
In each one of these types, you will have to connect to a backend ODBC data source. You do not even need to use the GUI but can use Access' object library to create linked tables (see DoCmd.TransferDatabase) and pass through querydefs (see QueryDef.Connect or .Execute).
I suspect the sqlout.txt log you see are translations of the ODBC calls to its native dialect.
To build on #Parfait's point #1:
From Microsoft Access Developer's Guide to SQL Server by Mary Chipman and Andy Baron:
Optimizing Access Queries:
There's a common misconception that the Jet engine always retrieves all the data in linked SQL Server tables and then processes the data locally. This is not usually true. Jet is perfectly capable of sending efficient queries to SQL Server over ODBC and retrieving only the rows required. However, in some cases, Jet will in fact be forced to fetch all the data in certain tables first and then process it. You should be aware of when you are forcing Jet to do this and be sure that it is justified. The following are some general guidelines to follow when creating your Access queries:
Using expressions that can't be evaluated by the server will cause Jet to retrieve all the data required to evaluate those expressions locally. The impact of using Access-specific expressions, such as domain aggregate functions, Access financial functions, or custom VBA functions will vary depending on where in your query the expressions are used. Using such an expression in the SELECT clause will usually not cause a problem because no extra data will be returned. However, if the expression is in the WHERE clause, that criterion cannot be applied on the server, and all the data evaluated by the expression will have to be returned.
With multiple criteria, as many as possible will be processed on the server. This means that even if you use criteria that you know include functions that will need to be processed by Jet, adding other criteria that can be handled by the server will reduce the number of records that Jet has to process. Adding criteria on indexed columns is especially helpful.
Query syntax that includes an Access-specific extension to SQL, not supported by the ODBC driver, may force processing to be done on the client by Access. For example, even though SELECT TOP 5 PERCENT is now supported by SQL Server, it is not supported by the ODBC driver. If you use that syntax in an Access query, Jet will need to retrieve all the records and calculate which ones are in the top 5 percent. On the other hand, even though crosstab queries are specific to Access, Jet will translate them into simple GROUP BY queries and fetch just the required data in one trip to the server unless problematic criteria is used.
Heterogeneous joins between local and remote tables or between remote tables that are in different data sources will, of course, have to be processed by Jet after the source data is retrieved. However, if the remote join field is indexed and the table is large, Jet will often use the index to retrieve only the required rows by making multiple calls to the remote table, one fore each row required.
Jet allows you to mix data types within [typo - fix later] of UNION queries and within expressions, but SQL server doesn't. Such mixing of data types will force processing to be done locally.
Multiple outer joins in one query will be processed locally.
The most important factor is reducing the total number of records being fetched. Jet will retrieve multiple batches of records in the background until the result set is complete, so even though you may seem to get results back immediately, a continuing load is being placed on the server for large result sets.
Note: this book is quite old (published in 2000) and is in reference to Jet Engine. I imagine things might be slightly different in newer versions of Access which use ACE, although I don't have a source to back this up.

SQL Server 2008, Sybase - large select queries over low bandwidth

I need to pull a large amount of data from various tables across a line that has very low bandwidth. I need to minimize the amount of data that gets sent too and fro.
On that side is a Sybase database, on this side SQL Server 2008.
What I need is to pull all the tables from the Sybase database that have to do with this office. Lets say I have the following tables as an example:
Farm
Tree
Branch
etc.
(one farm has many trees, one tree has many branches etc.)
Lets say the "Farm" table has a field called "CountryID", and I only want the data for where CountryID=12. The actual table structures I am looking at are very complex (and I am also not very familiar with them) so I want to try to keep the queries simple.
So I am thinking of setting up a series of views:
CREATE VIEW vw_Farm AS
SELECT * from Farm where CountryID=12
CREATE VIEW vw_Tree AS
SELECT * from Tree where FarmID in (SELECT FarmID FROM vw_Farm)
CREATE VIEW vw_Branch AS
SELECT * from Tree where BranchID in (SELECT BranchID FROM vw_Branch)
etc.
To then pull the actual data across I would then do:
SELECT * from vw_Farm into localDb.Farm
SELECT * from vw_Tree into localDb.Tree
SELECT * from vw_Branch into localDb.Branch
etc.
Simple enough to set up. I am wondering how this will perform though? Will it perform all the SELECT statements on the Sybase side and then just send back the result? Also, since this will be an iterative process, is it possible to index the views for subsequent calls?
Any other optimisation suggestions would also be welcome!
Thanks
Karl
EDIT: Just to clarify, the views will be set up in SQL Server. I am using a linked server using Sybase ASE to set up those views. What is worrying me in particular is whether the fact that the view is in SQL Server on this side and not on Sybase on that side will mean that for each iteration the data from the preceeing view will get pulled across to SQL Server first before the calculations get executed. I want Sybase to do all the calcs and just pass the results across.
It's difficult to be certain without testing, but my somewhat-relevant experience (using linked servers to platforms other than Sybase, and on SQL Server 2005) has been that using subqueries (such as your code for vw_Tree and vw_Branch) more or less guarantees that SQL Server will pull all the data for the outer table into a local temp table, then match it to the results of the inner query.
The problem is that SQL Server has no access to the linked server's table statistics, so can make no meaningful decisions about how to optimise the query.
If you want to be sure to have the work done on the Sybase server, your best bet will be to write code (could be views or stored procedures) on the Sybase side and reference them from SQL Server.
Linked server connections are, in my experience, not particularly resilient over flaky networks. If it's available, you could consider using Integration Services rather than linked-server queries - but even that may not be much better. You may need to consider falling back on moving text files with robocopy and bcp.

Oracle and SQL Dataset

Problem: I have to pull data from a SQL Server database and an Oracle database and put it together in one dataset. The problem I have is: The SQL Server query requires an ID that is only found from the return of the Oracle query.
What I am wondering is: How can I approach this problem in a way such that performance is not harmed?
You can do this with linked servers or by transferring the data all to one side. It's all going to depend on the volume of data on each side.
A general rule of thumb is to execute the query on the side which has the most data.
For instance, if the set of Oracle IDs is small, but the SQL Server set is large, you make a linked server to the Oracle side and execute this on the SQL Server side:
SELECT *
FROM sqlservertable
INNER JOIN linkedserver.oracletable
ON whatever
In this case, if the Oracle side is large (or cannot be prefiltered before the need to join with the SQL Server side), the performance will normally be pretty poor - and will improve a lot by pulling the entire table (or the minimal subset you can determine) into a SQL Server table instead and do the JOIN all on the SQL Server side.