SSIS performance vs OpenQuery with Linked Server from SQL Server to Oracle - sql

We have a linked server (OraOLEDB.Oracle) defined in the SQL Server environment. Oracle 12c, SQL Server 2016. There is also an Oracle client (64 bit) installed on SQL Server.
When retrieving data from Oracle (a simple query, getting all columns from a 3M row, fairly narrow table, with varchars, dates and integers), we are seeing the following performance numbers:
sqlplus: select from Oracle > OS File on the SQL Server itself
less than 2k rows/sec
SSMS: insert into a SQL Server table select from Oracle using OpenQuery (passthrough to Oracle, so remote execution)
less than 2k rows/sec
SQL Export/Import tool (in essence, SSIS): insert into a SQL Server table, using the OLEDB Oracle for source and OLEDB SQL Server for target
over 30k rows/second
Looking for ways to improve throughput using OpenQuery/OpenResultSet, to match SSIS throughput. There is probably some buffer/flag somewhere that allows to achieve the same?
Please advise...
Thank you!
--Alex

There is probably some buffer/flag somewhere that allows to achieve the same?
Probably looking for the FetchSize parameter
FetchSize - specifies the number of rows the provider will fetch at a
time (fetch array). It must be set on the basis of data size and the
response time of the network. If the value is set too high, then this
could result in more wait time during the execution of the query. If
the value is set too low, then this could result in many more round
trips to the database. Valid values are 1 to 429,496, and 296. The
default is 100.
eg
exec sp_addlinkedserver N'MyOracle', 'Oracle', 'ORAOLEDB.Oracle', N'//172.16.8.119/xe', N'FetchSize=2000', ''
See, eg https://blogs.msdn.microsoft.com/dbrowne/2013/10/02/creating-a-linked-server-for-oracle-in-64bit-sql-server/

I think there are many way to enhance the performance on the INSERT query, I suggest reading the following article to get more information about data loading performance.
The Data Loading Performance Guide
There are one method you can try which is minimizing the logging by using clustered index. check the link below for more information:
New update on minimal logging for SQL Server 2008

Related

Using dot notation vs OpenQuery from SQL Server to Oracle

Trying to bring data from Oracle into SQL Server. SQL has a linked server defined. I need to filter data out on the Oracle side, so there is a WHERE clause that limits the data, based on the value of one column (time period).
Tried performance with two different methods:
OpenQuery:
select * INTO T2 from OpenQuery(LinkedSrv,'select * from SCHEMA.TAB')
dot notation (LinkedServer..Schema.Table):
select * INTO T2 from LinkedSrv..SCHEMA.TAB
Both perform kind of slow, pushing about 5-6k rows/second. For 20M row table, this is not ideal. And then discovered something rather interesting:
select * INTO T2 from LinkedSrv..SCHEMA.TAB WHERE col >= Value
This pushes the throughput up to almost 100k rows/second
Specifying criteria with OpenQuery does not affect the throughout. Explain plan shows
RemoteQuery -> ComputeScalar -> Filter (WHERE) -> TableInsert in the dot notation scenario with WHERE.
Other than that, explain plans are the same. So... How does adding a WHERE clause locally (because this is where it does it) improve throughput by a factor of 10???
... And what can I do to achieve (the desired outcome) the same fast throughput when using OpenQuery?
Thank you!
The difference between the dot notation and the OpenQuery methods is that the first uses client cursor engine and most things are evaluated locally while the second send the Query to the remote server and read the output.
Not always filtering data in the dot notation query is faster from OpenQuery approach. It is based on each the local and remote server resources.
Check out the following stackoverflow question they will give you more information:
SQL 2005 - Linked Server to Oracle Queries Extremely Slow
Additional Information
Best Performer: Distributed query (Four-part) or OPENQUERY when executing linked server queries in SQL Server

Measuring MS Access SQL query duration

I'm trying to compare MS Access SQL queries for local table vs linked table
(it is linked to an Oracle and to a SQL Server database).
I can get query duration when running the SQL command directly on Oracle or SQL Server, but when running the SQL in MS Access, I don't know how to capture the query duration.
Is there a way to get the query duration when running a SQL command inside MS Access?
Thanks. :-)
Yes, it is.
Record in a variable the actual time.
Create a recordset with data source pointing to your query/view/table
Open the recordset (eventually you may check the recordcount)
Record in another variable the actual time
DateDiff between 1. amd 4.
Access does not provide that sort of information, unlike server databases.
You could use a Form Timer and get an idea of the duration, but with linked tables a lot of that depends on the network, server overhead, etc.

SQL Query - Finding Current log file usage for one database

I want to set up some monitoring software that will generate an SMNP trap if a database log file goes beyond about 95% usage. It can only look at the first result in the first column of an SQL query, so what I'm looking for is an SQL Query which will just return the percentage figure ONLY in the result - eg, 95
I've found several different ways of doing similar things, but all return table heading etc, whereas I just want the figure. It'll be running this query every hour so nothing too intensive. I'm running SQL version 8.
Thanks, Mike
You could write a query against the OS DMVs to get just the single value you're looking for.
Not sure if this will work for SQL Server 2000, but I know it works as far back as SQL Server 2005. It also requires that performance counters are enabled on the host server (i.e. OS, not just SQL Server).
This query should do the trick:
SELECT cntr_value as PercentUsed
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Percent Log Used'
AND instance_name = 'your_database_name'

Results returned from a view using linked server may vary?

i have a view that is using linked server to retrieve data from a remote server in SQL Server. On each time viewing the view, the results returned are vary. For example, 1st time execution may return 100 rows of records but on 2nd time of execution, rows returned are 120 rows. Any ideas what is the cause?
I have witnessed odd linked-server results that are a product of non-determinism written into the SQL itself, I.e. a TOP query written without an ORDER BY clause.
This problem, for example, where the chap had multiple non-unique foreign keys coming from a table source on the left hand side of a linked-server INNER JOIN, and wanted 10 rows from a remote sub-query to the right, where the end result was restricted to 10 rows itself, when it should have been greater than 10 rows.
Should definitely give your SQL a quick eye for such curiosities.
The data on the linked server changed between executions?
Is your SQL Server fully patched? SQL Server 2008 and 2005 both have bug fixes out related to incorrect query results from linked servers.
Here is one example:
969997 FIX: You receive an incorrect result when you query data from a linked server that is created by using an index OLE DB provider in SQL Server 2005 or in SQL Server 2008
Is the linked server also a SQL Server? If not, perhaps a buggy driver? I've seen odd results, for example, due to an old Informix ODBC driver. Are you able to run something akin to SQL Profiler on the linked server to see what command it's receiving?
I'm not sure what the answer is, but (assuming that your counts of 100 and 120 are accurate) can you not capture the data from the two runs and compare it? That might give you some clues as to what's going on. For example, is it completely different datat, or is it duplicate rows (in the 120 row batch).

Oracle and SQL Dataset

Problem: I have to pull data from a SQL Server database and an Oracle database and put it together in one dataset. The problem I have is: The SQL Server query requires an ID that is only found from the return of the Oracle query.
What I am wondering is: How can I approach this problem in a way such that performance is not harmed?
You can do this with linked servers or by transferring the data all to one side. It's all going to depend on the volume of data on each side.
A general rule of thumb is to execute the query on the side which has the most data.
For instance, if the set of Oracle IDs is small, but the SQL Server set is large, you make a linked server to the Oracle side and execute this on the SQL Server side:
SELECT *
FROM sqlservertable
INNER JOIN linkedserver.oracletable
ON whatever
In this case, if the Oracle side is large (or cannot be prefiltered before the need to join with the SQL Server side), the performance will normally be pretty poor - and will improve a lot by pulling the entire table (or the minimal subset you can determine) into a SQL Server table instead and do the JOIN all on the SQL Server side.