Slow Datareader with ODP.NET - odp.net

we are using (ODP.NET) Oracle.DataAccess version 1.102.3.0 with Oracle 11g client. I am having some problems with reading data using the datareader,
my procedure is returning a ref_cursor around 10000 records. However fetching the data takes around 30 to 40 sec.
Are there any possibilities to improve the performace ?

Try setting the FetchSize.
Since this is a Procedure and a RefCusor you can perform these operations.
ExecuteReader
Set the FetchSize -> do this before you start reading
Read the Results
If you follow the above sequence of actions you will be able to obtain the size of a row.
e.g.
int NumberOfRowsToFetchPerRoundTrip = 2000;
var reader =cmd.ExecuteReader();
reader.FetchSize = reader.RowSize * NumberOfRowsToFetchPerRoundTrip;
while(reader.Read())
{
//Do something
}
This will reduce the amount of round trips. The fetch size is bytes so you could also just use an arbitrary fetchSize eg 1024*1024 (1mb). However I would recommend basing you fetch on the size of a row * the number of rows you want to fetch per request.
In addition I would set the parameters on the connection string
Enlist=false;Self Tuning=False
You seem to get more consistent performance with these settings, although there may be some variation from one version of ODP/NET to the next.

Related

Query fast with literal but slow with variable

I am using Typeorm for SQL Server in my application. When I pass the native query like connection.query(select * from user where id = 1), the performance is really good and it less that 0.5 seconds.
If we use the findone or QueryBuilder method, it is taking around 5 seconds to get a response.
On further debugging, we found that passing value directly to query like this,
return getConnection("vehicle").createQueryBuilder()
.select("vehicle")
.from(Vehicle, "vehicle")
.where("vehicle.id='" + id + "'").getOne();
is faster than
return getConnection("vehicle").createQueryBuilder()
.select("vehicle")
.from(Vehicle, "vehicle")
.where("vehicle.id =:id", {id:id}).getOne();
Is there any optimization we can do to fix the issue with parameterized query?
I don't know Typeorm but it seems to me clear the difference. In one case you query the database for the whole table and filter it locally and in the other you send the filter to the database and it filters the data before it sends it back to the client.
Depending on the size of the table this has a big impact. Consider picking one record from 10 million. Just the time to transfer the data to the local machine is 10 million times slower.

dbGetQuery retrieves less number of rows

I am trying to fetch a large dataset into R environment using ODBC Connection
When I try to retrieve data from a large dataset using dbGetQuery() function, the number of rows are less than what is see on hive. Sometimes, the same code fetches me correct number of rows. Could some help me if i should clear any buffer before fetching the data?
hive_con <- dbConnect(odbc::odbc(),.connection_string=Connection_String)
qry<-"select * from mytable"
rslt<-dbGetQuery(hive_con,qry)
I have tried changing n parameter of the function dbGetQuery(). But still the problem persists
Finally, i found that i was using HTTP protocol for data extraction. This protocol doesn't show any warning if the data loss happens.

Target Based commit point while updating into table

One of my mappings is running for a really long time (2 hours).From the session log i can see the statment "Time out based commit poin" which is tking most of the time and Busy percentage for the SQL tranfsormation is very high(which is taking time,I ran the SQL query manually in DB,its working fine ).So, basically there is a router which splits the record between insert and update.And the update stream is taking long.It has a SQL transforamtion,Update statrtergy and aggregator.I added an sorter before aggregator but no luck.
Also changed comit interval ,Lins Sequential Buffer lenght and Maximum memory allowed by checking some of the other blogs.Could you please help me with this.
If possible try to avoid the transformations which are creating cache because in future if the input records increase. Cache size will also increase and decrease the throughput
1) Aggregator : Try to use the Aggregation in SQL override itself
2) Sorter : Try to do the same in the SQL Override itself
Generally SQL transformation is slow for huge data loads, because for each input record an SQL session is invoked and a connection is established to database and the row is fetched. Say for example there are 1 million records, 1 million SQL sessions are initiated in the backend and the database is called.
What the SQL transformation doing ? Is it just generating a Surrogate key or its fetching a value from a table based on derived value from the stream
For fetching a value from a table based on derived value from the stream:
Try to use lookup
For generating Surrogate key, Use Oracle Sequence instead
Let me know if its purpose is any thing other than that
Also do the below checks
Sort the session log on thread and just make a note of start and end times of
the following
1) lookup caches creation (time between Query issued --> First row returned --> Cache creation completed)
2) Reader thread first row return time
Regards,
Raj

SQL - When data are transfered

i need to get a large amount of data from a remote database. the idea is do a sort of pagination, like this
1 Select a first block of datas
SELECT * FROM TABLE LIMIT 1,10000
2 Process that block
while(mysql_fetch_array()...){
//do something
}
3 Get next block
and so on.
Assuming 10000 is an allowable dimension for my system, let us suppose i have 30000 records to get: i perform 3 call to remote system.
But my question is: when executing a select, the resultset is transmitted and than stored in some local part with the result that fetch is local, or result set is stored in remote system and records coming one by one at any fetch? Because if the real scenario is the second i don't perform 3 call, but 30000 call, and is not what i want.
I hope I explained, thanks for help
bye
First, it's highly recommended to utilize MySQLi or PDO instead of the deprecated mysql_* functions
http://php.net/manual/en/mysqlinfo.api.choosing.php
By default with the mysql and mysqli extensions, the entire result set is loaded into PHP's memory when executing the query, but this can be changed to load results on demand as rows are retrieved if needed or desired.
mysql
mysql_query() buffers the entire result set in PHP's memory
mysql_unbuffered_query() only retrieves data from the database as rows are requested
mysqli
mysqli::query()
The $resultmode parameter determines behaviour.
The default value of MYSQLI_STORE_RESULT causes the entire result set to be transfered to PHP's memory, but using MYSQLI_USE_RESULT will cause the rows to be retrieved as requested.
PDO by default will load data as needed when using PDO::query() or PDO::prepare() to execute the query and retrieving results with PDO::fetch().
To retrieve all data from the result set into a PHP array, you can use PDO::fetchAll()
Prepared statements can also use the PDO::MYSQL_ATTR_USE_BUFFERED_QUERY constant, though PDO::fetchALL() is recommended.
It's probably best to stick with the default behaviour and benchmark any changes to determine if they actually have any positive results; the overhead of transferring results individually may be minor, and other factors may be more important in determining the optimal method.
You would be performing 3 calls, not 30.000. That's for sure.
Each 10.000 results batch is rendered on the server (by performing each of the 3 queries). Your while iterates through a set of data that has already been returned by MySQL (that's why you don't have 30.000 queries).
That is assuming you would have something like this:
$res = mysql_query(...);
while ($row = mysql_fetch_array($res)) {
//do something with $row
}
Anything you do inside the while loop by making use of $row has to do with already-fetched data from your initial query.
Hope this answers your question.
according to the documentation here all the data is fetched to the server, then you go through it.
from the page:
Returns an array of strings that corresponds to the fetched row, or FALSE if there are no more rows.
In addition it seams this is deprecated so you might want to use something else that is suggested there.

Sybase Jconnect Driver - What is default fetch size and if increasing fetch size can benefit me

I am using Sybase Jconnect Driver Jconn3 to execute stored procs which return upto a million rows of data. I have learnt from here and there that using a greater fetch size can improve the timing to fetch all the data.
However I can't figure out that what is the default fetch size for Sybase Jconnect Driver. Can you help with what is default fetch size sybase uses.
And given that I have sufficient memory/cpu resources to handle
million resources together, is it advisable to set fetchSize to
everything?
Fetch Size sets the number of rows returned in a block from the server. It's normally set at 0, which means return all rows at one time. If you have the memory to accept 1,000,000 rows at one time, then you can just leave the setting alone.
If you want to check it, just call getFetchSize() on the statement object. (You may need to cast the object in to a SybStatement to do this.)
After research, I found JDBC driver for Sybase does support streaming. You can setFetchSize() to constrain the amount of data to be kept in memory at one time. If you set the value to 0, it will load the whole set of data into memory.
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc39001.0605/html/prjdbc/X11994.htm