Sybase Jconnect Driver - What is default fetch size and if increasing fetch size can benefit me - sql

I am using Sybase Jconnect Driver Jconn3 to execute stored procs which return upto a million rows of data. I have learnt from here and there that using a greater fetch size can improve the timing to fetch all the data.
However I can't figure out that what is the default fetch size for Sybase Jconnect Driver. Can you help with what is default fetch size sybase uses.
And given that I have sufficient memory/cpu resources to handle
million resources together, is it advisable to set fetchSize to
everything?

Fetch Size sets the number of rows returned in a block from the server. It's normally set at 0, which means return all rows at one time. If you have the memory to accept 1,000,000 rows at one time, then you can just leave the setting alone.
If you want to check it, just call getFetchSize() on the statement object. (You may need to cast the object in to a SybStatement to do this.)

After research, I found JDBC driver for Sybase does support streaming. You can setFetchSize() to constrain the amount of data to be kept in memory at one time. If you set the value to 0, it will load the whole set of data into memory.
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc39001.0605/html/prjdbc/X11994.htm

Related

Camel Sql Consumer Performance for Large DataSets

I am trying to cache some static data in Ignite cache in order to query faster so I need to read the data from DataBase in order to insert them into cache cluster.
But number of rows is like 3 million and it causes OutOfMemory error normally because SqlComponent is trying to process all the data as one and it tries to collect them once and for all.
Is there any way to split them when reading result set (for ex 1000 items per Exchange)?
You can add a limit in the SQL query depending on what SQL database you use.
Or you can try using jdbcTemplate.maxRows=1000 to use that option. But it depends on the JDBC driver if it supports limiting using that option or not.
And also mind you need some way to mark/delete the rows after processing, so they are not selected in the next query, such as using the onConsume option.
You can look in the unit tests to find some examples with onConsume etc: https://github.com/apache/camel/tree/master/components/camel-sql/src/test

What is the maximum permitted response data size?

In the API Docs section Browsing Table Data, there is a reference to the "permitted response data size"; however, that link is dead. Experimentation revealed that requests with maxResults=50000 are usually successful, but as I near maxResults=100000 I begin to get errors from the BigQuery server.
This is happening while I page through a large table (or set of query results), so after each page is received, I request the next one; it thus doesn't matter to me what the page size is, but it does affect the communication with BigQuery.
What is the optimal value for this parameter?
Here is some explanations: https://developers.google.com/bigquery/docs/reference/v2/jobs/query?hl=en
The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.
To sum up: max size is 10MB, no row count limit.
You can choose value of maxResult parameter based on your usage of app.
If you want show data on the report, then you need to set low value for fast showing first page.
If you need to load data to other app, then you can use max possible value (record size * row count < 10MB).
As you say, you manually set maxResults = 100000 to page through result set, it will get errors from BigQuery server. What errors you will get? Could you paste the error message?

SQL - When data are transfered

i need to get a large amount of data from a remote database. the idea is do a sort of pagination, like this
1 Select a first block of datas
SELECT * FROM TABLE LIMIT 1,10000
2 Process that block
while(mysql_fetch_array()...){
//do something
}
3 Get next block
and so on.
Assuming 10000 is an allowable dimension for my system, let us suppose i have 30000 records to get: i perform 3 call to remote system.
But my question is: when executing a select, the resultset is transmitted and than stored in some local part with the result that fetch is local, or result set is stored in remote system and records coming one by one at any fetch? Because if the real scenario is the second i don't perform 3 call, but 30000 call, and is not what i want.
I hope I explained, thanks for help
bye
First, it's highly recommended to utilize MySQLi or PDO instead of the deprecated mysql_* functions
http://php.net/manual/en/mysqlinfo.api.choosing.php
By default with the mysql and mysqli extensions, the entire result set is loaded into PHP's memory when executing the query, but this can be changed to load results on demand as rows are retrieved if needed or desired.
mysql
mysql_query() buffers the entire result set in PHP's memory
mysql_unbuffered_query() only retrieves data from the database as rows are requested
mysqli
mysqli::query()
The $resultmode parameter determines behaviour.
The default value of MYSQLI_STORE_RESULT causes the entire result set to be transfered to PHP's memory, but using MYSQLI_USE_RESULT will cause the rows to be retrieved as requested.
PDO by default will load data as needed when using PDO::query() or PDO::prepare() to execute the query and retrieving results with PDO::fetch().
To retrieve all data from the result set into a PHP array, you can use PDO::fetchAll()
Prepared statements can also use the PDO::MYSQL_ATTR_USE_BUFFERED_QUERY constant, though PDO::fetchALL() is recommended.
It's probably best to stick with the default behaviour and benchmark any changes to determine if they actually have any positive results; the overhead of transferring results individually may be minor, and other factors may be more important in determining the optimal method.
You would be performing 3 calls, not 30.000. That's for sure.
Each 10.000 results batch is rendered on the server (by performing each of the 3 queries). Your while iterates through a set of data that has already been returned by MySQL (that's why you don't have 30.000 queries).
That is assuming you would have something like this:
$res = mysql_query(...);
while ($row = mysql_fetch_array($res)) {
//do something with $row
}
Anything you do inside the while loop by making use of $row has to do with already-fetched data from your initial query.
Hope this answers your question.
according to the documentation here all the data is fetched to the server, then you go through it.
from the page:
Returns an array of strings that corresponds to the fetched row, or FALSE if there are no more rows.
In addition it seams this is deprecated so you might want to use something else that is suggested there.

Slow Datareader with ODP.NET

we are using (ODP.NET) Oracle.DataAccess version 1.102.3.0 with Oracle 11g client. I am having some problems with reading data using the datareader,
my procedure is returning a ref_cursor around 10000 records. However fetching the data takes around 30 to 40 sec.
Are there any possibilities to improve the performace ?
Try setting the FetchSize.
Since this is a Procedure and a RefCusor you can perform these operations.
ExecuteReader
Set the FetchSize -> do this before you start reading
Read the Results
If you follow the above sequence of actions you will be able to obtain the size of a row.
e.g.
int NumberOfRowsToFetchPerRoundTrip = 2000;
var reader =cmd.ExecuteReader();
reader.FetchSize = reader.RowSize * NumberOfRowsToFetchPerRoundTrip;
while(reader.Read())
{
//Do something
}
This will reduce the amount of round trips. The fetch size is bytes so you could also just use an arbitrary fetchSize eg 1024*1024 (1mb). However I would recommend basing you fetch on the size of a row * the number of rows you want to fetch per request.
In addition I would set the parameters on the connection string
Enlist=false;Self Tuning=False
You seem to get more consistent performance with these settings, although there may be some variation from one version of ODP/NET to the next.

SQL connection lifetime

I am working on an API to query a database server (Oracle in my case) to retrieve massive amount of data. (This is actually a layer on top of JDBC.)
The API I created tries to limit as much as possible the loading of every queried information into memory. I mean that I prefer to iterate over the result set and process the returned row one by one instead of loading every rows in memory and process them later.
But I am wondering if this is the best practice since it has some issues:
The result set is kept during the whole processing, if the processing is as long as retrieving the data, it means that my result set will be open twice as long
Doing another query inside my processing loop means opening another result set while I am already using one, it may not be a good idea to start opening too much result sets simultaneously.
On the other side, it has some advantages:
I never have more than one row of data in memory for a result set, since my queries tend to return around 100k rows, it may be worth it.
Since my framework is heavily based on functionnal programming concepts, I never rely on multiple rows being in memory at the same time.
Starting the processing on the first rows returned while the database engine is still returning other rows is a great performance boost.
In response to Gandalf, I add some more information:
I will always have to process the entire result set
I am not doing any aggregation of rows
I am integrating with a master data management application and retrieving data in order to either validate them or export them using many different formats (to the ERP, to the web platform, etc.)
There is no universal answer. I personally implemented both solutions dozens of times.
This depends of what matters more for you: memory or network traffic.
If you have a fast network connection (LAN) and a poor client machine, then fetch data row by row from the server.
If you work over the Internet, then batch fetching will help you.
You can set prefetch count or your database layer properties and find a golden mean.
Rule of thumb is: fetch everything that you can keep without noticing it
if you need more detailed analysis, there are six factors involved:
Row generation responce time / rate(how soon Oracle generates first row / last row)
Row delivery response time / rate (how soon can you get first row / last row)
Row processing response time / rate (how soon can you show first row / last row)
One of them will be the bottleneck.
As a rule, rate and responce time are antagonists.
With prefetching, you can control the row delivery response time and row delivery rate: higher prefetch count will increase rate but decrease response time, lower prefetch count will do the opposite.
Choose which one is more important to you.
You can also do the following: create separate threads for fetching and processing.
Select just ehough rows to keep user amused in low prefetch mode (with high response time), then switch into high prefetch mode.
It will fetch the rows in the background and you can process them in the background too, while the user browses over the first rows.