I am writing a small utility that index's a SQL Server 2005 table data to LUCENE data store. i am using a JDBC SQL Server driver for connectivity. The result set returned from the server has around 2 million rows. This always throws a Out of Memory Exception. I have increased the Heap size for the client machine to around 1.6 GB, but to no avail. How can i fetch such large data sets, using JDBC giving limited memory .
Kind Regards
Have you looked at LIMIT and TOP in SQL? Also, try Googling for phrases like 'SQL query pagination'.
Related
We have a linked server (OraOLEDB.Oracle) defined in the SQL Server environment. Oracle 12c, SQL Server 2016. There is also an Oracle client (64 bit) installed on SQL Server.
When retrieving data from Oracle (a simple query, getting all columns from a 3M row, fairly narrow table, with varchars, dates and integers), we are seeing the following performance numbers:
sqlplus: select from Oracle > OS File on the SQL Server itself
less than 2k rows/sec
SSMS: insert into a SQL Server table select from Oracle using OpenQuery (passthrough to Oracle, so remote execution)
less than 2k rows/sec
SQL Export/Import tool (in essence, SSIS): insert into a SQL Server table, using the OLEDB Oracle for source and OLEDB SQL Server for target
over 30k rows/second
Looking for ways to improve throughput using OpenQuery/OpenResultSet, to match SSIS throughput. There is probably some buffer/flag somewhere that allows to achieve the same?
Please advise...
Thank you!
--Alex
There is probably some buffer/flag somewhere that allows to achieve the same?
Probably looking for the FetchSize parameter
FetchSize - specifies the number of rows the provider will fetch at a
time (fetch array). It must be set on the basis of data size and the
response time of the network. If the value is set too high, then this
could result in more wait time during the execution of the query. If
the value is set too low, then this could result in many more round
trips to the database. Valid values are 1 to 429,496, and 296. The
default is 100.
eg
exec sp_addlinkedserver N'MyOracle', 'Oracle', 'ORAOLEDB.Oracle', N'//172.16.8.119/xe', N'FetchSize=2000', ''
See, eg https://blogs.msdn.microsoft.com/dbrowne/2013/10/02/creating-a-linked-server-for-oracle-in-64bit-sql-server/
I think there are many way to enhance the performance on the INSERT query, I suggest reading the following article to get more information about data loading performance.
The Data Loading Performance Guide
There are one method you can try which is minimizing the logging by using clustered index. check the link below for more information:
New update on minimal logging for SQL Server 2008
Good day.
Need to get records from an Oracle database to a database in SQL Server. The data source type (ODBC) the performed using a SQL command, where I am taking all possible indices according to my requirement. The process runs fine, the problem is that it takes a long time and I need to be something quick. The process can not be performed with lookup, requires merge or merge join, simply load a table from Oracle to SQL under certain conditions.
Thank you for your help
Check what is your limiting factor. Generally there are 3 points to check:
Remote server is slow.
Source DB can run low on memory, read speed or free CPU. Substitute you query with a straight SELECT statement with no WHERE clause or JOINs and see if your SSIS package runs faster.
Target DB.
You may have indexes enabled, high write latency on HDD or not enough CPU.
Run an INSERT for your target table and see how longer it takes.
Problem may be in the middle: transfer between 2 servers. Network usually is main bottleneck. Is SSIS hosted on the same server as SQL server? then you have 2 network connections + possible hardware bottleneck on dedicated SSIS machine.
Depending on the bottleneck there are different solutions.
If you have network capacity and bottleneck is 1 CPU per query on Oracle, then you can partition your data horisontally (IDs 1 to 100, 101 to 200 etc); establish multiple connections to Oracle and load data in several streams. Number of streams is 1 less then number of CPUs on Oracle, SSIS or SQL Server (which ever is smaller).
I've written a SQL query that looks like this:
SELECT * FROM MY_TABLE WHERE ID=123456789;
When I run it in the Query Analyzer in SQL Server Management Studio, the query never returns; instead, after about ten minutes, I get the following error: System.OutOfMemoryException
My server is Microsoft SQL Server (not sure what version).
SELECT * FROM MY_TABLE; -- return 44258086
SELECT * FROM MY_TABLE WHERE ID=123456789; -- return 5
The table has over forty million rows! However, I need to fetch five specific rows!
How can I work around this frustrating error?
Edit: The server suddenly started working fine for no discernable reason, but I'll leave this question open for anyone who wants to suggest troubleshooting steps for anyone else with this problem.
According to http://support.microsoft.com/kb/2874903:
This issue occurs because SSMS has insufficient memory to allocate for
large results.
Note SSMS is a 32-bit process. Therefore, it is limited to 2 GB of
memory.
The article suggests trying one of the following:
Output the results as text
Output the results to a file
Use sqlcmd
You may also want to check the server to see if it's in need of a service restart--perhaps it has gobbled up all the available memory?
Another suggestion would be to select a smaller subset of columns (if the table has many columns or includes large blob columns).
If you need specific data use an appropriate WHERE clause. Add more details if you are stuck with this.
Alternatively write a small application which operates using a cursor and does not try to load it completely into memory.
I have a Linux VM (with CentOS 64 bit) where I installed Oracle Database 11g Express Edition. The database is running, I can use sqlplus and I can create tables and stuff. However, when I run a certain SQL script which inserts a huge amount of random data (~2 million rows) I get this error:
ORA-30009: Not enough memory for CONNECT BY operation
I already tried to increase PGA_AGGREGATE_TARGET with the command below. As far as I read this should solve the problem (but it doesn't) since it increases the memory.
ALTER SYSTEM SET PGA_AGGREGATE_TARGET = 40M scope = both;
However, the problem is that I can not set PGA_AGGREGATE_TARGET higher than ~40M (which seems to be not sufficient). If I try to set it up to 100M or more I get another error:
ERROR at line 1:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-47500: XE edition memory parameter invalid or not specified
Any idea how I can solve this problem? It is perfectly fine for me to re-install the whole database or whatever.
PS.: I asked the same question on dba.stackexchange.com since I wasn't sure whether it is programming or database administration.
You are using Oracle 11g Express edition and it has certain limitations like database size doesn't grow up more than 11g and memory SGA+PGA doesn't grow up more than 1 GB. It means, you are encountering with somewhere this limitation. If you are not be able to expand PGA that means, you need to decrease SGA size first and after that try again. I will get success.
I am trying to retrieve around 200 billion rows from a remote SQL Server. To optimize this, I have limited my query to use only an indexed column as a filter and am selecting only a subset of columns to make the query look like this:
SELECT ColA, ColB, ColC FROM <Database> WHERE RecordDate BETWEEN '' AND ''
But it looks like unless I limit my query to a time window of a few hours, the query fails in all cases with the following error:
OLE DB provider "SQLNCLI10" for linked server "<>" returned message "Query timeout expired".
Msg 7399, Level 16, State 1, Server M<, Line 1
The OLE DB provider "SQLNCLI10" for linked server "<>" reported an error. Execution terminated by the provider because a resource limit was reached.
Msg 7421, Level 16, State 2, Server <>, Line 1
Cannot fetch the rowset from OLE DB provider "SQLNCLI10" for linked server "<>".
The timeout is probably an issue because of the time it takes to execute the query plan. As I do not have control over the server, I was wondering if there is a good way of retrieving this data beyond the simple SELECT I am using. Are there any SQL Server specific tricks that I can use? Perhaps tell the remote server to paginate the data instead of issuing multiple queries or something else? Any suggestions on how I could improve this?
This is more of the kind of job SSIS is suited for. Even a simple flow like ReadFromOleDbSource->WriteToOleDbSource would handle this, creating the necessary batching for you.
Why read 200 Billion rows all at once?
You should page them, reading say a few thousand rows at a time.
Even if you do genuinely need to read all 200 Billion rows you should still consider using paging to break up the read into shorter queries - that way if a failure happens you just continue reading where you left off.
See efficient way to implement paging for at least one method of implementing paging using ROW_NUMBER
If you are doing data analysis then I suspect you are either using the wrong storage (SQL Server isn't really designed for processing of large data sets), or you need to alter your queries so that the analysis is done on the Server using SQL.
Update: I think the last paragraph was somewhat misinterpreted.
Storage in SQL Server is primarily designed for online transaction processing (OLTP) - efficient querying of massive datasets in massively concurrent environments (for example reading / updating a single customer record in a database of billions, at the same time that thousands of other users are doing the same for other records). Typically the goal is to minimise the amout of data read, reducing the amount of IO needed and also reducing contention.
The analysis you are talking about is almost the exact opposite of this - a single client actively trying to read pretty much all records in order to perform some statistical analysis.
Yes SQL Server will manage this, but you have to bear in mind that it is optimised for a completely different scenario. For example data is read from disk a page (8 KB) at a time, despite the fact that your statistical processing is probably only based on 2 or 3 columns. Depending on row density and column width you may only be using a tiny fraction of the data stored on an 8 KB page - most of the data that SQL Server had to read and allocate memory for wasn't even used. (Remember that SQL Server also had to lock that page to prevent other users from messing with the data while it was being read).
If you are serious about processing / analysis of massive datasets then there are storage formats that are optimised for exactly this sort of thing - SQL Server also has an add on service called Microsoft Analysis Services that adds additional online analytical processing (OLAP) and data mining capabilities, using storage modes more suited to this sort of processing.
Personally I would use a data extraction tool such as BCP to get the data to a local file before trying to manipulate it if I was trying to pull that much data at once.
http://msdn.microsoft.com/en-us/library/ms162802.aspx
This isn't A SQL Server specific answer, but even when the rDBMS supports server side cursors, it's considered poor form to use them. Doing so means that you are consuming resources on the server even though the server is still waiting for you to request more data.
Instead you should reformulate your query usage so that the server can transmit the entire result set as soon as it can, and then completely forget about you and your query to make way for the next one. When the result set is too large for you process all in one go, you should keep track of the last row returned by the current batch so that you can fetch another batch starting at that position.
Odds are the remote server has the "Remote Query Timeout" set. How long does it take for the query to fail?
Just run into the same problem, I also had the message at 10:01 after running the query.
Check this link. There's a remote query timeout setting under Connections that's setup to 600secs by default and you need to change it to zero (unlimited) or other value you think is right.
Try to change remote server connection timeout property.
For that go to SSMS, connect to the server, right click on server's name in object explorer, further select Properties -> Connections and change value in the Remote query timeout (in seconds, 0 = no timeout) text box.