Sybase ASE 15.5 : Slow insert with JDBC executebatch() - sql

I'm using Sybase ASE 15.5 and JDBC driver jconnect 4 and I'm experiencing slow insert with executebatch() with a batch size of +/-40 rows on a large table of 400 million rows with columns (integer, varchar(128),varchar(255)), primary key and clustered index on columns (1,2) and nonclustered index on columns (2,1). Each batch of +/-40 rows takes +/-200ms. Is the slowness related to the size of the table? I know that dropping the indexes can improve performance but unfortunately that is not an option. How can I improve the speed of insertion?
NOTE : This is part of the application live run, this is not a one shot migration, so I won't be using bcp tool.
EDIT : I have checked this answer for mysql but not sure it applies to Sybase ASE https://stackoverflow.com/a/13504946/8315843

There are many reasons why the inserts could be slow, eg:
each insert statement is having to be parsed/compiled; the ASE 15.x optimizer attempts to do a lot more work than the previous ASE 11/12 optimizer w/ net result being that compiles (generally) take longer to perform
the batch is not wrapped in a single transaction, so each insert has to wait for a separate write to the log to complete
you've got a slow network connection between the client host and the dataserver host
there's some blocking going on
the table has FK constraints that need to be checked for each insert
there's a insert trigger on the table (w/ the obvious question of what is the trigger doing and how long does it take to perform its operations)
Some ideas to consider re: speeding up the inserts:
use prepared statements; the first insert is compiled into a lightweight procedure (think 'temp procedure'); follow-on inserts (using the prepared statement) benefit from not having to be compiled
make sure a batch of inserts are wrapped in a begin/commit tran wrapper; this tends to defer the log write(s) until the commit tran is issued; fewer writes to the log means less time waiting for the log write to be acknowledged
if you have a (relatively) slow network connection between the application and dataserver hosts, look at using a larger packet size; fewer packets means less time waiting for round-trip packet processing/waiting
look into if/how jdbc supports the bulk-copy libraries (basically implement bcp-like behavior via jdbc) [I don't work with jdbc so I'm only guessing this might be avaialble]
Some of the above is covered in these SO threads:
Getting ExecuteBatch to execute faster
JDBC Delete & Insert using batch
Efficient way to do batch INSERTS with JDBC

Related

SQL Server row versioning overhead on INSERT/SELECT workflow

SQL Server 2016
Workflow consists of several continuous inserts from one writer and occasional select from a separate reader which returns several rows (all same table). Insert latency is prioritized over select performance. There are no updates/deletes and the selects will never need to return rows that have recently been inserted.
Both ALLOW_SNAPSHOT_ISOLATION and READ_COMMITTED_SNAPSHOT are set to ON.
The issue is that whenever a select query is sent via SqlCommand.ExecuteReader, there is a significant spike in insert latency until SqlCommand.ExecuteReader returns with a SqlDataReader. Since insert latency is important, this degradation needs to be minimized. Select is under read committed isolation level.
Using NOLOCK table hint in the select query does not show this same spike in insert latency & given the use case of the table, dirty reads aren't a concern since they can't happen.
Using READPAST table hint gives similar results to no hint (read committed snapshot).
I haven't found anything online that explains this discrepancy. What overhead is there with read committed snapshot (current state) that impacts insert latency that is not seen when NOLOCK is used?

SQLite batch commit vs commit in shot for huge bulk inserts

I need to dump huge (~ 10-40 million rows) huge data set into a SQLite database. Is there an advantage of doing a commit for every n number of inserts (n could 50,000, 100,000, etc) vs. doing a commit only after whole 40 millions rows got inserted.
Obviously, in theory a single commit will be fastest way to do it. But is there an advantage of doing commit by batches? In my case it is either all data got inserted or non gets inserted. Is there any danger of doing extremely large amount of inserts in SQLite before doing a commit (i.e. Do I need to have bigger diskspace for sqlite as it needs to use bigger temp files?)?
I'm using Perl DBI to insert the data.
I have had some performance improvements by using the following things:
set PRAGMA synchronous = OFF this prevents SQLite engine from waiting for OS-level write to complete.
set PRAGMA journal_mode = MEMORY this tells the SQLite engine to store the journal in RAM instead of disk, only drawback is that the database can't be recovered in case of a OS crash or power failure.
next, create indexes after all inserts. Also, you may issue a commit after every 100,000 records.

Is it ok to KILL this DELETE query?

I ran a query to delete around 4 million rows from my database. It ran for about 12 hours before my laptop lost the network connection. At that point, I decided to take a look at the status of the query in the database. I found that it was in the suspended state. Specifically:
Start Time SPID Database Executing SQL Status command wait_type wait_time wait_resource last_wait_type
---------------------------------------------------------------------------------------------------------------------------------------------------
2018/08/15 11:28:39.490 115 RingClone *see below suspended DELETE PAGEIOLATCH_EX 41 5:1:1116111 PAGEIOLATCH_EX
*Here is the sql query in question:
DELETE FROM T_INDEXRAWDATA WHERE INDEXRAWDATAID IN (SELECT INDEXRAWDATAID FROM T_INDEX WHERE OWNERID='1486836020')
After reading this;
https://dba.stackexchange.com/questions/87066/sql-query-in-suspended-state-causing-high-cpu-usage
I realize I probably should have broken this up into smaller pieces to delete them (or even delete them one-by-one). But now I just want to know if it is "safe" for me to KILL this query, as the answer in that post suggests. One thing the selected answer states is that "you may run into data consistency problems" if you KILL a query while it's executing. If it causes some issues with the data I am trying to delete, I'm not that concerned. However, I'm more concerned about this causing some issues with other data, or with the table structure itself.
Is it safe to KILL this query?
If you ran the delete from your laptop over the network and it lost connection with the server, you can either kill the spid or wait when it will disappear by itself. Depending on the ##version of your SQL Server instance, in particular how well it's patched, the latter might require instance restart.
Regarding the consistency issues, you seem to misunderstand it. It is possible only if you had multiple statements run in a single batch without being wrapped with a transaction. As I understand, you had a single statement; if that's the case, don't worry about consistency, SQL Server wouldn't have become what it is now if it would be so easy to corrupt its data.
I would have rewritten the query however, if T_INDEX.INDEXRAWDATAID column has NULLs then you can run into issues. It's better to rewrite it via join, also adding batch splitting:
while 1=1 begin
DELETE top (10000) t
FROM T_INDEXRAWDATA t
inner join T_INDEX i on t.INDEXRAWDATAID = i.INDEXRAWDATAID
WHERE i.OWNERID = '1486836020';
if ##rowcount = 0
break;
checkpoint;
end;
It definitely will not be any slower, but it can boost performance, depending on your schema, data and the state of any indices the tables have.

How to update lots of rows with out locking table in SQL Server 2008?

I have a job which is updating a table for 20 minutes and at these moments I can't update any row of it naturally.
Is there a way or method for to do this ?
The job can be more longer but I have to continue to update table.
On the other hand the job should rollback if it has got error during job.
Thanks..
Split the job into separate transactions. The way locks work in a professional DBMS like SQL Server is that they escalate to higher levels as they are required. Once a query has hit a lot of pages it's only natural that it's updated to a read-only table lock. The only way to circumvent this while keeping transactional integrity is to split it up in smaller jobs.
As Niels has pointed out you should attempt to update the table in batches explicitly committing each batch within a transaction. If you are creating enough locks to warrant a table lock then the chances are you have also ballooned your transaction log. Its probably worth checking and shrinking down to a more reasonable size if necessary.
Alternatively you could try enabling Trace flags 1224 or 1211 which "Disables lock escalation based on the number of locks" - http://msdn.microsoft.com/en-us/library/ms188396.aspx

Mass Updates and Commit frequency in SQL Server

My database background is mainly Oracle, but I've recently been helping with some SQL Server work. My group has inherited some SQL server DTS packages that do daily loads and updates of large amounts of data. Currently it is running in SQL Server 2000, but will soon be upgraded to SQL Server 2005 or 2008. The mass updates are running too slowly.
One thing I noticed about the code is that some large updates are done in procedural code in loops, so that each statement only updates a small portion of the table in a single transaction. Is this a sound method to do updates in SQL server? Locking of concurrent sessions should not be an issue because user access to tables is disabled while the bulk loading takes place. I've googled around some, and found some articles suggesting that doing it this way conserves resources, and that resources are released each time an update commits, leading to greater efficiency. In Oracle this is generally a bad approach, and I've used single transactions for very large updates with success in Oracle. Frequent commits slow the process down and use more resources in Oracle.
My question is, for mass updates in SQL Server, is it generally a good practice to use procedural code, and commit many SQL statements, or to use one big statement to do the whole update?
Sorry Guys,
None of the above answer the question. They are just examples of how you can do things. The answer is, more resources get used with frequent commits, however, the transaction log cannot be truncated until a commit point. Thus, if your single spanning transaction is very big it will cause the transaction log to grow and possibly fregment which if undetected will cause problems later. Also, in a rollback situation, the duration is generally twice as long as the original transaction. So if your transaction fails after 1/2 hour it will take 1 hour to roll back and you can't stop it :-)
I have worked with SQL Server2000/2005, DB2, ADABAS and the above is true for all. I don't really see how Oracle can work differently.
You could possibly replace the T-SQL with a bcp command and there you can set the batch size without having to code it.
Issuing frequest commits in a single table scan is prefferable to running multiple scans with small processing numbers because generally if a table scan is required the whole table will be scanned even if you only returning a small subset.
Stay away from snapshots. A snapshot will only increase the number of IOs and competes for IO and CPU
In general, I find it better to update in batches - typically in the range of between 100 to 1000. It all depends on how your tables are structured: foreign keys? Triggers? Or just updating raw data? You need to experiment to see which scenario works best for you.
If I am in pure SQL, I will do something like this to help manage server resources:
SET ROWCOUNT 1000
WHILE 1=1 BEGIN
DELETE FROM MyTable WHERE ...
IF ##ROWCOUNT = 0
BREAK
END
SET ROWCOUNT 0
In this example, I am purging data. This would only work for an UPDATE if you could restrict or otherwise selectively update rows. (Or only insert xxxx number of rows into an auxiliary table that you can JOIN against.)
But yes, try not to update xx million rows at one time. It takes forever and if an error occurs, all those rows will be rolled back (which takes an additional forever.)
Well everything depends.
But ... assuming your db is in single user mode or you have table locks (tablockx) against all the tables involved, batches will probably perform worse. Especially if the batches are forcing table scans.
The one caveat is that very complex queries will quite often consume resources on tempdb, if tempdb runs out of space (cause the execution plan required a nasty complicated hash join) you are in deep trouble.
Working in batches is a general practice that is quite often used in SQL Server (when its not in snapshot isolation mode) to increase concurrency and avoid huge transaction rollbacks because of deadlocks (you tend to get deadlock galore when updating a 10 million row table that is active).
When you move to SQL Server 2005 or 2008, you will need to redo all those DTS packages in SSIS. I think you will pleasantly surprised to see how much faster SSIS can be.
In general, In SQL Server 2000, you want to run things in batches of records if the whole set ties up the table for too long. If you are running the packages at night when there is no use on the system, you may be able to get away with a set-based insert of the entire dataset. Row-by-row is always the slowest method, so avoid that if possible as well (Especially if all the row-row-row inserts are in one giant transaction!). If you have 24 hour access with no down time, you will almost certainly need to run in batches.