Nhibernate: Batching and StatelessSession - nhibernate

I was trying around with just setting the batch value in the config file, and I see that there's a visible benefit in using it, as in inserting 25000 entries takes less time then without batching. My question is, what are the counter indication, or the dangers of using batching? As I see it I only see benefits in setting a batch value, and activating it.
Another question is regarding StatelessSession. I was also testing this and I've noticed that when I do a scope.Insert it takes more time compared to doing scope.Save of a regular Session, but when I do a commit it's lightning fast. Is there any reason for a Insert from a StatelessSession to take more time then a Save from a regular Session?
Thanks in advance

I can only speak to the first issue. A possible negative of having a large batch size is the size of the sql being sent across the wire in one go.

Related

Aerospike how to write records in batches to avoid inconsistency

How we can write a batch function for aerospike that will help us in storing 10 million records and without any data loss. Moreover if the function is called simuntaneously then the data storing should be done in proper way no override should happen and no data loss .
Currently there is no batch write API in Aerospike. You have to write each record individually. The only way to never have any data loss is to use Strong Consistency Mode. It covers all sorts of corner cases and ensures committed writes are never lost.

best practises of batch insert in hibernate(large insertions)

I have a job that runs and inserts over 20000 records parsing a json, I am connecting my whole application to oracle db using hibernate. It is taking around 1 hour of time because it also involves json calls and parsing of json, whereas just printing the parsed fields in the logs takes a minute or 2. My question here is, Is there a way to optimize the insertion process using hibernate.
I tried suggestions from Hibernate batch size confusion, but still I feel very slow.
I tried increasing batch size.
I tried disabling second level cache.
I also flushed and cleared my session depending on the batch size
I am planning to move to jdbc batch insertions, but wanna give a try to optimize using hibernate.
I hope this may give a generic expose to most of amateur programmers helping them with the best practises

High disk IO rate

My rails application always reaches the threshold of the disk I/O rate set by my VPS at Linode. It's set at 3000 (I up it from 2000), and every hour or so I will get a notification that it reaches 4000-5000+.
What are the methods that I can use to minimize the disk IO rate? I mostly use Sphinx (Thinking Sphinx plugin) and Latitude and Longitude distance search.
What are the methods to avoid?
I'm using Rails 2.3.11 and MySQL.
Thanks.
did you check if your server is swapping itself to death? what does "top" say?
your Linode may have limited RAM, and it could be very likely that it is swapping like crazy to keep things running..
If you see red in the IO graph, that is swapping activity! You need to upgrade your Linode to more RAM,
or limit the number / size of processes which are running. You should also add approximately 2x the RAM size as Swap space (swap partition).
http://tinypic.com/view.php?pic=2s0b8t2&s=7
Since your question is too vague to answer concisely, this is generally a sign of one of a few things:
Your data set is too large because of historical data that you could prune. Delete what is no longer relevant.
Your tables are not indexed properly and you are hitting a lot of table scans. Check with EXAMINE on each of your slow queries.
Your data structure is not optimized for the way you are using it, and you are doing too many joins. Some tactical de-normalization would help here. Make sure all your JOIN queries are strictly necessary.
You are retrieving more data than is required to service the request. It is, sadly, all too common that people load enormous TEXT or BLOB columns from a user table when displaying only a list of user names. Load only what you need.
You're being hit by some kind of automated scraper or spider robot that's systematically downloading your entire site, page by page. You may want to alter your robots.txt if this is an issue, or start blocking troublesome IPs.
Is it going high and staying high for a long time, or is it just spiking temporarily?
There aren't going to be specific methods to avoid (other than not writing to disk).
You could try using a profiler in production like NewRelic to get more insight into your performance. A profiler will highlight the actions that are taking a long time, however, and when you examine the specific algorithm you're using in that action, you might discover what's inefficient about that particular action.

NHibernate Batch Size across entire app - any problems?

If I configure NHibernate with a batch size of say 20, am I likely to run into problems in regular, non-batch-update related scenarios?
The majority of updates/inserts performed by my application are once-offs. But in certain cases I am doing large scale updates/inserts which would benefit from batching. Should I use a different session configuration for these, or do you think I can safely leave the batch size higher for the entire app?
The reason I ask is that it is hassle to setup a different session just for the batching scenarios (because this is a web app with per-request sessions).
In the NH documentation, the only negative to batch updates that is mentioned is this:
optimistic concurrency checking may be
impaired since ADO.NET 2.0 does not
return the number of rows affected by
each statement in the batch, only the
total number of rows affected by the
batch.
And I would not think that would be a problem for small batches. So I doubt that a higher batch size will negatively affect your application's normal functioning.
However, you should consider creating a new session for your batch operations anyway. A normal NHibernate session is going to be inefficient for batch updates/inserts, because the first-level cache tracks every single object. You can control this manually by doing session.Flush(); session.Clear(); regularly, but it is probably easier to use a StatelessSession instead.

When to commit changes?

Using Oracle 10g, accessed via Perl DBI, I have a table with a few tens of million rows being updated a few times per second while being read from much more frequently form another process.
Soon the update frequency will increase by an order of magnitude (maybe two).
Someone suggested that committing every N updates instead of after every update will help performance.
I have a few questions:
Will that be faster or slower or it depends (planning to benchmark both way as soon as can get a decent simulation of the new load)
Why will it help / hinder performance.
If "it depends ..." , on what ?
If it helps what's the best value of N ?
Why can't my local DBA have an helpful straight answer when I need one? (Actually I know the answer to that one) :-)
EDIT:
#codeslave : Thanks, btw losing
uncommited changes is not a problem, I
don't delete the original data used
for updating till I am sure everything
is fine , btw cleaning lady did
unplugs the server, TWICE :-)
Some googling showed it might help
because of issue related to rollback
segments, but I still don't know a
rule of thumb for N every few tens ?
hundreds? thousand ?
#diciu : Great info, I'll definitely
look into that.
A commit results in Oracle writing stuff to the disk - i.e. in the redo log file so that whatever the transaction being commited has done can be recoverable in the event of a power failure, etc.
Writing in file is slower than writing in memory so a commit will be slower if performed for many operations in a row rather then for a set of coalesced updates.
In Oracle 10g there's an asynchronous commit that makes it much faster but less reliable: https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-6158695.html
PS I know for sure that, in a scenario I've seen in a certain application, changing the number of coalesced updates from 5K to 50K makes it faster by an order of magnitude (10 times faster).
Reducing the frequency of commits will certainly speed things up, however as you are reading and writing to this table frequently there is the potential for locks. Only you can determine the likelihood of the same data being updated at the same time. If the chance of this is low, commit every 50 rows and monitor the situation. Trial and error I'm afraid :-)
As well as reducing the commit frequency, you should also consider performing bulk updates instead of individual ones.
If you "don't delete the original data used for updating till [you are] sure everything is fine", then why don't you remove all those incremental commits in between, and rollback if there's a problem? It sounds like you effectively have built a transaction systems on top of transactions.
#CodeSlave your your questions is answered by #stevechol , if i remove ALL the incremental commits there will be locks. I guess if nothing better comes along I'll follow his advice pick a random number , monitor the load and adjust accordingly. While applying #diciu twaks.
PS: the transaction on top of transaction is just accidental, I get the files used for updates by FTP and instead of deleting them immediately I set a cron job to deletes them a week later (if no one using the application has complained) that means if something goes wrong I have a week to catch the errors.
Faster/Slower?
It will probably be a little faster. However, you run a greater risk of running into deadlocks, losing uncommitted changes should something catastrophic happen (cleaning lady unplugs the server), FUD, Fire, Brimstone, etc.
Why would it help?
Obviously fewer commit operations, which in turn means fewer disk writes, etc.
DBA's and straight answers?
If it was easy, you won't need one.