JPA with HIBERNATE insert very slow - sql

I am trying to insert some data to SQL Server 2008 R2 by using JAP and HIBERNATE. Everything "works" except for that it's very slow. To insert 20000 rows, it takes about 45 seconds, while a C# script takes about less than 1 second.
Any veteran in this domain can offer some helps? I would appreciate it a lot.
Update: got some great advices from the answers below, but it still doesn't work as expected. Speed is the same.
Here is the updated persistence.xml:
<persistence version="2.0"
xmlns="http://java.sun.com/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd">
<persistence-unit name="ClusterPersist"
transaction-type="RESOURCE_LOCAL">
<provider>org.hibernate.ejb.HibernatePersistence</provider>
<class>cluster.data.persist.sqlserver.EventResult</class>
<exclude-unlisted-classes>true</exclude-unlisted-classes>
<properties>
<property name="javax.persistence.jdbc.url"
value="jdbc:sqlserver://MYSERVER:1433;databaseName=MYTABLE" />
<property name="javax.persistence.jdbc.user" value="USER" />
<property name="javax.persistence.jdbc.password" value="PASSWORD" />
<property name="javax.persistence.jdbc.driver"
value="com.microsoft.sqlserver.jdbc.SQLServerDriver" />
<property name="hibernate.show_sql" value="flase" />
<property name="hibernate.hbm2ddl.auto" value="update" />
<property name="hibernate.connection.provider_class"
value="org.hibernate.service.jdbc.connections.internal.C3P0ConnectionProvider" />
<property name="hibernate.c3p0.max_size" value="100" />
<property name="hibernate.c3p0.min_size" value="0" />
<property name="hibernate.c3p0.acquire_increment" value="1" />
<property name="hibernate.c3p0.idle_test_period" value="300" />
<property name="hibernate.c3p0.max_statements" value="0" />
<property name="hibernate.c3p0.timeout" value="100" />
<property name="hibernate.jdbc.batch_size" value="50" />
<property name="hibernate.cache.use_second_level_cache" value="false" />
</properties>
</persistence-unit>
And here is the updated code part:
public static void writeToDB(String filePath) throws IOException {
EntityManager entityManager = entityManagerFactory.createEntityManager();
Session session = (Session) entityManager.getDelegate();
Transaction tx = session.beginTransaction();
int i = 0;
URL filePathUrl = null;
try {
filePathUrl = new URL(filePath);
} catch (MalformedURLException e) {
filePathUrl = (new File(filePath)).toURI().toURL();
}
String line = null;
BufferedReader stream = null;
try {
InputStream in = filePathUrl.openStream();
stream = new BufferedReader(new InputStreamReader(in));
// Read each line in the file
MyRow myRow = new MyRow();
while ((line = stream.readLine()) != null) {
String[] splitted = line.split(",");
int num1 = Integer.valueOf(splitted[1]);
float num2= Float.valueOf(splitted[6]).intValue();
myRow.setNum1(num1);
myRow.setNum2(num2);
session.save(myRow);
if (i % 50 == 0) {
session.flush();
session.clear();
}
i++;
}
tx.commit();
} finally {
if (stream != null)
stream.close();
}
session.close();
}
Updated, here is the source for MyRow:
#Entity
#Table(name="MYTABLE")
public class MyRow {
#Id
#GeneratedValue(strategy=GenerationType.IDENTITY)
private Long id;
#Basic
#Column(name = "Num1")
private int Num1;
#Basic
#Column(name = "Num2")
private float Num2;
public Long getId() {
return id;
}
public void setId(Long id) {
this.id = id;
}
public float getNum1() {
return Num1;
}
public void setNum1(float num1) {
Num1 = num1;
}
public int getNum2() {
return Num2;
}
public void setNum2(int num2) {
Num2 = num2;
}
}

The problem
One of the major performance hits if you use Hibernate as your ORM is the way its "dirty check" is implemented (because without Byte Code Enhancement, which is standard in all JDO based ORMs and some others, dirty checking will always be an inefficient hack).
When flushing, a dirty check needs to be carried out on every object in the session to see if it is "dirty" i.e. one of its attributes has changed since it was loaded from the database. For all "dirty" (changed) objects Hibernate has to generate SQL updates to update the records that represent the dirty objects.
The Hibernate dirty check is notoriously slow on anything but a small number of objects because it needs to perform a "field by field" comparison between objects in memory with a snapshot taken when the object was first loaded from the database. The more objects, say, a HTTP request loads to display a page, then the more dirty checks will be required when commit is called.
Technical details of Hibernate's dirty checking mechanism
You can read more about Hibernate's dirty check mechanism implemented as a "field by field" comparison here:
How does Hibernate detect dirty state of an entity object?
How the problem is solved in other ORMs
A much more efficient mechanism used by some other ORMs is to use an automatically generated "dirty flag" attribute instead of the "field by field" comparison but this has traditionally only been available in ORMs (typically JDO based ORMs) that use and promote byte code enhancement or byte code 'weaving' as it is sometimes called eg., http://datanucleus.org and others
During byte code enhancement, by DataNucleus or any of the other ORMs supporting this feature, each entity class is enhanced to:
add an implicit dirty flag attribute
add the code to each of the setter methods in the class to automatically set the dirty flag when called
Then during a flush, only the dirty flag needs to be checked instead of performing a field by field comparison - which, as you can imagine, is orders of magnitude faster.
Other negative consequences of "field by field" dirty checking
The other innefficiency of the Hibernate dirty checking is the need to keep a snap shot of every loaded object in memory to avoid having to reload and check against the database during dirty checking.
Each object snap shot is a collection of all its fields.
In addition to the performance hit of the Hibernate dirty checking mechanism at flush time, this mechanism also burdens your app with the extra memory consumption and CPU usage associated with instantiating and initializing these snapshots of every single object that is loaded from the database - which can run into the thousands or millions depending on your application.
Hibernate has introduced byte code enhancement to address this but I have worked on many ORM persisted projects (both Hibernate and non Hibernate) and I am yet to see a Hibernate persisted project that uses that feature, possibly due to a number of reasons:
Hibernate has traditionally promoted its "no requirement for byte code enhancement" as a feature when people evaluate ORM technologies
Historical reliability issues with Hibernate's byte code enhancement implementation which is possibly not as mature as ORMs that have used and promoted byte code enhancement from the start
Some people are still scared of using byte code enhancement due to the promotion of an anti 'byte code enhancement' stance and the fear certain groups instilled in people regarding the use of byte code enhancement in the early days of ORMs
These days byte code enhancement is used for many different things - not just persistence. It has almost become mainstream.

To enable JDBC batching you should initialize the property hibernate.jdbc.batch_size to between 10 and 50 (int only)
hibernate.jdbc.batch_size=50
If it's still not as fast as expected, then I'd review the document above paying attention to NOTE(s) and section 4.1. Especially the NOTE that says, "Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator."

Old topic but came across this today looking for something else. I had to post on this common problem that is unfortunately not very well understood and documented. For too long, Hibernate's documentation had only that brief note as posted above.
Starting with version 5, there is a better but still thin explanation: https://docs.jboss.org/hibernate/orm/5.3/userguide/html_single/Hibernate_User_Guide.html#identifiers-generators-identity
The problem of slow insert of very large collection is simply poor choice of Id generation strategy:
#Id
#GeneratedValue(strategy=GenerationType.IDENTITY)
When using Identity strategy, what need to be understood is that the database server creates the identity of the row, on the physical insert. Hibernate needs to know the assigned Id to have the object in persisted state, in session. The database generated Id is only known on the insert's response. Hibernate has NO choice but to perform 20000 individual inserts to be able to retrieve the generated Ids. It doesn't work with batch as far as I know, not with Sybase, not with MSSQL. That is why, regardless how hard you tried and with all the batching properties properly configured, Hibernate will do individual inserts.
The only solution that I know and have applied many time is to choose a client side Id generation strategy instead of the popular database side Identity strategy.
I often used:
#Id
#GeneratedValue(strategy = GenerationType.SEQUENCE)
#GenericGenerator(strategy = "org.hibernate.id.enhanced.SequenceStyleGenerator")
There's a bit more configuration to get it to work but that the essence of it. When using a client side Id generation, Hibernate will set the Ids of all the 20000 objects before hitting the database. And with proper batching properties as seen in previous answers, Hibernate will do inserts in batch, as expected.
It is unfortunate that Identity generator so convenient and popular, it appears everywhere in all examples without clear explanation of the consequence of using this strategy. I read many so called "advance" Hibernate books and never seen one so far explaining the consequence of Identity on underlying insert performance on large data set.

Hibernate "default mode" IS slow.
Its advantages are Object Relational Mapping and some cache (but obviously it is not very useful for bulk insertion).
Use batch processing instead http://docs.jboss.org/hibernate/core/4.0/devguide/en-US/html/ch04.html

Related

Session Flush not Showing SQL when persisting unsaved entities

The scenario is a (more complex) version of the following:
IList<T> ts = Session.QueryOvery<T>().List();
// modify data of multiple objects
ts[0].Foo = "foo0";
ts[1].Foo = "foo1";
using (ITransaction trx = Session.BeginTransaction())
{
// save only one object
Session.Save (ts[0]);
trx.Commit();
}
As NH goes, this will also save ts[1] by default, to prevent stale state (side note : we love control over our SQL, so we turn that off by setting Session.FlushMode=FlushMode.Never).
What really vexes me is the fact that, even though Show_SQL is activated, no sql is shown for the ts[1] updates that are definitely sent to the Database by the flush.
Is there any way I can get those to show up?
As stated in https://stackoverflow.com/a/9403516/1236044 , you just need to add adonet.batch_size setting with value 0 to your config :
<property name="adonet.batch_size">0</property>

Why is NHibernate not removing entities from the 2nd level cache when they are updated?

I am using SysCache2 and NHibernate 2.1.2.4.
No matter how hard I try, NHibernate keeps loading previous instances of an entity.
My class is mapped as cacheable ReadWrite.
The cache region is the default, i.e. the full name of the class's types.
I am performing all actions within a transaction.
The database is definitely being updated, and when I manually clear ASP .NET's cache, the problem goes away.
I am doing a simple update, like this:
using(var transaction = NHSession.BeginTransaction())
{
var foo = Session.Load<Foo>(_fooId);
foo.Name = "A new name";
transaction.Commit();
}
Then I reload the entity later on (in a different session within the application), like this:
using(var transaction = NHSession.BeginTransaction())
{
var foo = Session.Load<Foo>(_fooId);
Response.Write(foo.Name);
transaction.Commit();
}
... but Foo's name is still the old name, not the new name I just updated it to!
I would log all the caching related messages and see if the cache is updated when the 2nd transaction is committed. Here is a sample log4net config to log caching messages...
<logger name="NHibernate.Cache.ReadWriteCache" additivity="false">
<level value="ALL"/>
<appender-ref ref="Console"/>
</logger>
I worked this one out - for certain reasons I had two Session Factories. I didn't realise that the Save operation was occurring within one factory, and the load was occurring in the other.
Why are you using Load instead of Get? I suspect this has something to do with the problem. This article explains the differences but doesn't relate anything specific to your problem. Still, I would try switching to Get.

NHibernate ISession.Save() - Why is this persisting my entities immediately?

I am creating a large number of entities with NHibernate, attaching them to my ISession, and then using a transaction to commit my changes to the database. Code sample is below:
ISession _context = SessionProvider.OpenSession();
//Create new entities
for(int i=0; i<100; i++)
{
MyEntity entity = new MyEntity(i);
//Attach new entity to the context
_context.Save(entity);
}
//Persist all changes to the database
using(var tx = _context.BeginTransaction())
{
//Flush the session
tx.Commit();
}
I was under the impression that the line _context.Save() simply makes the ISession aware of the new entity, but that no changes are persisted to the database until I Flush the session via the line tx.Commit().
What I've observed though, is that the database gets a new entity every time I call _context.Save(). I end up with too many individual calls to the database as a result.
Does anyone know why ISession.Save() is automatically persisting changes? Have I misunderstood something about how NHibernate behaves? Thanks.
***EDIT - Just to clarify (in light of the two suggested answers) - my problem here is that the database IS getting updated as soon as I call _context.Save(). I don't expect this to happen. I expect nothing to be inserted into the database until I call tx.Commit(). Neither of the two suggested answers so far helps with this unfortunately.
Some good information on identity generators can be found here
Try:
using(Session _context = SessionProvider.OpenSession())
using(var tx = _context.BeginTransaction())
{
//Create new entities
for(int i=0; i<100; i++)
{
MyEntity entity = new MyEntity(i);
//Attach new entity to the context
_context.Save(entity);
}
//Flush the session
tx.Commit();
}
Which identity generator are you using? If you are using post-insert generators like MSSQL/MySQL's Identity or Oracle's sequence to generate the value of your Id fields, that is your problem.
From NHibernate POID Generators Revealed:
Post insert generators, as the name
suggest, assigns the id’s after the
entity is stored in the database. A
select statement is executed against
database. They have many drawbacks,
and in my opinion they must be used
only on brownfield projects. Those
generators are what WE DO NOT SUGGEST
as NH Team.
Some of the drawbacks are the
following
Unit Of Work is broken with the use of
those strategies. It doesn’t matter if
you’re using FlushMode.Commit, each
Save results in an insert statement
against DB. As a best practice, we
should defer insertions to the commit,
but using a post insert generator
makes it commit on save (which is what
UoW doesn’t do).
Those strategies
nullify batcher, you can’t take the
advantage of sending multiple queries
at once(as it must go to database at
the time of Save)
You can set your batch size in your configuration:
<add key="hibernate.batch_size" value="10" />
Or you can set it in code. And make sure you do your saves within a transaction scope.
Try setting the FlushMode to Commit:
ISession _context = SessionProvider.OpenSession();
context.FlushMode = FlushMode.Commit;
peer's suggestion to set the batch size is good also.
My understanding is that when using database identity columns, NHibernate will defer inserts until the session is flushed unless it needs to perform the insert in order to retrieve a foreign key or ensure that a query returns the expected results.
Well
rebelliard's answer is a possibility depending on your mapping
you are not using explicit transactions (StuffHappens' answer)
default flush mode is auto and that complicates things (Jamie Ide's answer)
if by any change you make any queries using the nhibernate api the default behaviour is to flush the cache to the database first so that the results of those queries will match the session entity representation.
What about :
ISession _context = SessionProvider.OpenSession();
//Persist all changes to the database
using(var tx = _context.BeginTransaction())
{
//Create new entities
for(int i=0; i<100; i++)
{
MyEntity entity = new MyEntity(i);
//Attach new entity to the context
_context.Save(entity);
}
//Flush the session
tx.Commit();
}

Flushing in NHibernate

This question is a bit of a dupe, but I still don't understand the best way to handle flushing.
I am migrating an existing code base, which contains a lot of code like the following:
private void btnSave_Click()
{
SaveForm();
ReloadList();
}
private void SaveForm()
{
var foo = FooRepository.Get(_editingFooId);
foo.Name = txtName.Text;
FooRepository.Save(foo);
}
private void ReloadList()
{
fooRepeater.DataSource = FooRepository.LoadAll();
fooRepeater.DataBind();
}
Now that I am changing the FooRepository to Nhibernate, what should I use for the FooRepository.Save method? Should the FooRepository always flush the session when the entity is saved?
I'm not sure if I understand your question, but here is what I think:
Think in "putting objects to the session" instead of "getting and storing data". NH will store all new and changed objects in the session without any special call to it.
Consider this scenarios:
Data change:
Get data from the database with any query. The entities are now in the NH session
Change entities by just changing property values
Commit the transaction. Changes are flushed and stored to the database.
Create a new object:
Call a constructor to create a new object
Store it to the database by calling "Save". It is in the session now.
You still can change the object after Save
Commit the changes. The latest state will be stored to the database.
If you work with detached entities, you also need Update or SaveOrUpdate to put detached entities to the session.
Of course you can configure NH to behave differently. But it works best if you follow this default behaviour.
It doesn't matter whether or not you explicitly flush the session between modifying a Foo entity and loading all Foos from the repository. NHibernate is smart enough to auto-flush itself if you have made changes in the session that may affect the results of the query you are trying to run.
Ideally I try to use one session per "unit of work". This means one cohesive piece of work which may involve several smaller steps. If you feel that you do not have a seam in your architecture where you can achieve this, then managing the session inside the repository will also work. Just be aware that you are missing out on some of the power that NHibernate provides you.
I'd vote up Stefan Moser's answer if I could - I'm still getting to grips with Nh myself but I think it's nice to be able to write code like this:
private void SaveForm()
{
using (var unitofwork = UnitOfWork.Start())
{
var foo = FooRepository.Get(_editingFooId);
var bar = BarRepository.Get(_barId);
foo.Name = txtName.Text;
bar.SomeOtherProperty = txtBlah.Text;
FooRepository.Save(foo);
BarRepository.Save(bar);
UnitOfWork.CommitChanges();
}
}
so this way either the whole action succeeds or it fails and rolls back, keeping flushing/transaction management outside of the Repositories.

NHibernate update on single property updates all properties in sql

I am performing a standard update in NHibernate to a single property. However on commit of the transaction the sql update seems to set all fields I have mapped on the table even though they have not changed. Surely this can't be normal behaviour in Nhibernate? Am I doing something wrong? Thanks
using (var session = sessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
var singleMeeting = session.Load<Meeting>(10193);
singleMeeting.Subject = "This is a test 2";
transaction.Commit();
}
}
This is the normal behavior. You can try adding dynamic-update="true" to your class definition to override this behavior.
Well. yes this is normal behaviour for NHibernate. You can use generated attribute for your properties to change the behaviour. Details on Ayende's blog.
Why is this default is because with dynamics you don't get your query plan cached. And usually you don't mind that you send few more bytes over high speed network connection between your application server and database. Unless you are saving long strings where this setting is perfectly appropriate.