Batch Update in NHibernate

Batch Update in NHibernate - nhibernate

Does batch update command exist in NHibernate? As far as I am aware it doesn't. So what's the best way to handle this situation? I would like to do the following:
Fetch a list of objects ( let's call them a list of users, List<User> ) from the database
Change the properties of those objects, ( Users.Foreach(User=>User.Country="Antartica")
Update each item back individually ( Users.Foreach(User=>NHibernate.Session.Update(User)).
Call Session.Flush to update the database.
Is this a good approach? Will this resulted in a lot of round trip between my code and the database?
What do you think? Or is there a more elegant solution?

I know I'm late to the party on this, but thought you may like to know this is now possible using HQL in NHibernate 2.1+
session.CreateQuery(#"update Users set Country = 'Antarctica'")
.ExecuteUpdate();

Starting NHibernate 3.2 batch jobs have improvements which minimizes database roundtrips. More information can be found on HunabKu blog.
Here is example from it - these batch updates do only 6 roundtrips:
using (ISession s = OpenSession())
using (s.BeginTransaction())
{
for (int i = 0; i < 12; i++)
{
var user = new User {UserName = "user-" + i};
var group = new Group {Name = "group-" + i};
s.Save(user);
s.Save(group);
user.AddMembership(group);
}
s.Transaction.Commit();
}

You can set the batch size for updates in the nhibernate config file.
<property name="hibernate.adonet.batch_size">16</property>
And you don't need to call Session.Update(User) there - just flush or commit a transaction and NHibernate will handle things for you.
EDIT: I was going to post a link to the relevant section of the nhibernate docs but the site is down - here's an old post from Ayende on the subject:
As to whether the use of NHibernate (or any ORM) here is a good approach, it depends on the context. If you are doing a one-off update of every row in a large table with a single value (like setting all users to the country 'Antarctica' (which is a continent, not a country by the way!), then you should probably use a sql UPDATE statement. If you are going to be updating several records at once with a country as part of your business logic in the general usage of your application, then using an ORM could be a more sensible method. This depends on the number of rows you are updating each time.
Perhaps the most sensible option here if you are not sure is to tweak the batch_size option in NHibernate and see how that works out. If the performance of the system is not acceptable then you might look at implementing a straight sql UPDATE statement in your code.

Starting with NHibernate 5.0 it is possible to make bulk operations using LINQ.
session.Query<Cat>()
.Where(c => c.BodyWeight > 20)
.Update(c => new { BodyWeight = c.BodyWeight / 2 });
NHibernate will generate a single "update" sql query.
See Updating entities

You don't need to update, nor flush:
IList<User> users = session.CreateQuery (...).List<User>;
users.Foreach(u=>u.Country="Antartica")
session.Transaction.Commit();
I think NHibernate writes a batch for all the changes.
The problem is, that your users need to be loaded into memory. If it gets a problem, you can still use native SQL using NHibernate. But until you didn't prove that it is a performance problem, stick with the nice solution.

No it's not a good approach!
Native SQL is many times better for this sort of update.
UPDATE USERS SET COUNTRY = 'Antartica';
Just could not be simpler and the database engine will process this one hundred times more efficiently than row at a time Java code.

Related

Why is Orchard so slow when executing a content item query?

Lets say i want to query all Orchard user IDs and i want to include those users that have been removed (aka soft deleted) also. The DB contains around 1000 users.
Option A - takes around 2 minutes
Orchard.ContentManagement.IContentManager lContentManager = ...;
lContentManager
.Query<Orchard.Users.Models.UserPart, Orchard.Users.Models.UserPartRecord>(Orchard.ContentManagement.VersionOptions.AllVersions)
.List()
.Select(u => u.Id)
.ToList();
Option B - executes with almost unnoticeable delay
Orchard.Data.IRepository<Orchard.Users.Models.UserPartRecord> UserRepository = ...;
UserRepository .Fetch(u => true).Select(u => u.Id).ToList();
I don't see any SQL queries being executed in SQL Profiler when using Option A. I guess it has something to do with NHibernate or caching.
Is there any way to optimize Option A?

Could it be because the IContentManager version is accessing the data via InfoSet (basically an xml representation of the data), where as the IRepository version uses the actual DB table itself.
I seem to remember reading that though Infoset is great in many cases, when you're dealing with larger datasets with sorting / filtering it is more efficient to go direct to the table, as using Infoset requires each xml fragment to be parsed and elements extracted before you get to the data.
Since 'the shift', Orchard uses both so you can use whichever method best suits to your needs. I can't find the article that explained it now, but this explains the shift & infosets quite nicely:
http://weblogs.asp.net/bleroy/the-shift-how-orchard-painlessly-shifted-to-document-storage-and-how-it-ll-affect-you
Hope that helps you?

neo4j count nodes performance on 200K nodes and 450K relations

We're developing an application based on neo4j and php with about 200k nodes, which every node has a property like type='user' or type='company' to denote a specific entity of our application. We need to get the count of all nodes of a specific type in the graph.
We created an index for every entity like users, companies which holds the nodes of that property. So inside users index resides 130K nodes, and the rest on companies.
With Cypher we quering like this.
START u=node:users('id:*')
RETURN count(u)
And the results are
Returned 1 row.Query took 4080ms
The Server is configured as default with a little tweaks, but 4 sec is too for our needs. Think that the database will grow in 1 month 20K, so we need this query performs very very much.
Is there any other way to do this, maybe with Gremlin, or with some other server plugin?
I'll cache those results, but I want to know if is possible to tweak this.
Thanks a lot and sorry for my poor english.

Finaly, using Gremlin instead of Cypher, I found the solution.
g.getRawGraph().index().forNodes('NAME_OF_USERS_INDEX').query(
new org.neo4j.index.lucene.QueryContext('*')
).size()
This method uses the lucene index to get "aproximate" rows.
Thanks again to all.

Mmh,
this is really about the performance of that Lucene index. If you just need this single query most of the time, why not update an integer with the total count on some node somewhere, and maybe update that together with the index insertions, for good measure run an update with the query above every night on it?

You could instead keep a property on a specific node up to date with the number of such nodes, where updates are done guarded by write locks:
Transaction tx = db.beginTx();
try {
...
...
tx.acquireWriteLock( countingNode );
countingNode.setProperty( "user_count",
((Integer)countingNode.getProperty( "user_count" ))+1 );
tx.success();
} finally {
tx.finish();
}

If you want the best performance, don't model your entity categories as properties on the node. In stead, do it like this :
company1-[:IS_ENTITY]->companyentity
Or if you are using 2.0
company1:COMPANY
The second would also allow you automatically update your index in a separate background thread by the way, imo one of the best new features of 2.0
The first method should also proof more efficient, since making a "hop" in general takes less time than reading a property from a node. It does however require you to create a separate index for the entities.
Your queries would look like this :
v2.0
MATCH company:COMPANY
RETURN count(company)
v1.9
START entity=node:entityindex(value='company')
MATCH company-[:IS_ENTITIY]->entity
RETURN count(company)

Memory leak in Rails 3.0.11 migration

A migration contains the following:
Service.find_by_sql("select
service_id,
registrations.regulator_given_id,
registrations.regulator_id
from
registrations
order by
service_id, updated_at desc").each do |s|
this_service_id = s["service_id"]
if this_service_id != last_service_id
Service.find(this_service_id).update_attributes!(:regulator_id => s["regulator_id"],
:regulator_given_id => s["regulator_given_id"])
last_service_id = this_service_id
end
end
and it is eating up memory, to the point where it will not run in the 512MB allowed in Heroku (the registrations table has 60,000 items). Is there a known problem? Workaround? Fix in a later version of Rails?
Thanks in advance
Edit following request to clarify:
That is all the relevant source - the rest of the migration creates the two new columns that are being populated. The situation is that I have data about services from multiple sources (regulators of the services) in the registrations table. I have decided to 'promote' some of the data ([prime]regulator_id and [prime]regulator_given_key) into the services table for the prime regulators to speed up certain queries.

This will load all 60000 items in one go and keep those 60000 AR objects around, which will consume a fair amount of memory. Rails does provide a find_each method for breaking down a query like that into chunks of 1000 objects at a time, but it doesn't allow you to specify an ordering as you do.
You're probably best off implementing your own paging scheme. Using limit/offset is a possibility however large OFFSET values are usually inefficient because the database server has to generate a bunch of results that it then discards.
An alternative is to add conditions to your query that ensures that you don't return already processed items, for example specifying that service_id be less than the previously returned values. This is more complicated if when compared in this matter some items are equal. With both of these paging type schemes you probably need to think about what happens if a row gets inserted into your registrations table while you are processing it (probably not a problem with migrations, assuming you run them with access to the site disabled)

(Note: OP reports this didn't work)
Try something like this:
previous = nil
Registration.select('service_id, regulator_id, regulator_given_id')
.order('service_id, updated_at DESC')
.each do |r|
if previous != r.service_id
service = Service.find r.service_id
service.update_attributes(:regulator_id => r.regulator_id, :regulator_given_id => r.regulator_given_id)
previous = r.service_id
end
end
This is a kind of hacky way of getting the most recent record from regulators -- there's undoubtedly a better way to do it with DISTINCT or GROUP BY in SQL all in a single query, which would not only be a lot faster, but also more elegant. But this is just a migration, right? And I didn't promise elegant. I also am not sure it will work and resolve the problem, but I think so :-)
The key change is that instead of using SQL, this uses AREL, meaning (I think) the update operation is performed once on each associated record as AREL returns them. With SQL, you return them all and store in an array, then update them all. I also don't think it's necessary to use the .select(...) clause.
Very interested in the result, so let me know if it works!

linq to sql insert locks

here is some insert code
gkInfo.data.ToList()
.ForEach(p => p.hour.ToList()
.ForEach(r => r.block.ToList()
.ForEach(q =>
{
var v = new VarValues();
v.dt = DateTime.Parse(p.target_date + " " + (r.value - 1).ToString() + ":00:00");
v.id_objecttype = config.stations.Where(i => i.text == q.station_name).Single().id_objecttype;
v.id_object = q.bnum.ToString();
v.id_param = config.stations.Where(i => i.text == q.station_name).Single().id_param;
v.pl_lev = 3;
v.source = 0;
v.value = q.block_state;
v.version = version;
v.description = q.change_type;
m53500context1.VarValues.InsertOnSubmit(v);
}
)));
m53500context1.SubmitChanges();
and this code, locks table.
can i avoid it? or its impossible?

Although I don't know all the details regarding your issue the patterns seems very familiar. Often times you need to do a big update in the database, but at the same time you still need the database be available, so for example if there is a web site that is working off the dataset, it does not time-out while update operations in progress.
Sometimes update operations can be regular exports from another database, sometimes, calculating some caches, not unlike the example you have provided.
If you need your update be transactional (i.e. all or nothing), there is no real way around the lock. While the update is underway the table is locked. If you don't need transaction, then you can try and break up you update to smaller batches. SubmitChanges, wrap all the changes in a single transaction, so you gonna need to use several SubmitChanges, so each individual transaction is fast and thus not locking the table for long.
If being transactional is a requirement, you can do you inserts in a staging area, i.e. not in the same table that other processes read from. When the insert is finished you figure out the way, to swap the areas. This could complicated by the fact that there maybe updates into this table that you haven't accounted for, but I do not know if this is true in your case.
In the worst case you will need to have some application logic, that knows that update is in progress and while it's happening it reads the data from an alternate location. Of course you will have to provide this alternate location (a copy) to read from.
There is no hard and fast answer, but there a few things (above) that you can try. Also feel free to tell more about your specific task / requirements.

Please see: LINQ To SQL NO_LOCK. You need to set a different isolation level for your transaction.

NHibernate performance insert

I'm doing some tests with nhibernate and I'm modifying batch_size to get bulk inserts.
I'm using mssql2005 and using the northwind db.
I created 1000 object and insert them to the database. I've changed the values of batch_size from 5 to 100 but found no change in the performance. I'm getting value of around 300ms. Using the sql profiler, I see that 1000 sql insert statements at the sever side. Please help.
app.config
<property name="adonet.batch_size">10</property>
Code
public bool MyTestAddition(IList<Supplier> SupplierList)
{
var SupplierList_ = SupplierList;
var stopwatch = new Stopwatch();
stopwatch.Start();
using (ISession session = dataManager.OpenSession())
{
int counter = 0;
using (ITransaction transaction = session.BeginTransaction())
{
foreach (var supplier in SupplierList_)
{
session.Save(supplier);
}
transaction.Commit();
}
}
stopwatch.Stop();
Console.WriteLine(string.Format("{0} milliseconds. {1} items added",
stopwatch.ElapsedMilliseconds,
SupplierList_.Count));
return true;
}

The following is a great post on batch processing in Hibernate, which is what NHibernate is based upon and closely follows:
http://relation.to/Bloggers/BatchProcessingInHibernate
As you can see, the suggested actions are to set a reasonable batch size in the config, which you have done, but to also call session.flush()and session.clear() every 20 or so records.
We have employed this method ourselves and can now create and save 1000+ objects in seconds.

You could load the target type to a List and then call System.Data.SqlClient.BulkCopy to bcp the data into the target table.
This would allow processing of greater volumes.

According to this nhusers post, you seeing 1000 inserts in SQL server should not really matter, because the optimization is done on a different level. If you really have no gain in performance, trying the most recent version of NHibernate might help pointing to the resolution.

i have tried similar stuff with nh, never really got great performance. i remember settling with doing a flush every 10 entries and a commit every 50 entries to get a performance boost as with each insertion the process got steadily slower. It really depends on the size of the object so you could play arround with those numbers, maybe you can squeeze some performance out of it.

A call to ITransaction.Commit will Flush your Session, effectively writing your changes to the database. You are calling Commit after every Save, so there will be an INSERT for each Supplier.
I'd try to call Commit after every 10 Suppliers or so, or maybe even at the end of your 1000 Suppliers!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas