NHibernate performance insert

NHibernate performance insert - nhibernate

I'm doing some tests with nhibernate and I'm modifying batch_size to get bulk inserts.
I'm using mssql2005 and using the northwind db.
I created 1000 object and insert them to the database. I've changed the values of batch_size from 5 to 100 but found no change in the performance. I'm getting value of around 300ms. Using the sql profiler, I see that 1000 sql insert statements at the sever side. Please help.
app.config
<property name="adonet.batch_size">10</property>
Code
public bool MyTestAddition(IList<Supplier> SupplierList)
{
var SupplierList_ = SupplierList;
var stopwatch = new Stopwatch();
stopwatch.Start();
using (ISession session = dataManager.OpenSession())
{
int counter = 0;
using (ITransaction transaction = session.BeginTransaction())
{
foreach (var supplier in SupplierList_)
{
session.Save(supplier);
}
transaction.Commit();
}
}
stopwatch.Stop();
Console.WriteLine(string.Format("{0} milliseconds. {1} items added",
stopwatch.ElapsedMilliseconds,
SupplierList_.Count));
return true;
}

The following is a great post on batch processing in Hibernate, which is what NHibernate is based upon and closely follows:
http://relation.to/Bloggers/BatchProcessingInHibernate
As you can see, the suggested actions are to set a reasonable batch size in the config, which you have done, but to also call session.flush()and session.clear() every 20 or so records.
We have employed this method ourselves and can now create and save 1000+ objects in seconds.

You could load the target type to a List and then call System.Data.SqlClient.BulkCopy to bcp the data into the target table.
This would allow processing of greater volumes.

According to this nhusers post, you seeing 1000 inserts in SQL server should not really matter, because the optimization is done on a different level. If you really have no gain in performance, trying the most recent version of NHibernate might help pointing to the resolution.

i have tried similar stuff with nh, never really got great performance. i remember settling with doing a flush every 10 entries and a commit every 50 entries to get a performance boost as with each insertion the process got steadily slower. It really depends on the size of the object so you could play arround with those numbers, maybe you can squeeze some performance out of it.

A call to ITransaction.Commit will Flush your Session, effectively writing your changes to the database. You are calling Commit after every Save, so there will be an INSERT for each Supplier.
I'd try to call Commit after every 10 Suppliers or so, or maybe even at the end of your 1000 Suppliers!

Related

Slow Datareader with ODP.NET

we are using (ODP.NET) Oracle.DataAccess version 1.102.3.0 with Oracle 11g client. I am having some problems with reading data using the datareader,
my procedure is returning a ref_cursor around 10000 records. However fetching the data takes around 30 to 40 sec.
Are there any possibilities to improve the performace ?

Try setting the FetchSize.
Since this is a Procedure and a RefCusor you can perform these operations.
ExecuteReader
Set the FetchSize -> do this before you start reading
Read the Results
If you follow the above sequence of actions you will be able to obtain the size of a row.
e.g.
int NumberOfRowsToFetchPerRoundTrip = 2000;
var reader =cmd.ExecuteReader();
reader.FetchSize = reader.RowSize * NumberOfRowsToFetchPerRoundTrip;
while(reader.Read())
{
//Do something
}
This will reduce the amount of round trips. The fetch size is bytes so you could also just use an arbitrary fetchSize eg 1024*1024 (1mb). However I would recommend basing you fetch on the size of a row * the number of rows you want to fetch per request.
In addition I would set the parameters on the connection string
Enlist=false;Self Tuning=False
You seem to get more consistent performance with these settings, although there may be some variation from one version of ODP/NET to the next.

RavenDB - Fastest Insert Performance - What is the benchmark?

I'm working on a prototype, using RavenDB, for my company to evaluate. We will have many threads inserting thousands of rows every few seconds, and many threads reading at the same time. I've done my first simple insert test, and before going much further, I want to make sure I'm using the recommended way of getting the best performance for RavenDB inserts.
I believe there is a bulk insert option. I haven't investigated that yet, as I'm not sure if that's necessary. I'm using the .NET API, and my code looks like this at the moment:
Debug.WriteLine("Number of Marker objects: {0}", markerList.Count);
StopwatchLogger.ExecuteAndLogPerformance(() =>
{
IDocumentSession ravenSession = GetRavenSession();
markerList.ForEach(marker => ravenSession.Store(marker));
ravenSession.SaveChanges();
}, "Save Marker data in RavenDB");
The StopwatchLogger simply invokes the action while putting a stopwatch around it:
internal static void ExecuteAndLogPerformance(Action action, string descriptionOfAction)
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
action();
stopwatch.Stop();
Debug.WriteLine("{0} -- Processing time: {1} ms", descriptionOfAction, stopwatch.ElapsedMilliseconds);
}
Here is the output from a few runs. Note, I'm writing to a local instance of RavenDB (build 701). I know performance will be worse over the network, but I'm testing locally first.
One run:
Number of Marker objects: 671
Save Marker data in RavenDB -- Processing time: 1308 ms
Another run:
Number of Marker objects: 670
Save Marker data in RavenDB -- Processing time: 1266 ms
Another run:
Number of Marker objects: 667
Save Marker data in RavenDB -- Processing time: 625 ms
Another run:
Number of Marker objects: 639
Save Marker data in RavenDB -- Processing time: 639 ms
Ha. 639 objects in 639 ms. What are the odds of that? Anyway, that's one insert per millisecond, which would be 1000 every second.
The Marker object/document doesn't have much to it. Here is an example of one that has already been saved:
{
"ID": 14740009,
"SubID": "120403041588",
"ReadTime": "2012-04-03T13:51:45.0000000",
"CdsLotOpside": "163325",
"CdsLotBackside": "163325",
"CdteLotOpside": "167762",
"CdteLotBackside": "167762",
"EquipmentID": "VA_B"
}
Is this expected performance?
Is there a better way (best practice) to insert to gain speed?
Are there insert benchmarks available somewhere that I can target?

First, I would rather make sure that the number of items you save in a single batch doesn't get too big. There is no hard limit, however it hurts performance and eventually will crash if the transaction size gets too big. Using a value like 1024 items is safe, but it really depends on the size of your documents.
1000 documents per seconds is much lower than the number that you can actually reach with a single instance of RavenDB. You should do inserts in parallel and you can do some sort of tweaking with config option. For instance, you could increase the values defined by the settings beginning with Raven/Esent/. It is also a good idea (like in sql server) to put the logs and indexes to different hard drives. Depending on your concrete scenario you may also want to temporarily disable indexing while you're doing the inserts.
However, in most cases you don't want to care about that. If you need really high insert performance you can use multiple sharded instances and theoretically get an unlimited number of inserts/per second (just add more instances).

linq to sql insert locks

here is some insert code
gkInfo.data.ToList()
.ForEach(p => p.hour.ToList()
.ForEach(r => r.block.ToList()
.ForEach(q =>
{
var v = new VarValues();
v.dt = DateTime.Parse(p.target_date + " " + (r.value - 1).ToString() + ":00:00");
v.id_objecttype = config.stations.Where(i => i.text == q.station_name).Single().id_objecttype;
v.id_object = q.bnum.ToString();
v.id_param = config.stations.Where(i => i.text == q.station_name).Single().id_param;
v.pl_lev = 3;
v.source = 0;
v.value = q.block_state;
v.version = version;
v.description = q.change_type;
m53500context1.VarValues.InsertOnSubmit(v);
}
)));
m53500context1.SubmitChanges();
and this code, locks table.
can i avoid it? or its impossible?

Although I don't know all the details regarding your issue the patterns seems very familiar. Often times you need to do a big update in the database, but at the same time you still need the database be available, so for example if there is a web site that is working off the dataset, it does not time-out while update operations in progress.
Sometimes update operations can be regular exports from another database, sometimes, calculating some caches, not unlike the example you have provided.
If you need your update be transactional (i.e. all or nothing), there is no real way around the lock. While the update is underway the table is locked. If you don't need transaction, then you can try and break up you update to smaller batches. SubmitChanges, wrap all the changes in a single transaction, so you gonna need to use several SubmitChanges, so each individual transaction is fast and thus not locking the table for long.
If being transactional is a requirement, you can do you inserts in a staging area, i.e. not in the same table that other processes read from. When the insert is finished you figure out the way, to swap the areas. This could complicated by the fact that there maybe updates into this table that you haven't accounted for, but I do not know if this is true in your case.
In the worst case you will need to have some application logic, that knows that update is in progress and while it's happening it reads the data from an alternate location. Of course you will have to provide this alternate location (a copy) to read from.
There is no hard and fast answer, but there a few things (above) that you can try. Also feel free to tell more about your specific task / requirements.

Please see: LINQ To SQL NO_LOCK. You need to set a different isolation level for your transaction.

Batch Update in NHibernate

Does batch update command exist in NHibernate? As far as I am aware it doesn't. So what's the best way to handle this situation? I would like to do the following:
Fetch a list of objects ( let's call them a list of users, List<User> ) from the database
Change the properties of those objects, ( Users.Foreach(User=>User.Country="Antartica")
Update each item back individually ( Users.Foreach(User=>NHibernate.Session.Update(User)).
Call Session.Flush to update the database.
Is this a good approach? Will this resulted in a lot of round trip between my code and the database?
What do you think? Or is there a more elegant solution?

I know I'm late to the party on this, but thought you may like to know this is now possible using HQL in NHibernate 2.1+
session.CreateQuery(#"update Users set Country = 'Antarctica'")
.ExecuteUpdate();

Starting NHibernate 3.2 batch jobs have improvements which minimizes database roundtrips. More information can be found on HunabKu blog.
Here is example from it - these batch updates do only 6 roundtrips:
using (ISession s = OpenSession())
using (s.BeginTransaction())
{
for (int i = 0; i < 12; i++)
{
var user = new User {UserName = "user-" + i};
var group = new Group {Name = "group-" + i};
s.Save(user);
s.Save(group);
user.AddMembership(group);
}
s.Transaction.Commit();
}

You can set the batch size for updates in the nhibernate config file.
<property name="hibernate.adonet.batch_size">16</property>
And you don't need to call Session.Update(User) there - just flush or commit a transaction and NHibernate will handle things for you.
EDIT: I was going to post a link to the relevant section of the nhibernate docs but the site is down - here's an old post from Ayende on the subject:
As to whether the use of NHibernate (or any ORM) here is a good approach, it depends on the context. If you are doing a one-off update of every row in a large table with a single value (like setting all users to the country 'Antarctica' (which is a continent, not a country by the way!), then you should probably use a sql UPDATE statement. If you are going to be updating several records at once with a country as part of your business logic in the general usage of your application, then using an ORM could be a more sensible method. This depends on the number of rows you are updating each time.
Perhaps the most sensible option here if you are not sure is to tweak the batch_size option in NHibernate and see how that works out. If the performance of the system is not acceptable then you might look at implementing a straight sql UPDATE statement in your code.

Starting with NHibernate 5.0 it is possible to make bulk operations using LINQ.
session.Query<Cat>()
.Where(c => c.BodyWeight > 20)
.Update(c => new { BodyWeight = c.BodyWeight / 2 });
NHibernate will generate a single "update" sql query.
See Updating entities

You don't need to update, nor flush:
IList<User> users = session.CreateQuery (...).List<User>;
users.Foreach(u=>u.Country="Antartica")
session.Transaction.Commit();
I think NHibernate writes a batch for all the changes.
The problem is, that your users need to be loaded into memory. If it gets a problem, you can still use native SQL using NHibernate. But until you didn't prove that it is a performance problem, stick with the nice solution.

No it's not a good approach!
Native SQL is many times better for this sort of update.
UPDATE USERS SET COUNTRY = 'Antartica';
Just could not be simpler and the database engine will process this one hundred times more efficiently than row at a time Java code.

How to change slow parametrized inserts into fast bulk copy (even from memory)

I had someting like this in my code (.Net 2.0, MS SQL)
SqlConnection connection = new SqlConnection(#"Data Source=localhost;Initial
Catalog=DataBase;Integrated Security=True");
connection.Open();
SqlCommand cmdInsert = connection.CreateCommand();
SqlTransaction sqlTran = connection.BeginTransaction();
cmdInsert.Transaction = sqlTran;
cmdInsert.CommandText =
#"INSERT INTO MyDestinationTable" +
"(Year, Month, Day, Hour, ...) " +
"VALUES " +
"(#Year, #Month, #Day, #Hour, ...) ";
cmdInsert.Parameters.Add("#Year", SqlDbType.SmallInt);
cmdInsert.Parameters.Add("#Month", SqlDbType.TinyInt);
cmdInsert.Parameters.Add("#Day", SqlDbType.TinyInt);
// more fields here
cmdInsert.Prepare();
Stream stream = new FileStream(fileName, FileMode.Open, FileAccess.Read);
StreamReader reader = new StreamReader(stream);
char[] delimeter = new char[] {' '};
String[] records;
while (!reader.EndOfStream)
{
records = reader.ReadLine().Split(delimeter, StringSplitOptions.None);
cmdInsert.Parameters["#Year"].Value = Int32.Parse(records[0].Substring(0, 4));
cmdInsert.Parameters["#Month"].Value = Int32.Parse(records[0].Substring(5, 2));
cmdInsert.Parameters["#Day"].Value = Int32.Parse(records[0].Substring(8, 2));
// more here complicated stuff here
cmdInsert.ExecuteNonQuery()
}
sqlTran.Commit();
connection.Close();
With cmdInsert.ExecuteNonQuery() commented out this code executes in less than 2 sec. With SQL execution it takes 1m 20 sec. There are around 0.5 milion records. Table is emptied before. SSIS data flow task of similar functionality takes around 20 sec.
Bulk Insert was not an option (see below). I did some fancy stuff during this import.
My test machine is Core 2 Duo with 2 GB RAM.
When looking in Task Manager CPU was not fully untilized. IO seemed also not to be fully utilized.
Schema is simple like hell: one table with AutoInt as primary index and less than 10 ints, tiny ints and chars(10).
After some answers here I found that it is possible to execute bulk copy from memory! I was refusing to use bulk copy beacuse I thought it has to be done from file...
Now I use this and it takes aroud 20 sec (like SSIS task)
DataTable dataTable = new DataTable();
dataTable.Columns.Add(new DataColumn("ixMyIndex", System.Type.GetType("System.Int32")));
dataTable.Columns.Add(new DataColumn("Year", System.Type.GetType("System.Int32")));
dataTable.Columns.Add(new DataColumn("Month", System.Type.GetType("System.Int32")));
dataTable.Columns.Add(new DataColumn("Day", System.Type.GetType("System.Int32")));
// ... and more to go
DataRow dataRow;
object[] objectRow = new object[dataTable.Columns.Count];
Stream stream = new FileStream(fileName, FileMode.Open, FileAccess.Read);
StreamReader reader = new StreamReader(stream);
char[] delimeter = new char[] { ' ' };
String[] records;
int recordCount = 0;
while (!reader.EndOfStream)
{
records = reader.ReadLine().Split(delimeter, StringSplitOptions.None);
dataRow = dataTable.NewRow();
objectRow[0] = null;
objectRow[1] = Int32.Parse(records[0].Substring(0, 4));
objectRow[2] = Int32.Parse(records[0].Substring(5, 2));
objectRow[3] = Int32.Parse(records[0].Substring(8, 2));
// my fancy stuf goes here
dataRow.ItemArray = objectRow;
dataTable.Rows.Add(dataRow);
recordCount++;
}
SqlBulkCopy bulkTask = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, null);
bulkTask.DestinationTableName = "MyDestinationTable";
bulkTask.BatchSize = dataTable.Rows.Count;
bulkTask.WriteToServer(dataTable);
bulkTask.Close();

Instead of inserting each record individually, Try using the SqlBulkCopy class to bulk insert all the records at once.
Create a DataTable and add all your records to the DataTable, and then use SqlBulkCopy.WriteToServer to bulk insert all the data at once.

Is required the transaction? Using transaction need much more resources than simple commands.
Also If you are sure, that inserted values are corect, you can use a BulkInsert.

1 minute sounds pretty reasonable for 0.5 million records. That's a record every 0.00012 seconds.
Does the table have any indexes? Removing these and reapplying them after the bulk insert would improve performance of the inserts, if that is an option.

It doesn't seem unreasonable to me to process 8,333 records per second...what kind of throughput are you expecting?

If you need better speed, you might consider implementing bulk insert:
http://msdn.microsoft.com/en-us/library/ms188365.aspx

If some form of bulk insert isn't an option, the other way would be multiple threads, each with their own connection to the database.
The issue with the current system is that you have 500,000 round trips to the database, and are waiting for the first round trip to complete before starting the next - any sort of latency (ie, a network between the machines) will mean that most of your time is spent waiting.
If you can split the job up, perhaps using some form of producer/consumer setup, you might find that you can get much more utilisation of all the resources.
However, to do this you will have to lose the one great transaction - otherwise the first writer thread will block all the others until its transaction is completed. You can still use transactions, but you'll have to use a lot of small ones rather than 1 large one.
The SSIS will be fast because it's using the bulk-insert method - do all the complicated processing first, generate the final list of data to insert and give it all at the same time to bulk-insert.

I assume that what is taking the approximately 58 seconds is the physical inserting of 500,000 records - so you are getting around 10,000 inserts a second. Without knowing the specs of your database server machine (I see you are using localhost, so network delays shouldn't be an issue), it is hard to say if this is good, bad, or abysmal.
I would look at your database schema - are there a bunch of indices on the table that have to be updated after each insert? This could be from other tables with foreign keys referencing the table you are working on. There are SQL profiling tools and performance monitoring facilities built into SQL Server, but I've never used them. But they may show up problems like locks, and things like that.

Do the fancy stuff on the data, on all records first. Then Bulk-Insert them.
(since you're not doing selects after an insert .. i don't see the problem of applying all operations on the data before the BulkInsert

If I had to guess, the first thing I would look for are too many or the wrong kind of indexes on the tbTrafficLogTTL table. Without looking at the schema definition for the table, I can't really say, but I have experienced similar performance problems when:
The primary key is a GUID and the primary index is CLUSTERED.
There's some sort of UNIQUE index on a set of fields.
There are too many indexes on the table.
When you start indexing half a million rows of data, the time spent to create and maintain indexes adds up.
I will also note that if you have any option to convert the Year, Month, Day, Hour, Minute, Second fields into a single datetime2 or timestamp field, you should. You're adding a lot of complexity to your data architecture, for no gain. The only reason I would even contemplate using a split-field structure like that is if you're dealing with a pre-existing database schema that cannot be changed for any reason. In which case, it sucks to be you.

I had a similar problem in my last contract. You're making 500,000 trips to SQL to insert your data. For a dramatic increase in performance, you want to investigate the BulkInsert method in the SQL namespace. I had "reload" processes that went from 2+ hours to restore a couple of dozen tables down to 31 seconds once I implemented Bulk Import.

This could best be accomplished using something like the bcp command. If that isn't available, the suggestions above about using BULK INSERT are your best bet. You're making 500,000 round trips to the database and writing 500,000 entries to the log files, not to mention any space that needs to be allocated to the log file, the table, and the indexes.
If you're inserting in an order that is different from your clustered index, you also have to deal with the time require to reorganize the physical data on disk. There are a lot of variables here that could possibly be making your query run slower than you would like it to.
~10,000 transactions per second isn't terrible for individual inserts coming roundtripping from code/

BULK INSERT = bcp from a permission
You could batch the INSERTs to reduce roundtrips
SQLDataAdaptor.UpdateBatchSize = 10000 gives 50 round trips
You still have 500k inserts though...
Article
MSDN

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas