Querying over large data under NHibernate transaction - nhibernate

I understand that explicit transactions should be used even for reading data but I am unable to understand why the below code runs much slower under an NHibernate transaction (as opposed to running without it)
session.BeginTransaction();
var result = session.Query<Order>().Where(o=>o.OrderNumber > 0).Take(100).ToList();
session.Transaction.Commit();
I can post more detailed UT code if needed but if I am querying over 50,000 Order records, it takes about 1 sec for this query to run under NHibernate's explicit transaction, and it takes only about 15/20 msec without one.
Update 1/15/2019
Here is the detailed code
[Test]
public void TestQueryLargeDataUnderTransaction()
{
int count = 50000;
using (var session = _sessionFactory.OpenSession())
{
Order order;
// write large amount of data
session.BeginTransaction();
for (int i = 0; i < count; i++)
{
order = new Order {OrderNumber = i, OrderDate = DateTime.Today};
OrderLine ol1 = new OrderLine {Amount = 1 + i, ProductName = $"sun screen {i}", Order = order};
OrderLine ol2 = new OrderLine {Amount = 2 + i, ProductName = $"banjo {i}", Order = order};
order.OrderLines = new List<OrderLine> {ol1, ol2};
session.Save(order);
session.Save(ol1);
session.Save(ol2);
}
session.Transaction.Commit();
Stopwatch s = new Stopwatch();
// read the same data
session.BeginTransaction();
var result = session.Query<Order>().Where(o => o.OrderNumber > 0).Skip(0).Take(100).ToList();
session.Transaction.Commit();
s.Stop();
Console.WriteLine(s.ElapsedMilliseconds);
}
}

Your for-loop iterates 50000 times and for each iteration it creates 3 objects. So by the time you reach the first call to Commit(), the session knows about 150000 objects that it will flush to the database at Commit time (or earlier) (subject to your id generator policy and flush mode).
So far, so good. NHibernate is not necessarily optimised to handle so many objects in the session, but it can be acceptable providing one is careful.
On to the problem...
It's important to realize that committing the transaction does not remove the 150000 objects from the session.
When you later perform the query, it will notice that it is inside a transaction, in which case, by default, "auto-flushing" will be performed. This means that before sending the SQL query to the database, NHibernate will check if any of the objects known to the session has changes that might affect the outcome of the query (this is somewhat simplified). If such changes are found, they will be transmitted to the database before performing the actual SQL query. This ensures that the executed query will be able to filter based on changes made in the same session.
The extra second you notice is the time it takes for NHibernate to iterate over the 150000 objects known to the session to check for any changes. The primary use cases for NHibernate rarely involves more than tens or a few hundreds of objects, in which case the time needed to check for changes is negligible.
You can use a new session for the query to not see this effect, or you can call session.Clear() immediately after the first commit. (Note that for production code, session.Clear() can be dangerous.)
Additional: The auto-flushing happens when querying but only if inside a transaction. This behaviour can be controlled using session.FlushMode. During auto-flush NHibernate will aim to flush only objects that may affect the outcome of the query (i.e. which database tables are affected).
There is an additional effect to be aware of with regards to keeping sessions around. Consider this code:
using (var session = _sessionFactory.OpenSession())
{
Order order;
session.BeginTransaction();
for (int i = 0; i < count; i++)
{
// Your code from above.
}
session.Transaction.Commit();
// The order variable references the last order created. Let's modify it.
order.OrderDate = DateTime.Today.AddDays(4);
session.BeginTransaction();
var result = session.Query<Order>().Skip(0).Take(100).ToList();
session.Transaction.Commit();
}
What will happen with the change to the order date done after the first call to Commit()? That change will be persisted to the database when the query is performed in the second transaction despite the fact that the object modification itself happened before the transaction was started. Conversely, if you remove the second transaction, that modification will not be persisted of course.
There are multiple ways to manage sessions and transaction that can be used for different purposes. However, by far the easiest is to always follow this simple unit-of-work pattern:
Open session.
Immediately open transaction.
Perform a reasonable amount of work.
Commit or rollback transaction.
Dispose transaction.
Dispose session.
Discard all objects loaded using the session. At this point they can still
be used in memory, but any changes will not be persisted. Safer to just get
rid of them.

Related

Using SQL dB column as a lock for concurrent operations in Entity Framework

We have a long running user operation that is handled by a pool of worker processes. Data input and output is from Azure SQL.
The master Azure SQL table structure columns are approximated to
[UserId, col1, col2, ... , col N, beingProcessed, lastTimeProcessed ]
beingProcessed is boolean and lastTimeProcessed is DateTime. The logic in every worker role is as shown below and with multiple workers processing (each with their own Entity Framework layer), in essence beingProcessed is being used a lock for MutEx purposes
Question: How can I deal with concurrency issues on the beingProcessed "lock" itself based on the above load? I think read-modify-write operation on the beingProcessed needs to be atomic but I'm open to other strategies. Open to other code refinements too.
[Update]: I wonder if TransactionScope is what's needed here ... http://msdn.microsoft.com/en-US/library/system.transactions.transactionscope(v=vs.110).aspx
Code:
public void WorkerRoleMain()
{
while(true)
{
try
{
dbContext db = new dbContext();
// Read
foreach (UserProfile user in db.UserProfile
.Where(u => DateTime.UtcNow.Subtract(u.lastTimeProcessed)
> TimeSpan.FromHours(24) &
u.beingProcessed == false))
{
user.beingProcessed = true; // Modify
db.SaveChanges(); // Write
// Do some long drawn processing here
...
...
...
user.lastTimeProcessed = DateTime.UtcNow;
user.beingProcessed = false;
db.SaveChanges();
}
}
catch(Exception ex)
{
LogException(ex);
Sleep(TimeSpan.FromMinutes(5));
}
} // while ()
}
What we usually do is this:
At the beginning of a long operation we start a transaction:
BEGIN TRANSACTION
Then we select a row from the table we would like to update/delete using these hints:
SELECT * FROM Table WITH (ROWLOCK, NOWAIT) Where ID = 123;
Then we check that we have the row. If the row is locked by another process there will be an SQL Error. In this case we rollback the transaction and advise the user.
If the record is locked we process the record, and do the required updates, using the same transaction object we used to lock the record:
UPDATE Table SET Col1='value' WHERE ID = 123;
Then we COMMIT the transaction.
COMMIT;
This is just the Pseudo-code of the process. You will have to implement it in your program.
One small note regarding the above process. When you lock the record in SQL Server (or Azure), use the primary key in your WHERE Clause, otherwise the SQL Server will decide to use a Page lock, or Table lock

Allocating integers from sequence with transaction in NHibernate

I have some sequence in database, represented by range. It has three fields: beginId, nextId and endId.
The task is to acquire the nextId from this range, and to ensure that it is unique. The code could be ran in highly parallelized environment, with many threads, on many machines.
What I need to do:
lock(database)
{
var seq = GetSequence()
var acquiredId = seq.NextId;
seq.NextId++
Save(seq)
}
So I use this code:
using (ISession session = GetSessionFactory().OpenSession())
using (ITransaction transaction = session.BeginTransaction(IsolationLevel.RepeatableRead))
{
var sequence = session.CreateCriteria<Sequence>().Single(); // This line is simplified
var allocatedId = sequence.NextId;
sequence.NextId++;
session.SaveOrUpdate(sequence);
transaction.Commit();
return allocatedId;
}
But for some reason when I run this code in multi-threading for testing, I received the same id assigned several times. I'm using the transaction with RepeatableRead lock, but that doesn't help.
P.S. Id doesn't mean Id of the table - it just the name agreement we use.
I've used .Lock().Upgrade when the data is taken from DB, and everything worked. Thanks to hazzik for the link.
Now my code have this:
session.QueryOver<Sequence>().Lock().Upgrade.DoSomeFiltering().Single()

Check for duplicates when saving

How to check for duplicates when saving new object?
Scenario:
check for duplicates by some query -> when no duplicates perform saving
Is not good because between check and save there is plenty of time when other user can insert new object with the same data (high activity of users).
Should I check for exception when saving or what?
using (var tx = session.BeginTransaction(IsolationLevel.Serializable))
{
bool alreadyExists = session.Query<MyEntity>()
.Any(x => x.UniqueProp = newEntity.UniqueProp);
if (!alreadyExists)
session.Save(newEntity)
tx.Commit();
}
The Serializable isolation level guarantees nobody can insert a matching row between the query and the insert. Of course the downside is reduced concurrency because of the range locks.
Handling the exception is an alternative.

NHibernate - pessimistic locking not working

Follow up of this other question.
I'm trying to implement pessimistic locking for a concurrency issue as I described in the question above (please, feel free to add to that one). But it's not working for me.
I do a very simple test: I have two seperate sites running that both increase a counter 500 times. I run them simultaneously. In the end, I expect that a certain column in my table has, you guess it, a value of 1000.
Here is the code. It's no production code of course, but test code or not, it should still work, right?
for (int i = 0; i < 500; i++)
{
var tx = this.userRepo.Session.BeginTransaction();
var user = this.userRepo.GetById(42);
user.Counter++;
userRepo.Save(user);
tx.Commit();
}
The GetById method uses LockMode.Upgrade:
public T GetById(int id)
{
T obj = Session.Get<T>(id, LockMode.Upgrade);
return obj;
}
Now, using NHProfiler I see the following SQL statement:
SELECT Id FROM 'User' WHERE Id = 42 for update
but the result is a value of around 530, so that's about half of the updates lost due to concurrency. What am I doing wrong? I disabled second level cache in this test. Am I using the wrong lock mode? Should I specify an isoliation level? Anything else? Thanks in advance.
EDIT: FluentNhibernate config:
Fluently.Configure()
.Database(MySQLConfiguration.Standard.ConnectionString(connectionstring))
.Mappings(m => assemblyTypes.Select(t => t.Assembly).ToList().ForEach(a => m.FluentMappings.AddFromAssembly(a)))
.ExposeConfiguration(c => c.Properties.Add("hbm2ddl.keywords", "none"));
For the LockMode.Upgrade to work, all transactions have to be enclosed in a transaction because what LockMode.Upgrade does is lock it into the current transaction.
Your problem will most likely be due to the statements not being enclosed in a transaction.
Optimistic locking does not apply to a single statement, but to multiple transactions that are separated from each other. An example:
Begin a transaction;
Get record by Id = 42;
End the transaction.
Then, outside of the transaction, increase Counter.
After that:
Begin a transaction;
Get record by Id = 42;
Check whether counter has been unchanged from the value received in the first transaction;
a. If it hasn't changed, update the counter with the increased value;
b. If it has changed, handle the changed value.
End the transaction.
Optimistic locking means that you "hope" the Counter hasn't changed between the two transactions and handle the case where it has changed. With pessimistic locking, you ensure that all changes are done within a single transaction with all required records locked.
B.t.w.: the checking mechanism (whether Counter has changed in the mean time) can be automatically handled by NHibernate.

Transaction timeout expired while using Linq2Sql DataContext.SubmitChanges()

please help me resolve this problem:
There is an ambient MSMQ transaction. I'm trying to use new transaction for logging, but get next error while attempt to submit changes - "Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding." Here is code:
public static void SaveTransaction(InfoToLog info)
{
using (TransactionScope scope =
new TransactionScope(TransactionScopeOption.RequiresNew))
{
using (TransactionLogDataContext transactionDC =
new TransactionLogDataContext())
{
transactionDC.MyInfo.InsertOnSubmit(info);
transactionDC.SubmitChanges();
}
scope.Complete();
}
}
Please help me.
Thx.
You could consider increasing the timeout or eliminating it all together.
Something like:
using(TransactionLogDataContext transactionDC = new TransactionLogDataContext())
{
transactionDC.CommandTimeout = 0; // No timeout.
}
Be careful
You said:
thank you. but this solution makes new question - if transaction scope was changed why submit operation becomes so time consuming? Database and application are on the same machine
That is because you are creating new DataContext right there:
TransactionLogDataContext transactionDC = new TransactionLogDataContext())
With new data context ADO.NET opens up new connection (even if connection strings are the same, unless you do some clever connection pooling).
Within transaction context when you try to work with more than 1 connection instances (which you just did)
ADO.NET automatically promotes transaction to a distributed transaction and will try to enlist it into MSDTC. Enlisting very first transaction per connection into MSDTC will take time (for me it takes 30+ seconds), consecutive transactions will be fast, however (in my case 60ms). Take a look at this http://support.microsoft.com/Default.aspx?id=922430
What you can do is reuse transaction and connection string (if possible) when you create new DataContext.
TransactionLogDataContext tempDataContext =
new TransactionLogDataContext(ExistingDataContext.Transaction.Connection);
tempDataContext.Transaction = ExistingDataContext.Transaction;
Where ExistingDataContext is the one which started ambient transaction.
Or attemp to speed up your MS DTC.
Also do use SQL Profiler suggested by billb and look for SessionId between different commands (save and savelog in your case). If SessionId changes, you are in fact using 2 different connections and in that case will have to reuse transaction (if you don't want it to be promoted to MS DTC).