NHibernate: Saving different types of objects in the same session breaks batching - nhibernate

newbie here, sorry if this is an obvious question.
It seems saving different types of objects in the same session breaks batching, cause significant performance drop.
ID generator is set to Increment (as Diego Mijelshon advised, I tried hilo("100"), but unfortunately same issue, Test1() is still about 5 times slower than Test2()):
public class CustomIdConvention : IIdConvention
{
public void Apply(IIdentityInstance instance)
{
instance.GeneratedBy.Increment();
}
}
AdoNetBatchSize is set to 1000:
MsSqlConfiguration.MsSql2008
.ConnectionString(connectionString)
.AdoNetBatchSize(1000)
.Cache(x => x
.UseQueryCache()
.ProviderClass<HashtableCacheProvider>())
.ShowSql();
These are the models:
public class TestClass1
{
public virtual int Id { get; private set; }
}
public class TestClass2
{
public virtual int Id { get; private set; }
}
These are the test methods. Test1() takes 62 seconds, Test2() takes only 11 seconds. (as Phill advised, I tried stateless sessions, but unfortunately same issue):
[TestMethod]
public void Test1()
{
int count = 50 * 1000;
using (var session = SessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
for (int i = 0; i < count; i++)
{
var x = new TestClass1();
var y = new TestClass2();
session.Save(x);
session.Save(y);
}
transaction.Commit();
}
}
}
[TestMethod]
public void Test2()
{
int count = 50 * 1000;
using (var session = SessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
for (int i = 0; i < count; i++)
{
var x = new TestClass1();
session.Save(x);
}
transaction.Commit();
}
}
using (var session = SessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
for (int i = 0; i < count; i++)
{
var y = new TestClass2();
session.Save(y);
}
transaction.Commit();
}
}
}
Any ideas?
Thanks!
Update:
The test project can be downloaded from here. You need to change the connectionString in the Main method. I changed all sessions to stateless sessions.
My restuls: Test1 = 59.11, Test2 = 7.60, Test3 = 7.72. Test1 is 7.7 times slower than Test2 & Test3!

Do not use increment. It's the worst possible generator.
Try changing it to HiLo.
Update:
It looks like the problem occurs when alternating saves of different entities, regardless of whether the session/transaction are separated or not.
This produces similar results to the second test method:
[TestMethod]
public void Test3()
{
int count = 50 * 1000;
using (var session = SessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
for (int i = 0; i < count; i++)
{
var x = new TestClass1();
session.Save(x);
}
for (int i = 0; i < count; i++)
{
var y = new TestClass2();
session.Save(y);
}
transaction.Commit();
}
}
}
My guess, without looking at NH's sources, is that it preserves the order because of possible relationships between the entities, even when there are none.

When you run test2 and test3, the insert's are batched together.
When you run test1, where you alternate the inserts, the inserts are issued as separate statements and are not batched together.
I found this out by profiling all three tests.
So as per Diego's answer, it must preserve the order that you're inserting, and batch them together.
I wrote a 4th test, I set the batch size to 10, then alternated when i changed from TestClass1 to TestClass2 so that I was doing 5 of TestClass1 and then 5 of TestClass2, to hit the batch size.
This pushed out batch's of 5 in the order they were processed.
public void Test4()
{
int count = 10;
using (var session = SessionFactory.OpenSession())
using (var transaction = session.BeginTransaction())
{
for (int i = 0; i < count; i++)
{
if (i%2 == 0)
{
for (int j = 0; j < 5; j++)
{
var x = new TestClass1();
session.Save(x);
}
}
else
{
for (int j = 0; j < 5; j++)
{
var y = new TestClass2();
session.Save(y);
}
}
}
transaction.Commit();
}
}
Then I changed it to insert 3 at a time instead of 5. The batch's were in multiples of 3, so what must be happening is the batch size allows a batch of 1 type to go to specified amount, but groups only the same type together. While alternating causes separate insert statements.

Related

Rate limiting with redis

I'm using elasticache redis for rate limit and use Redisson as the client, the related code is:
public CompletableFuture<List<Long>> incrementSingleKeys(
List<String> keys, List<Long> increments, List<Long> ttls) {
RBatch batch = redissonClient.createBatch(BatchOptions.defaults());
for (int i = 0; i < keys.size(); i++) {
batch.getAtomicLong(keys.get(i)).addAndGetAsync(increments.get(i));
}
return batch
.executeAsync()
.thenCompose(
(counters) -> {
List<String> keysToSet = Lists.newArrayList();
List<Long> TTLsToSet = Lists.newArrayList();
for (int i = 0; i < counters.size(); i++) {
if (counters.get(i) == increments.get(i)) { // only set ttl for new keys
keysToSet.add(keys.get(i));
TTLsToSet.add(ttls.get(i));
}
}
if (!keysToSet.isEmpty()) { // Call setTTLS
return setTTLs(keysToSet, TTLsToSet)
.thenApply(
(r) -> counters
);
} else {
return CompletableFuture.completedFuture(counters);
}
});
}
public CompletableFuture<List<Boolean>> setTTLs(List<String> keys, List<Long> TTLs) {
CompletableFuture<List<Boolean>> future = new CompletableFuture<>();
Stopwatch timer = Stopwatch.createStarted();
RBatch batch = redissonClient.createBatch(BatchOptions.defaults());
for (int i = 0; i < keys.size(); i++) {
batch.getBucket(keys.get(i)).expireAsync(TTLs.get(i), TimeUnit.MILLISECONDS);
}
batch
.executeAsync()
.whenComplete(
(list, ex) -> {
if (ex != null) {
future.complete(
Collections.nCopies(size, false); // fail open
)
} else {
future.complete(
list.stream()
.map(entry -> (entry instanceof Boolean ? (Boolean) entry : false))
.collect(Collectors.toList()),
);
}
});
return future;
}
Basically, I'll set ttl only for new keys. The issue is that sometimes the increment batch call is successful but setTTL call timeout, which results in a permanent key and could lead to incorrect rate limiting. One workaround is to always get and set ttl whenever increment happens, but this would affect the performance. Is there any other solution?

How to get TransactionScope to work on large sql transactions?

I have an app that needs to write to a table on a sql database. The ID's being used in the IN statement of the sql string needs to be chunked up because there could potentially be 300k ID's in the IN and this overflows the max string length in the web config. I get this error when i try to test the scenario with 200+k ID's but not with say around 50K ID's:
The transaction associated with the current connection has completed but has not been disposed. The transaction must be disposed before the connection can be used to execute SQL statements.
Heres is the code:
public int? MatchReconDetail(int ReconRuleNameID, List<int> participants, int? matchID, string userID, int FiscalPeriodID)
{
DataContext context = new DataContext();
context.CommandTimeout = 600;
using (TransactionScope transaction = new TransactionScope())
{
Match dbMatch = new Match() { AppUserIDMatchedBy = userID, FiscalPeriodID = FiscalPeriodID, DateCreated = DateTime.Now };
context.Matches.InsertOnSubmit(dbMatch);
context.SubmitChanges();
//string ids = string.Concat(participants.Select(rid => string.Format("{0}, ", rid))).TrimEnd(new char[] { ' ', ',' });
int listCount = participants.Count;
int listChunk = listCount / 1000;
int count = 0;
int countLimit = 1000;
for (int x = 0; x <= listChunk; x++)
{
count = 1000 * x;
countLimit = 1000 * (x + 1);
List<string> chunkList = new List<string>();
for (int i = count; i < countLimit; i++)
{
if (listCount - count < 1000 && listCount - count != 0)
{
int remainder = listCount - count;
for (int r = 0; r < remainder; r++)
{
chunkList.Add(participants[count].ToString());
count++;
}
}
else if (listCount - count >= 1000)
{
chunkList.Add(participants[i].ToString());
}
}
string ids = string.Concat(chunkList.Select(rid => string.Format("{0}, ", rid))).TrimEnd(new char[] { ' ', ',' });
string sqlMatch = string.Format("UPDATE [Cars3]..[ReconDetail] SET [MatchID] = {0} WHERE [ID] IN ({1})", dbMatch.ID, ids);
context.ExecuteCommand(sqlMatch);
}
matchID = dbMatch.ID;
context.udpUpdateSummaryCache(FiscalPeriodID, ReconRuleNameID, false);
transaction.Complete();
}
return matchID;
}
}
I have read some articles suggesting timeout issues so I add the context.CommandTimeout at the top of this function. The error seems to occur after about a minute after the event fires.
Any thoughts will be greatly appreciated.
You need to set the timeout on transaction too. Not only on command.
You might also have to modify the maximum transaction timeout. The default is 10 minutes, it can be changed in machine.config file.
Read this blog post for more details.
I solved this by using:
using (TransactionScope transaction = new TransactionScope(TransactionScopeOption.Required, new TimeSpan(0,30,0)))

Roslyn - replace node and fix the whitespaces

In my program I use Roslyn and I need to replace a node with a new node. For example, if I have code like
public void Foo()
{
for(var i = 0; i < 5; i++)
Console.WriteLine("");
}
and I want to insert brackes for for statement, I get
public void Foo()
{
for(var i = 0; i < 5; i++)
{
Console.WriteLine("");
}
}
I tried to use NormalizeWhitespace, but if I use it on for statement, I get
public void Foo()
{
for(var i = 0; i < 5; i++)
{
Console.WriteLine("");
}
}
However, I'd like to have for statement formatted correctly. Any hints how to do it?
EDIT:
I solved it by using:
var blockSyntax = SyntaxFactory.Block(
SyntaxFactory.Token(SyntaxKind.OpenBraceToken).WithLeadingTrivia(forStatementSyntax.GetLeadingTrivia()).WithTrailingTrivia(forStatementSyntax.GetTrailingTrivia()),
syntaxNodes,
SyntaxFactory.Token(SyntaxKind.CloseBraceToken).WithLeadingTrivia(forStatementSyntax.GetLeadingTrivia()).WithTrailingTrivia(forStatementSyntax.GetTrailingTrivia())
);
However, the answer from Sam is also correct.
You need to use .WithAdditionalAnnotations(Formatter.Annotation), but only on the specific element you want to format. Here's an example from the NullParameterCheckRefactoring project.
IfStatementSyntax nullCheckIfStatement = SyntaxFactory.IfStatement(
SyntaxFactory.Token(SyntaxKind.IfKeyword),
SyntaxFactory.Token(SyntaxKind.OpenParenToken),
binaryExpression,
SyntaxFactory.Token(SyntaxKind.CloseParenToken),
syntaxBlock, null).WithAdditionalAnnotations(Formatter.Annotation, Simplifier.Annotation);

How can I do this all in one query? Nhibernate

I have a list of Ids and I want to get all the rows back in one query. As a list of objects(So a List of Products or whatever).
I tried
public List<TableA> MyMethod(List<string> keys)
{
var query = "SELECT * FROM TableA WHERE Keys IN (:keys)";
var a = session.CreateQuery(query).SetParameter("keys", keys).List();
return a; // a is a IList but not of TableA. So what do I do now?
}
but I can't figure out how to return it as a list of objects. Is this the right way?
List<TableA> result = session.CreateQuery(query)
.SetParameterList("keys", keys)
.List<TableA>();
Howeever there could be a limitation in this query if number of ":keys" exceed more than 1000 (incase of oracle not sure with other dbs) so i would recommend to use ICriteria instead of CreateQuery- native sqls.
Do something like this,
[TestFixture]
public class ThousandIdsNHibernateQuery
{
[Test]
public void TestThousandIdsNHibernateQuery()
{
//Keys contains 1000 ids not included here.
var keys = new List<decimal>();
using (ISession session = new Session())
{
var tableCirt = session.CreateCriteria(typeof(TableA));
if (keys.Count > 1000)
{
var listsList = new List<List<decimal>>();
//Get first 1000.
var first1000List = keys.GetRange(0, 1000);
//Split next keys into 1000 chuncks.
for (int i = 1000; i < keys.Count; i++)
{
if ((i + 1)%1000 == 0)
{
var newList = new List<decimal>();
newList.AddRange(keys.GetRange(i - 999, 1000));
listsList.Add(newList);
}
}
ICriterion firstExp = Expression.In("Key", first1000List);
ICriterion postExp = null;
foreach (var list in listsList)
{
postExp = Expression.In("Key", list);
tableCirt.Add(Expression.Or(firstExp, postExp));
firstExp = postExp;
}
tableCirt.Add(postExp);
}
else
{
tableCirt.Add(Expression.In("key", keys));
}
var results = tableCirt.List<TableA>();
}
}
}

NHibernate does not seems doing Bulk Inserting into PostgreSQL

I am interfacing with a PostgreSQL database with NHibernate.
Background
I made some simple tests...it seems it's taking 2 seconds to persist 300 records.
I have a Perl program with identical functionality, but issue direct SQL instead, takes only 70% of the time.
I am not sure if this is expected. I thought C#/NHibernate would be faster or at least on par.
Questions
One of my observation is that (with show_sql turned on), the NHibernate is issuing INSERTs a few hundreds times, instead of doing bulk INSERT that take cares of multiple rows. And note I am assigning the primary key myself, not using the "native" generator.
Is that expected? Is there anyway I could make it issue bulk INSERT statement instead? It seems to me that this could be one of the area I could speed up the performance.
As stachu found out correctly: NHibernate does not have *BatchingBatcher(Factory) for PostgreSQL(Npgsql)
As stachu askes: Did anybody managed to force Nhibarnate to do batch inserts to PostgreSQL
I wrote a Batcher that doesn't use any Npgsql batching stuff, but does manipulate the SQL String "oldschool style" (INSERT INTO [..] VALUES (...),(...), ...)
using System;
using System.Collections;
using System.Data;
using System.Diagnostics;
using System.Text;
using Npgsql;
namespace NHibernate.AdoNet
{
public class PostgresClientBatchingBatcherFactory : IBatcherFactory
{
public virtual IBatcher CreateBatcher(ConnectionManager connectionManager, IInterceptor interceptor)
{
return new PostgresClientBatchingBatcher(connectionManager, interceptor);
}
}
/// <summary>
/// Summary description for PostgresClientBatchingBatcher.
/// </summary>
public class PostgresClientBatchingBatcher : AbstractBatcher
{
private int batchSize;
private int countOfCommands = 0;
private int totalExpectedRowsAffected;
private StringBuilder sbBatchCommand;
private int m_ParameterCounter;
private IDbCommand currentBatch;
public PostgresClientBatchingBatcher(ConnectionManager connectionManager, IInterceptor interceptor)
: base(connectionManager, interceptor)
{
batchSize = Factory.Settings.AdoBatchSize;
}
private string NextParam()
{
return ":p" + m_ParameterCounter++;
}
public override void AddToBatch(IExpectation expectation)
{
if(expectation.CanBeBatched && !(CurrentCommand.CommandText.StartsWith("INSERT INTO") && CurrentCommand.CommandText.Contains("VALUES")))
{
//NonBatching behavior
IDbCommand cmd = CurrentCommand;
LogCommand(CurrentCommand);
int rowCount = ExecuteNonQuery(cmd);
expectation.VerifyOutcomeNonBatched(rowCount, cmd);
currentBatch = null;
return;
}
totalExpectedRowsAffected += expectation.ExpectedRowCount;
log.Info("Adding to batch");
int len = CurrentCommand.CommandText.Length;
int idx = CurrentCommand.CommandText.IndexOf("VALUES");
int endidx = idx + "VALUES".Length + 2;
if (currentBatch == null)
{
// begin new batch.
currentBatch = new NpgsqlCommand();
sbBatchCommand = new StringBuilder();
m_ParameterCounter = 0;
string preCommand = CurrentCommand.CommandText.Substring(0, endidx);
sbBatchCommand.Append(preCommand);
}
else
{
//only append Values
sbBatchCommand.Append(", (");
}
//append values from CurrentCommand to sbBatchCommand
string values = CurrentCommand.CommandText.Substring(endidx, len - endidx - 1);
//get all values
string[] split = values.Split(',');
ArrayList paramName = new ArrayList(split.Length);
for (int i = 0; i < split.Length; i++ )
{
if (i != 0)
sbBatchCommand.Append(", ");
string param = null;
if (split[i].StartsWith(":")) //first named parameter
{
param = NextParam();
paramName.Add(param);
}
else if(split[i].StartsWith(" :")) //other named parameter
{
param = NextParam();
paramName.Add(param);
}
else if (split[i].StartsWith(" ")) //other fix parameter
{
param = split[i].Substring(1, split[i].Length-1);
}
else
{
param = split[i]; //first fix parameter
}
sbBatchCommand.Append(param);
}
sbBatchCommand.Append(")");
//rename & copy parameters from CurrentCommand to currentBatch
int iParam = 0;
foreach (NpgsqlParameter param in CurrentCommand.Parameters)
{
param.ParameterName = (string)paramName[iParam++];
NpgsqlParameter newParam = /*Clone()*/new NpgsqlParameter(param.ParameterName, param.NpgsqlDbType, param.Size, param.SourceColumn, param.Direction, param.IsNullable, param.Precision, param.Scale, param.SourceVersion, param.Value);
currentBatch.Parameters.Add(newParam);
}
countOfCommands++;
//check for flush
if (countOfCommands >= batchSize)
{
DoExecuteBatch(currentBatch);
}
}
protected override void DoExecuteBatch(IDbCommand ps)
{
if (currentBatch != null)
{
//Batch command now needs its terminator
sbBatchCommand.Append(";");
countOfCommands = 0;
log.Info("Executing batch");
CheckReaders();
//set prepared batchCommandText
string commandText = sbBatchCommand.ToString();
currentBatch.CommandText = commandText;
LogCommand(currentBatch);
Prepare(currentBatch);
int rowsAffected = 0;
try
{
rowsAffected = currentBatch.ExecuteNonQuery();
}
catch (Exception e)
{
if(Debugger.IsAttached)
Debugger.Break();
throw;
}
Expectations.VerifyOutcomeBatched(totalExpectedRowsAffected, rowsAffected);
totalExpectedRowsAffected = 0;
currentBatch = null;
sbBatchCommand = null;
m_ParameterCounter = 0;
}
}
protected override int CountOfStatementsInCurrentBatch
{
get { return countOfCommands; }
}
public override int BatchSize
{
get { return batchSize; }
set { batchSize = value; }
}
}
}
I also found that NHibernate is not doing batch inserts into PostgreSQL.
I identified two possible reasons:
1) Npgsql driver does not support batch inserts/updates (see forum)
2) NHibernate does not have *BatchingBatcher(Factory) for PostgreSQL(Npgsql). I tried using Devart dotConnect driver with NHibernate (I wrote custom driver for NHibernate) but it still did not worked.
I suppose this driver should also implement IEmbeddedBatcherFactoryProvider interface, but it seems not trivial for me (using one for Oracle did not worked ;) )
Did anybody managed to force Nhibarnate to do batch inserts to PostgreSQL or can confirm my conclusion?