Is it possible to insert only 100 records at a time per transactionScope?
I would like to do things that way to avoid timeouts on my application.
using(var scope = new TransactionScope())
{
foreach ( object x in list)
{
// doing insertOnSubmit here
}
context.SubmitChanges();
scope.Complete();
}
So I would like to insert only 100 or even 50 rows inside that transaction just to avoid timeouts.
Something like this?
TransactionScope scope;
bool committed = false;
int i = 0;
const int batchSize = 100; // or even 50
try
{
foreach ( object x in list)
{
if (i % batchSize == 0)
{
if (scope != null)
{
scope.Commit();
scope.Dispose();
context.SubmitChanges();
}
scope = new TransactionScope();
}
// doing insertOnSubmit here
++i;
}
if (scope != null)
{
scope.Commit();
context.SubmitChanges();
}
}
finally
{
if (scope != null)
{
scope.Dispose();
}
}
Sure. Just start a new transaction every 50-100 records. What is your exact problem?
Related
Batch query execution from what I have read from multiple sources online have been stating that it enables grouping multiple statements together and executing it at once, thereby eliminating multiple back and forth communication.
Some sources that claim this are:
https://www.tutorialspoint.com/jdbc/jdbc-batch-processing.htm#:~:text=Batch%20Processing%20allows%20you%20to,communication%20overhead%2C%20thereby%20improving%20performance.
http://tutorials.jenkov.com/jdbc/batchupdate.html
https://www.baeldung.com/jdbc-batch-processing
All of these talk about single network trip etc, however going through the source code of H2 or Sqlite, it does look like its executing one by one. Albeit with autocommit disabled.
Eg: Sqlite
final synchronized int[] executeBatch(long stmt, int count, Object[] vals, boolean autoCommit) throws SQLException {
if (count < 1) {
throw new SQLException("count (" + count + ") < 1");
}
final int params = bind_parameter_count(stmt);
int rc;
int[] changes = new int[count];
try {
for (int i = 0; i < count; i++) {
reset(stmt);
for (int j = 0; j < params; j++) {
rc = sqlbind(stmt, j, vals[(i * params) + j]);
if (rc != SQLITE_OK) {
throwex(rc);
}
}
rc = step(stmt);
if (rc != SQLITE_DONE) {
reset(stmt);
if (rc == SQLITE_ROW) {
throw new BatchUpdateException("batch entry " + i + ": query returns results", changes);
}
throwex(rc);
}
changes[i] = changes();
}
}
finally {
ensureAutoCommit(autoCommit);
}
reset(stmt);
return changes;
}
Eg: H2
public int[] executeBatch() throws SQLException {
try {
debugCodeCall("executeBatch");
if (batchParameters == null) {
// Empty batch is allowed, see JDK-4639504 and other issues
batchParameters = Utils.newSmallArrayList();
}
batchIdentities = new MergedResult();
int size = batchParameters.size();
int[] result = new int[size];
SQLException first = null;
SQLException last = null;
checkClosedForWrite();
for (int i = 0; i < size; i++) {
Value[] set = batchParameters.get(i);
ArrayList<? extends ParameterInterface> parameters =
command.getParameters();
for (int j = 0; j < set.length; j++) {
Value value = set[j];
ParameterInterface param = parameters.get(j);
param.setValue(value, false);
}
try {
result[i] = executeUpdateInternal();
// Cannot use own implementation, it returns batch identities
ResultSet rs = super.getGeneratedKeys();
batchIdentities.add(((JdbcResultSet) rs).result);
} catch (Exception re) {
SQLException e = logAndConvert(re);
if (last == null) {
first = last = e;
} else {
last.setNextException(e);
}
result[i] = Statement.EXECUTE_FAILED;
}
}
batchParameters = null;
if (first != null) {
throw new JdbcBatchUpdateException(first, result);
}
return result;
} catch (Exception e) {
throw logAndConvert(e);
}
}
From the above code I see that there are multiple calls to the database, with each having its own result set. How does batch execution actually work?
I'm using elasticache redis for rate limit and use Redisson as the client, the related code is:
public CompletableFuture<List<Long>> incrementSingleKeys(
List<String> keys, List<Long> increments, List<Long> ttls) {
RBatch batch = redissonClient.createBatch(BatchOptions.defaults());
for (int i = 0; i < keys.size(); i++) {
batch.getAtomicLong(keys.get(i)).addAndGetAsync(increments.get(i));
}
return batch
.executeAsync()
.thenCompose(
(counters) -> {
List<String> keysToSet = Lists.newArrayList();
List<Long> TTLsToSet = Lists.newArrayList();
for (int i = 0; i < counters.size(); i++) {
if (counters.get(i) == increments.get(i)) { // only set ttl for new keys
keysToSet.add(keys.get(i));
TTLsToSet.add(ttls.get(i));
}
}
if (!keysToSet.isEmpty()) { // Call setTTLS
return setTTLs(keysToSet, TTLsToSet)
.thenApply(
(r) -> counters
);
} else {
return CompletableFuture.completedFuture(counters);
}
});
}
public CompletableFuture<List<Boolean>> setTTLs(List<String> keys, List<Long> TTLs) {
CompletableFuture<List<Boolean>> future = new CompletableFuture<>();
Stopwatch timer = Stopwatch.createStarted();
RBatch batch = redissonClient.createBatch(BatchOptions.defaults());
for (int i = 0; i < keys.size(); i++) {
batch.getBucket(keys.get(i)).expireAsync(TTLs.get(i), TimeUnit.MILLISECONDS);
}
batch
.executeAsync()
.whenComplete(
(list, ex) -> {
if (ex != null) {
future.complete(
Collections.nCopies(size, false); // fail open
)
} else {
future.complete(
list.stream()
.map(entry -> (entry instanceof Boolean ? (Boolean) entry : false))
.collect(Collectors.toList()),
);
}
});
return future;
}
Basically, I'll set ttl only for new keys. The issue is that sometimes the increment batch call is successful but setTTL call timeout, which results in a permanent key and could lead to incorrect rate limiting. One workaround is to always get and set ttl whenever increment happens, but this would affect the performance. Is there any other solution?
Is hibernatetemplate's bulkUpdate actually doing a bulkUpdate? I looked at the code, and it doesn't seem to be doing bulkUpdate. Or maybe am I missing something?
public int bulkUpdate(final String queryString, final Object... values) throws DataAccessException {
return executeWithNativeSession(new HibernateCallback<Integer>() {
public Integer doInHibernate(Session session) throws HibernateException {
Query queryObject = session.createQuery(queryString);
prepareQuery(queryObject);
if (values != null) {
for (int i = 0; i < values.length; i++) {
queryObject.setParameter(i, values[i]);
}
}
return queryObject.executeUpdate();
}
});
}
whereas JdbcTemplate batchUpdate (looks like) is doing a batchUpdate
public int[] batchUpdate(final String[] sql) throws DataAccessException {
Assert.notEmpty(sql, "SQL array must not be empty");
if (logger.isDebugEnabled()) {
logger.debug("Executing SQL batch update of " + sql.length + " statements");
}
class BatchUpdateStatementCallback implements StatementCallback<int[]>, SqlProvider {
private String currSql;
public int[] doInStatement(Statement stmt) throws SQLException, DataAccessException {
int[] rowsAffected = new int[sql.length];
if (JdbcUtils.supportsBatchUpdates(stmt.getConnection())) {
for (String sqlStmt : sql) {
this.currSql = sqlStmt;
stmt.addBatch(sqlStmt);
}
rowsAffected = stmt.executeBatch();
}
else {
for (int i = 0; i < sql.length; i++) {
this.currSql = sql[i];
if (!stmt.execute(sql[i])) {
rowsAffected[i] = stmt.getUpdateCount();
}
else {
throw new InvalidDataAccessApiUsageException("Invalid batch SQL statement: " + sql[i]);
}
}
}
return rowsAffected;
}
public String getSql() {
return this.currSql;
}
}
return execute(new BatchUpdateStatementCallback());
}
Yes, it is doing bulk update. As you see, DELETE and INSERT queries executed in bulkUpdate method can affect multiple rows. That's why they are called bulk operations.
Point is to have handy method to execute update and execute query and return number of rows affected in bulk operation. Additionally it wraps exceptions to DataAccessException.
I have a list of Ids and I want to get all the rows back in one query. As a list of objects(So a List of Products or whatever).
I tried
public List<TableA> MyMethod(List<string> keys)
{
var query = "SELECT * FROM TableA WHERE Keys IN (:keys)";
var a = session.CreateQuery(query).SetParameter("keys", keys).List();
return a; // a is a IList but not of TableA. So what do I do now?
}
but I can't figure out how to return it as a list of objects. Is this the right way?
List<TableA> result = session.CreateQuery(query)
.SetParameterList("keys", keys)
.List<TableA>();
Howeever there could be a limitation in this query if number of ":keys" exceed more than 1000 (incase of oracle not sure with other dbs) so i would recommend to use ICriteria instead of CreateQuery- native sqls.
Do something like this,
[TestFixture]
public class ThousandIdsNHibernateQuery
{
[Test]
public void TestThousandIdsNHibernateQuery()
{
//Keys contains 1000 ids not included here.
var keys = new List<decimal>();
using (ISession session = new Session())
{
var tableCirt = session.CreateCriteria(typeof(TableA));
if (keys.Count > 1000)
{
var listsList = new List<List<decimal>>();
//Get first 1000.
var first1000List = keys.GetRange(0, 1000);
//Split next keys into 1000 chuncks.
for (int i = 1000; i < keys.Count; i++)
{
if ((i + 1)%1000 == 0)
{
var newList = new List<decimal>();
newList.AddRange(keys.GetRange(i - 999, 1000));
listsList.Add(newList);
}
}
ICriterion firstExp = Expression.In("Key", first1000List);
ICriterion postExp = null;
foreach (var list in listsList)
{
postExp = Expression.In("Key", list);
tableCirt.Add(Expression.Or(firstExp, postExp));
firstExp = postExp;
}
tableCirt.Add(postExp);
}
else
{
tableCirt.Add(Expression.In("key", keys));
}
var results = tableCirt.List<TableA>();
}
}
}
newbie here, sorry if this is an obvious question.
It seems saving different types of objects in the same session breaks batching, cause significant performance drop.
ID generator is set to Increment (as Diego Mijelshon advised, I tried hilo("100"), but unfortunately same issue, Test1() is still about 5 times slower than Test2()):
public class CustomIdConvention : IIdConvention
{
public void Apply(IIdentityInstance instance)
{
instance.GeneratedBy.Increment();
}
}
AdoNetBatchSize is set to 1000:
MsSqlConfiguration.MsSql2008
.ConnectionString(connectionString)
.AdoNetBatchSize(1000)
.Cache(x => x
.UseQueryCache()
.ProviderClass<HashtableCacheProvider>())
.ShowSql();
These are the models:
public class TestClass1
{
public virtual int Id { get; private set; }
}
public class TestClass2
{
public virtual int Id { get; private set; }
}
These are the test methods. Test1() takes 62 seconds, Test2() takes only 11 seconds. (as Phill advised, I tried stateless sessions, but unfortunately same issue):
[TestMethod]
public void Test1()
{
int count = 50 * 1000;
using (var session = SessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
for (int i = 0; i < count; i++)
{
var x = new TestClass1();
var y = new TestClass2();
session.Save(x);
session.Save(y);
}
transaction.Commit();
}
}
}
[TestMethod]
public void Test2()
{
int count = 50 * 1000;
using (var session = SessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
for (int i = 0; i < count; i++)
{
var x = new TestClass1();
session.Save(x);
}
transaction.Commit();
}
}
using (var session = SessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
for (int i = 0; i < count; i++)
{
var y = new TestClass2();
session.Save(y);
}
transaction.Commit();
}
}
}
Any ideas?
Thanks!
Update:
The test project can be downloaded from here. You need to change the connectionString in the Main method. I changed all sessions to stateless sessions.
My restuls: Test1 = 59.11, Test2 = 7.60, Test3 = 7.72. Test1 is 7.7 times slower than Test2 & Test3!
Do not use increment. It's the worst possible generator.
Try changing it to HiLo.
Update:
It looks like the problem occurs when alternating saves of different entities, regardless of whether the session/transaction are separated or not.
This produces similar results to the second test method:
[TestMethod]
public void Test3()
{
int count = 50 * 1000;
using (var session = SessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
for (int i = 0; i < count; i++)
{
var x = new TestClass1();
session.Save(x);
}
for (int i = 0; i < count; i++)
{
var y = new TestClass2();
session.Save(y);
}
transaction.Commit();
}
}
}
My guess, without looking at NH's sources, is that it preserves the order because of possible relationships between the entities, even when there are none.
When you run test2 and test3, the insert's are batched together.
When you run test1, where you alternate the inserts, the inserts are issued as separate statements and are not batched together.
I found this out by profiling all three tests.
So as per Diego's answer, it must preserve the order that you're inserting, and batch them together.
I wrote a 4th test, I set the batch size to 10, then alternated when i changed from TestClass1 to TestClass2 so that I was doing 5 of TestClass1 and then 5 of TestClass2, to hit the batch size.
This pushed out batch's of 5 in the order they were processed.
public void Test4()
{
int count = 10;
using (var session = SessionFactory.OpenSession())
using (var transaction = session.BeginTransaction())
{
for (int i = 0; i < count; i++)
{
if (i%2 == 0)
{
for (int j = 0; j < 5; j++)
{
var x = new TestClass1();
session.Save(x);
}
}
else
{
for (int j = 0; j < 5; j++)
{
var y = new TestClass2();
session.Save(y);
}
}
}
transaction.Commit();
}
}
Then I changed it to insert 3 at a time instead of 5. The batch's were in multiples of 3, so what must be happening is the batch size allows a batch of 1 type to go to specified amount, but groups only the same type together. While alternating causes separate insert statements.