Java executor service How to return total success and total failed record at once - process

I am inserting records in database using executorserive. Lets say saving 100 records by creating 5 batches each containing 20 records.
ExecutorService e = Executors.newFixedThreadPool(3);
Collection c= new ArrayList();
while (someloop)
{
c.add(mySaveMehtod());
}
List<Future<String>> list = e.invokeAll(c);
Iterator<Future<String>> i= list.iterator();
Future<String> f=null;
while(i.hasNext())
{
f= itr.next();
}
Strin str = f.get();
While processing there might be error for some records and some records will process successfully.
Once process finishes I want to collect total successfully processed and total failed record at once.
Can anybody let me know how I can achieve this ?
Thanks.

Assuming that you know if an INSERT of a record was successful immediately after executing the SQL, you can simply use two AtomicIntegers. You declare them and set to 0 before running batch insert jobs, and increment them in these jobs. Operations on AtomicIntegers are thread safe, so you don't need to worry about synchronization. For example:
public static void main() {
AtomicInteger nSuccess = new AtomicInteger(0);
AtomicInteger nFailed = new AtomicInteger(0);
// add batch insert jobs
// wait for jobs to finish
System.out.println("succeeded: " + nSuccess.get() + " failed: " + nFailed.get());
}
class BatchInserter implements Runnable {
public void run() {
for (int i = 0; i < 20; i++) {
if (insertRecord(i)) {
nSuccess.getAndIncrement();
} else {
nFailed.getAndIncrement();
}
}
}
}

Related

Some confusion about safepoint and -XX:GuaranteedSafepointInterval

My env:
JDK : temurin-1.8.0_332
System: macOS big sur
VM: Hot Spot
GuaranteedSafepointInterval = 1000ms (default)
Q1: The vm param GuaranteedSafepointInterval = 1000ms is really accurately?
public class Test {
public static AtomicInteger count = new AtomicInteger(0);
public static void main(String[] args) throws InterruptedException {
Runnable task = () -> {
for (int i = 0;i < 1000000000;i++){
count.getAndAdd(1);
}
System.out.println(Thread.currentThread().getName() + " done" );
};
Thread t1 = new Thread(task);
Thread t2 = new Thread(task);
t1.start();
t2.start();
Thread.sleep(1000);
System.out.println("count result = " + count);
}
}
As expect,because of the safepoint, the result is
count result = 2000000000
Thread-1 done
Thread-0 done
But ,if change the main thread sleep time to 800ms...(more you can use <800ms)
result
So, the default value GuaranteedSafepointInterval = 1000ms is really accurately?
Q2: Sometimes,if i still set sleep time = 1000ms,but change the loop content count.getAndAdd(1) to count.getAndAdd(i) ,the "count result" will print after 1s...... as i know,the hot spot vm(jdk=1.8) does some optimizations for 'Counted Loop' when sets savepoint ,(according to the Q1 scence),why it doesnt effective?
Any Can Help Me? More Thanks...

With android Room, how to delete items more than 1000?

got crash when the ids is > 999
android.database.sqlite.SQLiteException: too many SQL variables (code 1): ,
while compiling: delete from data where ids in (?,?,...)
saw this seems there is max limit to 999.
how to delete more than 1000 with Room?
Probably you have a list of ids to delete. Open a transaction, split the list in sublist and execute the SQL delete operation once for sublist.
For more information about Room the official documentation about Transactions with Room.
I didn't test the following code, but I think that it accomplishes your need.
#Dao
public interface DataDao {
#Delete("delete from data where ids in :filterValues")
long delete(List<String> filterValues)
#Transaction
public void deleteData(List<Data> dataToDelete) {
// split the array in list of 100 elements (change as you prefer but < 999)
List<List<Data>> subLists=DeleteHelper.chopped(dataToDelete, 100);
List<String> ids=new ArrayList<>();
for (List<Data> list: subList) {
list.clear();
for (Data item: list) {
ids.add(item.getId());
}
delete(ids);
}
}
}
public abstract class DeleteHelper {
// chops a list into non-view sublists of length L
public static <T> List<List<T>> chopped(List<T> list, final int L) {
List<List<T>> parts = new ArrayList<List<T>>();
final int N = list.size();
for (int i = 0; i < N; i += L) {
parts.add(new ArrayList<T>(
list.subList(i, Math.min(N, i + L)))
);
}
return parts;
}
}
I hope this help.
I think there are two ways to solve it.
First, chop chop your list and runs multiple times with delete method. (just like #xcesco answered)
Second, you can write very long query and run it with #RawQuery.
#RawQuery
abstract int simpleRawQuery(SupportSQLiteQuery sqliteQuery)
#Transaction
public int deleteData(List<Long> pkList) {
SimpleSQLiteQuery query = new SimpleSQLiteQuery("DELETE FROM tb WHERE _id IN (" + StringUtils.join(pkList,",") + ")";
return simpleRawQuery(query)
}

What happens if corePoolSize of ThreadPoolExecutor is 0

I'm reading Efficient Android Threading.
It says,
With zero-core threads and a bounded queue that can hold 10 tasks, no tasks actually run until the 11th task is inserted, triggering the creation of a thread.
But when I try code such as,
int N = Runtime.getRuntime().availableProcessors();
ThreadPoolExecutor executor = new ThreadPoolExecutor(
0,
N*2,
60L, TimeUnit.SECONDS,
new ArrayBlockingQueue<Runnable>(10));
for(int i = 1 ; i <= 5 ; ++i) {
final int j = i;
executor.execute(new Runnable() {
#Override
public void run() {
Log.d("Debug", "Executed : " + j);
SystemClock.sleep(1000);
}
});
Log.d("Debug", "Queued : " + i);
}
The tasks are executed correctly even though there are only 5 tasks in the queue. What am I missing ?

NHibernate: Saving different types of objects in the same session breaks batching

newbie here, sorry if this is an obvious question.
It seems saving different types of objects in the same session breaks batching, cause significant performance drop.
ID generator is set to Increment (as Diego Mijelshon advised, I tried hilo("100"), but unfortunately same issue, Test1() is still about 5 times slower than Test2()):
public class CustomIdConvention : IIdConvention
{
public void Apply(IIdentityInstance instance)
{
instance.GeneratedBy.Increment();
}
}
AdoNetBatchSize is set to 1000:
MsSqlConfiguration.MsSql2008
.ConnectionString(connectionString)
.AdoNetBatchSize(1000)
.Cache(x => x
.UseQueryCache()
.ProviderClass<HashtableCacheProvider>())
.ShowSql();
These are the models:
public class TestClass1
{
public virtual int Id { get; private set; }
}
public class TestClass2
{
public virtual int Id { get; private set; }
}
These are the test methods. Test1() takes 62 seconds, Test2() takes only 11 seconds. (as Phill advised, I tried stateless sessions, but unfortunately same issue):
[TestMethod]
public void Test1()
{
int count = 50 * 1000;
using (var session = SessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
for (int i = 0; i < count; i++)
{
var x = new TestClass1();
var y = new TestClass2();
session.Save(x);
session.Save(y);
}
transaction.Commit();
}
}
}
[TestMethod]
public void Test2()
{
int count = 50 * 1000;
using (var session = SessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
for (int i = 0; i < count; i++)
{
var x = new TestClass1();
session.Save(x);
}
transaction.Commit();
}
}
using (var session = SessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
for (int i = 0; i < count; i++)
{
var y = new TestClass2();
session.Save(y);
}
transaction.Commit();
}
}
}
Any ideas?
Thanks!
Update:
The test project can be downloaded from here. You need to change the connectionString in the Main method. I changed all sessions to stateless sessions.
My restuls: Test1 = 59.11, Test2 = 7.60, Test3 = 7.72. Test1 is 7.7 times slower than Test2 & Test3!
Do not use increment. It's the worst possible generator.
Try changing it to HiLo.
Update:
It looks like the problem occurs when alternating saves of different entities, regardless of whether the session/transaction are separated or not.
This produces similar results to the second test method:
[TestMethod]
public void Test3()
{
int count = 50 * 1000;
using (var session = SessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
for (int i = 0; i < count; i++)
{
var x = new TestClass1();
session.Save(x);
}
for (int i = 0; i < count; i++)
{
var y = new TestClass2();
session.Save(y);
}
transaction.Commit();
}
}
}
My guess, without looking at NH's sources, is that it preserves the order because of possible relationships between the entities, even when there are none.
When you run test2 and test3, the insert's are batched together.
When you run test1, where you alternate the inserts, the inserts are issued as separate statements and are not batched together.
I found this out by profiling all three tests.
So as per Diego's answer, it must preserve the order that you're inserting, and batch them together.
I wrote a 4th test, I set the batch size to 10, then alternated when i changed from TestClass1 to TestClass2 so that I was doing 5 of TestClass1 and then 5 of TestClass2, to hit the batch size.
This pushed out batch's of 5 in the order they were processed.
public void Test4()
{
int count = 10;
using (var session = SessionFactory.OpenSession())
using (var transaction = session.BeginTransaction())
{
for (int i = 0; i < count; i++)
{
if (i%2 == 0)
{
for (int j = 0; j < 5; j++)
{
var x = new TestClass1();
session.Save(x);
}
}
else
{
for (int j = 0; j < 5; j++)
{
var y = new TestClass2();
session.Save(y);
}
}
}
transaction.Commit();
}
}
Then I changed it to insert 3 at a time instead of 5. The batch's were in multiples of 3, so what must be happening is the batch size allows a batch of 1 type to go to specified amount, but groups only the same type together. While alternating causes separate insert statements.

NHibernate does not seems doing Bulk Inserting into PostgreSQL

I am interfacing with a PostgreSQL database with NHibernate.
Background
I made some simple tests...it seems it's taking 2 seconds to persist 300 records.
I have a Perl program with identical functionality, but issue direct SQL instead, takes only 70% of the time.
I am not sure if this is expected. I thought C#/NHibernate would be faster or at least on par.
Questions
One of my observation is that (with show_sql turned on), the NHibernate is issuing INSERTs a few hundreds times, instead of doing bulk INSERT that take cares of multiple rows. And note I am assigning the primary key myself, not using the "native" generator.
Is that expected? Is there anyway I could make it issue bulk INSERT statement instead? It seems to me that this could be one of the area I could speed up the performance.
As stachu found out correctly: NHibernate does not have *BatchingBatcher(Factory) for PostgreSQL(Npgsql)
As stachu askes: Did anybody managed to force Nhibarnate to do batch inserts to PostgreSQL
I wrote a Batcher that doesn't use any Npgsql batching stuff, but does manipulate the SQL String "oldschool style" (INSERT INTO [..] VALUES (...),(...), ...)
using System;
using System.Collections;
using System.Data;
using System.Diagnostics;
using System.Text;
using Npgsql;
namespace NHibernate.AdoNet
{
public class PostgresClientBatchingBatcherFactory : IBatcherFactory
{
public virtual IBatcher CreateBatcher(ConnectionManager connectionManager, IInterceptor interceptor)
{
return new PostgresClientBatchingBatcher(connectionManager, interceptor);
}
}
/// <summary>
/// Summary description for PostgresClientBatchingBatcher.
/// </summary>
public class PostgresClientBatchingBatcher : AbstractBatcher
{
private int batchSize;
private int countOfCommands = 0;
private int totalExpectedRowsAffected;
private StringBuilder sbBatchCommand;
private int m_ParameterCounter;
private IDbCommand currentBatch;
public PostgresClientBatchingBatcher(ConnectionManager connectionManager, IInterceptor interceptor)
: base(connectionManager, interceptor)
{
batchSize = Factory.Settings.AdoBatchSize;
}
private string NextParam()
{
return ":p" + m_ParameterCounter++;
}
public override void AddToBatch(IExpectation expectation)
{
if(expectation.CanBeBatched && !(CurrentCommand.CommandText.StartsWith("INSERT INTO") && CurrentCommand.CommandText.Contains("VALUES")))
{
//NonBatching behavior
IDbCommand cmd = CurrentCommand;
LogCommand(CurrentCommand);
int rowCount = ExecuteNonQuery(cmd);
expectation.VerifyOutcomeNonBatched(rowCount, cmd);
currentBatch = null;
return;
}
totalExpectedRowsAffected += expectation.ExpectedRowCount;
log.Info("Adding to batch");
int len = CurrentCommand.CommandText.Length;
int idx = CurrentCommand.CommandText.IndexOf("VALUES");
int endidx = idx + "VALUES".Length + 2;
if (currentBatch == null)
{
// begin new batch.
currentBatch = new NpgsqlCommand();
sbBatchCommand = new StringBuilder();
m_ParameterCounter = 0;
string preCommand = CurrentCommand.CommandText.Substring(0, endidx);
sbBatchCommand.Append(preCommand);
}
else
{
//only append Values
sbBatchCommand.Append(", (");
}
//append values from CurrentCommand to sbBatchCommand
string values = CurrentCommand.CommandText.Substring(endidx, len - endidx - 1);
//get all values
string[] split = values.Split(',');
ArrayList paramName = new ArrayList(split.Length);
for (int i = 0; i < split.Length; i++ )
{
if (i != 0)
sbBatchCommand.Append(", ");
string param = null;
if (split[i].StartsWith(":")) //first named parameter
{
param = NextParam();
paramName.Add(param);
}
else if(split[i].StartsWith(" :")) //other named parameter
{
param = NextParam();
paramName.Add(param);
}
else if (split[i].StartsWith(" ")) //other fix parameter
{
param = split[i].Substring(1, split[i].Length-1);
}
else
{
param = split[i]; //first fix parameter
}
sbBatchCommand.Append(param);
}
sbBatchCommand.Append(")");
//rename & copy parameters from CurrentCommand to currentBatch
int iParam = 0;
foreach (NpgsqlParameter param in CurrentCommand.Parameters)
{
param.ParameterName = (string)paramName[iParam++];
NpgsqlParameter newParam = /*Clone()*/new NpgsqlParameter(param.ParameterName, param.NpgsqlDbType, param.Size, param.SourceColumn, param.Direction, param.IsNullable, param.Precision, param.Scale, param.SourceVersion, param.Value);
currentBatch.Parameters.Add(newParam);
}
countOfCommands++;
//check for flush
if (countOfCommands >= batchSize)
{
DoExecuteBatch(currentBatch);
}
}
protected override void DoExecuteBatch(IDbCommand ps)
{
if (currentBatch != null)
{
//Batch command now needs its terminator
sbBatchCommand.Append(";");
countOfCommands = 0;
log.Info("Executing batch");
CheckReaders();
//set prepared batchCommandText
string commandText = sbBatchCommand.ToString();
currentBatch.CommandText = commandText;
LogCommand(currentBatch);
Prepare(currentBatch);
int rowsAffected = 0;
try
{
rowsAffected = currentBatch.ExecuteNonQuery();
}
catch (Exception e)
{
if(Debugger.IsAttached)
Debugger.Break();
throw;
}
Expectations.VerifyOutcomeBatched(totalExpectedRowsAffected, rowsAffected);
totalExpectedRowsAffected = 0;
currentBatch = null;
sbBatchCommand = null;
m_ParameterCounter = 0;
}
}
protected override int CountOfStatementsInCurrentBatch
{
get { return countOfCommands; }
}
public override int BatchSize
{
get { return batchSize; }
set { batchSize = value; }
}
}
}
I also found that NHibernate is not doing batch inserts into PostgreSQL.
I identified two possible reasons:
1) Npgsql driver does not support batch inserts/updates (see forum)
2) NHibernate does not have *BatchingBatcher(Factory) for PostgreSQL(Npgsql). I tried using Devart dotConnect driver with NHibernate (I wrote custom driver for NHibernate) but it still did not worked.
I suppose this driver should also implement IEmbeddedBatcherFactoryProvider interface, but it seems not trivial for me (using one for Oracle did not worked ;) )
Did anybody managed to force Nhibarnate to do batch inserts to PostgreSQL or can confirm my conclusion?