Tuple concurrently updated when granting permissions - sql

Struggling with database queries - not a db expert by any means, any help would be appreciated.
When dynamically created databases and schemas, once in awhile I get this error:
Unable to apply database grants.
io.vertx.core.impl.NoStackTraceThrowable: Error granting permission.
io.vertx.pgclient.PgException:
ERROR: tuple concurrently updated (XX000)
The role names, database names and schema names are replaced in the query strings in a separate place, i modified the code to pass in the query string directly to the transaction for simplicity.
The permissions being granted are as follows:
private static final String ERR_PERMISSION_GRANT_ERROR_MESSAGE = "Error granting permission. ";
private static final String ADVISORY_LOCK = "SELECT pg_try_advisory_lock("
+ String.valueOf(BigInteger.valueOf(Double.valueOf(Math.random()).longValue())) + ")";
private static final String CREATE_USER = "CREATE ROLE <role-name> LOGIN PASSWORD <pwd>;";
private static final String GRANT_PERMISSION1 = "GRANT CREATE, CONNECT ON DATABASE <db-name> TO <role-name>;";
private static final String GRANT_PERMISSION2 = "GRANT USAGE ON SCHEMA <schema-name> TO <role-name>;";
private static final String GRANT_PERMISSION3 = "GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA <schema-name> TO <role-name>";
private static final String GRANT_PERMISSION5 = "ALTER DEFAULT PRIVILEGES IN SCHEMA <schema-name> GRANT ALL ON SEQUENCES TO <role-name>;";
private static Promise<Boolean> grantDatabase(PgPool pool, String databaseName, String userName, String schemaName,
Vertx vertx) {
Promise<Boolean> promise = Promise.promise();
pool.getConnection()
// Transaction must use a connection
.onSuccess(conn -> {
// Begin the transaction
conn.begin().compose(tx -> conn
// Various statements
.query(updateQueryString(ADVISORY_LOCK, databaseName, userName)).execute()
.compose(
res1 -> conn.query(
updateQueryString(GRANT_PERMISSION1 databaseName, userName))
.execute()
.compose(res2 -> conn.query(
updateQueryString(GRANT_PERMISSION2, schemaName, userName))
.execute()
.compose(res3 -> conn
.query(updateQueryString(
GRANT_PERMISSION3, schemaName, userName))
.execute()
.compose(res4 -> conn
.query(updateQueryString(GRANT_PERMISSION5,
schemaName, userName))
.execute()))))
// Commit the transaction
.compose(res5 -> tx.commit()))
// Return the connection to the pool
.eventually(v -> conn.close()).onSuccess(v -> promise.complete(Boolean.TRUE))
.onFailure(err -> promise
.fail(ERR_PERMISSION_GRANT_ERROR_MESSAGE
});
return promise;
}
How do I fix the tuple concurrently updated error in this case? I only have a single instance of my service running.
PostgreSQL v14.6 (Homebrew)
vertx-pg-client 4.3.8

You've probably already found this and established the error is caused by the non-zero chance of two of your threads trying to run those queries at the same time. They could be also competing with something else - missing commas and parentheses suggest the code you showed is not 1:1 what you're running, plus you could have more grant/revoke/alter elsewhere.
I think your plan to use advisory locks is better than the alternative of establishing a separate "grant queue" or trying to track and lock system tables.
Locking approach
private static final String ADVISORY_LOCK = "SELECT pg_try_advisory_lock("
/* ... */
.query(updateQueryString(ADVISORY_LOCK, databaseName, userName)).execute()
.compose( /* ... */
You might want to change your advisory lock function.
pg_advisory_lock() would make it wait for the lock if it's not available.
pg_try_advisory_lock() instead of making the client wait for lock to become available, returns false. I don't see the code in any way responding to the result of true if it got the lock or false if it didn't, which means that it just tries to acquire the lock and ignores the outcome, continuing regardless.
Both of the above obtain a session-level lock, so it won't be released unless you call pg_advisory_unlock() on the same ID. Lock obtained from pg_advisory_xact_lock() and pg_try_advisory_lock() would be released automatically at commit/rollback.
With a standalone connection, conn.close() should end the session which triggers the db to lift both session- and transaction-level locks it held. With a pool, it could live on after released, still holding the locks unless it happens to get cleaned up by a pool configured to do so.
ID used for locking
Your use of Math.random() seems to always result in a 0 because of narrowing primitive conversion in Double.longValue()
String.valueOf( //BigInt to String
BigInteger.valueOf( //Long to BigInt,
Double.valueOf( //Double to Double
Math.random() //returns between 0.0 and 1.0
).longValue() //Double to Long, basically flooring it to 0
)
)
Which means you're already always re-using a static ID.
But in case you tried to randomise the ID to make each thread use a different, unique lock id, they wouldn't be able to block each other. Threads need to use the same lock ID in reference to the same "action" that could interfere with the other threads if they attempted it at the same time.
private static final String ADVISORY_LOCK = "SELECT pg_try_advisory_lock("
+ String.valueOf(BigInteger.valueOf(Double.valueOf(Math.random()).longValue()))
+ ")";
--Random lock ID generated as 99:
/*1st agent:*/ SELECT pg_try_advisory_lock(99);
--lock acquired, 1st agent proceeds to run its queries
--In parallel, 2nd agent gets a random ID of 77:
/*2nd agent:*/ SELECT pg_try_advisory_lock(77);
--77 wasn't locked, so it immediately proceeds to attempt the same action
--as the 1st agent, disregarding the fact that it can make them compete
--and result in `ERROR: tuple concurrently updated`
Aside from swapping pg_try_advisory_lock() for a pg_advisory_xact_lock() I think replacing that Math.random() with a static, arbitrary number, will be enough:
private static final String ADVISORY_LOCK = "SELECT pg_advisory_xact_lock("
+ "123456789"
+ ")";
--now everyone trying to run those particular queries checks the same ID
/*1st agent:*/ SELECT pg_advisory_xact_lock(123456789);
--noone called dibs on that ID so far, so it's allowed to proceed
--In parallel, 2nd agent enters the same subroutine and asks about the same ID:
/*2nd agent:*/ SELECT pg_advisory_xact_lock(123456789);
--1st agent hasn't released the lock on that ID yet, so 2nd agent waits
If competing parts of your app were initialising their own Random() with the same, shared seed, or re-starting a shared Random(), they'd get the same ID - but that's only trading a predefined, static ID for a predefined seed.
Random, unique lock IDs could be useful to avoid accidental ID re-use for some unrelated action and to free you from having to keep track of what ID was used where. However, those IDs would have to be generated ahead of runtime or during each initialisation.

Related

Querying over large data under NHibernate transaction

I understand that explicit transactions should be used even for reading data but I am unable to understand why the below code runs much slower under an NHibernate transaction (as opposed to running without it)
session.BeginTransaction();
var result = session.Query<Order>().Where(o=>o.OrderNumber > 0).Take(100).ToList();
session.Transaction.Commit();
I can post more detailed UT code if needed but if I am querying over 50,000 Order records, it takes about 1 sec for this query to run under NHibernate's explicit transaction, and it takes only about 15/20 msec without one.
Update 1/15/2019
Here is the detailed code
[Test]
public void TestQueryLargeDataUnderTransaction()
{
int count = 50000;
using (var session = _sessionFactory.OpenSession())
{
Order order;
// write large amount of data
session.BeginTransaction();
for (int i = 0; i < count; i++)
{
order = new Order {OrderNumber = i, OrderDate = DateTime.Today};
OrderLine ol1 = new OrderLine {Amount = 1 + i, ProductName = $"sun screen {i}", Order = order};
OrderLine ol2 = new OrderLine {Amount = 2 + i, ProductName = $"banjo {i}", Order = order};
order.OrderLines = new List<OrderLine> {ol1, ol2};
session.Save(order);
session.Save(ol1);
session.Save(ol2);
}
session.Transaction.Commit();
Stopwatch s = new Stopwatch();
// read the same data
session.BeginTransaction();
var result = session.Query<Order>().Where(o => o.OrderNumber > 0).Skip(0).Take(100).ToList();
session.Transaction.Commit();
s.Stop();
Console.WriteLine(s.ElapsedMilliseconds);
}
}
Your for-loop iterates 50000 times and for each iteration it creates 3 objects. So by the time you reach the first call to Commit(), the session knows about 150000 objects that it will flush to the database at Commit time (or earlier) (subject to your id generator policy and flush mode).
So far, so good. NHibernate is not necessarily optimised to handle so many objects in the session, but it can be acceptable providing one is careful.
On to the problem...
It's important to realize that committing the transaction does not remove the 150000 objects from the session.
When you later perform the query, it will notice that it is inside a transaction, in which case, by default, "auto-flushing" will be performed. This means that before sending the SQL query to the database, NHibernate will check if any of the objects known to the session has changes that might affect the outcome of the query (this is somewhat simplified). If such changes are found, they will be transmitted to the database before performing the actual SQL query. This ensures that the executed query will be able to filter based on changes made in the same session.
The extra second you notice is the time it takes for NHibernate to iterate over the 150000 objects known to the session to check for any changes. The primary use cases for NHibernate rarely involves more than tens or a few hundreds of objects, in which case the time needed to check for changes is negligible.
You can use a new session for the query to not see this effect, or you can call session.Clear() immediately after the first commit. (Note that for production code, session.Clear() can be dangerous.)
Additional: The auto-flushing happens when querying but only if inside a transaction. This behaviour can be controlled using session.FlushMode. During auto-flush NHibernate will aim to flush only objects that may affect the outcome of the query (i.e. which database tables are affected).
There is an additional effect to be aware of with regards to keeping sessions around. Consider this code:
using (var session = _sessionFactory.OpenSession())
{
Order order;
session.BeginTransaction();
for (int i = 0; i < count; i++)
{
// Your code from above.
}
session.Transaction.Commit();
// The order variable references the last order created. Let's modify it.
order.OrderDate = DateTime.Today.AddDays(4);
session.BeginTransaction();
var result = session.Query<Order>().Skip(0).Take(100).ToList();
session.Transaction.Commit();
}
What will happen with the change to the order date done after the first call to Commit()? That change will be persisted to the database when the query is performed in the second transaction despite the fact that the object modification itself happened before the transaction was started. Conversely, if you remove the second transaction, that modification will not be persisted of course.
There are multiple ways to manage sessions and transaction that can be used for different purposes. However, by far the easiest is to always follow this simple unit-of-work pattern:
Open session.
Immediately open transaction.
Perform a reasonable amount of work.
Commit or rollback transaction.
Dispose transaction.
Dispose session.
Discard all objects loaded using the session. At this point they can still
be used in memory, but any changes will not be persisted. Safer to just get
rid of them.

Data is not properly stored to hsqldb when using pooled data source by dbcp

I'm using hsqldb to create cached tables and indexed tables.
The data being stored has pretty high frequency so I need to use a connection pool.
Also because there is a lot of data I do not call checkpoint on every commit, but rather expect the data to be flushed after 50,000 rows are inserted.
So the thing is that I can see the .data file is growing but when I connect with hsqldb client I don't see the tables and the data.
So I had 2 simple tests, one inserted single row and one inserted 60,000 rows to new table. In both cases I couldn't see the result in any hsqldb client.
(Note that I use shutdown=true)
So when I add checkpoint after each commit, it solve the problem.
Also if specify in the connection string to use log, it solves the problem (I don't want the log in production though). Also not using pooled connection solved the problem and last is using pooled data source and explicitly close it before shutdown.
So I guess that some connections in the connection pool are not being closed, preventing from the db to somehow commit the changes and make them available for the client. But then, why couldn't I see the result even with 60,000 rows?
I also would expect the pool to be closed automatically...
What am I doing wrong? What is happening behind the scene?
The code to get the data source looks like this:
Class.forName("org.hsqldb.jdbcDriver");
String url = "jdbc:hsqldb:" + m_dbRoot + dbName + "/db" + ";hsqldb.log_data=false;shutdown=true;hsqldb.nio_data_file=false";
ConnectionFactory connectionFactory = new DriverManagerConnectionFactory(url, user, password);
GenericObjectPool connectionPool = new GenericObjectPool();
KeyedObjectPoolFactory stmtPool = new GenericKeyedObjectPoolFactory(null);
new PoolableConnectionFactory(connectionFactory, connectionPool, stmtPool, null, false, true);
DataSource ds = new PoolingDataSource(connectionPool);
And I'm using this Pooled data source to create table:
Connection c = m_dataSource.getConnection();
Statement st = c.createStatement();
String script = String.format("CREATE CACHED TABLE IF NOT EXISTS %s (id %s NOT NULL, entity %s NOT NULL, PRIMARY KEY (id));", m_tableName, m_idGenerator.getIdType(), TABLE_ENTITY_TYPE);
st.execute(script);
c.close;
st.close();
And insert rows:
Connection c = m_dataSource.getConnection();
c.setAutoCommit(false);
Statement stmt = c.prepareStatement(m_sqlInsert);
stmt.setObject(1, id);
stmt.setBinaryStream(2, Serializer.Helper.serialize(m_serializer, entity));
stmt.executeUpdate();
stmt.close();
stmt = null;
c.commit();
c.close();
stmt.close();
so the above seems to add data but it cannot be seen.
When I explicitly called
connectionPool.close();
Then and only then I could see the result.
I also tried to use JDBCDataSource and it worked as well.
So what is going on? And what is the right way to do this?
Your method of accessing the database from outside your application process is simply wrong.
Only one java process is supposed to connect to the file: database.
In order to achieve your aim, launch an HSQLDB server within your application, using exactly the same JDBC URL. Then connect to this server from the external client.
See the Guide:
http://www.hsqldb.org/doc/2.0/guide/listeners-chapt.html#lsc_app_start
Update: The OP commented that the external client was used after the application had stopped. Because you have turned the log off with hsqldb.log_data=false, nothing is persisted permanently. You need to perform an explicit CHECKPOINT or SHUTDOWN when your application completes its work. You cannot rely on shutdown=true at all, even without connection pooling.
See the Guide:
http://www.hsqldb.org/doc/2.0/guide/deployment-chapt.html#dec_bulk_operations

Using SQL dB column as a lock for concurrent operations in Entity Framework

We have a long running user operation that is handled by a pool of worker processes. Data input and output is from Azure SQL.
The master Azure SQL table structure columns are approximated to
[UserId, col1, col2, ... , col N, beingProcessed, lastTimeProcessed ]
beingProcessed is boolean and lastTimeProcessed is DateTime. The logic in every worker role is as shown below and with multiple workers processing (each with their own Entity Framework layer), in essence beingProcessed is being used a lock for MutEx purposes
Question: How can I deal with concurrency issues on the beingProcessed "lock" itself based on the above load? I think read-modify-write operation on the beingProcessed needs to be atomic but I'm open to other strategies. Open to other code refinements too.
[Update]: I wonder if TransactionScope is what's needed here ... http://msdn.microsoft.com/en-US/library/system.transactions.transactionscope(v=vs.110).aspx
Code:
public void WorkerRoleMain()
{
while(true)
{
try
{
dbContext db = new dbContext();
// Read
foreach (UserProfile user in db.UserProfile
.Where(u => DateTime.UtcNow.Subtract(u.lastTimeProcessed)
> TimeSpan.FromHours(24) &
u.beingProcessed == false))
{
user.beingProcessed = true; // Modify
db.SaveChanges(); // Write
// Do some long drawn processing here
...
...
...
user.lastTimeProcessed = DateTime.UtcNow;
user.beingProcessed = false;
db.SaveChanges();
}
}
catch(Exception ex)
{
LogException(ex);
Sleep(TimeSpan.FromMinutes(5));
}
} // while ()
}
What we usually do is this:
At the beginning of a long operation we start a transaction:
BEGIN TRANSACTION
Then we select a row from the table we would like to update/delete using these hints:
SELECT * FROM Table WITH (ROWLOCK, NOWAIT) Where ID = 123;
Then we check that we have the row. If the row is locked by another process there will be an SQL Error. In this case we rollback the transaction and advise the user.
If the record is locked we process the record, and do the required updates, using the same transaction object we used to lock the record:
UPDATE Table SET Col1='value' WHERE ID = 123;
Then we COMMIT the transaction.
COMMIT;
This is just the Pseudo-code of the process. You will have to implement it in your program.
One small note regarding the above process. When you lock the record in SQL Server (or Azure), use the primary key in your WHERE Clause, otherwise the SQL Server will decide to use a Page lock, or Table lock

SELECT through oledbcommand in vb.net not picking up recent changes

I'm using the following code to work out the next unique Order Number in an access database. ServerDB is a "System.Data.OleDb.OleDbConnection"
Dim command As New OleDb.OleDbCommand("", serverDB)
command.CommandText = "SELECT max (ORDERNO) FROM WORKORDR"
iOrder = command.ExecuteScalar()
NewOrderNo = (iOrder + 1)
If I subsequently create a WORKORDR (using a different DB connection), the code will not pick up the new "next order number."
e.g.
iFoo = NewOrderNo
CreateNewWorkOrderWithNumber(iFoo)
iFoo2 = NewOrderNo
will return the same value to both iFoo and iFoo2.
If I Close and then reopen serverDB, as part of the "NewOrderNo" function, then it works. iFoo and iFoo2 will be correct.
Is there any way to force a "System.Data.OleDb.OleDbConnection" to refresh the database in this situation without closing and reopening the connection.
e.g. Is there anything equivalent to serverdb.refresh or serverdb.FlushCache
How I create the order.
I wondered if this could be caused by not updating my transactions after creating the order. I'm using an XSD for the order creation, and the code I use to create the record is ...
Sub CreateNewWorkOrderWithNumber(ByVal iNewOrder As Integer)
Dim OrderDS As New CNC
Dim OrderAdapter As New CNCTableAdapters.WORKORDRTableAdapter
Dim NewWorkOrder As CNC.WORKORDRRow = OrderDS.WORKORDR.NewWORKORDRRow
NewWorkOrder.ORDERNO = iNewOrder
NewWorkOrder.name = "lots of fields filled in here."
OrderDS.WORKORDR.AddWORKORDRRow(NewWorkOrder)
OrderAdapter.Update(NewWorkOrder)
OrderDS.AcceptChanges()
End Sub
From MSDN
Microsoft Jet has a read-cache that is
updated every PageTimeout milliseconds
(default is 5000ms = 5 seconds). It
also has a lazy-write mechanism that
operates on a separate thread to main
processing and thus writes changes to
disk asynchronously. These two
mechanisms help boost performance, but
in certain situations that require
high concurrency, they may create
problems.
If you possibly can, just use one connection.
Back in VB6 you could force the connection to refresh itself using ADO. I don't know whether it's possible with VB.NET. My Google-fu seems to be weak today.
You can change the PageTimeout value in the registry but that will affect all programs on the computer that use the Jet engine (i.e. programmatic use of Access databases)
I always throw away a Connection Object after I used it. Due to Connection Pooling getting a new Connection is cheap.

Transaction timeout expired while using Linq2Sql DataContext.SubmitChanges()

please help me resolve this problem:
There is an ambient MSMQ transaction. I'm trying to use new transaction for logging, but get next error while attempt to submit changes - "Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding." Here is code:
public static void SaveTransaction(InfoToLog info)
{
using (TransactionScope scope =
new TransactionScope(TransactionScopeOption.RequiresNew))
{
using (TransactionLogDataContext transactionDC =
new TransactionLogDataContext())
{
transactionDC.MyInfo.InsertOnSubmit(info);
transactionDC.SubmitChanges();
}
scope.Complete();
}
}
Please help me.
Thx.
You could consider increasing the timeout or eliminating it all together.
Something like:
using(TransactionLogDataContext transactionDC = new TransactionLogDataContext())
{
transactionDC.CommandTimeout = 0; // No timeout.
}
Be careful
You said:
thank you. but this solution makes new question - if transaction scope was changed why submit operation becomes so time consuming? Database and application are on the same machine
That is because you are creating new DataContext right there:
TransactionLogDataContext transactionDC = new TransactionLogDataContext())
With new data context ADO.NET opens up new connection (even if connection strings are the same, unless you do some clever connection pooling).
Within transaction context when you try to work with more than 1 connection instances (which you just did)
ADO.NET automatically promotes transaction to a distributed transaction and will try to enlist it into MSDTC. Enlisting very first transaction per connection into MSDTC will take time (for me it takes 30+ seconds), consecutive transactions will be fast, however (in my case 60ms). Take a look at this http://support.microsoft.com/Default.aspx?id=922430
What you can do is reuse transaction and connection string (if possible) when you create new DataContext.
TransactionLogDataContext tempDataContext =
new TransactionLogDataContext(ExistingDataContext.Transaction.Connection);
tempDataContext.Transaction = ExistingDataContext.Transaction;
Where ExistingDataContext is the one which started ambient transaction.
Or attemp to speed up your MS DTC.
Also do use SQL Profiler suggested by billb and look for SessionId between different commands (save and savelog in your case). If SessionId changes, you are in fact using 2 different connections and in that case will have to reuse transaction (if you don't want it to be promoted to MS DTC).