env: TX - row lock contention during update in RAC - locking

My first posting on stackoverflow :)
I observed a lot of latency yesterday in this 4 node RAC. Update statement:
UPDATE ACT_RU_JOB
SET REV_ = :1,
LOCK_EXP_TIME_ = :2,
LOCK_OWNER_ = :3,
DUEDATE_ = :4,
PROCESS_INSTANCE_ID_ = :5,
EXCLUSIVE_ = :6
WHERE ID_ = :7 AND REV_ = :8
There were 8 sessions each showing the same update and wait event name enq: TX - row lock contention.
The AWR report shows it was the top SQL waited by elapsed time: enter image description here
This is a third party application, so changing code is not practical. How do I resolve this wait event? This happens when the application is busier, does not happen in quiet periods.
For the table in question, the AWR report reports some ITL waits where VALUE is 7 and %CAPTURE is 9.86%. Does that justify increasing INITRAN of the table from 1 to a higher value?
Any help will be appreciated.
There was a thought of killing one of the sessions, but simply waiting resolved the issue.

Related

SQL queries returning incorrect results during High Load

I have a table in which during the performance runs, there are inserts happening in the beginning when the job starts, during the insertion time there are also parallel operations(GET/UPDATE queries) happening on that table. The Get operation also updates a value in column marking that record as picked. However, the next get performed on the table would again return back the same record even when the record was marked in progress.
P.S. --> both the operations are done by the same single thread existing in the system. Logs below for reference, record marked in progress at Line 1 on 20:36:42,864, however, it is returned back in the result set of query executed after 20:36:42,891 by the same thread.
We also observed that during high load (usually during same scenario as mentioned above) some update operation (intermittent) were not happening on the table even when the update executed successfully (validated using the returned result and then doing a get just after that to check the updated value ) without throwing an exception.
13 Apr 2020 20:36:42,864 [SHT-4083-initial] FINEST - AbstractCacheHelper.markContactInProgress:2321 - Action state after mark in progresss contactId.ATTR=: 514409 for jobId : 4083 is actionState : 128
13 Apr 2020 20:36:42,891 [SHT-4083-initial] FINEST - CacheAdvListMgmtHelper.getNextContactToProcess:347 - Query : select priority, contact_id, action_state, pim_contact_store_id, action_id
, retry_session_id, attempt_type, zone_id, action_pos from pim_4083 where handler_id = ? and attempt_type != ? and next_attempt_after <= ? and action_state = ? and exclude_flag = ? order
by attempt_type desc, priority desc, next_attempt_after asc,contact_id asc limit 1
This happens usually during the performance runs when there are parallel JOB's started which are working on Ignite. Can anyone suggest what can be done to avoid such a situation..?
We have 2 ignite data nodes that are deployed as springBootService deployed in the cluster being accessed, by 3 client nodes.
Ignite version -> 2.7.6, Cache configuration is as follows,
IgniteConfiguration cfg = new IgniteConfiguration();
CacheConfiguration cachecfg = new CacheConfiguration(CACHE_NAME);
cachecfg.setRebalanceThrottle(100);
cachecfg.setBackups(1);
cachecfg.setCacheMode(CacheMode.REPLICATED);
cachecfg.setRebalanceMode(CacheRebalanceMode.ASYNC);
cachecfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
cachecfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
// Defining and creating a new cache to be used by Ignite Spring Data repository.
CacheConfiguration ccfg = new CacheConfiguration(CACHE_TEMPLATE);
ccfg.setStatisticsEnabled(true);
ccfg.setCacheMode(CacheMode.REPLICATED);
ccfg.setBackups(1);
DataStorageConfiguration dsCfg = new DataStorageConfiguration();
dsCfg.getDefaultDataRegionConfiguration().setPersistenceEnabled(true);
dsCfg.setStoragePath(storagePath);
dsCfg.setWalMode(WALMode.FSYNC);
dsCfg.setWalPath(walStoragePath);
dsCfg.setWalArchivePath(archiveWalStoragePath);
dsCfg.setWriteThrottlingEnabled(true);
cfg.setAuthenticationEnabled(true);
dsCfg.getDefaultDataRegionConfiguration()
.setInitialSize(Long.parseLong(cacheInitialMemSize) * 1024 * 1024);
dsCfg.getDefaultDataRegionConfiguration().setMaxSize(Long.parseLong(cacheMaxMemSize) * 1024 * 1024);
cfg.setDataStorageConfiguration(dsCfg);
cfg.setClientConnectorConfiguration(clientCfg);
// Run the command to alter the default user credentials
// ALTER USER "ignite" WITH PASSWORD 'new_passwd'
cfg.setCacheConfiguration(cachecfg);
cfg.setFailureDetectionTimeout(Long.parseLong(cacheFailureTimeout));
ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
ccfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
ccfg.setRebalanceMode(CacheRebalanceMode.ASYNC);
ccfg.setRebalanceThrottle(100);
int pool = cfg.getSystemThreadPoolSize();
cfg.setRebalanceThreadPoolSize(2);
cfg.setLifecycleBeans(new MyLifecycleBean());
logger.info(methodName, "Starting ignite service");
ignite = Ignition.start(cfg);
ignite.cluster().active(true);
// Get all server nodes that are already up and running.
Collection<ClusterNode> nodes = ignite.cluster().forServers().nodes();
// Set the baseline topology that is represented by these nodes.
ignite.cluster().setBaselineTopology(nodes);
ignite.addCacheConfiguration(ccfg);

synchronization between 2 applications pooling a SQL table

I have 2 instances of a VB.NET application each running on their own dedicated servers. The said application runs a While true loop with a 5s sleep on IDLE (IDLE is when the Table doesn't have any ProcessQuery to be treated). On each iteration, the application questions a table in the SQL Database to know if there is anything it could process.
The problem is that i sometimes encounter the problem where both of the instances are "taking" the same ProcessQuery.
I'm using EntityFramework6. I have looked into EntityState but i don't think it does exactly what i'm trying to accomplish.
I was wondering what would be my solution to have perfect parallel instances. It's not impossible at some point i have 12 instances running on 12 machines.
Thanks!
Dim conn As New Info_IndusEntities()
Dim DemandeWilma As WilmaDemandes = conn.WilmaDemandes.Where(Function(x) x.Site = 'LONDON' AndAlso x.Statut = 'toProcess').OrderBy(Function(x) x.RequestDate).FirstOrDefault
If Not IsNothing(DemandeWilma) Then
DemandeWilma.Statut = Statuts.EnTraitement.ToString
DemandeWilma.ServerName = Environment.MachineName
DemandeWilma.ProcessDate = DateTime.Now
conn.SaveChanges()
Return DemandeWilma
end if
UPDATE (21/06/19)
I found an article that I find interesting.
I started by adding a column to my Table :
UPDATED (21/06/19)
I then refreshed my model and changed the Concurrency Check property of RowVersion column in my ORM :
When I tested the update, here's the log of EF6 :
UPDATE [dbo].[WilmaDemandes] SET [Statut] = #0, [ServerName] = #1,
[DateDebut] = #2 WHERE (([ID] = #3) AND ([RowVersion] = #4)) SELECT
[RowVersion] FROM [dbo].[WilmaDemandes] WHERE ##ROWCOUNT > 0 AND [ID]
= #3
-- #0: 'EnTraitement' (Type = String, Size = 20)
-- #1: 'TRB5995' (Type = String, Size = 20)
-- #2: '2019-06-25 7:31:01 AM' (Type = DateTime2)
-- #3: '124373' (Type = Int32)
-- #4: 'System.Byte[]' (Type = Binary, Size = 8)
-- Executing at 2019-06-25 7:31:24 AM -04:00
-- Completed in 95 ms with result: SqlDataReader
Closed connection at 2019-06-25 7:31:24 AM -04:00
Exception thrown:
'System.Data.Entity.Infrastructure.DbUpdateConcurrencyException' in
EntityFramework.dll
UPDATED (25/06/19)
The problems, as explained in this post, starts when you are using DB-First instead of Code-First. Your property will get overwritten silently as soon as you update the model. Some people back then coded a console app workaround that they run on pre-build. I'm not sure i'm quite ready to take this solution as final solution.
Interesting tutorial on how to test optimistic concurrency and ways to resolve such an exception.
Add an "owner" column to your queue table
Your application updates one record (TOP 1) and sets the owner value to their identifier (WHERE Owner IS NULL)
Now your application goes back and reads their owned rows and processes them
It's a simple pattern and it works great. If any processes happen to take ownership 'simultaneously', only one will actually get the reservation.
I'm not very good at LINQ so here's a brute force method, multiline for clarity:
// First try reserving a row
conn.Database.ExecuteSqlCommand(
"WITH UpdateTop1 AS
(SELECT TOP 1 * FROM WilmaDemandes
WHERE Owner IS NULL
AND Site = 'LONDON'
ORDER BY RequestDate)
UPDATE UpdateTop1 SET Owner='ThisApplication'"
);
// See if we got one
Dim DemandeWilma As WilmaDemandes =
conn.WilmaDemandes.
Where(x => x.Owner=='ThisApplication').FirstOrDefault
// If we got a row, process it. Otherwise Idle and repeat
There's also no reason that you must reserve one row. You could reserve all the free rows and work your way through them. Meanwhile other processes will pick up any subsequently arriving rows
Personally I would refactor your status column and make it NULL for new records ready to be processed, otherwise it's the worker ID that has reserved it.
It also helps to add things like timestamp columns to record when the row was reserved etc.

Using multiple threads for DB updates results in higher write time per update

So I have a script that is supposed to update a giant table (Postgres). Since the table has about 150m rows and I want to complete this as fast as possible, using multiple threads seemed like a perfect answer. However, I'm seeing something very weird.
When I use a single thread, the write time to an update is much much lower than when I use multiple threads.
require 'sequel'
.....
DB = Sequel.connect(DB_CREDS)
queue = Queue.new
read_query = query = DB["
SELECT id, extra_fields
FROM objects
WHERE XYZ IS FALSE
"]
read_query.use_cursor(:rows_per_fetch => 1000).each do |row|
queue.push(row)
end
Up until this point, IMO it shouldn't matter because we're just reading stuff from the DB and it has nothing to do with writing. From here, I've tried two approaches. Single-threaded and Multi-threaded.
NOTE - This is not the actual UPDATE query that I want to execute, it's just a pseudo one for demonstration purposes. The actual query is a lot longer and plays with JSON and stuff so I can't really update the entire table using a single query.
Single-threaded
until queue.empty?
photo = queue.shift
id = photo[:id]
update_query = DB["
UPDATE objects
SET XYZ = TRUE
WHERE id = #{id}
"]
result = update_query.update
end
If I execute this, I see in my DB logs that each update query takes time less than 0.01 seconds
I, [2016-08-15T10:45:48.095324 #54495] INFO -- : (0.001441s) UPDATE
objects SET XYZ = TRUE WHERE id = 84395179
I, [2016-08-15T10:45:48.103818 #54495] INFO -- : (0.008331s) UPDATE
objects SET XYZ = TRUE WHERE id = 84395181
I, [2016-08-15T10:45:48.106741 #54495] INFO -- : (0.002743s) UPDATE
objects SET XYZ = TRUE WHERE id = 84395182
Multi-threaded
MAX_THREADS = 5
num_threads = 0
all_threads = []
until queue.empty?
if num_threads < MAX_THREADS
photo = queue.shift
num_threads += 1
all_threads << Thread.new {
id = photo[:id]
update_query = DB["
UPDATE photos
SET cv_tagged = TRUE
WHERE id = #{id}
"]
result = update_query.update
num_threads -= 1
Thread.exit
}
end
end
all_threads.each do |thread|
thread.join
end
Now, in theory it should be faster right? But each update takes about 0.5 seconds. I'm so surprised what that is the case.
I, [2016-08-15T11:02:10.992156 #54583] INFO -- : (0.414288s)
UPDATE objects
SET XYZ = TRUE
WHERE id = 119498834
I, [2016-08-15T11:02:11.097004 #54583] INFO -- : (0.622775s)
UPDATE objects
SET XYZ = TRUE
WHERE id = 119498641
I, [2016-08-15T11:02:11.097074 #54583] INFO -- : (0.415521s)
UPDATE objects
SET XYZ = TRUE
WHERE id = 119498826
Any ideas on -
Why this is happening?
How can I increase the update speed for multiple threads approach.
Have you configured Sequel so that it has a connection pool of 5 connections?
Have you considered doing multiple updates per call via an IN clause?
If you haven't done 1, you have N threads fighting over N-n connections, which equates to resource starvation, which is a classic concurrency issue.
Your example can be reduced to: DB[:objects].where(:XYZ=>false).update(:XYZ=>true)
I'm guessing your actual need is not that simple. But the same approach may still work. Instead of issuing a query per row, use a single query to update all related rows.
I went through something similar on a project ("import all history from a legacy database into a new one with completely different structure and organization"). Unless you managed to shoot yourself in the foot somewhere else, you have 2 basic bottlenecks to look for:
the database's disk IO
the ruby process' CPU
Some suggestions,
database IO: use DB transactions, update 1000 records per transaction (you can tweak the exact number but 1000 is usually good) - huge DB table usually means a lot of indexes too, every couple of update actions will trigger a REINDEX and AUTOVACUUM actions within the DB which will result in a significant drop of update speed, a transaction basically allows you to push a 1000 updated records without REINDEX and AUTOVACUUM and then perform both actions, the result is MUCH faster (something like an order of magnitude)
database IO: change indexes, drop every index you can live without during the update process, ideally you will have only 1 very streamlined index which allows unique row lookups for update purposes
ruby CPU: unless you are using JRuby or Rubinius, or REALLY paying the price of network latency to your DB, threads will do you no big benefit, use fork/processes (see GIL). You did a great job choosing Sequel over AR for this
ruby CPU: if you decide to go threads + JRuby with this don't forget to try and plug in jProfiler, it's amazing at tracing bottlenecks in Java and author of SideKiq swears it is amazing for JRuby too - unfortunately, afaik, there is no equivalent of jProfiler for C Ruby (there are profiling tools, but nowhere as useful)
After you implement these suggestions you know you did all you could when:
all of the CPUs on the Ruby box are on 100% load
the hard disk IO of the DB is on 100% throughput
Find this sweet spot and don't add additional ruby update threads/processes after that (or add more hardware) and that's that
PS check out https://github.com/ruby-concurrency/concurrent-ruby - it's a great parallelization lib

Does HAWQ reuse QE processes after a query finished?

Query Executor processes are created on segments to do query execution. When I doing a query, I can see the working QEs. But when the query is finished, they are still alive with idle state. Does HAWQ reuse QE processes after a query finished?
Yes, HAWQ QE Process is kept in session level. If you have already finished a query but with session alive, the next query you sent through the same session will reuse the already started QEs.
There are two phenomenons:
1) The catched QE process number is less than the QEs needed for the new query on the same host. Under this case, HAWQ will reuse the catched QEs, and also start new QEs for the not-enough number.
2) The catched QE process number is more than the QEs needed for the new query on the same host. Under this case, HAWQ will choose some QEs inside of these catched QEs. You'll see some QEs still idle.
The number of QEs needed is decided by resource manager.
Moveover, if you run the "SET" command, if there are catched QEs on the segment hosts, all the QEs will be reused. But if there are no catched QEs, the "SET" command will not start any QEs in segment.
The cache of QEs in HAWQ is designed for two purpose:
Reuse the QEs between consecutive queries so as to avoid forking them every time we run a query, and thus improve query performance, especially for small query.
Debug in feature development and bug fix.
The QEs of current query is released if current session is closed or they are idle after gp_vmem_idle_resource_timeout ms. It is 10 minutes in debug build, and 18 seconds in release build by default. You may refer to guc.c for details:
{
{"gp_vmem_idle_resource_timeout", PGC_USERSET, CLIENT_CONN_OTHER,
gettext_noop("Sets the time a session can be idle (in milliseconds) before we release gangs on the segment DBs to free resources."),
gettext_noop("A value of 0 turns off the timeout."),
GUC_UNIT_MS | GUC_GPDB_ADDOPT
},
&IdleSessionGangTimeout,
#ifdef USE_ASSERT_CHECKING
600000, 0, INT_MAX, NULL, NULL /* 10 minutes by default on debug builds.*/
#else
18000, 0, INT_MAX, NULL, NULL
#endif
}
Yes. If in an interval, there comes another query, QEs can be reused. If this interval timeout, QEs quit.
Moreover session quit will quit all the forked QEs no matter the interval is.
The interval GUC is gp_vmem_idle_resource_timeout, you can set it in your session.

Transactions and watch statement in Redis

Could you please explain me following example from "The Little Redis Book":
With the code above, we wouldn't be able to implement our own incr
command since they are all executed together once exec is called. From
code, we can't do:
redis.multi()
current = redis.get('powerlevel')
redis.set('powerlevel', current + 1)
redis.exec()
That isn't how Redis transactions work. But, if we add a watch to
powerlevel, we can do:
redis.watch('powerlevel')
current = redis.get('powerlevel')
redis.multi()
redis.set('powerlevel', current + 1)
redis.exec()
If another client changes the value of powerlevel after we've called
watch on it, our transaction will fail. If no client changes the
value, the set will work. We can execute this code in a loop until it
works.
Why we can't execute increment in transaction that can't be interrupted by other command? Why we need to iterate instead and wait until nobody changes value before transaction starts?
There are several questions here.
1) Why we can't execute increment in transaction that can't be interrupted by other command?
Please note first that Redis "transactions" are completely different than what most people think transactions are in classical DBMS.
# Does not work
redis.multi()
current = redis.get('powerlevel')
redis.set('powerlevel', current + 1)
redis.exec()
You need to understand what is executed on server-side (in Redis), and what is executed on client-side (in your script). In the above code, the GET and SET commands will be executed on Redis side, but assignment to current and calculation of current + 1 are supposed to be executed on client side.
To guarantee atomicity, a MULTI/EXEC block delays the execution of Redis commands until the exec. So the client will only pile up the GET and SET commands in memory, and execute them in one shot and atomically in the end. Of course, the attempt to assign current to the result of GET and incrementation will occur well before. Actually the redis.get method will only return the string "QUEUED" to signal the command is delayed, and the incrementation will not work.
In MULTI/EXEC blocks you can only use commands whose parameters can be fully known before the begining of the block. You may want to read the documentation for more information.
2) Why we need to iterate instead and wait until nobody changes value before transaction starts?
This is an example of concurrent optimistic pattern.
If we used no WATCH/MULTI/EXEC, we would have a potential race condition:
# Initial arbitrary value
powerlevel = 10
session A: GET powerlevel -> 10
session B: GET powerlevel -> 10
session A: current = 10 + 1
session B: current = 10 + 1
session A: SET powerlevel 11
session B: SET powerlevel 11
# In the end we have 11 instead of 12 -> wrong
Now let's add a WATCH/MULTI/EXEC block. With a WATCH clause, the commands between MULTI and EXEC are executed only if the value has not changed.
# Initial arbitrary value
powerlevel = 10
session A: WATCH powerlevel
session B: WATCH powerlevel
session A: GET powerlevel -> 10
session B: GET powerlevel -> 10
session A: current = 10 + 1
session B: current = 10 + 1
session A: MULTI
session B: MULTI
session A: SET powerlevel 11 -> QUEUED
session B: SET powerlevel 11 -> QUEUED
session A: EXEC -> success! powerlevel is now 11
session B: EXEC -> failure, because powerlevel has changed and was watched
# In the end, we have 11, and session B knows it has to attempt the transaction again
# Hopefully, it will work fine this time.
So you do not have to iterate to wait until nobody changes the value, but rather to attempt the operation again and again until Redis is sure the values are consistent and signals it is successful.
In most cases, if the "transactions" are fast enough and the probability to have contention is low, the updates are very efficient. Now, if there is contention, some extra operations will have to be done for some "transactions" (due to the iteration and retries). But the data will always be consistent and no locking is required.