As per apache ignite spring data documentation, there are two method to save the data in ignite cache:
1. org.apache.ignite.springdata.repository.IgniteRepository.save(key, vlaue)
and
2. org.apache.ignite.springdata.repository.IgniteRepository.save(Map<ID, S> entities)
So, I just want to understand the 2nd method transaction behavior. Suppose we are going to save the 100 records by using the save(Map<Id,S>) method and due to some reason after 70 records there are some nodes go down. In this case, will it roll back all the 70 records?
Note: As per 1st method behavior, If we use #Transaction at method level then it will roll back the particular entity.
First of all, you should read about the transaction mechanism used in Apache Ignite. It is very good described in articles presented here:
https://apacheignite.readme.io/v1.0/docs/transactions#section-two-phase-commit-2pc
The most interesting part for you is "Backup Node Failures" and "Primary Node Failures":
Backup Node Failures
If a backup node fails during either "Prepare" phase or "Commit" phase, then no special handling is needed. The data will still be committed on the nodes that are alive. GridGain will then, in the background, designate a new backup node and the data will be copied there outside of the transaction scope.
Primary Node Failures
If a primary node fails before or during the "Prepare" phase, then the coordinator will designate one of the backup nodes to become primary and retry the "Prepare" phase. If the failure happens before or during the "Commit" phase, then the backup nodes will detect the crash and send a message to the Coordinator node to find out whether to commit or rollback. The transaction still completes and the data within distributed cache remains consistent.
In your case, all updates for all values in the map should be done in one transaction or rollbacked. I guess that these articles answered your question.
Related
We are trying to implement caching for our multi-tenant application. We are planning to create new Redis DB for each tenant.
We have one scenario where we need to use Redis Transactions. While going through this post https://redis.io/topics/transactions, we found that
All the commands in a transaction are serialized and executed
sequentially. It can never happen that a request issued by another
client is served in the middle of the execution of a Redis
transaction. This guarantees that the commands are executed as a
single isolated operation.
Is this read blocking will only apply to database level or at full instance level?
The guarantee you quoted applies to the instance, not the database. A command for DB 2 will not run in the middle of a transaction for DB 1.
You can find more information about multiple databases (including an argument by the creator of Redis against using them at all) in this question.
Actually I have couple of questions here.
1) When I call insert from my application using Mysql connector, its answered by one of the Master node, but does that master node waits before the insert is applied on all the nodes and then reply to the client. If it waits for all the nodes to insert before replying to the client then how is wsrep_sst_method=xtrabackup helps, will it make it reply to client immediately or will it make no difference. Maybe I understood this variable wrong.
2) What about read, I guess it is just answered by one of the master node. In case wsrep_sync_wait is set only in that case it waits for a reply from all the nodes.
Thanks
"How synchronous"? Synchronous enough, but with one exception: "Critical read".
The "fix" is during reading, not writing.
When writing the heavyweight checking is done during COMMIT. At this point, all other nodes are contacted to see if "this transaction will eventually commit successfully". That is, the other nodes say "yes" but don't actually finish the work enough for a subsequent SELECT to see the results of the write. The guarantee here is that, the cluster is in a consistent state and will stay that way, even if any one node dies.
"Critical read" is, for example, when a user posts something, then immediately reads the database and expects to see the posting. But, if the read (SELECT) hits a different node, the "almost" synchronous nature of Galera may not have committed the data to the reading node. The data is there, and will be successfully written to disk, but maybe not yet. The workaround is to use wsrep_sync_wait when reading to assure that replication is caught up before the SELECT. No action is taken when writing.
(I don't see the relevance of wsrep_sst_method=xtrabackup. That relates to recovering from a dead node.)
I need to design a distributed system a scheduler sends tasks to workers in multiple nodes. Each task is assigned an id, and it could be executed more than once, scheduled by the scheduler (usually once per hour).
My only requirement is that a task with a specific id should not be executed twice at the same time by the cluster. I can think of a design where the scheduler holds a lock for each task id and sends the task to an appropriate worker. Once the worker has finished the lock should be released and the scheduler might schedule it again.
What should my design include to ensure this. I'm concerned about cases where a task is sent to a worker which starts the task but then fails to inform the scheduler about it.
What would be the best practice in this scenario to ensure that only a single instance of a job is always executed at a time?
You could use a solution that implements a consensus protocol. Say - for example - that all your nodes in the cluster can communicate using the Raft protocol. As such, whenever a node X would want to start working on a task Y it would attempt to commit a message X starts working on Y. Once such messages are committed to the log, all the nodes will see all the messages in the log in the same order.
When node X finishes or aborts the task it would attempt to commit X no longer works on Y so that another node can start/continue working on it.
It could happen that two nodes (X and Z) may try to commit their start messages concurrently, and the log would then look something like this:
...
N-1: ...
N+0: "X starts working on Y"
...
N+k: "Z starts working on Y"
...
But since there is no X no longer works on Y message between the N+0 and N+k entry, every node (including Z) would know that Z must not start the work on Y.
The only remaining problem would be if node X got partitioned from the cluster before it can attempt to commit its X no longer works on Y for which I believe there is no perfect solution.
A work-around could be that X would try to periodically commit a message X still works on Y at time T and if no such message was committed to the log for some threshold duration, the cluster would assume that no one is working on that task anymore.
With this work-around however, you'd be allowing the possibility that two or more nodes will work on the same task (the partitioned node X and some new node that picks up the task after the timeout).
After some thorough search, I came to the conclusion that this problem can be solved through a method called fencing.
In essence, when you suspect that a node (worker) failed, the only way to ensure that it will not corrupt the rest of the system is to provide a fence that will stop the node from accessing the shared resource you need to protect. That must be a radical method like resetting the machine that runs the failed process or setup a firewall rule that will prevent the process from accessing the shared resource. Once the fence is in place, then you can safely break the lock that was being held by the failed process and start a new process.
Another possibility is to use a relational database to store task metadata + proper isolation level (can't go wrong with serializable if performance is not your #1 priority).
SERIALIZABLE
This isolation level specifies that all transactions occur in a completely isolated fashion; i.e., as if all transactions in the system had executed serially, one after the other. The DBMS may execute two or more transactions at the same time only if the illusion of serial execution can be maintained.
Use either optimistic or pessimistic locking should work too. https://learning-notes.mistermicheels.com/data/sql/optimistic-pessimistic-locking-sql/
In case you need a rerun of the task, simply update the metadata. (or I would recommend to create a new task with different metadata to keep track of its execution history)
I am trying to use Redis as a cache that sits in front of an SQL database. At a high level I want to implement these operations:
Read value from Redis, if it's not there then generate the value via querying SQL, and push it in to Redis so we don't have to compute that again.
Write value to Redis, because we just made some change to our SQL database and we know that we might have already cached it and it's now invalid.
Delete value, because we know the value in Redis is now stale, we suspect nobody will want it, but it's too much work to recompute now. We're OK letting the next client who does operation #1 compute it again.
My challenge is understanding how to implement #1 and #3, if I attempt to do it with StackExchange.Redis. If I naively implement #1 with a simple read of the key and push, it's entirely possible that between me computing the value from SQL and pushing it in that any number of other SQL operations may have happened and also tried to push their values into Redis via #2 or #3. For example, consider this ordering:
Client #1 wants to do operation #1 [Read] from above. It tries to read the key, sees it's not there.
Client #1 calls to SQL database to generate the value.
Client #2 does something to SQL and then does operation #2 [Write] above. It pushes some newly computed value into Redis.
Client #3 comes a long, does some other operation in SQL, and wants to do operation #3 [Delete] to Redis knowing that if there's something cached there, it's no longer valid.
Client #1 pushes its (now stale) value to Redis.
So how do I implement my operation #1? Redis offers a WATCH primitive that makes this fairly easy to do against the bare metal where I would be able to observe other things happened on the key from Client #1, but it's not supported by StackExchange.Redis because of how it multiplexes commands. It's conditional operations aren't quite sufficient here, since if I try saying "push only if key doesn't exist", that doesn't prevent the race as I explained above. Is there a pattern/best practice that is used here? This seems like a fairly common pattern that people would want to implement.
One idea I do have is I can use a separate key that gets incremented each time I do some operation on the main key and then can use StackExchange.Redis' conditional operations that way, but that seems kludgy.
It looks like question about right cache invalidation strategy rather then question about Redis. Why i think so - Redis WATCH/MULTI is kind of optimistic locking strategy and this kind of
locking not suitable for most of cases with cache where db read query can be a problem which solves with cache. In your operation #3 description you write:
It's too much work to recompute now. We're OK letting the next client who does operation #1 compute it again.
So we can continue with read update case as update strategy. Here is some more questions, before we continue:
That happens when 2 clients starts to perform operation #1? Both of them can do not find value in Redis and perform SQL query and next both of then write it to Redis. So we should have garanties that just one client would update cache?
How we can be shure in the right sequence of writes (operation 3)?
Why not optimistic locking
Optimistic concurrency control assumes that multiple transactions can frequently complete without interfering with each other. While running, transactions use data resources without acquiring locks on those resources. Before committing, each transaction verifies that no other transaction has modified the data it has read. If the check reveals conflicting modifications, the committing transaction rolls back and can be restarted.
You can read about OCC transactions phases in wikipedia but in few words:
If there is no conflict - you update your data. If there is a conflict, resolve it, typically by aborting the transaction and restart it if still need to update data.
Redis WATCH/MULTY is kind of optimistic locking so they can't help you - you do not know about your cache key was modified before try to work with them.
What works?
Each time your listen somebody told about locking - after some words you are listen about compromises, performance and consistency vs availability. The last pair is most important.
In most of high loaded system availability is winner. Thats this means for caching? Usualy such case:
Each cache key hold some metadata about value - state, version and life time. The last one is not Redis TTL - usually if your key should be in cache for X time, life time
in metadata has X + Y time, there Y is some time to garantie process update.
You never delete key directly - you need just update state or life time.
Each time your application read data from cache if should make decision - if data has state "valid" - use it. If data has state "invalid" try to update or use absolete data.
How to update on read(the quite important is this "hand made" mix of optimistic and pessisitic locking):
Try set pessimistic locking (in Redis with SETEX - read more here).
If failed - return absolete data (rememeber we still need availability).
If success perform SQL query and write in to cache.
Read version from Redis again and compare with version readed previously.
If version same - mark as state as "valid".
Release lock.
How to invalidate (your operations #2, #3):
Increment cache version and set state "invalid".
Update life time/ttl if need it.
Why so difficult
We always can get and return value from cache and rarely have situatiuon with cache miss. So we do not have cache invalidation cascade hell then many process try to update
one key.
We still have ordered key updates.
Just one process per time can update key.
I have queue!
Sorry, you have not said before - I would not write it all. If have queue all becomes more simple:
Each modification operation should push job to queue.
Only async worker should execute SQL and update key.
You still need use "state" (valid/invalid) for cache key to separete application logic with cache.
Is this is answer?
Actualy yes and no in same time. This one of possible solutions. Cache invalidation is much complex problem with many possible solutions - one of them
may be simple, other - complex. In most of cases depends on real bussines requirements of concrete applicaton.
Consider the following scenario.
There are 2 Hazelcast nodes. One is stopped, another is running under quite heavy load.
Now, the second node comes up. The application starts up and its Hazelcast instance hooks up to the first. Hazelcast starts data repartitioning. For 2 nodes, it essentially means
that each entry in IMap gets copied to the new node and two nodes are assigned to be master/backup arbitrarily.
PROBLEM:
If the first node is brought down during this process, and the replication is not done completely, part of the IMap contents and ITopic subscriptions may be lost.
QUESTION:
How to ensure that the repartitioning process has finished, and it is safe to turn off the first node?
(The whole setup is made to enable software updates without downtime, while preserving current application state).
I tried using getPartitionService().addMigrationListener(...) but the listener does not seem to be hooked up to the complete migration process. Instead, I get tens to hundreds calls migrationStarted()/migrationCompleted() for each chunk of the replication.
1- When you gracefully shutdown first node, shutdown process should wait (block) until data is safely backed up.
hazelcastInstance.getLifecycleService().shutdown();
2- If you use Hazelcast Management Center, it shows ongoing migration/repartitioning operation count in home screen.