ElastiCache URL I can hit that always uses primary node - amazon-elasticache

This morning I ran into an issue were my primary node in replication group was changed. I still need to investigate why this happened.
The upshot was lots of failures in a Rails application as it was trying to write to what was the primary node but had become a read replica.
Is there a URL I can use that basically says "write to the primary node of this replication group, I don't care which node that is"
Right now I am using something similar to;
name-002.aaaaa.0001.use1.cache.amazonaws.com
My "fix" for now was changing what was name-001 to name-002 but until I know the reason why the primary node was changed I have to assume this will break again.

I think I have answered my own question.
In the admin section for the replication group there is a Primary Endpoint which seems to do the job of delegating that work out.

Related

Apache Ignite node startup error - Joining node doesn't have stored group keys

I made some changes in my dev Ignite cluster to enable persistence. Now when I start my cluster (2 node, version 2.9.0), first one starts just fine but second one doesn't seem to be. As the first one shows in log the below error
[14:18:59] Joining node doesn't have stored group keys [node=4f20534b-1e44-46af-b81a-34d35807abd8]
I saw a similar question, whose answer mentions of TDE or transparent data encryption. But I have not enabled encryption to data anywhere in my config.
What could be the problem? Please help.
It's a known issue that Ignite checks a list of known encryption keys for cache groups even in case TDE is turned off.
Actually, it doesn't affect the node start, just an info message is printed.
You can find the discussion regarding that here.

Changed data for a key in Redis: how to figure out what changed it

The web application I am working with is written in Django and is using Redis to cache some data from Elasticsearch. Yesterday everything was working fine, but today it started to give me an error. I looked over the structure of the data redis is storing for the key and some of the deep inner values for keys were changed to lists instead of dicts (that they are supposed to be). So, overnight redis data was modified by someone or something. Now I need a way to figure out which code changed it. If I launch the app after doing redic-cli flushdb and start using it, navigating here and there everything is working fine, and I can't find any apparent wrong code this way. The data for redis is set only in one place in the app code and it is set correctly. I looked into redis.log but it does not say which key it modified and when but this data could be crucial here.
So, I need to find out which code mistakenly modified the key. It could be separately run code by someone, could be some hidden specific side of the app (I doubt it is the case), or some bug within the redis itself. Maybe I would need to introduce some kind of additional observer that would run each time keys are changed writing when and which key was modified in redis. I am stuck and not sure what I could do here. Any suggestions would be greatly appreciated.
A few things you may try with Redis:
MONITOR is a debugging command that streams back every command processed by the Redis server. You may then see what command is modifying your key, from what client connection.
Redis Keyspace Notifications allow you to subscribe to Pub/Sub channels in order to receive events affecting the Redis data set in some way. You can subscribe to the key of interest.
CLIENT LIST command returns information and statistics about the client connections server in a mostly human readable format.
As you are suspicious from another code or someone modifying your data, you may want to use Redis 6 with ACLs, to control what clients can do what operations on what keys.

How can I check in RemoteFilte, if current node is primary or backup?

I have two nodes in Partitioned mode and I use Continuous Query. When I put value to cache I see RemoteFilter is working twice (on primary node and on backup node). How can I check in filter if current node is primary or backup?
Well, there are several methods on Affinity API to help you detect whether a node is a primary or backup. However, if the topology changes while checking the Affinity API, then you may end up on a primary node that became a backup or vice versa.
There is a way to check this deterministically, which is described in IGNITE-3878 ticket. This should come in the next release.

Apache Ignite Replicated Cache race conditions?

I'm quite new to Apache Ignite so please be gentle. My question is simple.
If I have a replicated cache using Apache Ignite. I write to this cache key 123. My cluster has 10 nodes.
First question is:
Does replicated cache mean that before the "put" call comes back the key 123 must be written to all 10 nodes? Or does the call come back immediately and the replication is done behind the scenes?
Second question is:
Lets say key 123 is written on Node 1. It's now being replicated to all other nodes. However a few microseconds later Node 2 tries to write key 123 with a different value. Do I now have a race condition? Or does Ignite somehow handles this situation in such a way where Node 2's attempt to write key 123 won't happen until Node 1's "put" has replicated across all nodes?
For some context, what I'm trying to build is a de-duplication system across a cluster of API machines. I was hoping that I would be able to create a hash of my API request (with only values that make the request unique) and write it to the Ignite Cache. The API request would only proceed if the cache does not already contain the unique hash (possibly created by a different API instance). Of course the cache would have an eviction policy to evict these cache keys after a few seconds because they won't be needed anymore.
REPLICATED cache is the same as PARTITIONED with infinite number of backups and some optimizations. So it has primary partitions that distributed across nodes according to affinity function.
Now when you perform update, request comes to primary node, and primary node, in it's turn, updates all backups. Property CacheConfiguration.setWriteSynchronizationMode() is responsible for the way in which entries will be updated. By default it's PRIMARY_SYNC, which means that thread which calls put() will wait only for primary partition update, and backups will be updated asynchronously. If you set it to FULL_SYNC, thread will be released only when all backups updated.
Answering your second question, there will not be a race condition, because all requests will come to primary node.
Additionally to your clarification, if backup node wasn't updated yet, get() request will go to primary node, so in PRIMARY_SYNC mode you'll never get null if primary partition has a value.

Passive Replication in Distributed Systems - Replacing the Primary Server

In a passive replication based distributed system, if the primary server fails, one of the backups is promoted as primary. However, suppose that the original primary server recovers, then how do we switch back the primary server to it from the current backup?
I was wondering
if the failed primary server recovers, it must be incorporated into the system as a secondary and updated to reflect the most accurate information at the given point of time. To restore it as the primary server, it can be promoted as the primary in case the current primary (which was originally a backup) fails, otherwise, if required the current primary can be blocked for a while, the original primary promoted as primary again and the blocked reintroduced as backup.
I could not find an answer to this question elsewhere and this is what I feel. Please suggest any better alternatives.
It depends on what system you're looking at. Usually there's no immediate need to replace the backup when the original primary server recovers; if there is, you'd need to synchronize the two and promote the original primary.
Distributed synchronization (or consensus) is a hard problem. There's a lot of literature out there and I recommend that you read up. An example of a passively replicated system (with Leaders/Followers/Candidates) is Raft, which you could start with. A good online visualization can be found here, and the paper is here.
ZAB and Paxos are worth a read as well!