latency metrics for elastic cache redis - redis

We are moving some of our caching from in memory cache (Caffeine library) to elastic cache (redis). In the course of running performance numbers. When we observe high latency numbers (from the application side code we push a metrics before calling redisson's library
long startTimeMillis = System.currentTimeMillis();
String redisKey = key.getKeyForRedis();
.....
resultRBucket.get();
REDIS_CACHE_HIT_COUNT.inc();
REDIS_CACHE_HIT_LATENCY.update(System.currentTimeMillis() - startTimeMillis);
What aws side metrics is useful to see what Redis thinks its latency for key lookups are?

Related

Replication across AZ

We have a 6-node cluster setup in which 3 server nodes are spread across 3 availability zones and each zone also has a client node. All is set up in a Kubernetes-based service.
Important configurations,
storeKeepBinary = true
cacheMode = Partitioned (some cache's about
5-8, out of 25 have this as TRANSACTIONAL)
AtomicityMode = Atomic
backups = 1 readFromBackups = false
no persistence for these tests
When we run it locally on physical boxes, we get a decent throughput. However when we deploy this in the cloud in an AZ-based setup in k8s. We see a steep drop. We can only get a performance comparable to on-prem cluster tests when we keep only a single cache node without any backups (backups=0).
I get that a different hardware and n/w latency in cloud come into play. And while i investigate all that wrt to the differences in cloud, i want to understand if there are some obvious behavioral issues under the cover wrt to ignite that i trying to understand a few things outline below,
Why should cache get calls be slower? It's a partitioned data, so lookup should be by key and since we have turned off 'readFrombackup', it should always go the primary partition. So adding number of cache servers should not change any of the get call latencies.
Similar for 'inserts/puts', other than the caches where the atomicity is 'Transactional', everything else should be the same when we go from one cache to 3 caches.
Any other areas anyone can suggest which i can take a look from configuration/etc.
TIA

Is Redis deployment on separate node really performant?

Based on below latentcy comparisons given at https://gist.github.com/jboner/2841832 SSD Read is almost similar to Network Read in same datacenter in terms of cost.
I am trying to understand if Redis deployment on separate node/cluster will be performant due to network latency introduced? Won't deploying Redis on app nodes itself be a better option? This is assuming app nodes are using SSD disks and data is sharded across app nodes.
This is for a large deployment with more than 10 app nodes.
Obviously if you can run Redis on the same node as your app you'll get better latency than over the network (and you can also use Unix socket to reduce it more).
But the questions you need to ask your self:
How are you going to shard the data between the app nodes?
What about high availability?
Are there cases where one app node will need data from another node?
Can you be sure the load will be evenly distributed between the nodes so no Redis node will get out of memory?
What about scale out? How are you going to reshard the data?

Redis Streams vs Kafka Streams/NATS

Redis team introduce new Streams data type for Redis 5.0. Since Streams looks like Kafka topics from first view it seems difficult to find real world examples for using it.
In streams intro we have comparison with Kafka streams:
Runtime consumer groups handling. For example, if one of three consumers fails permanently, Redis will continue to serve first and second because now we would have just two logical partitions (consumers).
Redis streams much faster. They stored and operated from memory so this one is as is case.
We have some project with Kafka, RabbitMq and NATS. Now we are deep look into Redis stream to trying using it as "pre kafka cache" and in some case as Kafka/NATS alternative. The most critical point right now is replication:
Store all data in memory with AOF replication.
By default the asynchronous replication will not guarantee that XADD commands or consumer groups state changes are replicated: after a failover something can be missing depending on the ability of followers to receive the data from the master. This one looks like point to kill any interest to try streams in high load.
Redis failover process as operated by Sentinel or Redis Cluster performs only a best effort check to failover to the follower which is the most updated, and under certain specific failures may promote a follower that lacks some data.
And the cap strategy. The real "capped resource" with Redis Streams is memory, so it's not really so important how many items you want to store or which capped strategy you are using. So each time you consumer fails you would get peak memory consumption or message lost with cap.
We use Kafka as RTB bidder frontend which handle ~1,100,000 messages per second with ~120 bytes payload. With Redis we have ~170 mb/sec memory consumption on write and with 512 gb RAM server we have write "reserve" for ~50 minutes of data. So if processing system would be offline for this time we would crash.
Could you please tell more about Redis Streams usage in real world and may be some cases you try to use it themself? Or may be Redis Streams could be used with not big amount of data?
long time no see. This feels like a discussion that belongs in the redis-db mailing list, but the use case sounds fascinating.
Note that Redis Streams are not intended to be a Kafka replacement - they provide different properties and capabilities despite the similarities. You are of course correct with regards to the asynchronous nature of replication. As for scaling the amount of RAM available, you should consider using a cluster and partition your streams across period-based key names.

Ignite servers go down if put large data into cluster

I deployed an ignite cluster in yarn. The cluster has 5 servers. Each server has 10GB memory and 8GB heap. I was trying to write a lot of data to ignite cache. Each item is an integer array whose length is 100K. The backups is 2. When I write 3980 items to the ignite cache, the cluster's heap is almost full. But instead of reject writing, the servers went down one by one.
My questions are:
Is there a configuration or way to control the cache ratio of servers, so the heap won't be full and servers won't go down?
Apparently, servers go down when write too much into cache seems not good for users. I'm wondering why ignite will let this happen, if user uses default configuration.
Apache Ignite, as well as Java Virtual Machine, is NOT responsible for managing or controlling a size of data sets that are placed into Java heap. This is the reason why OutOfMemoryError is presented in Java API because it's a responsibility of an application to handle its data sets and make sure that they fit into the heap.
You can set up eviction policy and Ignite may either move data to off-heap region or swap or completely remove from the memory.
Refer to my foreword above. This is the responsibility of an application. Ignite can assist here wit its eviction policy, off-heap mode and ability to scale out.

Google Compute Engine Load Balancer limits

I'm thinking of using Google Compute Engine to run a LOT of instances in a target pool behind a network load balancer. Each of those instances will end up real-time processing many large data streams, so at full scale and peak times there might be multiple Terabytes per second go through.
Question:
Is there a quota or limit to the data you can push through those load balancers? Is there a limit of instances you can have in a target pool? (the documentation does not seem to specify this)
It seems like load balancers have a dedicated IP (means it's a single machine?)
There's no limit on the amount of data that you can push through a LB. As for instances, there are default limits on CPUs, persistent or SSD disks, and you can see those quotas in the Developers Console at 'Compute' > 'Compute Engine'> 'Quotas', however you can always request increase quota at this link. You can have as many instances that you need in a target pool. Take a look to the Compute Engine Autoscaler that will help you to spin up machines as your service needs. The single IP provided for your LB is in charge of distributing incoming traffic across your multiple instances.