Data loss in AWS Elastic Cache - redis

We are using AWS Elastic cache with clustered mode and 3 nodes.
We don't use persistence with AWS Elastic cache hence we don't have backup /restore of the data stored.
So, i would like to understand what is the probability of losing my data in my setup. I assume this will only happen if all the nodes go down ?
Best Regards,
Saurav

Related

Delta Table transactional guarantees when loading using Autoloader from AWS S3 to Azure Datalake

Trying to use autoloader where AWS S3 is source and Delta lake is in Azure Datalake Gen. When I am trying to read files it gives me following error
Writing to Delta table on AWS from non-AWS is unsafe in terms of providing transactional guarantees. If you can guarantee that no one else will be concurrently modifying the same Delta table, you may turn this check off by setting the SparkConf: "spark.databricks.delta.logStore.crossCloud.fatal" to false when launching your cluster.
Tried setting up settings at cluster level and it works fine. My question is, is there any way we can ensure transactional guarantee wile loading data from AWS3 to Azure Datalake (Datalake is backend storage for our Delta Lake). We don't want to set "spark.databricks.delta.logStore.crossCloud.fatal" at Cluster level. Will there be any issue if we do and will it be a good solution for a production ETL pipeline?
This warning appears when Databricks detects that you're doing the multicloud work.
But this warning is for case when you're writing into AWS S3 using Delta, because AWS doesn't have atomic write operation (like, put-if-absent), so it requires some kind of coordinator process that is available only on AWS.
But in your case you can ignore this message because you're just reading from AWS S3, and writing into Delta that is on Azure Datalake.

Is there a way or tool that push cassandra data into AWS for backup purpose?

I'm working as cassandra cluster DevOps engr. wanted to know is there a way or tool that push cassandra data into AWS for backup purpose.I have cassandra cluster that is not in AWS. I explored netflix-priam but as per my understanding it needs cassandra to be hosted on AWS itself then it takes backups on EBS. my question is why i need to install cassandra cluster on AWS if i already have on-premise working cassandra. I have also read about cassandra-snapshotter & table-snap code in github,but dont want to use that. So again asking, is there such tool other than tablesnap,cassandra-snapshotter & Netflix-priam ??
Please help
Thanks

Redis on Azure VM vs Azure Redis Cache

We have checked both Redis installed in Azure VM and Azure Redis Cache both are working same I can't see a difference in the performance Have anyone used both in large scale application if so can anyone share the performance and durability of both ?
Have analysed the following
Monitoring
In-zone replication
Multi-zone replication
Auto fail-over
Data persistence
Backup
Pricing
SSL Authentication & Encryption
All the above Azure redis have the upper hand
Still I want make sure which one is the best
Does using VM has any bottlenecks ?
I would go for Azure Redis Cache. Mainly because its fully managed. At the end of the day you do have nodes under the hood. But why should you care for maintaining a VM? Hotfixes? Patches, Seucirty Updates ..etc ..etc.
I would ask the question the other way around. Why should you use VMs at all?
MG

Ignite data backup in hard disk

So i'm totally new to ignite here. Is there any configuration or strategy to export all data present in the cache memory to the local hard disk in ignite.
Basically what i'm hoping for is some kind of a logger/snapshot that shows the change in data when any kind of sql update operation is performed on the data present in the caches.
If someone could sugest a solution, i'd appreciate it a lot.
You can create and configure persistence store for any cache [1]. If cluster is restarted, all the data will be there and can be reloaded into memory using IgniteCache#loadCache(..) method. Out of the box Ignite provides integration with RDBMS [2] and Cassandra [3].
Additionally, in one of the future versions (most likely next 2.1) Ignite will provide a local disk persistence storage which will allow to run with a cold cache, i.e. without explicit reloading after cluster restart. I would recommend to monitor dev and user Apache Ignite mailing lists for more details.
[1] https://apacheignite.readme.io/docs/persistent-store
[2] https://apacheignite-tools.readme.io/docs/automatic-rdbms-integration
[3] https://apacheignite-mix.readme.io/docs/ignite-with-apache-cassandra

database Vs cache management in deepstream

I was wondering about how deepstream decides to store an info in cache vs database if both of them are configured. Can this be decided by the clients?
Also, when using redis will it provide both cache and database functionality? I would be using amazon elastic cache with redis backend for the same.
It stores it in both, first in the cache in a blocking way and outside the critical path in the database in a non-blocking way.
Here's an animation illustrating this.
You can also find more information here: https://deepstream.io/tutorials/core/storing-data/