Apache Ignite persist to disk - ignite

Is there any easy way in Ignite to persist to disk after the Ignite servers are up and running and filled in with data?
I have seen https://apacheignite.readme.io/docs/distributed-persistent-store#section-usage but it seems you need to supply the XML property at startup of your Ignite topology in order to persist to disk.

There's no easy way I can think of. You will need to start new nodes, with persistent data region, and somehow transfer data to those nodes to newly created persistent caches. The easiest way will be to create them as a part of new cluster.

Related

Apache Ignite: Control data

I want to ask a question:
How does Apache Ignite distribute data?
How can I control the distribution in Apache Ignite?
For example, I want to distribute more data to some nodes (because they have more memory, and able to save more data), and less data to others nodes
Thank you!!
If you want to do this for one cache you can implement your version of affinity function (https://apacheignite.readme.io/docs/affinity-collocation#section-affinity-function), but this is not recommended because it will not be scalable. If you just want to specify mapping of node to the new cache you can try nodeFilter in CacheConfiguration.

Apache Ignite with Kudu

I am trying to position Ignite as Query Grid for databases such as Kudu, Hbase, etc.. Thus, all data silos will be queried over Ignite with read/write through. How this is possible? Are there any integrations with them?
The first time, SQL query runs, it will need to pull the data from such databases and create the key/value on Ignite.
Then, if one/two/three node goes down, eventually the data stored in memory will be lost. How the recovery is done or not possible?
Thanks
CK
Ignite SQL is unable to load specific data by query from external store, it's only possible on API get()/getAll() operations. To be able querying data you need load them into Ignite at first, for example, with loadCache(). Internally this function does a query to target database and transforms response into key-value manner.
BTW, if you enable persistence in Ignite, it will know the structure of data and will be able to query them, even if not all entries loaded into memory.
In case of node crash traditionally used data replication between nodes. In Ignite it's named backups. If you loose more nodes than backups set, then you'll need to preload data from store again.

Can we copy Apache Ignite Cluster to another Ignite cluster?

I want to back up entire Ignite cluster so that back up clutser will be used if the original(active) cluster is down. Is there any approach for this?
If you need two separate clusters with replication across data center, it would be better to look at GridGain solutions that supports Datacenter Replication.
Unfortunately, Ignite does not support DR.
With Apache Ignite you can logically divide you cluster to two zones to have guarantee that every zone contains full copy of data. However, there is no way to choose primary node for partitions manually. See, AffinityFunction and affinityBackupFilter() method of standard implementations.
As answered above, ready made solution is only available in paid version. Open source Apache ignite provides ability to take cluster wide absolute snapshot. You can add a cron job in your ignite cluster to take this snapshot and add another job to copy snapshot data to object storage like S3.
On the other side, you download this data node wise to work directories of respective nodes as per manual restore procedure and start the cluster. It should automatically activate when all baseline nodes are started successfully and your cluster is ready to use.

Redis: how to save to disk only 1 database?

I'd like to install N redis server instances in master-slave mode.
By idea they should save to disk database-0 and do not save database-1 as security sensible data to keep it in memory only.
The same for the replication: all databses to replicate and each of slave nodes must save database-0 only but not database-1.
Is it possible to do?
It is not possible to do this. This level of fine-grained control requires multiple redis instances per persistence level and replication level.
This is perfectly fine and the recommended way to do this over redis, and in fact will give you better performance, as redis is single threaded.

Can Infinispan be forced to fully replicate to a new cluster member

Looking through the Infinispan getting started guide it states [When in replication mode]
Infinispan only replicates data to nodes which are already in the
cluster. If a node is added to the cluster after an entry is added, it
won’t be replicated there.
Which I read as any cluster member will always be ignorant of any data that existed in the cluster before it became a cluster member.
Is there a way to force Infinispan to replicate all existing data to a new cluster member?
I see two options currently but I'm hoping I can just get Infinispan to do the work.
Use a distributed cache and live with the increase in access times inherent in the model, but this at least leaves Infinispan to handle its own state.
Create a Listener to listen for a new cache member joining and iterate through the existing data, pushing it into the new member. Unfortunately this would in effect cause every entry to replicate out to the existing cluster members again. I don't think this option will fly.
This information sounds as misleading/outdated. When the node joins a cluster, a rebalance process is initiated and when you query for these data during the rebalance prior to delivering these data to the node, the entry is fetched by remote RPC.