Gridgain as high speed data storage - ignite

I deployed Gridgain server in Kubernets with native persistence enabled. Data read speed in expected level. Also it use as primary data storage. So data should not be lost. I faced some issues several times and solution given as clean work directory.
Can I know is Gridgain suitable for primary database?

Related

Can I replace Redis cache with Cosmos DB?

Can i use azure cosmos db instead of redis cache for server side caching , i feel that cosmos Db also provides key value storage, has geo replication , read write access and lower latency than redis cache
If you're still reading this 2 years later note the following. The answer is yes but the real story is that they work better together. Azure Cache for Redis now has an Enterprise Tier through the same Marketplace tile. This gives you the ability to deploy Redis in an Active-Active model across multiple regions where all instances are readable and writeable with conflict resolution built into the different datatypes that Redis supports. Couple that with higher performance through the redis enterprise proxy and up to 5 9's of availability gives you additional options to choose from. Azure Cache for Redis Enterprise (ACRE) in front of Cosmos is a real option as ACRE has sub-millisecond latency capabilities. Note, I work for Redis Labs and have seen this work and deployed it myself.
Redis is an in-memory datastore hence it's primary use-case is in-memory caching. Since it is a Key-value store, it has generally limited query ability, only allowing queries by primary key.
While, CosmosDB is Globally distributed, horizontally scalable, multi-model database service. It becomes handy in scenarios where you need the ability to query over heterogeneous data.
Those two are totally for different purposes, even Microsoft has redis cache as a service apart from CosmosDB only to serve this purpose.
Cosmos is probably going to be more expensive, from a cost perspective, than using Redis - depending on your throughput.
The one big benefit you can achieve with Cosmos is multi-read regions so your availability could increase and also the latency to your users if they're reading from a Cosmos region closer to them.

REDIS Server Configuration

I am working on a migration project from Oracle to Redis, my Oracle DB size is 1 TB, can you please suggest the hardware configuration for Redis. I am planning to have a master with 2 slaves for the Redis server.
What is the best option for Redis to have high availability?
Is the master-slave architecture is fine? If yes can I have all the master and slaves on the same server? If yes what are the disadvantages will occur?
Please suggest me the best option for high availability for my Redis server.
Considering the data size you can utilize redis cluster to store your data.
When designed properly, this is expected to provide the high availability and partitioning your data among multiple masters in the cluster.
To identify its suitability, you need to perform some kind of benchmarks with the real data and real queries expected from your application.
You can use redis-benckmark utility provided by redis out of the box and simulate the expected data and calls to get a picture of what's expected

Redis: Efficient cluster of servers for large key set

I have a very large set of keys, 200M keys, with small values, <100 bytes, to store and I'm trying to use Redis. The problem is such that I have 10 Redis DB to split the keys over, but currently I'm on a single server with those 10 Redis DB. By a Redis DB I mean using SELECT. From my calculations it looks like I'm going to blow out memory. I think I'll need over 4TB of memory for this case! What are my options? First, my calculation is based on 10000 keys with 100 byte values taking 220MB of RAM (this is from a table I found). So simply put (2*10^8 / 10^4) * 220MB = 4.4TB.
If my calculation looks correct, what are my options? I've read on different posts that Redis VM is no longer an option. Can I use a Redis cluster? This still appears to require too many servers to be practical. I understand I could switch to another DB, but I'd like that to be the last resort option.
Firstly, using shared databases (i.e. the SELECT command) isn't a recommended practice since all of these databases are essentially managed by the same Redis process. It is preferable having 10 separate Redis processes (even on the same server) in order to avoid contention (more info here).
Next, there are ways to reduce the memory footprint of your database. You could, for example, perform client-side compression (see here) or consider other optimizations such as using Hashes to keep multiple values (as described here).
That said, a Redis server is ultimately bound by the amount of RAM that the host provides. Once you've reached that limit you'll need to shard your database and use a Redis cluster. Since you're already using multiple databases this shouldn't pose a big challenge as your code should already be compatible with that to a degree. Sharding can be done in one of three approaches: client, proxy or Redis Cluster. Client-side sharding can be implemented in your code or by the Redis client that you're using (if the client library that you're using supports that). Redis Cluster (v3) is expected to be released in the very near future and already has a stable release candidate. As for proxy-based sharding, there are several open source solutions out there, including Twitter's twemproxy, Netflix's dynomite and codis. Additional information about sharding and partitioning can be found here.
Disclaimer: I work at Redis Labs. Lastly, AFAIK there's only one Redis-as-a-Service provider that already provides built-in support for clustering Redis. Redis Labs' Redis Cloud is a fully-managed service that can scale seamlessly to any required capacity. Our clusters support both the '{}' hashtag standard as well as sharding by RegEx - more about this can be found here.
You can use LMDB with Dynomite to store data beyond your memory capacity. LMDB uses both disk and memory to store data. Dynomite make LMDB to be distributed.
We have done a POC with this combo and they work nicely together.
For more information, please check out our open issue here:
https://github.com/Netflix/dynomite/issues/254

IBM Worklight 6.2. Analytics topology. Master and data Nodes

I'm reading about production topology for the Analytics part of Worklight 6.2.
https://www-01.ibm.com/support/knowledgecenter/api/content/SSZH4A_6.2.0/com.ibm.worklight.monitor.doc/monitor/t_setting_up_production_cluster.html
It explains that nodes can act both as Master Node or as Data Node or only as one of them.
My question is why we should configure dedicated nodes, Master OR Data instead of configuring all the nodes for both Master AND Data.
I assume the the node (only one) acting as master will provide worst performance in its Data role but on the other hand the configuration will be simpler and the high availability will be higher.
Thank you.
Your assumption is correct.
A master node is responsible for handling communication between the data nodes. The data nodes will be responsible for indexing data. Having dedicated master and data nodes will allow them to focus their processing time and memory on their specific tasks. However, as you mentioned, in some cases its not worth doing this to complicate the configuration.
Another reason is that its not necessary to put a master node on a high performing machine. You can reserve the better machines for the data nodes.
The analytics console uses Elasticsearch under the covers. It would be worth looking up the benefits and drawbacks of choosing master and data nodes in Elasticsearch since it is an open source library and there are several resources available for it.
Edit:
As you can imagine, there is no one size fits all configuration. The configuration depends on several factors such as:
How long you wish to keep data stored
How many machines you have to dedicate to analytics
How verbose your client logs have been set
Your preferences between availability and performance
In my personal tests, I typically keep each node as a data and master node. Its possible that in the future we will document how the different configurations affect performance.

Is Infinispan an improvement of JBoss Cache?

According to this link which belongs to JBoss documentation, I understood that Infinispan is a better product than JBoss Cache and kind of improvement the reason for which they recommend to migrate from JBoss Cache to Infinispan, that is supported by JBoss as well. Am I right in what I understood? Otherwise, are there differences?
One more question : Talking about replication and distribution, can any one of them be better than the other according to the need?
Thank you
Question:
Talking about replication and distribution, can any one of them be better than the other according to the need?
Answer:
I am taking a reference directly from Clustering modes - Infinispan
Distributed:
Number of copies represents the tradeoff between performance and durability of data
The more copies you maintain, the lower performance will be, but also the lower the risk of losing data due to server outages
use of a consistent hash algorithm to determine where in a cluster entries should be stored
No need to replicate data on each node that takes more time than just communicating hash code
Best suitable if no of nodes are high
Best suitable if size of data stored in cache is high.
Replicated:
Entries added to any of these cache instances will be replicated to all other cache instances in the cluster
This clustered mode provides a quick and easy way to share state across a cluster
replication practically only performs well in small clusters (under 10 servers), due to the number of replication messages that need to happen - as the cluster size increases
Practical Experience:
I are using Infinispan cache in my running live application on Jboss server having 8 nodes. Initially I used replicated cache but it took much longer time to respond due to large size of data. Finally we come back to Distributed and now its working fine.
Use replicated or distributed cache only for data specific to any user session. If data is common regardless of any user than prefer Local cache that's created separately for each node.