What exactly is Gemfire? - gemfire

I have been studying 'in-memory data grids' and saw the term 'gemfire'. I'm confused. It seems that gemfire is a term to refer to technologies that store and manipulate data like a database but in the computer memory, isn't it? What exactly is gemfire?
Which technologies can I use to work with 'in-memory data grids' in Node.js?
I saw some applications, like 'Apache Geode' and 'Pivotal gemfire'. How do I work with them? Is it like work with some cache technologies (like Redis or Memcached)? In geode's case, are the data only accessed through an API or are there other ways to access this one?

There are many products that qualify as a "in-memory data grid", GemFire is one of the leading ones. From this article the main ones are:
VMware Gemfire (Java)
Oracle Coherence (Java)
Alachisoft NCache (.Net)
Gigaspaces XAP Elastic Caching Edition (Java)
Hazelcast (Java)
Scaleout StateServer (.Net)
Most of these products have drivers in many languages. You can access data in GemFire over REST, or over the native node.js client.
Apache Geode is the open source version of GemFire. It is much more powerful than memcached and Redis; You can use Geode not only as a cache, but as a store of record (it has native persistence). It has an Object Query Language (OQL) engine built in, which allows you to query nested objects, has powerful features like Continuous Queries and replication over WAN, among others. Geode also has protocol adapters for memcached and Redis, allowing your memcached and Redis clients to connect to Geode.

I would add to the list of "In memory data grid" solutions:
Apache Ignite
Infinispan
They also provide powerful features.
For feature comparison you can use this website: https://db-engines.com/en/system/Hazelcast%3BIgnite .
Last note: GemFire is now a Pivotal solution.

GemFire is a high performance distributed data management infrastructure that sits between application cluster and back-end data sources.
With GemFire, data can be managed in-memory, which makes the access faster.
Kindly check the Link below for further details
https://www.baeldung.com/spring-data-gemfire

Related

Syncronize multiple instances of Spring Cache with a Redis lock

I'm building a Spring Boot application that uses Spring Cache with a Redis backing store and needs to synchronize the updates made to the cache.
The caching is not made on the fly, but by an scheduled process that updates the cache periodically.
The algorithm I came up with is:
periodically the instances will check if the Redis cache is older than some predetermined time
if that's the case, the instance will try to acquire a lock on some Redis key
if the instance successfully locks the key, it will then proceed with the update
if some other instance already locked the key, move on
all instances can still read the cache
Everything is more or less already built, all I need is to implement the locking/releasing mechanism.
Spring Cache is using Lettuce to interact with Redis, what is the best way to get an connection to Redis and manage the locking mechanism?
As you may already be aware, Spring's Cache Abstraction provides simple coordination amongst multiple Threads in a single Spring [Boot] application process using the sync attribute on the #Cacheable annotation (see ref doc).
NOTE: Despite the comment ("... use the sync attribute to instruct the underlying cache provider to lock the cache entry while the value is being computed. As a result, only one thread is busy computing the value, while the others are blocked until the entry is updated in the cache.") in the documentation, the locking mechanics is handled by the core framework itself, and in most cases, not the provider. Anyway...
However, this "coordination" is only per-process and will not work for multiple Spring [Boot] application instances, or (OS) JVM processes. In this case, you need some form of distributed locking across your multiple Spring [Boot] application instances to coordinates access to shared cache entries stored in the single Redis server (cluster) shared by your Spring [Boot] application instances.
I am no Redis expert (I am still learning), but I am familiar with similar NoSQL stores (Apache Geode/VMware GemFire, Hazelcast, etc) and distributed locking mechanisms. I see that distributed locking is possible to achieve with Redis as well. In a quick search, I found "Distributed Locking" in Redis, and specifically, "Building a lock in Redis". This is probably the best way to go.
In addition, if you want to make this distributed locking automatically/transparently available through Spring's Cache Abstraction, then you could possibly create a custom AOP Aspect and weave this Aspect together with the framework provided Caching Aspect (Interceptor), being conscious of ordering, as 1 idea.
Alternatively, you could implement wrapper implementations for the Spring Cache and CacheManager SPI interfaces that implement distributed locking on top of the core Redis Cache and CacheManager provider implementations provided by Spring Boot/Spring Data Redis.
Of course, there are multiple ways to go about this. Just tossing out more ideas, but have a look at the distributed locking information in the book.

how to achieve multi tenancy in redis?

Since I am fairly new with redis, I am trying to explore options and see how can I achieve multi tenancy with redis.
I read some documentation on redisLabs official page and looks like redis cluster mode supports multi tenancy out of the box with redis enterprise.
I am wondering if such a solution for multi tenancy is available in sentinel mode as well?
I may be completely confused with the multi tenancy that redis enterprise provides. May be it works in a sentinel mode also but nothing seems very clear to me.
Can someone throw some light on multi tenancy in redis and what mode supports it?
If you are going to use redis-cluster, then only one DB is supported.
Redis Cluster does not support multiple databases like the stand alone version of Redis. There is just database 0 and the SELECT command is not allowed.
If you are not going to use cluster mode, then you may take a look on the message posted by the creator of Redis about multiple databases (years ago)
I understand how this can be useful, but unfortunately I consider
Redis multiple database errors my worst decision in Redis design at
all... without any kind of real gain, it makes the internals a lot
more complex. The reality is that databases don't scale well for a
number of reason, like active expire of keys and VM. If the DB
selection can be performed with a string I can see this feature being
used as a scalable O(1) dictionary layer, that instead it is not.
With DB numbers, with a default of a few DBs, we are communication
better what this feature is and how can be used I think. I hope that
at some point we can drop the multiple DBs support at all, but I think
it is probably too late as there is a number of people relying on this
feature for their work.
Salvatore's message
Redis cluster documentation
What i may suggest is prefixing. We are using this method in a SaaS application and all different data types are prefixed with related customer name. We handle some of the operations on application layer.
If you want to go single instance/multiple database then you need to manage them on your codebase via using select command. There may be some libraries to manage them. One of the critical thing is that;
All databases are still persisted in the same RedisDB / Append Only file.

Can I replace Redis cache with Cosmos DB?

Can i use azure cosmos db instead of redis cache for server side caching , i feel that cosmos Db also provides key value storage, has geo replication , read write access and lower latency than redis cache
If you're still reading this 2 years later note the following. The answer is yes but the real story is that they work better together. Azure Cache for Redis now has an Enterprise Tier through the same Marketplace tile. This gives you the ability to deploy Redis in an Active-Active model across multiple regions where all instances are readable and writeable with conflict resolution built into the different datatypes that Redis supports. Couple that with higher performance through the redis enterprise proxy and up to 5 9's of availability gives you additional options to choose from. Azure Cache for Redis Enterprise (ACRE) in front of Cosmos is a real option as ACRE has sub-millisecond latency capabilities. Note, I work for Redis Labs and have seen this work and deployed it myself.
Redis is an in-memory datastore hence it's primary use-case is in-memory caching. Since it is a Key-value store, it has generally limited query ability, only allowing queries by primary key.
While, CosmosDB is Globally distributed, horizontally scalable, multi-model database service. It becomes handy in scenarios where you need the ability to query over heterogeneous data.
Those two are totally for different purposes, even Microsoft has redis cache as a service apart from CosmosDB only to serve this purpose.
Cosmos is probably going to be more expensive, from a cost perspective, than using Redis - depending on your throughput.
The one big benefit you can achieve with Cosmos is multi-read regions so your availability could increase and also the latency to your users if they're reading from a Cosmos region closer to them.

How to handle data from an external, independent data source with Pivotal GemFire?

I am new to GemFire.
Currently we are using an MySQL DB and would like to move to GemFire.
How to move the existing data stored in MySQL over to GemFire? I.e., is there any way to to import existing MySQL data into GemFire?
There are many different options available for you to migrate data from 1 data store (e.g. an RDBMS like MySQL) to an IMDG (e.g. Pivotal GemFire). Pivotal GemFire does not provide any tools for this purpose OOTB.
However, you could...
A) Write a Spring Batch application to migrate all your data from MySQL to Pivotal GemFire in 1 large swoop. This is typical for most large-scale conversion processes, converting from 1 data store to another, either as part of an upgrade or a migration.
The advantage of using Pivotal GemFire as your target data store is that it stores Java Objects. So, if you are, say, using an ORM tool (e.g. Hibernate) to map the data stored in your MySQL database tables back to your application domain objects, you can then immediately and simply turnaround and store those same Objects directly into a corresponding Region in Pivotal GemFire. There is no additional mapping required to store an Object into GemFire.
Although, if you need something less immediate, then you can also...
B) Take advantage of Pivotal GemFire's CacheLoader, and maybe even the CacheWriter mechanisms. The CacheLoader and CacheWriter are implementations of the "Read-Through" and "Write-Through" design patterns.
More details of this approach can be found here.
In a nutshell, you implement a CacheLoader to load data from some external data source on Cache miss. You attach, or register the CacheLoader with a GemFire Region when the Region is created. When a Key (which can correspond to your MySQL Table Primary Key) is requested (Region.get(key)) and an entry does not exist, then GemFire will consult the CacheLoader to resolve the value, providing you actually registered a CacheLoader with the Region.
In this way, you slowly build up Pivotal GemFire from the MySQL RDBMS based on need.
Clearly, it is quite likely Pivotal GemFire will not be able to store all the data from your RDBMS in "memory". So, you can enable both Persistence and Overflow [to Disk] capabilities. By enabling Persistence, GemFire will load the data from it's own DiskStores the next time the nodes come online, assuming you brought them down prior.
The CacheWriter mechanism is nice if you want to run both Pivotal GemFire and MySQL in parallel for while, until you can shift enough of the responsibilities of MySQL over to GemFire, for instance. The CacheWriter will write back to your underlying MySQL DB each time an entry is written or updated in the GemFire Region. You can even do this asynchronously (i.e. "Write-Behind") using GemFire's AsyncEventQueues and Listeners; see here.
Obviously, you many options at your disposal. You need to carefully way your options and choose an approach that best meets your application requirements and needs.
If you have additional questions, let me know.

What are the options to bulk/batch load data into Apache Geode(Gemfire)?

We need to load millions of key/values into Apache Geode and we'd like to know what are some the options available. Our values happen to be in the 256kb range.
There are several options depending on your application requirements/SLAs or whether you need to perform conversion or other transformations, etc.
Out-of-the-box, Apache Geode provides the Cache & Region Snapshot Service. This is useful when you want to migrate data from 1 existing Apache Geode cluster to another, for instance. Not so useful if your data is coming from an external source, like a RDBMS.
Another option is to lazily load the data based on need. This can be accomplished by implementing the CacheLoader interface and registering the CacheLoader with a Region. Obviously, you could create a CacheLoader implementation that intelligently loads a block of data based on some rules/criteria in addition to loading and returning the single value of interests based on the current requests.
A lot of times, users create an external, custom Conversion process or tool to extract, transform and bulk load (ETL) a bunch of data into Apache Geode. This is typical in complex Use Cases or requirements. However, it is highly advisable to use perhaps a framework/tool like...
Spring XD (now Spring Cloud Data Flow on Pivotal's Cloud Foundry (PCF)) is great ETL tool and pipeline for creating stream-based applications. Spring XD / SCDF provides many different options for "sources" and "sinks" (e.g. GemFire Server). In addition to sources & sinks, you can even "tap" the stream to process the data with "Processors". So whether you are doing real-time stream or batch-oriented data operations (e.g. bulk loads), Spring XD is a great option.
I am sure Google might provide other answers on how to perform ETL with a KeyValue store like Apache Geode.
Hope this helps get you going.
Cheers,
John
We have very limited options to load Gemfire regions .
1) Spring batch:
Create Gemfire writer for load data and remove data
Create batch configuration and lod it
2) Apache Spark
https://www.linkedin.com/pulse/fast-data-access-using-gemfire-apache-spark-part-vaquar-khan-/