Apache Ignite Cache - get key from value - ignite

Problem
Need to implement a bi-directional key value store. Cache entry is populated into the cache using a key (String) mapping to a value(Long). However, the the cache needs to handle both key->value and value->key lookup. What is the most efficient way to do this?
Note: value->key mapping does not need to be synchronized with key->value in real time, but cannot fall behind key->value mapping for too long (10-15 seconds max).
Naive Ignite Implementation
Use ignite continuous query to monitor changes, upon modification, update another (value->key) cache
Are there better way of achieving this goal? Is ignite the wrong tool to use?

You can use SQL queries and have secondary index on value. Your proposed approach should also work.

Related

Is there any way to record all keys of cache misses in Ignite?

I am facing a problem with hit-rate of Ignite through the below metrics exposed by Ignite
org_apache_dmp_user_mask_CacheHitPercentage{name="\"org.apache.ignite.internal.processors.cache.CacheClusterMetricsMXBeanImpl\"",} 98.48369598388672 org_apache_dmp_user_mask_CacheHitPercentage{name="\"org.apache.ignite.internal.processors.cache.CacheLocalMetricsMXBeanImpl\"",} 99.72936248779297
Although I tried to fulfill all data in a cache that will be used for serving requests, however somehow at the midnight (from 2 am to 5 am), the hit-rate reduced significantly.
hit-rate chart
So I'd like to record those cases for troubleshooting and understand what is happening at the midnight.
Anyone can help, please?
Write your own implementation of CacheStore (or rather extend the one that you're using currently). The method load or loadAll will be called whenever the system reads a key from the 3rd party storage. You can log the key there before reading the external key.

Making sure (distributed) caches only store the latest value in a distributed system

Let's say I want to use a built-in solution such as Redis or Memcached to cache database rows (as an example), to avoid recurrent costly trips to the database.
For the sake of the argument, let's assume I have a TABLE(id, x, y) and that I want to cache all rows so I never have to read directly from the database.
Questions:
Consider the following case: NodeA tries to update a given row's field x while NodeB tries to update y, then both simultaneously try to update the cache line. If they try to "manually" update the field they just changed to the row in the cache, if we follow the typical last-write-wins, one of the fields is going to be discarded, which is catastrophic. This makes me think I need to always fill the cache's rows with a full row read from the database.
But this by itself won't necessarily help me. If NodeA writes to x and loads the entire row in memory and then NodeB writes to y and reads the entire row in memory, if NodeB writes to the cache before NodeA then NodeB's changes will be overwritten! This makes me believe I need to always somehow version the rows both in the DB and in the cache. Is this the case? Memcached seems to have a compare and set primitive, but I see no such thing in Redis.
Even if 1. and 2. are not an issue, I still need to guarantee that my write / read has read-after-write consistency, otherwise it may happen that what I'm reading and intending to put in the cache is not necessarily the most up-to-date version. If that's the case, how can I make sure of this? By requiring w + r > n?
This seems to be a very common use-case, I'd guess it's pretty much a solved problem. What can I try to resolve this?
Key value stores as redis support advance data structures, such as HASHs.
If you're doing partial updates to cached entities (only a set of fields is updated as part of the super set), and given your goal is to avoid time-consuming database reads, simply save the table entry as a HASH K/V pairs (using HSET) and the use HGETALL for reading.
Redis OPS are atomic by nature, so that should solve your problems, if I got them right.
On a side note, if you're caching an entire entity yet doing partial updates, you should consider a simpler caching approach, such as read-through (making cache validity a reader-only concern).
As opposed to Database accesses. Redis cache access from different location unless somehow serialized, will always have the potential of being out of order when it comes to distributed systems, as there's always the execution environment (network, threading) to introduce possible delays.
Doing read-through caching will ensure data is always updated after the most recent write without the need to synchronize anything else.
This is how Facebook solved the issue with Memcached: http://nil.csail.mit.edu/6.824/2020/papers/memcache-faq.txt.
The idea is to use the concept of a lease: when a request for a cached value is received and there is no data for such key, a lease token (64 bits id) is returned.
When the webserver fetches the data from the database it can then store the data in the cache with that token. Every time an invalidation request is invoked on a key, a new lease token is created, and as such, if a put is attempted for an old token, the put ends up rejected.
As far as I understand, it's not really possible to (easily) replicate this behavior with Redis without resorting to LUA scripts.

Auto Syncing for Keys in Apache Geode

I have an Apache Geode setup, connected with external Postgres datasource. I have a scenario where I define an expiration time for a key. Let's say after T time the key is going to expire. Is there a way so that the keys which are going to expire can make a call to an external datasource and update the value incase the value has been changed? I want a kind of automatic syncing for my keys which are there in Apache Geode. Is there any interface which i can implement and get the desired behavior?
I am not sure I fully understand your question. Are you saying that the values in the cache may possibly be more recent than what is currently stored in the database?
Whether you are using Look-Aside Caching, Inline Caching, or even Near Caching, Apache Geode combined with Spring would take care of ensuring the cache and database are kept in sync, to some extent depending on the caching pattern.
With Look-Aside Caching, if used properly, the database (i.e. primary System of Record (SOR), e.g. Postgres in your case) should always be the most current. (Look-Aside) Caching is secondary.
With Synchronous Inline Caching (using a CacheLoader/CacheWriter combination for Read/Write-Through) and in particular, with emphasis on CacheWriter, during updates (e.g. Region.put(key, value) cache operations), the DB is written to first, before the entry is stored (or overwritten) in the cache. If the DB write fails, then the cache entry is not written or updated. This is true each time a value for a key is updated. If the key has not be updated recently, then the database should reflect the most recent value. Once again, the database should always be the most current.
With Asynchronous Inline Caching (using AEQ + Listener, for Write-Behind), the updates for a cache entry are queued and asynchronously written to the DB. If an entry is updated, then Geode can guarantee that the value is eventually written to the underlying DB regardless of whether the key expires at some time later or not. You can persist and replay the queue in case of system failures, conflate events, and so on. In this case, the cache and DB are eventually consistent and it is assumed that you are aware of this, and this is acceptable for your application use case.
Of course, all of these caching patterns and scenarios I described above assume nothing else is modifying the SOR/database. If another external application or process is also modifying the database, separate from your Geode-based application, then it would be possible for Geode to become out-of-sync with the database and you would need to take steps to identify this situation. This is rather an issue for reads, not writes. Of course, you further need to make sure that stale cache entries does not subsequently overwrite the database on an update. This is easy enough to handle with optimistic locking. You could even trigger a cache entry remove on an DB update failure to have the cache refreshed on read.
Anyway, all of this is to say, if you applied 1 of the caching patterns above correctly, the value in the cache should already be reflected in the DB (or will be in the Async, Write-Behind Caching UC), even if the entry eventually expires.
Make sense?

Sql query over Ignite CacheStore or over database

I am a beginner for Ignite, so I have some puzzles, one of which is as follows:when I try to query cache, whether it can look if memory contains or not. If not, then whether it will query database? If not,how to achieve such way?
Please help me if you know.Thx.
Queries work over in-memory data only. You can either use key access (operations like get(), getAll(), etc.) and utilize automatic read-through from the persistence store, or manually preload the data before running queries. For information on how effectively load large data set into the cache, see this page: https://apacheignite.readme.io/docs/data-loading

Redis full text search : reverse indexing or sunspot?

I have 3,5 millions records (readonly) actually stored in a MySQL DB that I would want to pull out to Redis for performance reasons. Actually, I've managed to store things like this into Redis :
1 {"type":"Country","slug":"albania","name_fr":"Albanie","name_en":"Albania"}
2 {"type":"Country","slug":"armenia","name_fr":"Arménie","name_en":"Armenia"}
...
The key I use here is the legacy MySQL id, so with some Ruby glue, I can break as less things as possible in this existing app (and this is a serious concern here).
Now the problem is when I need to perform a search on the keyword "Armenia", inside the value part. Seems like there's only two ways out :
Either I multiplicate Redis index :
id => JSON values (as shown above)
slug => id (reverse indexing based on the slug, that could do the basic search trick)
finally, another huge index specifically for autocomplete, as shown in this post : http://oldblog.antirez.com/post/autocomplete-with-redis.html
Either I use sunspot or some full text search engine (unfortunatly, I actually use ThinkingSphinx which is too much tied to MySQL :-(
So, what would you do ? Do you think the MySQL to Redis move of a single table is even a good idea ? I'm afraid of the Memory footprint those gigantic Redis key/values could take on a 16GB RAM Server.
Any feedback on a similar Redis usage ?
Before I start with a real answer, I wanted to mention that I don't see a good reason for you to be using Redis here. Based on what types of use cases it sounds like you're trying to do, it sounds like something like elasticsearch would be more appropriate for you.
That said, if you just want to be able to search for a few different fields within your JSON, you've got two options:
Auxiliary index that points field_key -> list_of_ids (in your case, "Armenia" -> 1).
Use Lua on top of Redis with JSON encoding and decoding to get at what you want. This is way more flexible and space efficient, but will be slower as your table grows.
Again, I don't think either is appropriate for you because it doesn't sound like Redis is going to be a good choice for you, but if you must, those should work.
Here's my take on Redis.
Basically I think of it as an in-memory cache that can be configured to only store the least recently used data (LRU). Which is the role I made it to play in my use case, the logic of which may be applicable to helping you think about your use case.
I'm currently using Redis to cache results for a search engine based on some complex queries (slow), backed by data in another DB (similar to your case). So Redis serves as a cache storage for answering queries. All queries either get served the data in Redis or the DB if it's a cache-miss in Redis. So, note that Redis is not replacing the DB, but merely being an extension via cache in my case.
This fit my specific use case, because the addition of Redis was supposed to assist future scalability. The idea is that repeated access of recent data (in my case, if a user does a repeated query) can be served by Redis, and take some load off of the DB.
Basically my Redis schema ended up looking somewhat like the duplication of your index you outlined above. I used sets and sortedSets to create "batches / sets" of redis-keys, each of which pointed to specific query results stored under a particular redis-key. And in the DB, I still had the complete data set and an index.
If your data set fits on RAM, you could do the "table dump" into Redis, and get rid of the need for MySQL. I could see this working, as long as you plan for persistent Redis storage and plan for the possible growth of your data, if this "table" will grow in the future.
So depending on your actual use case and how you see Redis fitting into your stack, and the load your DB serves, don't rule out the possibility of having to do both of the options you outlined above (which happend in my case).
Hope this helps!
Redis does provide Full Text Search with RediSearch.
Redisearch implements a search engine on top of Redis. This also enables more advanced features, like exact phrase matching, auto suggestions and numeric filtering for text queries, that are not possible or efficient with traditional Redis search approaches.