Sql query over Ignite CacheStore or over database - sql

I am a beginner for Ignite, so I have some puzzles, one of which is as follows:when I try to query cache, whether it can look if memory contains or not. If not, then whether it will query database? If not,how to achieve such way?
Please help me if you know.Thx.

Queries work over in-memory data only. You can either use key access (operations like get(), getAll(), etc.) and utilize automatic read-through from the persistence store, or manually preload the data before running queries. For information on how effectively load large data set into the cache, see this page: https://apacheignite.readme.io/docs/data-loading

Related

Apache Ignite: Сache API vs SQL

What should I use cache.put(key, value) or cache.query("INSERT INTO Table ")?
In case you properly configured queryable fields for your cache you can use both ways to insert data into the cache:
Key-Value API as shown here.
SqlFieldsQuery as described here.
Also, in case you would like to upload a large amount of data you can use Data Streamer, which automatically buffer the data and group it into batches for better performance.
Any. Or both.
One of the powers of Ignite is that it's truly multi-model - the same data is accessible via different interfaces. If you migrate a legacy app from an RDBMS, you'll use SQL. If you have something simple and don't care about the schema or queries, you'll use key-value.
In my experience, non-trivial systems based on Apache Ignite tend to use different kinds of access simultaneously. A perfectly normal example of an app:
Use key-value to insert the data from an upstream source
Use SQL to read and write data in batch processing and analytics
Use Compute with both SQL and key-value inside of the tasks to do colocated processing and fast analytics

jboss data grid for clustered enterprise application - what is the efficient way

we are having a clustered enterprise application using JTA transaction and hibernate for database operations deployed on JBoss EAP.
To increase system performance we are planning to use Jboss data grid. This is how I plan to use jboss data grid:
I am adding/replacing the object is cache whenever its inserted/updated in database using cache.put
when object is deleted from database its deleted from cache using cache.remove
while retrieving, first try to get the data from cache using key or query. If data is not present, load the data from database.
However, I have below questions on data grid:
To query objects we are using hibernate criteria however data grid uses its own query builder. Can we avoid writing separate query for hibernate and datagrid?
I want a list of objects to be returned matching a criteria. If one of the objects matching the criteria is evicted from cache, is it reloaded automatically from database?
If the transactions is rolled back is it rolled back from data grid cache as well
Are there any examples which I can refer for my implementation of data grid?
which is better choice for my requirement infinispan as 2nd level cache or data grid in library or remote mode?
Galder's comment is right, the best practice is using Infinispan as the second-level cache provider. Trying to implement it on your own is very prone to timing issues (you'd have stale/non-updated entries in the cache).
Regarding queries: With 2LC query caching on the cache keeps a map of 'sql query' -> 'list of results'. However once you update any type that's used in a query, all such queries are invalidated (e.g. if the query lists people with age > 60, updating a newborn still invalidates that query). Therefore this should be on only when the queries prevail over updates.
Infinispan has its own query support but this is not exposed when using it as 2LC provider. It is assumed that the cache will hold only a (most frequently accessed) subset of the entities in the database and therefore the results of such queries would not be correct.
If you want to go for Infinispan but keep the DB persistence, an option might be using JPA cache store (and indexing). Note though that updates to DB that don't go through Infinispan would not be reflected in the cache, and the indexing may lag a bit (since it's asynchronous). You can split your dataset and use JPA for one part and Infinispan + JPA cache store for the other, too.
A third option is using Hibernate Search, which keeps the data in database but index is in Lucene (possibly stored in Infinispan caches, too) and you don't use the Criteria API but Hibernate Search API.

Apache Ignite with Kudu

I am trying to position Ignite as Query Grid for databases such as Kudu, Hbase, etc.. Thus, all data silos will be queried over Ignite with read/write through. How this is possible? Are there any integrations with them?
The first time, SQL query runs, it will need to pull the data from such databases and create the key/value on Ignite.
Then, if one/two/three node goes down, eventually the data stored in memory will be lost. How the recovery is done or not possible?
Thanks
CK
Ignite SQL is unable to load specific data by query from external store, it's only possible on API get()/getAll() operations. To be able querying data you need load them into Ignite at first, for example, with loadCache(). Internally this function does a query to target database and transforms response into key-value manner.
BTW, if you enable persistence in Ignite, it will know the structure of data and will be able to query them, even if not all entries loaded into memory.
In case of node crash traditionally used data replication between nodes. In Ignite it's named backups. If you loose more nodes than backups set, then you'll need to preload data from store again.

Caching temporary data - PostgreSQL and Mongo

I have some data from an API I need to cache. This data I want invalidated after X days, but I want it available locally to save time querying and compiling things for the end user.
Presently I have a PostgreSQL database. I want to keep this around because there's permanent data like user records I don't want to put in Mongo (unless you guys can convince me otherwise). I really have nothing against Mongo, but I can normalize some things with users and the only way I could think to do it without massive amounts of duplication is via PostgreSQL.
Now my API data is flat, and in JSON. I don't need to create any sort of link to any other table and it has a field that I can use as a key pretty easily. My idea is to literally "throw" the data into a Mongo instance and query as needed, invaliding every X days. This also offers some persistence should the server go down for whatever reason.
So my questions to you guys are this. Is this a good use case for Mongo over memcached? Should I just memcached the raw data instead? If you guys do suggest Mongo, should I move my users table and the relations over to Mongo as well?
Thanks!
This is the sort of thing Redis is really good for. Redis, possibly with selective cache invalidation via PostgreSQL's LISTEN and NOTIFY, is a pretty low pain way to manage caching.
Another option is to use UNLOGGED tables in PostgreSQL.

Redis full text search : reverse indexing or sunspot?

I have 3,5 millions records (readonly) actually stored in a MySQL DB that I would want to pull out to Redis for performance reasons. Actually, I've managed to store things like this into Redis :
1 {"type":"Country","slug":"albania","name_fr":"Albanie","name_en":"Albania"}
2 {"type":"Country","slug":"armenia","name_fr":"Arménie","name_en":"Armenia"}
...
The key I use here is the legacy MySQL id, so with some Ruby glue, I can break as less things as possible in this existing app (and this is a serious concern here).
Now the problem is when I need to perform a search on the keyword "Armenia", inside the value part. Seems like there's only two ways out :
Either I multiplicate Redis index :
id => JSON values (as shown above)
slug => id (reverse indexing based on the slug, that could do the basic search trick)
finally, another huge index specifically for autocomplete, as shown in this post : http://oldblog.antirez.com/post/autocomplete-with-redis.html
Either I use sunspot or some full text search engine (unfortunatly, I actually use ThinkingSphinx which is too much tied to MySQL :-(
So, what would you do ? Do you think the MySQL to Redis move of a single table is even a good idea ? I'm afraid of the Memory footprint those gigantic Redis key/values could take on a 16GB RAM Server.
Any feedback on a similar Redis usage ?
Before I start with a real answer, I wanted to mention that I don't see a good reason for you to be using Redis here. Based on what types of use cases it sounds like you're trying to do, it sounds like something like elasticsearch would be more appropriate for you.
That said, if you just want to be able to search for a few different fields within your JSON, you've got two options:
Auxiliary index that points field_key -> list_of_ids (in your case, "Armenia" -> 1).
Use Lua on top of Redis with JSON encoding and decoding to get at what you want. This is way more flexible and space efficient, but will be slower as your table grows.
Again, I don't think either is appropriate for you because it doesn't sound like Redis is going to be a good choice for you, but if you must, those should work.
Here's my take on Redis.
Basically I think of it as an in-memory cache that can be configured to only store the least recently used data (LRU). Which is the role I made it to play in my use case, the logic of which may be applicable to helping you think about your use case.
I'm currently using Redis to cache results for a search engine based on some complex queries (slow), backed by data in another DB (similar to your case). So Redis serves as a cache storage for answering queries. All queries either get served the data in Redis or the DB if it's a cache-miss in Redis. So, note that Redis is not replacing the DB, but merely being an extension via cache in my case.
This fit my specific use case, because the addition of Redis was supposed to assist future scalability. The idea is that repeated access of recent data (in my case, if a user does a repeated query) can be served by Redis, and take some load off of the DB.
Basically my Redis schema ended up looking somewhat like the duplication of your index you outlined above. I used sets and sortedSets to create "batches / sets" of redis-keys, each of which pointed to specific query results stored under a particular redis-key. And in the DB, I still had the complete data set and an index.
If your data set fits on RAM, you could do the "table dump" into Redis, and get rid of the need for MySQL. I could see this working, as long as you plan for persistent Redis storage and plan for the possible growth of your data, if this "table" will grow in the future.
So depending on your actual use case and how you see Redis fitting into your stack, and the load your DB serves, don't rule out the possibility of having to do both of the options you outlined above (which happend in my case).
Hope this helps!
Redis does provide Full Text Search with RediSearch.
Redisearch implements a search engine on top of Redis. This also enables more advanced features, like exact phrase matching, auto suggestions and numeric filtering for text queries, that are not possible or efficient with traditional Redis search approaches.