jboss data grid for clustered enterprise application - what is the efficient way

jboss data grid for clustered enterprise application - what is the efficient way - infinispan

we are having a clustered enterprise application using JTA transaction and hibernate for database operations deployed on JBoss EAP.
To increase system performance we are planning to use Jboss data grid. This is how I plan to use jboss data grid:
I am adding/replacing the object is cache whenever its inserted/updated in database using cache.put
when object is deleted from database its deleted from cache using cache.remove
while retrieving, first try to get the data from cache using key or query. If data is not present, load the data from database.
However, I have below questions on data grid:
To query objects we are using hibernate criteria however data grid uses its own query builder. Can we avoid writing separate query for hibernate and datagrid?
I want a list of objects to be returned matching a criteria. If one of the objects matching the criteria is evicted from cache, is it reloaded automatically from database?
If the transactions is rolled back is it rolled back from data grid cache as well
Are there any examples which I can refer for my implementation of data grid?
which is better choice for my requirement infinispan as 2nd level cache or data grid in library or remote mode?

Galder's comment is right, the best practice is using Infinispan as the second-level cache provider. Trying to implement it on your own is very prone to timing issues (you'd have stale/non-updated entries in the cache).
Regarding queries: With 2LC query caching on the cache keeps a map of 'sql query' -> 'list of results'. However once you update any type that's used in a query, all such queries are invalidated (e.g. if the query lists people with age > 60, updating a newborn still invalidates that query). Therefore this should be on only when the queries prevail over updates.
Infinispan has its own query support but this is not exposed when using it as 2LC provider. It is assumed that the cache will hold only a (most frequently accessed) subset of the entities in the database and therefore the results of such queries would not be correct.
If you want to go for Infinispan but keep the DB persistence, an option might be using JPA cache store (and indexing). Note though that updates to DB that don't go through Infinispan would not be reflected in the cache, and the indexing may lag a bit (since it's asynchronous). You can split your dataset and use JPA for one part and Infinispan + JPA cache store for the other, too.
A third option is using Hibernate Search, which keeps the data in database but index is in Lucene (possibly stored in Infinispan caches, too) and you don't use the Criteria API but Hibernate Search API.

Related

Cons of using MemoryCache as a temporary copy of DB table

I have a site where you can list your car for sale. There is a list and a map with filtering on car types and other car specifications. My idea was to cache cars table and use that to filter on when user is searching for a car on the website. Currently, especially when zooming in/out on the map, each time user does that, http request is made and it's querying the database, and that can be slow and heavy on the server.
As an experiment with 1 000 items, I have cached map data (trimmed data with only basic info) and it's working fine. I was thinking of doing a basically copy of cars table instead with all needed joins added in Memory Cache and use that instead of querying the DB every request for both list and the map. I would have Cron Job every 5 minutes (as data can change, but it doesn't have to be immediate) to update Memory Cache with latest cars data from DB.
What would be the cons of using this approach in long term and for using it for example storing 100 000 records? Beside server needing more RAM, would there be any concerns about scalability or usability of this approach? Would it be better to use Redis instead?
I do have in place now "search as you type" service, but I don't really need that functionality as filtering is pretty exact, I have added it more as a caching server but I think I would be better off just using Memory Cache until a real need for that kind of service is required.
Thank you

Since memory isn’t infinite, we need to limit the number of items stored in the In-Memory cache.
MemoryCache VS Redis
MemoryCache
MemoryCache is embedded in the process , hence can only be used as a plain key-value store from that process.
Redis
Redis is a remote data structure server. It is certainly slower than just storing the data in local memory.
I conclude that MemoryCache is running in the web server of the current application, and it is limited by the performance of the web server. Of course, it will be very fast under the same configuration. I think the disadvantage is that the stored data cannot be shared with other applications.
If redis is used, reading data directly from memory is not as fast as memorycache, but it has high reliability and high scalability.
Related Post:
1. How to update redis after updating database?
2. how to keep caching up to date
3. How can MySQL update data in real time in redis cache?

Apache Ignite with Kudu

I am trying to position Ignite as Query Grid for databases such as Kudu, Hbase, etc.. Thus, all data silos will be queried over Ignite with read/write through. How this is possible? Are there any integrations with them?
The first time, SQL query runs, it will need to pull the data from such databases and create the key/value on Ignite.
Then, if one/two/three node goes down, eventually the data stored in memory will be lost. How the recovery is done or not possible?
Thanks
CK

Ignite SQL is unable to load specific data by query from external store, it's only possible on API get()/getAll() operations. To be able querying data you need load them into Ignite at first, for example, with loadCache(). Internally this function does a query to target database and transforms response into key-value manner.
BTW, if you enable persistence in Ignite, it will know the structure of data and will be able to query them, even if not all entries loaded into memory.
In case of node crash traditionally used data replication between nodes. In Ignite it's named backups. If you loose more nodes than backups set, then you'll need to preload data from store again.

Sql query over Ignite CacheStore or over database

I am a beginner for Ignite, so I have some puzzles, one of which is as follows:when I try to query cache, whether it can look if memory contains or not. If not, then whether it will query database? If not,how to achieve such way?
Please help me if you know.Thx.

Queries work over in-memory data only. You can either use key access (operations like get(), getAll(), etc.) and utilize automatic read-through from the persistence store, or manually preload the data before running queries. For information on how effectively load large data set into the cache, see this page: https://apacheignite.readme.io/docs/data-loading

Is Nhibernate Shards production ready?

At the company I work we have a single database schema but with each of our clients using their own dedicated database, with one central database that stores client contact details and what database the client is using so we can connect to the appropriate database. I've looked at using NHibernate Shards but it seems to have gone very quiet and doesn't look complete.
Does anyone know the status of this project? Has anyone used it in production?
If it's not yet at a point that is considered usable in production, what are the alternatives? The two main ones seem to be:
Create a session factory per database and then a wrapper to select the appropriate factory to generate the correct session - this seems to me to have redundant session factories and not too efficient
Create just one session factory but when calling opensession pass it an IDbConnection - which would allow the session to have a different database connection.
My concern with 2 is how will NHibernate cope with a 2nd level cache as I believe it is controlled by the session factory - also the HiLo generator uses the session factory I believe. In these cases will having sessions attach to different dbs cause problems? For example we will end up with a MyCompany.Model.User class that has an id of 2 in both databases will this cause conflicts within the cache?

You could have a look at Enzo SQL Shard a sharding library for SQL Server. If you are already using NHibernate there might be a few changes required in the code though

NHibernate Shards is up-to-date with the latest NHibernate API changes and now supports all query models of NHibrrnate, including Linq. Complex scalar queries are currently not supported.
We use it in production for a multi-tenant environment, but there are a few things to be mindful of.
NHibernate Shards has a session factory for each shard, but only uses a single NHibernate Configuration instance to generate the session factories. This approach likely won't scale well to large numbers of shards.
Querying across shards does not play well with paging. It works, but can involve considerable client-side processing. It is best to keep result sets as small as possible and lock queries to single shards where feasible.

My application includes a client, web tier (load balanced), application tier (load balanced), and database tier. The web tier exposes services to clients, and forwards calls onto the application tier. The application tier then executes queries against the database (using NHibernate) and returns the results.
Data is mostly read, but writes occur fairly frequently, particularly as new data enters the system. Much more often than not, data is aggregated and those aggregations are returned to the client - not the original data.
Typically, users will be interested in the aggregation of recent data - say, from the past week. Thus, to me it makes sense to introduce a cache that includes all data from the past 7 days. I cannot just cache entities as and when they are loaded because I need to aggregate over a range of entities, and that range is dictated by the client, along with other complications, such as filters. I need to know whether - for a given range of time - all data within that range is in the cache or not.
In my ideal fantasy world, my services would not have to change at all:
public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
// execute HQL/criteria call and have it automatically use the cache where possible
}
There would be a separate filtering layer that would hook into NHibernate and intelligently and transparently determine whether the HQL/criteria query could be executed against the cache or not, and would only go to the database if necessary. If all the data was in the cache, it would query the cached data itself, kind of like an in-memory database.
However, on first inspection, NHibernate's second level cache mechanism does not seem appropriate for my needs. What I'd like to be able to do is:
Configure it to always have the last 7 days worth of data in the cache. eg. "For this table, cache all records where this field is between 7 days ago and now."
Have the ability to manually maintain the cache. As new data enters the system, it would be nice if I could just throw it straight into the cache rather than waiting until the cache is invalidated. Similarly, as data falls out of the time period, I'd like to be able to pull it from the cache.
Have NHibernate intelligently understand when it can serve a query directly from the cache rather than hitting the database at all. eg. If the user asks for an aggregate of data over the past 3 days, that aggregation should be calculated directly from the cache rather than touching the DB.
Now, I'm pretty sure #3 is asking too much. Even if I can get the cache populated with all the data required, NHibernate has no idea how to efficiently query that data. It would literally have to loop over all entities in order to discriminate which are relevant to the query (which might be fine, to be honest). Also, it would require an implementation of NHibernate's query engine that executed against objects rather than a database. But I can dream, right?
Assuming #3 is asking too much, I would require some logic in my services like this:
public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
if (CanBeServicedFromCache(starting, ending, filter))
{
// execute some LINQ to object code or whatever to determine the aggregation results
}
else
{
// execute HQL/criteria call to determine the aggregation results
}
}
This isn't ideal because each service must be cache-aware, and must duplicate the aggregation logic: once for querying the database via NHibernate, and once for querying the cache.
That said, it would be nice if I could at least store the relevant data in NHibernate's second level cache. Doing so would allow other services (that don't do aggregation) to transparently benefit from the cache. It would also ensure that I'm not doubling up on cached entities (once in the second level cache, and once in my own separate cache) if I ever decide the second level cache is required elsewhere in the system.
I suspect if I can get a hold of the implementation of ICache at runtime, all I would need to do is call the Put() method to stick my data into the cache. But this might be treading on dangerous ground...
Can anyone provide any insight as to whether any of my requirements can be met by NHibernate's second level cache mechanism? Or should I just roll my own solution and forgo NHibernate's second level cache altogether?
Thanks
PS. I've already considered a cube to do the aggregation calculations much more quickly, but that still leaves me with the database as the bottleneck. I may well use a cube in addition to the cache, but the lack of a cache is my primary concern right now.

Stop using your transactional ( OLTP ) datasource for analytical ( OLAP ) queries and the problem goes away.
When a domain significant event occurs (eg a new entity enters the system or is updated), fire an event ( a la domain events ). Wire up a handler for the event that takes the details of the created or updated entity and stores the data in a denormalised reporting store specifically designed to allow reporting of the aggregates you desire ( most likely push the data into a star schema ). Now your reporting is simply the querying of aggregates ( which may even be precalculated ) along predefined axes requiring nothing more than a simple select and a few joins. Querying can be carried out using something like L2SQL or even simple parameterised queries and datareaders.
Performance gains should be significant as you can optimise the read side for fast lookups across many criteria while optimising the write side for fast lookups by id and reduced index load on write.
Additional performance and scalability is also gained as once you have migrated to this approach, you can then physically separate your read and write stores such that you can run n read stores for every write store thereby allowing your solution to scale out to meet increased read demands while write demands increase at a lower rate.

Define 2 cache regions "aggregation" and "aggregation.today" with a large expiry time. Use these for your aggregation queries for previous days and today respectively.
In DoIt(), make 1 NH query per day in the requested range using cacheable queries. Combine the query results in C#.
Prime the cache with a background process which calls DoIt() periodically with the date range that you need to be cached. The frequency of this process must be lower than the expiry time of the aggregation cache regions.
When today's data changes, clear cache region "aggregation.today". If you want to reload this cache region quickly, either do so immediately or have another more frequent background process which calls DoIt() for today.
When you have query caching enabled, NHibernate will pull the results from cache if possible. This is based on the query and parameters values.

When analyzing the NHibernate cache details i remember reading something that you should not relay on the cache being there, witch seems a good suggestion.
Instead of trying to make your O/R Mapper cover your applications needs i think rolling your own data/cache management strategy might be more reasonable.
Also the 7 days caching rule you talk about sounds like something business related, witch is something the O/R mapper should not know about.
In conclusion make your app work without any caching at all, than use a profiler (or more - .net,sql,nhibernate profiler ) to see where the bottlenecks are and start improving the "red" parts by eventually adding caching or any other optimizations.
PS: about caching in general - in my experience one caching point is fine, two caches is in the gray zone and you should have a strong reason for the separation and more than two is asking for trouble.
hope it helps

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas