JBoss TreeCache vs PojoCache when using invaludation rather than replication

JBoss TreeCache vs PojoCache when using invaludation rather than replication - replication

We are setting up a Jboss cluster and we are building an own distributed cache solution built upon Jboss cache (Cant use it as 2nd level cache to ORM layer in our case). We want to use invalidation and not replication as cache mode. As far as i can see after (very) little testing both solutions seem to work, objects are put into the cache and objects seem to be evicted when they are updated on any of the servers.
This leads me to believe that PojoCache with AOP instrumentation is only needed when using replication so that you can replicate only updated field values and not whole objects. Am I correct here or are there any other advantages with using PojoCache over TreeCache in our scenario? And if PojoCache have advantages, do we still need AOP instrumentation and to annotate our entities with #PojoCacheable (yes, we are using JBCache 1.4.1) since we are not using relication?
Regards
Jonas Heineson

PoJoCache has the ability through AOP to:
only replicate changed fields and not whole objects. Makes a difference if e.g. your person object containes a huge image of the person and you only change the password
detect changes and thus can automatically put them on the list to be replicated.
TreeCache (plain) does not need AOP, but can thus not replicate individual fields or detect what has changed so that you need to trigger replication yourself.
If you don't replicate, those points are probably irrelevant.
IIrc, you don't need the #PojocaCacheable annotation for Pojo cache - without it, you need to specify the classes to be enhanced in a different way.
I have the feeling that if you are not replicating, the plain TreeCache will be enough.

Related

cloudflare Durable Objects update object value

Halo! I'm recently diving into cloudflare Workers, especially Durable Objects. I could make a simple request which put a js object into the assigned key. Let's say the key is key0, and the put object value is {"fieldA": "val0", "fieldB": "val1"}. In this case, how can i update the field-value of fieldA without removing fieldB? I've tried simply executing put("key0", {"fieldA": "newVal0"}) and it has kept removing {"fieldB": "val1"}.
Of course it is a common behaviour in js operations, but i cannot find out anything like ~["key0"]["fieldA"] = "newVal0" in docs(maybe i'm missing sth). OTL
Hope this question reach to the gurus in the community! Thanks in advance [:
EDIT after the answers:
In theory, it would be wonderful if flare durable objects support and work just like a normal js object. Such possible worker feature feels like a killer app for the cloud db services, since the average cpu time is quite fast and flare also has super low pricing compared to other big bros. If it happens, i would eager to migrate everything into the flare platform [:

Durable Objects' KV storage only supports get and put operations -- it doesn't have any sort of "update". So, you have two options:
get() the key, modify it, and then write the modified version back. This may sound inefficient, but keep in mind that commonly-accessed keys will likely be in in-memory cache. In fact, this get/modify/put implemented in your JavaScript is probably about as fast as any modification operation that Durable Objects itself could possibly implement built-in. That said, you probably don't want to use this approach with large objects, since the whole object has to be written to disk again after every update.
Split your object across multiple keys. E.g. instead of having the key foo map to {"fieldA": "val0", "fieldB": "val1"}, you could have separate keys foo:fieldA and foo:fieldB. Note that you can fetch all the keys at once using storage.list({prefix: "foo:"}). This approach is not as convenient but allows each field to be written separately to disk.

get and put deal with whole JS objects, so if you want to change part of the object you should get it, update it using normal JS, and then put the entire object back.

Is it okay to have more than one repository for an aggregate in DDD?

I've read this question about something similar but it didn't quite solve my problem.
I have an application where I'm required to use data from an API. Problem is there are performance and technical limitations to doing this. The performance limitations are obvious. The technical limitations lie in the fact that the API does not support some of the more granular queries I need to make.
I decided to use MySQL as a queryable cache.
Since the data I needed to retrieve from the API did not change very often, I settled on refreshing the cache once a day, so I didn't need any complicated mapper that checked if we had the data in the cache and if not fell back to the API. That was my first design, but I realized that wasn't very practical when the API couldn't support most of the queries I needed to make anyway.
Now I have a set of two mappers for every aggregate. One for MySQL and one for the API.
My problem is now how I hide the complexities of persistence from the domain, and the fact that it seems that I need multiple repositories.
Ideally I would have an interface that both mappers adhered to, but as previously disclosed that's not possible.
Is it okay to have multiple repositories, one for each mapper?

Is it okay to have more than one repository for an aggregate in DDD?
Short answer: yes.
Longer answer: you won't find any suggestion of multiple repository in the original book by Evans. As he described things, the domain model would have one representation of the aggregate, and the repository abstraction provided consumers with the illusion that the aggregate was stored in an in-memory collection.
Largely, this makes sense -- you are trying to ensure that writes to data within the aggregate boundary are consistent, so you need a single authority for change.
But... there's no particular reason that reads need to travel through the same code path as writes. Welcome to the world of cqrs. What that gives you immediately is the idea that the in memory representation for reads might need to be optimized differently from the in memory representation used for writes.
In its more general form, you get the idea that the concept that you are modeling might have different representations for each use case.
For your case, where it is sometimes appropriate to read from the RDBMS, sometimes from the API, sometimes both, this isn't quite an exact match -- the repository interface hides the implementation details from the consumer, but you still have to bother with the implementation.
One thing you might look at is your requirements; how fresh does the data need to be in each use case? A constraint that is often relaxed in the CQRS pattern is the idea that the effects of writes are immediately available for reading. The important question to ask would be, if the data hasn't been cached yet, can you simply report "data not available" without hitting the API?
If so, then use cases that access the cached data need only a single repository implementation.

If you are using external API to read and modify data, you can cache them locally to be faster in reads, but I would avoid to have a domain repository.
From the domain perspective it seems that you need a service to query (or just a Query in CQRS implementation) for some data, that you can do with a service, that internally can call some remote API or read from a local cache (mysql, whatever).
When you read your local cache you can develop a repository to decouple your logic from the db implementation, but this is a different concept from a domain repository, it is just a detail of your technical implementation, that has nothing to do with your domain.
If the remote service start offering the query you need you will change the implementation of how your query is executed, calling the remote API instead of the db, but your domain model should not change.
A domain repository is used to load and persist your aggregates, meanwhile if you are working with external aggregates (in a different context, subdomain) you need to interact with them using services.

What's the Point of Multiple Redis Databases?

So, I've come to a place where I wanted to segment the data I store in redis into separate databases as I sometimes need to make use of the keys command on one specific kind of data, and wanted to separate it to make that faster.
If I segment into multiple databases, everything is still single threaded, and I still only get to use one core. If I just launch another instance of Redis on the same box, I get to use an extra core. On top of that, I can't name Redis databases, or give them any sort of more logical identifier. So, with all of that said, why/when would I ever want to use multiple Redis databases instead of just spinning up an extra instance of Redis for each extra database I want? And relatedly, why doesn't Redis try to utilize an extra core for each extra database I add? What's the advantage of being single threaded across databases?

You don't want to use multiple databases in a single redis instance. As you noted, multiple instances lets you take advantage of multiple cores. If you use database selection you will have to refactor when upgrading. Monitoring and managing multiple instances is not difficult nor painful.
Indeed, you would get far better metrics on each db by segregation based on instance. Each instance would have stats reflecting that segment of data, which can allow for better tuning and more responsive and accurate monitoring. Use a recent version and separate your data by instance.
As Jonaton said, don't use the keys command. You'll find far better performance if you simply create a key index. Whenever adding a key, add the key name to a set. The keys command is not terribly useful once you scale up since it will take significant time to return.
Let the access pattern determine how to structure your data rather than store it the way you think works and then working around how to access and mince it later. You will see far better performance and find the data consuming code often is much cleaner and simpler.
Regarding single threaded, consider that redis is designed for speed and atomicity. Sure actions modifying data in one db need not wait on another db, but what if that action is saving to the dump file, or processing transactions on slaves? At that point you start getting into the weeds of concurrency programming.
By using multiple instances you turn multi threading complexity into a simpler message passing style system.

In principle, Redis databases on the same instance are no different than schemas in RDBMS database instances.
So, with all of that said, why/when would I ever want to use multiple
Redis databases instead of just spinning up an extra instance of Redis
for each extra database I want?
There's one clear advantage of using redis databases in the same redis instance, and that's management. If you spin up a separate instance for each application, and let's say you've got 3 apps, that's 3 separate redis instances, each of which will likely need a slave for HA in production, so that's 6 total instances. From a management standpoint, this gets messy real quick because you need to monitor all of them, do upgrades/patches, etc. If you don't plan on overloading redis with high I/O, a single instance with a slave is simpler and easier to manage provided it meets your SLA.

Even Salvatore Sanfilippo (creator of Redis) thinks it's a bad idea to use multiple DBs in Redis. See his comment here:
https://groups.google.com/d/topic/redis-db/vS5wX8X4Cjg/discussion
I understand how this can be useful, but unfortunately I consider
Redis multiple database errors my worst decision in Redis design at
all... without any kind of real gain, it makes the internals a lot
more complex. The reality is that databases don't scale well for a
number of reason, like active expire of keys and VM. If the DB
selection can be performed with a string I can see this feature being
used as a scalable O(1) dictionary layer, that instead it is not.
With DB numbers, with a default of a few DBs, we are communication
better what this feature is and how can be used I think. I hope that
at some point we can drop the multiple DBs support at all, but I think
it is probably too late as there is a number of people relying on this
feature for their work.

I don't really know any benefits of having multiple databases on a single instance. I guess it's useful if multiple services use the same database server(s), so you can avoid key collisions.
I would not recommend building around using the KEYS command, since it's O(n) and that doesn't scale well. What are you using it for that you can accomplish in another way? Maybe redis isn't the best match for you if functionality like KEYS is vital.
I think they mention the benefits of a single threaded server in their FAQ, but the main thing is simplicity - you don't have to bother with concurrency in any real way. Every action is blocking, so no two things can alter the database at the same time. Ideally you would have one (or more) instances per core of each server, and use a consistent hashing algorithm (or a proxy) to divide the keys among them. Of course, you'll loose some functionality - piping will only work for things on the same server, sorts become harder etc.

Redis databases can be used in the rare cases of deploying a new version of the application, where the new version requires working with different entities.

I know this question is years old, but there's another reason multiple databases may be useful.
If you use a "cloud Redis" from your favourite cloud provider, you probably have a minimum memory size and will pay for what you allocate. If however your dataset is smaller than that, then you'll be wasting a bit of the allocation, and so wasting a bit of money.
Using databases you could use the same Redis cloud-instance to provide service for (say) dev, UAT and production, or multiple instances of your application, or whatever else - thus using more of the allocated memory and so being a little more cost-effective.
A use-case I'm looking at has several instances of an application which use 200-300K each, yet the minimum allocation on my cloud provider is 1M. We can consolidate 10 instances onto a single Redis without really making a dent in any limits, and so save about 90% of the Redis hosting cost. I appreciate there are limitations and issues with this approach, but thought it worth mentioning.

I am using redis for implementing a blacklist of email addresses , and i have different TTL values for different levels of blacklisting , so having different DBs on same instance helps me a lot .

Using multiple databases in a single instance may be useful in the following scenario:
Different copies of the same database could be used for production, development or testing using real-time data. People may use replica to clone a redis instance to achieve the same purpose. However, the former approach is easier for existing running programs to just select the right database to switch to the intended mode.

Our motivation has not been mentioned above. We use multiple databases because we routinely need to delete a large set of a certain type of data, and FLUSHDB makes that easy. For example, we can clear all cached web pages, using FLUSHDB on database 0, without affecting all of our other use of Redis.
There is some discussion here but I have not found definitive information about the performance of this vs scan and delete:
https://github.com/StackExchange/StackExchange.Redis/issues/873

Infinispan keyset() not suitable for production

I decided to use infinispan distributed grid to extend my application to support cluster but I encountered a limitation when using this kind of shared resource.
How can I retrieve all the values or keys in the Distributed cache? I'm asking this because in their documentation all the collection methods are not recommended for running in production (meaning keySet()).
Right now I have a local bucket/cache with the pairs key/value but in order to process the values I need to retrieve the keys and iterate throught the set.
Set set = cache.keySet();
When having a large number of entries in the local cache, the keySet() returns a copy and this is a heavy load for the memory.
I tried to use the query feature but there are some network calls if I want to find the values and I don't need that. Also the query feature does not support complex filters.
Do you know which is the best approach when using infinispan in production?
As this is an experimental phase I'm using the last infinispan version.
Thanks a lot.

Map/Reduce functionality allows you to iterate over all the entries stored and also migrates the logic where the data is, so doesn't add a lot of burden.

We are using keySet() on production for informational purpose only. Performance do not seem to be a big issue under low data loads but of course you should use such methods with great care because they could have large performance impact depending by how you are using the cache. Remote cache queries seems a pretty handy feature to me.

Is there any reason I shouldn't cache in nHibernate?

I've just discovered the joy of Cache.ReadWrite() in fluent nHibernate, and have been analyzing the results with nhprof extensively.
It seems to be quite useful, but that seems a bit deceptive. Is there any particular reason I wouldn't want to cache a very frequently used object from a query? I mean, I have to presume I should not just go around decorating every single Mapping with a Cache property ... or should I?

As usual, it depends :)
If something has potential to be updated by background processes that don't use the second level cache, or changed directly in the database, caching will cause problems.
Entities that are infrequently accessed may not be good candidates for second level caching either, as they will just take up space.
Also, you may see some weirdness if you have collections mapped as Inverse - the changes will not be picked up by the second level cache correctly and you'll need to manually evict the collection.
As sJhonny points out below, if you have a web farm scenario (or any where your app is running on several servers) you'll need to use a distributed cache (like memcached) instead of the built in ASP.net cache.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas