Ignet query on local node pontential issue? - ignite

New to ignite, i have a use case, i need to run a job to clean up. I have ignite embedded in our spring boot application, for multiple instances, i am thinking have the job run on each instance, then just query the local data and clean up those. Do you see any issue with this? I am not sure how often ignite does reshuffing data?
Thanks
Shannon

You can surely do that.
With regards to data reshuffling, it will only happen when node is added or removed to cluster. However, ignite.compute().affinityRun() family of calls guarantees that code is ran near the data.
Otherwise, you could do ignite.compute().broadcast() and only iterate on each affected cache's local entries. You don't have the aforementioned guarantee then, though.

Related

Hot rod java client removeCache (replicated cache)

I'm in this situation: I have an Infinispan cluster (12.1) with two nodes and a replicated cache configured via xml.
I have also an hot rod client and when I try to call removeCache method, the first time, the cache is not removed but if I try a second one call to removeCache, the cache is deleted correctly.
I need a correctly removal at the first attempt.
Can anyone help me?
If you know beforehand you may need to remove caches, it's best to create them with CacheContainerAdmin.createCache() (or via the REST API/CLI/console) instead of the server XML configuration.
CacheContainerAdmin.removeCache() is under-specified: the javadoc doesn't say what it does when the cache was not created with CacheContainerAdmin.createCache(). As you've discovered, the current implementation only removes the cache on the server that processed the client request.
I have created ISPN-13048 to improve the documentation and maybe change the behaviour.

Using SQL Queries in Apache Ignite without a database

I'm using Apache Ignite as a distributed cache whose configuration I've generated based on an existing database using the Ignite Web Console--it's a writethrough cache that will periodically persist cached data to the Postgres database. However, I want to write unit tests in Java for my project, and do not have a reliable test database to use.
Part of what I'm wanting to test are the cache queries I'm occasionally running on my Ignite cache--I wanted to use sql queries to do this. However, I can't figure out how to preserve the queryEntities from my cache configuration without also having the database. I tried making a new xml file for test purposes that only configures the caches I need, and only sets the query entities (not the datastore or any db information), but when I run the test I get a "Failed to initialize DB connection" error--even though there is no DB defined in my config.
Is there a way to leverage these query entities without actually connecting the cache to a database? If not, is there a good way to spin up a postgres database as a part of a unit test?
You need to check persistence store configuration and disable that first to have everything in memory.
Next, make sure you are not initializing any DB connection while having your test cache configuration(You already said you looked after this fact).
cacheCfg.setWriteThrough(false).setReadThrough(false) should do the trick when defining a cache (note that after cache is started cfg can't be changed)

Redis cache in a clustered web farm? Sync between two member nodes?

Ok, so what I have are 2 web servers running inside of a Windows NLB clustered environment. The servers are identical in every respect, and as you'd expect in an NLB clustered environment, everybody is hitting the cluster name and not the individual members. We also have affinity turned off on the members in the cluster.
But, what I'm trying to do is to turn on some caching for a few large files (MP3s). It's easy enough to dial up a Redis node on one particular member and hit it, everything works like you'd expect. I can pull the data from the cache and serve it up as needed.
Now, let's add the overhead of the NLB. With an NLB in play, you may not be hitting the same web server each time. You might make your first hit to member 01, and the second hit to 02. So, I'd need a way to sync between the two servers. That way it doesn't matter which cluster member you hit, you are going to get the same data.
I don't need to worry about one cache being out of date, the only thing I'm storing in there is read only data from an internal web service.
I've only got 2 servers and it looks like redis clusters need 3. So I guess that's out.
Is this the best approach? Or perhaps there is something else better?
Reasons for redis: We only want the cache to use in-memory only. No writes to the database. Thought this would be a good fit, but need to make sure the data is available in both servers.
It's not possible to have redis multi master (writing on both). And I might say it's replication is blazing fast (check the slaveof command of Redis).
But why you need it in the same server? Access it as a service. So every node will access the actual data. If the main server goes down, the slave will promptly turn itself into a master.
One observation: you might notice that Redis makes use of disk in an async way. An append only file that it does checkpoint depending on the size from time to time so.

Running multiple Kettle transformation on single JVM

We want to use pan.sh to execute multiple kettle transformations. After exploring the script I found that it internally calls spoon.sh script which runs in PDI. Now the problem is every time a new transformation starts it create a separate JVM for its executions(invoked via a .bat file), however I want to group them to use single JVM to overcome memory constraints that the multiple JVM are putting on the batch server.
Could somebody guide me on how can I achieve this or share the documentation/resources with me.
Thanks for the good work.
Use Carte. This is exactly what this is for. You can startup a server (on the local box if you like) and then submit your jobs to it. One JVM, one heap, shared resource.
Benefit of that is then scalability, so when your box becomes too busy just add another one, also using carte and start sending some of the jobs to that other server.
There's an old but still current blog here:
http://diethardsteiner.blogspot.co.uk/2011/01/pentaho-data-integration-remote.html
As well as doco on the pentaho website.
Starting the server is as simple as:
carte.sh <hostname> <port>
There is also a status page, which you can use to query your carte servers, so if you have a cluster of servers, you can pick a quiet one to send your job to.

Redis active-active replication

I am using redis version 2.8.3. I want to build a redis cluster. But in this cluster there should be multiple master. This means I need multiple nodes that has write access and applying ability to all other nodes.
I could build a cluster with a master and multiple slaves. I just configured slaves redis.conf files and added that ;
slaveof myMasterIp myMasterPort
Thats all. Than I try to write something into db via master. It is replicated to all slaves and I really like it.
But when I try to write via a slave, it told me that slaves have no right to write. After that I just set read-only status of slave in redis.conf file to false. Hence, I could write something into db.
But I realize that, it is not replicated to my master replication so it is not replicated to all other slave neigther.
This means I could'not build an active-active cluster.
I tried to find something whether redis has active-active cluster capability. But I could not find exact answer about it.
Is it available to build active-active cluster with redis?
If it is, How can I do it ?
Thank you!
Redis v2.8.3 does not support multi-master setups. The real question, however, is why do you want to set one up? Put differently, what challenge/problem are you trying to solve?
It looks like the challenge you're trying to solve is how to reduce the network load (more on that below) by eliminating over-the-net reads. Since Redis isn't multi-master (yet), the only way to do it is by setting up each app server with a master and a slave (to the other master) - i.e. grand total of 4 Redis instances (and twice the RAM).
The simple scenario is when each app updates only a mutually-exclusive subset of the database's keys. In that scenario this kind of setup may actually be beneficial (at least in the short term). If, however, both apps can touch all keys or if even just one key is "shared" for writes between the apps, then you'll need to bake locking/conflict resolution/etc... logic into your apps to consolidate local master and slave differences (and that may be a bit of an overkill). In either case, however, you'll end up with too many (i.e. more than 1) Redises, which means more admin effort at the very least.
Also note that by colocating app and database on the same server you're setting yourself for near-certain scalability failure. What will happen when you need more compute resources for your apps or Redis? How will you add yet another app server to the mix?
Which brings me back to the actual problem you are trying to solve - network load. Why exactly is that an issue? Are your apps so throughput-heavy or is the network so thin that you are willing to go to such lengths? Or maybe latency is the issue that you want to resolve? Be the case as it may be, I recommended that you consider a time-proven design instead, namely separating Redis from the apps and putting it on its own resources. True, network will hit you in the face and you'll have to work around/with it (which is what everybody else does). On the other hand, you'll have more flexibility and control over your much simpler setup and that, in my book, is a huge gain.
Redis Enterprise has had this feature for quite a while, but if you are looking for an open source solution KeyDB is a fork with Active Active support (called Active Replica).
Setting it up is just a little more work than standard replication:
Both servers must have "active-replica yes" in their respective configuration files
On server B execute the command "replicaof [A address] [A port]"
Server B will drop its database and load server A's dataset
On server A execute the command "replicaof [B address] [B port]"
Server A will drop its database and load server B's dataset (including the data it just transferred in the prior step)
Both servers will now propagate writes to each other. You can test this by writing to a key on Server A and ensuring it is visible on B and vice versa.
https://github.com/JohnSully/KeyDB/wiki/KeyDB-(Redis-Fork):-Active-Replica-Support