I am building a UI POC for Apache Ignite and want it to be as light as possible. It is a live/real-time UI which will get, update, delete cache and should also listen to any changes in cache and always display the latest data.
I learnt that Thin Clients do almost all of that, but cannot listen to changes and Thick Clients are my only option if I want to do that. But Thick Clients also participate in data storage and compute grid functionality which is too much for a simple UI application running on a Desktop. Can I make it lightweight where it behaves like a Thin Client with live/listener functionality? What options do I have for this scenario?
That's what the Ignition.setClientMode() method is for: it turns off data storage. And typically when running a compute job you run it on a ClusterGroup of server nodes, for example:
Ignition.setClientMode(true);
Ignite ignite = Ignition.ignite();
...
ClusterGroup x = ignite.cluster().forServers();
ignite.compute(x).run(...)
Related
We are developing a Web API using .Net Core. To perform background tasks we have used Hosted Services.
System has been hosted in AWS Beantalk Environment with the Load Balancer. So based on the load Beanstalk creates/remove new instances of the system.
Our problem is,
Since background services also runs inside the API, When load balancer increases the instances, number of background services also get increased and there is a possibility to execute same task multiple times. Ideally there should be only one instance of background services.
One way to tackle this is to stop executing background services when in a load balanced environment and have a dedicated non-load balanced single instance environment for background services only.
That is a bit ugly solution. So,
1) Is there a better solution for this?
2) Is there a way to identify the primary instance while in a load balanced environment? If so I can conditionally register Hosted services.
Any help is really appreciated.
Thanks
I am facing the same scenario and thinking of a way to implement a custom service architecture that can run normally on all of the instance but to take advantage of pub/sub broker and distributed memory service so those small services will contact each other and coordinate what's to be done. It's complicated to develop yes but a very robust solution IMO.
You'll "have to" use a distributed "lock" system. You'll have to use, for example, a distributed memory cache who put a lock when someone (a node of your cluster) is working on background. If another node is trying to do the same job, he'll be locked by the first lock if the work isn't done yet.
What i mean, if all your nodes doesn't have a "sync handler" you can't handle this kind of situation. It could be SQL app lock, distributed memory cache or other things ..
There is something called Mutex but even that won't control this in multi-instance environment. However, there are ways to control it to some level (may be even 100%). One way would be to keep a tracker in the database. e.g. if the job has to run daily, before starting your job in the background service you might wanna query the database if there is any entry for today, if not then you will insert an entry and start your job.
Here is a use case:
I have version 1 of a web app deployed.
It uses couple Ignite-powered distributed (configured for replication) Maps, Sets and other data structures.
I'm going to deploy v2 of this application and once data is replicated I'm going to shutdown v1 of this app and re-route users (using nginx) to new instance (v2).
I can see that Ignite on v1 and v2 can discover each other and automatically perform replication of data structures.
My intention: I don't want to shutdown 1st instance (v1) before all data is replicated to 2nd instance (v2).
Question is: how do I know if initial replication is completed? Is there any event that is fired in such cases, or maybe some other way to accomplish this task?
If you configure you caches to use synchronous rebalancing [1], second node will not complete start process before rebalancing is completed. This way you will guarantee that all the data is replicated to the second node (of course, assuming that you're using fully replicated caches).
[1] https://apacheignite.readme.io/docs/rebalancing#section-rebalance-modes
I really love the Apache Ignite's shared RDD for spark. However, due to the limitation, I can not deploy ignite onto cluster nodes. The only way I can use Ignite is throuhgh embedded mode with Spark.
I would like to knowledge, in embedded mode, can the RDD shared through different Spark applications?
I have two Spark jobs:
Job 1: Produce the data, and stores into the shared RDD
Job 2: retrieve the data from the Shared RDD, and do some calculation.
Can this task be done using ignite's embedded mode?
Thanks
In embedded mode Ignite nodes are started within executors which are under control of Spark. Having said that, this mode is more for testing purposes in my opinion - no need to deploy and start Ignite separately while having an ability to try basic functionality. But in real scenarios it would be very hard to achieve consistency and failover guarantees as Spark can start and stop executors which in case of embedded mode are actually holding the data. I would recommend to work around your limitation and make sure Ignite can be installed separately in standalone mode.
My app will work as follows:
I'll have a bunch of replica servers and a load balancer. The data updates will be managed outside CometD. EDIT: I still intend to notify each CometD server of those updates, if necessary, so they can respond back to clients.
The clients are only subscribing to those updates (i.e. read only), so the CometD server nodes don't need to know anything about each other's behavior.
Am I right in thinking I could have server side "client" instances on the load balancer, per client connection, where each instance listens on the same channel as its respective client and forwards any messages back to it? If so, are there any disadvantages to this approach, instead of using Oort?
Reading the docs about Oort, it seems that the nodes "know" about each other, which I don't need. Would it be better then for me to avoid using Oort altogether, in my case? My concern would be that if I ended up adding many many nodes, the fact that they communicate to "each other" could mean unnecessary processing?
The description of the issue specifies that the data updates are managed outside CometD, but it does not detail how the CometD servers are notified of these data updates.
The two common solutions are A) to notify each CometD server or B) to use Oort.
In solution A) you have an event that triggers a data update, and some external application performs the data update on, say, a database. At this point the external application must notify the CometD servers that there was a data update. If the external application runs on a JVM, it can use the CometD Java client to send a message to each CometD server, notifying them of the data update; in turn, the CometD servers will notify the remote clients.
In solution B) the external application must notify just one CometD server that there was a data update; the Oort cluster will do the rest, broadcasting that message across the cluster, and then to remote clients.
Solution A) does not require the Oort cluster, but requires the external application to know exactly all nodes, and send a message to each node.
Solution B) uses Oort, so the external application needs only to know one Oort node.
Oort requires a bit of additional processing because the nodes are interconnected, but depending on the case this processing may be negligible, or the complications of notifying each CometD server "manually" (as in solution A) may be greater than running Oort.
I don't understand exactly what you mean by having "server side client instances on the load balancer". Typically load balancers don't run a JVM so it is not possible to run CometD clients on them, so this sentence does not sound right.
Besides the CometD documentation, you can also look at these slides.
I am currently developing a system that makes heavy use of redis for a series of web services.
One of the key criteria of this system is fast responses.
At present the layout (ignoring load balancers etc) is as follows:
2 x Front End Play Framework 2.x Servers
2 x Job Handling/Persistence Play Framework 2.x Servers
1 x MySQL Server
2 x Redis Servers, 1 master, 1 slave
In this setup, redis serves 2 tasks - as a shared cache and also as a message bus.
Currently the front end servers host a service which interacts in its entirety with Redis.
The front end servers try to balance reads across the pool of read servers (currently the master and 1 slave), but being Redis they need to make their writes to the master server. They handle cache updates etc by sending messages on the queues, which are picked up by the job handling servers.
The job handling servers do blocking listens (BLPOP) to the Redis write server and process tasks when necessary. They have the only connection to MySQL.
At present the read replica server is a dedicated server - more there to be able to switch it to write master if the current master fails.
I was thinking of putting a read replica slave of redis on each of the front end servers which means that read latency would be even less, and writes (messages for queues) get pushed to the write server on a separate connection.
If I need to scale, I could just add more front end servers with read slaves.
It sounds like a win/win to me as even if the write server temporarily drops out, the front end servers can still read data at least from their local slave and act accordingly.
Can anyone think of reasons why this might not be such a good idea?
I understand the advantages of this approach... but consider this: what happens when you need to scale just one component (i.e. FE server or Redis) but not the other? For example, more traffic could mean you'll need more app servers to handle it while the Redises will be considerably less loaded. On the other hand, if your dataset grows and/or more load is put on the Redises - you'll need to scale these and not the app.
The design should fit your requirements, and the simplicity of your suggested setup has a definite appeal (i.e. to scale, just add another identical lego block) but from my meager experience - anything that sounds too good to be true usually is. In the longer run, even if this works for you now, you may find yourself in a jam down the road. My advice - separate your Redis(es) from you app servers, deal with and/or work around the network and make sure each layer is available and scalable on its own right.