When to use Apache Helix and when to use Apache Mesos

When to use Apache Helix and when to use Apache Mesos - apache

Apache Mesos
is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark, and other frameworks on a dynamically shared pool of nodes.
Apache Helix
is a generic cluster management framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. Helix automates reassignment of resources in the face of node failure and recovery, cluster expansion, and reconfiguration.
Both are cluster managers, which one to choose and why?

Here is what I wrote on Apache Helix vs YARN which is applicable to Mesos v/s Helix. YARN/Mesos and Helix are complementary to each other.
You use Helix to build your system and manage the internal state of your system. Once the system is built it can be either deployed independently or deployed using YARN/Mesos.
#janisz, Helix is widely used and actively developed.

Related

Service grid in a micro services environment

We are using apache ignite as a IMDG in our micro services environment.
For scalability and load balancing we are considering to use a service registry like eureka or consul which is supported by spring cloud for the deployed micro services.
There is a concept of service grid providing support for node singleton and cluster singleton in apache ignite.
I also see WCF,weblogic and JBoss to having the same sort of features.
I am trying to understand what these service grids are and if i can use them to achieve the same benefits as the eureka service registry provided by netflix and supported by spring cloud.
Can someone guide if i can achieve the same using service grid in apache ignite.

No, you cannot use Apache Ignite Service Grid for the same purposes as Eureka. Eureka is used for load balancing and service discovery over WAN. Using Ignite clusters spanning over multiple AWS zones and remote client machines is not the most efficient way of using it.
More information on Ignite Service Grid can be found here - http://apacheignite.gridgain.org/docs/service-grid
Thanks!
UPD (for the 1st comment):
You cannot (in most cases) span and effectively use Ignite over WAN networks with high latencies and lower throughput characteristics.
As far as local clusters in non-cloud environments - go ahead! This is the best environment for systems of such kind.

Using Kubernetes or Apache mesos

We have a product which is described in some docker files, which can create the necessary docker containers. Some docker containers will just run some basic apps, while other containers will run clusters (hadoop).
Now is the question which cluster manager I need to use.
Kubernetes or Apache mesos or both?
I read Kubernetes is good for 100% containerized environments, while Apache Mesos is better for environments which are a bit containerized and a bit not-containerized. But Apache Mesos is better for running hadoop in docker (?).
Our environment is composed of only docker containers, but some with an hadoop cluster and some with some apps.
What will be the best?

Both functionally do the same, orchestrate Docker containers, but obviously they will do it in different ways and what you can easily achieve with one, it might prove difficult in the other and vice versa.
Mesos has a higher complexity and learning curve in my opinion. Kubernetes is relatively simpler and easier to grasp. You can literally spawn your own Kube master and minions running one command and specifying the provider: Vagrant or AWS,etc. Kubernetes is also able to be integrated into Mesos, so there is also the possibility where you could try both.
For the Hadoop specific use case you mention, Mesos might have an edge, it might integrate better in the Apache ecosystem, Mesos and Spark were created by the same minds.
Final thoughts: start with Kube, progressively exploring how to make it work for your use case. Then, after you have a good grasp on it, do the same with Mesos. You might end up liking pieces of each and you can have them coexist, or find that Kube is enough for what you need.

Weblogic cache replication with clusters

Is it possible to implement a cache in weblogic (10.3.5.0) which is accessible on every instance of a cluster ?
Does weblogic API offers some API with RMI who offers this possibility ?
Is there a framework like ehcache who offers this possibility ?

Oracle Coherence can handle this situation and it comes bundled with WebLogic 10.3.5.
http://docs.oracle.com/cd/E21764_01/apirefs.1111/e13952/taskhelp/coherence/CreateCoherenceServers.html
"Coherence servers (also known as Coherence data nodes) are stand-alone cache servers, dedicated JVM instances responsible for maintaining and managing cached data."

Creating AMQ network of broker clusters on JBoss Fuse 6.2, without fabric

I want to create (2) broker clusters connected by network of brokers in JBoss Fuse 6.2; each cluster has 2 master/slave pairs.
It's a small cluster, so we don't intend to use Fabric/Zookeeper; everything will be statically configured, no auto discovery.
Questions
Is it possible to use fabric profiles to build the topology, but
avoid using fabric at runtime?
Can we use Git, or something similar, for centrally managing container config files, again, without fabric?
We tried creating profiles using fabric:mq-create, but the command is not available unless a fabric is first created, which defeats the purpose.

No fabric profiles requires using fabric. You can use git to store files, but you cannot have JBoss Fuse automatic use it such as it does with fabric. You would need to use git manually.
The AMQ broker in JBoss Fuse is just standard Apache ActiveMQ so you can configure it manually/static as a network of brokers. It just not very easy to do if you haven't done that before.
See the JBoss A-MQ documentation as that covers the broker: http://www.jboss.org/products/amq/overview/
for example at: https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_A-MQ/6.2/html/Using_Networks_of_Brokers/index.html

Real world example of Apache Helix, Zookeeper, Mesos and Erlang?

I am new in
Apache ZooKeeper : ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
Apache Mesos : Apache Mesos is a cluster manager that simplifies the complexity of running applications on a shared pool of servers.
Apache Helix : Apache Helix is a generic cluster management framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes.
Erlang Langauge : Erlang is a programming language used to build massively scalable soft real-time systems with requirements on high availability.
It sounds to me that Helix and Mesos both are useful for Clustering management System. How they are related to ZooKeeper? It'd better if someone give me a real world example for their usage.
I am curious to know How [BOINC][1] are distributing tasks to their clients? Are they using any of the above technologies? (Forget about Erlang).
I just need a brief view on it :)

Erlang was built by Ericsson, designed for use in phone systems. By design, it runs hundreds, thousands, or even 10s of thousands of small processes to handle tasks by sending information between them instead of sharing memory or state. This enables all sorts of interesting features that are great for high availability distributed systems such as:
hot code reloading. Each process is paused, it's relevant module code is swapped out, and it is resumed where it left off, so deploys can happen without restarting or causing significant interruption.
Easy distributed messaging and clustering. Sending a message to a local process or a remote one is fairly seamless in most instances.
Process-local GC. Garbage collection happens in each process independently instead of a global stop-the-world even like java, aiding in low-latency results.
Supervision trees and complex process hierarchy and monitoring/managing.
A few concrete real-world examples that makes great use of Erlang would be:
MongooseIM A highly performant and incredibly scalable, distributed XMPP / Chat server
Riak A distributed key/value store.
Mesos, on the other hand, you can sort of think of as a platform effectively for turning a datacenter of servers into a platform for teams and developers. If I, say as a company, own a datacenter with 10,000 physical servers, and I have 1,000 engineers developing hundreds of services, a good way to allow the engineers to deploy and manage services across that hardware without them needing to worry about the servers directly. It's an abstraction layer over-top of the physical servers to that allows you to share and intelligently allocate resources.
As a user of Mesos, I might say that I have Service X. It's an executable bundle that lives in location Y. Each instance of Service X needs 4 GB of RAM and 2 cores. And I need 8 instances which will be attached to a load balancer. You can specify this in configuration and deploy based on that config. Mesos will find hardware that has enough ram and CPU capacity available to handle each instance of that service and start it running in each of those locations.
It can handle a lot of other more complex topics about the orchestration of them as well, but that's probably a bit in-depth for this :)
Zookeepers most common use cases are Service Discover and configuration management. You can think of it, fundamentally, a bit like a nested key value store, where services can look at pre-defined paths to see where other services currently live.
A simple example is that I have a web service using a shared database cluster. I know a simple name for that database cluster and where the configuration for it lives in zookeeper. I can look up (or repeatedly poll) that path in zookeeper to check what the addresses of the active database hosts are. And on the other side, if I take a database node out of rotation and replace it with a new one, the config in zookeeper gets updated with the new address, and anything continually looking at it will detect this change and change where it's connected to.
A more complex use case for zookeeper is how Kafka uses it (or did at the time that I last used Kafka). Kafka has streams, and streams have many shards. Each consumer of each stream use zookeeper to save checkpoints in each shard after they have read and processed up to a certain point in the stream. That way if the consumer crashes or is restarted, it knows where to pick up in the stream.

I dont know about Meos and Earlang language. But this article might help you with Helix and Zookeeper.
This article tells us:
Zookeeper is responsible for gluing all parts together where Helix is cluster management component that registers all cluster details (cluster itself, nodes, resources).
The article is related to clustering in JBPM using helix and zookeeper.But with this you will get a basic idea on what helix and zookeeper is used for.
And from most of the articles i read online it seems like zookeeper and helix are used together.

Apache Zookeeper can be installed on a single machine or on a cluster.
It can be used to keep track of logs. It can provide various services on a distributed platform.
Storm and Kafka rely on Zookeeper.
Storm uses Zookeeper to store all state so that it can recover from an outage in any of its (distributed) component services.
Kafka queue consumers can use Zookeeper to store information on what has been consumed from the queue.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas