Can we use Datastax Enterprise Opscenter with existing Apache cluster? - datastax

So we just joined the Datastax startup program, and have an existing cluster of Apache Cassandra nodes. We'd like to use some of the enterprise features of opscenter, like continuous repair and rebalancing data.
In order to utilize those, do we need to be running a Datastax Enterprise Cluster, or can we just run Datastax Enterprise Opscenter on our existing nodes?

Related

Ambari DB is damaged without Ambari DB backup

We have Ambari HDP cluster ( HDP version - 2.6.4 ) , with 420 workers linux machines ( when each worker include data node and node manager service )
Unfortunately Ambari DB is damaged , and we not have Ambari DB dump , so we cant recover Ambari DB , so actually we not have Ambari and Ambari GUI
But HDFS disks on workers machines include HDFS data , and name node is still working with all data as ( journal/hdfsha/current/ ) and ( namenode/current )
So HDFS works without Ambari
So regarding what I said until now - it is possible install new Ambari cluster and then add existing working HDFS to the cluster ?
Dose hortonworks / cloudera have procedure for this process ?
An "Ambari cluster" is not something. You install Ambari agents and point the server at them.
If you wipe the database, then the agents may attempt to reconfigure your services, however, so you better take a backup of at least core-site.xml, hdfs-site.xml, yarn-site.xml, etc.

Redis Enterprise Cluster aware client

Can anyone explain to me what is aware client in Redis Enterprise ?
I found this post: Redis Enterprise Clustering Command Error 'CLUSTER'
I try to use Redis Enterprise Cluster with docker.
I create 3 docker redis nodes with two shards for better scalability.
So what exactly is that aware client and what is the difference between non-cluster aware ?
Also, what is regular OSS cluster ?
Thank You..
"Cluster Aware" means a Redis client that supports the OSS Cluster API (https://redis.io/topics/cluster-spec). For example, the Ruby client https://github.com/redis/redis-rb#cluster-support supports it.
A non-aware client is a client that only supports connecting to Redis in single-instance mode (and perhaps Sentinel), such as the Python client https://github.com/andymccurdy/redis-py.
The Enterprise Cluster can be used by both types of clients regardless of how the database is deployed (i.e. clustered or not).
To clear some more of the confusion:
OSS Cluster - a mode of deployment and API (i.e. not single-instance)
Enterprise Cluster - a product

Is it possible to install Apache Ambari on top of an existing cluster

We have an existing Hadoop cluster that is not managed by Ambari. Is it possible to install Apache Ambari on top of an existing Hadoop cluster?
No, Ambari must provision the cluster it's monitoring.
Ambari is designed around a Stack concept where each stack consists of several services. A stack definition is what allows Ambari to install, manage and monitor the services in the cluster.

How to monitor hadoop cluster using Ambari on centos 7

I have a small hadoop cluster i.e one master and three slave nodes. I have to monitor cluster. I have found that we can use Ambari. CentOS 7 is installed on all machines. Please provide a complete details how I can do that ?. I have found that Ambari can be used for new cluster i.e you have to install new cluster. It does not work with already running cluster?
At the moment Ambari does not support CentOS 7, so that's not going to work.
However, Ambari does not perform cluster monitoring on its own. It uses Nagios for the purpose. Nagios is an independent software project that you can setup independently. That said it's kinda painful to do.
ambari-server for Ambari 2.2+ can be installed and works good on CentOS 7.
You have to installed ambari-server on one of the hosts (master node) and can use the webUI hostname:8080 for installing ambari agents on other hosts. Alternatively, ambari agents can be installed manually on other hosts can can be linked to communicate with the ambari-server.

apache hadoop, hbase and nutch components distribution for 4 servers cluster

I have 4 systems. I want to crawl some data. For that first I need to configure cluster. I am confused about placement of components.
should I place all component (hadoop, hive, hbase, nutch) in one machine and add other machines as nodes in hadoop?
Should I place hbase in one machine, nutch in other and hadoop in third and add forth machine as slave of hadoop?
Should HBase be in pseudo distributed mode or full distributed.
How many slaves I sholud add in hbase if I run it as fully distributed mode.
What should be the best way. PLease guide step by step ( For hbase and hadoop)
Say you have 4 nodes n1, n2, n3 and n4.
You can install hadoop and hbase in distributed mode.
If you are using Hadoop 1.x -
n1 - hadoop master[Namenode and Jobtracker]
n2, n3 and n3 - hadoop slaves [datanodes and tasktrackers]
For HBase, you can choose n1 or any other node as Master node, Since Master node are usually not CPU/Memory intensive, all Masters can be deployed on single node on test setup, However in Production its good to have each Master deployment on a separate node.
Lets say n2 - HBase Master, remaining 3 nodes can act as regionservers.
Hive and Nutch can reside on any node.
Hope this helps; For a test setup this should be good to go.
Update -
For Hadoop 2.x, since your cluster size is small, Namenode HA deployment can be skipped.
Namenode HA would require two nodes one each for an active and standby node.
A zookeeper quorum which again requires odd number of nodes so a minimum of three nodes would be required.
A journal quorum again require a minimum of 3 nodes.
But for a cluster this small HA might not be a major concern. So you can keep
n1 - namenode
n2 - ResouceManager or Yarn
and remaining nodes can act as datanodes, try not to deploy anything else on the yarn node.
Rest of the deployment for HBase, Hive and Nutch would remain same.
In my opinion, you should install Hadoop in fully distributed mode, so the jobs could run in parallel manner and much faster, as the MapReduce tasks will be distributed in 4 machines. Of course, the Hadoop's master node should run in one single machine.
If you need to process big amount of data, it's a good choice to install HBase in one single machine and the Hadoop in 3.
You could make all the above very easy using tools/platforms with a very friendly GUI like Cloudera Manager and Hortonworks. They will help you to control and maintain your cluster better but they are also provide Health Monitoring, Cluster Analytics as well as E-Mail notifications for every error occurs in your cluster.
Cloudera Manager
http://www.cloudera.com/content/cloudera/en/products-and-services/cloudera-enterprise/cloudera-manager.html
Hortonworks
http://hortonworks.com/
In these two links, you can find more guidance about how you could costruct your cluster