Running impala service alone in docker - impala

I am trying to install impala in a docker container(using MAPR documentstion).In this docker I am running only Impala service and remaining hive,maprfs services will be running on physical node.When starting impala-server(impala daemon) I am getting wearied errors.I just wanted to know whether this kind of installation is possible or not.
Thanks for Help!!

It is possible, but it depends on your Impala and MapR version. Impala 2.2.0 is supported on MapR 5.x. Impala 2.5.0 is supported on MapR 5.1 and later. Check enter link description here before proceeding.

Related

Apache NiFi Hive Processors with Hive 1.1 (CDH 5.7.1)

I work with Cloudera Manager CDH 5.7.1, which supports only Hive 1.1.0.
NiFi 1.0.0-BETA uses Hive 1.2.1.
When I try to use SelectHiveQL processor, I get the following error: Required field 'client_protocol' is unset!, which means that there's a version mismatch between Hive client and server.
Any suggestions to solve this problem?
I thought about building NiFi with hive-jdbc dependency version 1.1.0 instead of the default 1.2.1, but I hope there's a better solution.
Since NiFi is an Apache project, it builds with Apache JARs (such as Hive and Hadoop). However there are vendor-specific profiles and build properties you can use to build NiFi for a particular Hadoop distribution.
For example you could try the following to build a NiFi distro for CDH 5.7.1:
mvn clean install -DskipTests -Pcloudera -Dhadoop.version=2.6.0-cdh5.7.1 -Dhive.version=1.1.0-cdh5.7.1 -Dhbase.version=1.2.0-cdh5.7.1
The Hive processors use Hadoop libraries provided by the NiFi Hadoop Libraries NAR, and other NARs (like the Hadoop/HDFS processors) use this same libraries NAR, so the best approach is to build the whole thing. Otherwise you can try to replace just the Hadoop/Hive/HBase-related NARs and see if that works.
Because NiFi expects the newer version of Hive, it is necessary to remove the unsupported newer features (such as HiveStreaming and ORC support), support the older version of Thrift, and build against the Cloudera-specific libraries.
I have created a branch of the current NiFi-1.1.x release with the necessary changes to get the PutHiveQL and SelectHiveQL processors to work, which you could build as below:
git clone https://github.com/Chaffelson/nifi.git
git checkout nifi-1.1.x-cdhHiveBundle
mvn -T C2.0 clean install -Pcloudera -Dhive.version=1.1.0-cdh5.10.0 -Dhive.hadoop.version=2.6.0-cdh5.10.0 -Dhadoop.version=2.6.0-cdh5.10.0 -DskipTests
I have posted a more complete coverage of this on the Hortonworks Community forum: https://community.hortonworks.com/articles/93771/connecting-nifi-to-cdh-hive.html

Is there a recommended ami-ec2 image for dse 4.7.2 which includes spark and mllib

i would like to install datastax 4.7.2 or the latest version on ec2 and take advantage of spark and mllib. Is there a recommended image that I can use to ssh into?
You can use the DataStax AMI. DSE is free in Dev or in Prod for qualifying startups.
Just get your credentials by filling out the download form.

Error while executing select query in Hive - how to update Hadoop version

I am currently using hadoop 1.0.3 version. I recently installed Apache Hive to run with it. I was running the select * query which gave me an NoSuchMethodError: org.apache.hadoop.mapred.JobConf.unset
I further found out its a compatibility issue with my current version of hadoop and requires me to upgrade to 1.2 or later.
I am fairly new to hadoop and would like to upgrade my current version to 1.2 or later. How do I go about doing the same.
I could not find any resources online to do so.
Thanks.
Just download hadoop 1.2.x from here and do necessary configuration changes in your new hadoop. Change HADOOP_HOME to point to your new hadoop folder.
NOTE: Change all the environmental variables (including .bashrc) to point to your new hadoop.

Configure Redis Cluster in Ubuntu Server 14.04

I've installed redis-server using apt-get install redis-server and everything went fine.
Right now I'm trying to configure it in a Cluster mode. The problem is that in the tutorial supplied here http://redis.io/topics/cluster-tutorial they use a script called redis-trib.rb which I can't find it in my system.
Can you please tell me how can I configure my Redis to run in Cluster mode without that script ?
I would like to have a setup with two masters, each on a different machine.
Thank you very much.
Had same problem with reredis-trib.rb
This tutorial explains how to create Redis Cluster using only Redis commands: Configuring and Running Redis Cluster on Linux
You need Redis 3.0.0 beta to run Cluster. You'll not find it in a Linux distribution, since they all have copy of the stable server (fortunately!). Redis 3.0.0 will go out as a stable release the next week. You can find the source code of the stable release here: http://redis.io/download.
There is now a tutorial for Ubuntu at https://www.digitalocean.com/community/tutorials/how-to-configure-a-redis-cluster-on-ubuntu-14-04 which includes installation of a PPA to supply 3.0.x. This tutorial is only for two nodes and does not reference redis-trib.rb ...

Can Presto connect to other Hadoop distributions and run queries

I see Presto has plugin only to CDH4. Can I connect to other distributions such as HortonWorks from this and what does it take to do it.
Without a specific plugin, I am running into "path host null" errors when executing queries from Presto. Appreciate your help.
The Presto Hive connector supports multiple versions of Hadoop:
hive-hadoop1: Apache Hadoop 1.x
hive-hadoop2: Apache Hadoop 2.x
hive-cdh4: Cloudera CDH 4
hive-cdh5: Cloudera CDH 5
See the Hive Connector documentation for more details.
Where is the code for the CDH connector in GitHub?
briefly looking at the code in GitHub, i dont see anything specific to CDH , other than the name, in presto / presto-hive-cdh4 /src / main /java - am i looking at the wrong thing?