I would like to know the version of Hive that comes along with the CDH 5.5.x version?
I have managed to get the answer. CDH 5.5.x comes with Hive 1.1.0. For more details of various other tools' version check out this link.
Related
How do I change the table version via the Hudi CLI?
Steps:
ssh into EMR
kick off the hudi cli /usr/lib/hudi/cli/bin/hudi-cli.sh. Version of the Hudi CLI is 1.
connect to my table connect --path s3://bucket/db/table
In the desc of the table I see that it is version=3, but I want to use Hudi 0.9.0 to write to the table so I would like to set the table to version=2.
org.apache.hudi.exception.HoodieException: Unknown versionCode:3
at org.apache.hudi.common.table.HoodieTableVersion.lambda$versionFromCode$1(HoodieTableVersion.java:54)
at java.util.Optional.orElseThrow(Optional.java:290)
at org.apache.hudi.common.table.HoodieTableVersion.versionFromCode(HoodieTableVersion.java:54)
at org.apache.hudi.common.table.HoodieTableConfig.getTableVersion(HoodieTableConfig.java:246)
Sadly, I'm not aware of any way to use version 0.9.0 to downgrade 3 to 2, due to the error you are getting. There is no way for version 0.9.0 to know how 0.10.0 was writing things differently.
Recently, AWS has 6.6 available for use, but it isn't well documented. I'd recommend switching over to that, because it has hudi version 0.10.0 and can then do that downgrade.
This link should get updated whenever 6.6 gets updated in the docs.
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-app-versions-6.x.html
Side note, if you are using the bootstrap action script provided by AWS to repair the log4j vulnerability, I'd recommend taking the version 6.5 version provided and editing it to be 6.6. There is not a 6.6 script available at this time, but I did that and was not able to detect any vulnerabilities.
This link provides an explanation on the bootstrap action:
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-log4j-vulnerability.html
I work with Cloudera Manager CDH 5.7.1, which supports only Hive 1.1.0.
NiFi 1.0.0-BETA uses Hive 1.2.1.
When I try to use SelectHiveQL processor, I get the following error: Required field 'client_protocol' is unset!, which means that there's a version mismatch between Hive client and server.
Any suggestions to solve this problem?
I thought about building NiFi with hive-jdbc dependency version 1.1.0 instead of the default 1.2.1, but I hope there's a better solution.
Since NiFi is an Apache project, it builds with Apache JARs (such as Hive and Hadoop). However there are vendor-specific profiles and build properties you can use to build NiFi for a particular Hadoop distribution.
For example you could try the following to build a NiFi distro for CDH 5.7.1:
mvn clean install -DskipTests -Pcloudera -Dhadoop.version=2.6.0-cdh5.7.1 -Dhive.version=1.1.0-cdh5.7.1 -Dhbase.version=1.2.0-cdh5.7.1
The Hive processors use Hadoop libraries provided by the NiFi Hadoop Libraries NAR, and other NARs (like the Hadoop/HDFS processors) use this same libraries NAR, so the best approach is to build the whole thing. Otherwise you can try to replace just the Hadoop/Hive/HBase-related NARs and see if that works.
Because NiFi expects the newer version of Hive, it is necessary to remove the unsupported newer features (such as HiveStreaming and ORC support), support the older version of Thrift, and build against the Cloudera-specific libraries.
I have created a branch of the current NiFi-1.1.x release with the necessary changes to get the PutHiveQL and SelectHiveQL processors to work, which you could build as below:
git clone https://github.com/Chaffelson/nifi.git
git checkout nifi-1.1.x-cdhHiveBundle
mvn -T C2.0 clean install -Pcloudera -Dhive.version=1.1.0-cdh5.10.0 -Dhive.hadoop.version=2.6.0-cdh5.10.0 -Dhadoop.version=2.6.0-cdh5.10.0 -DskipTests
I have posted a more complete coverage of this on the Hortonworks Community forum: https://community.hortonworks.com/articles/93771/connecting-nifi-to-cdh-hive.html
Is there a single version of phoenix that is compatible with HBase provided in both Cloudera 5.5 and Hortonworks 2.4?
Hortonworks provides custom fixes and "backports" to their version of Phoenix in their HDP distribution. Cloudera may do the same as well.
I am assuming that you are asking about a client version that is compatible with both server versions.
Are you using the "thin" client jars? Do you find that your application does not work for one distribution or the other (dependent on which version jars you have)? Your application may work for both distributions if you use the non-thin jars.
If you would like to continue using the thin client, you may have to set phoenix.queryserver.serialization to JSON. HDP 2.3.4+ use PROTOBUF by default whereas CDH does not currently support PROTOBUF.
If you are asking about manually installing a version of the Phoenix server that can be installed on both distributions, both use HBase 1.1.x. Any Phoenix version 4.4+ can be used on either distribution. But I recommend using the version that is distributed with the platform.
A Phoenix 4.5.2 package for CDH 5.5.x is available via Cloudera Labs:
http://blog.cloudera.com/blog/2015/11/new-apache-phoenix-4-5-2-package-from-cloudera-labs/
Note however that Cloudera Labs packages are for dev/test only (not supported by Cloudera).
I am currently using hadoop 1.0.3 version. I recently installed Apache Hive to run with it. I was running the select * query which gave me an NoSuchMethodError: org.apache.hadoop.mapred.JobConf.unset
I further found out its a compatibility issue with my current version of hadoop and requires me to upgrade to 1.2 or later.
I am fairly new to hadoop and would like to upgrade my current version to 1.2 or later. How do I go about doing the same.
I could not find any resources online to do so.
Thanks.
Just download hadoop 1.2.x from here and do necessary configuration changes in your new hadoop. Change HADOOP_HOME to point to your new hadoop folder.
NOTE: Change all the environmental variables (including .bashrc) to point to your new hadoop.
I see Presto has plugin only to CDH4. Can I connect to other distributions such as HortonWorks from this and what does it take to do it.
Without a specific plugin, I am running into "path host null" errors when executing queries from Presto. Appreciate your help.
The Presto Hive connector supports multiple versions of Hadoop:
hive-hadoop1: Apache Hadoop 1.x
hive-hadoop2: Apache Hadoop 2.x
hive-cdh4: Cloudera CDH 4
hive-cdh5: Cloudera CDH 5
See the Hive Connector documentation for more details.
Where is the code for the CDH connector in GitHub?
briefly looking at the code in GitHub, i dont see anything specific to CDH , other than the name, in presto / presto-hive-cdh4 /src / main /java - am i looking at the wrong thing?