Error while executing select query in Hive - how to update Hadoop version - apache

I am currently using hadoop 1.0.3 version. I recently installed Apache Hive to run with it. I was running the select * query which gave me an NoSuchMethodError: org.apache.hadoop.mapred.JobConf.unset
I further found out its a compatibility issue with my current version of hadoop and requires me to upgrade to 1.2 or later.
I am fairly new to hadoop and would like to upgrade my current version to 1.2 or later. How do I go about doing the same.
I could not find any resources online to do so.
Thanks.

Just download hadoop 1.2.x from here and do necessary configuration changes in your new hadoop. Change HADOOP_HOME to point to your new hadoop folder.
NOTE: Change all the environmental variables (including .bashrc) to point to your new hadoop.

Related

How to change Hudi table version via Hudi CLI

How do I change the table version via the Hudi CLI?
Steps:
ssh into EMR
kick off the hudi cli /usr/lib/hudi/cli/bin/hudi-cli.sh. Version of the Hudi CLI is 1.
connect to my table connect --path s3://bucket/db/table
In the desc of the table I see that it is version=3, but I want to use Hudi 0.9.0 to write to the table so I would like to set the table to version=2.
org.apache.hudi.exception.HoodieException: Unknown versionCode:3
at org.apache.hudi.common.table.HoodieTableVersion.lambda$versionFromCode$1(HoodieTableVersion.java:54)
at java.util.Optional.orElseThrow(Optional.java:290)
at org.apache.hudi.common.table.HoodieTableVersion.versionFromCode(HoodieTableVersion.java:54)
at org.apache.hudi.common.table.HoodieTableConfig.getTableVersion(HoodieTableConfig.java:246)
Sadly, I'm not aware of any way to use version 0.9.0 to downgrade 3 to 2, due to the error you are getting. There is no way for version 0.9.0 to know how 0.10.0 was writing things differently.
Recently, AWS has 6.6 available for use, but it isn't well documented. I'd recommend switching over to that, because it has hudi version 0.10.0 and can then do that downgrade.
This link should get updated whenever 6.6 gets updated in the docs.
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-app-versions-6.x.html
Side note, if you are using the bootstrap action script provided by AWS to repair the log4j vulnerability, I'd recommend taking the version 6.5 version provided and editing it to be 6.6. There is not a 6.6 script available at this time, but I did that and was not able to detect any vulnerabilities.
This link provides an explanation on the bootstrap action:
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-log4j-vulnerability.html

Apache NiFi Hive Processors with Hive 1.1 (CDH 5.7.1)

I work with Cloudera Manager CDH 5.7.1, which supports only Hive 1.1.0.
NiFi 1.0.0-BETA uses Hive 1.2.1.
When I try to use SelectHiveQL processor, I get the following error: Required field 'client_protocol' is unset!, which means that there's a version mismatch between Hive client and server.
Any suggestions to solve this problem?
I thought about building NiFi with hive-jdbc dependency version 1.1.0 instead of the default 1.2.1, but I hope there's a better solution.
Since NiFi is an Apache project, it builds with Apache JARs (such as Hive and Hadoop). However there are vendor-specific profiles and build properties you can use to build NiFi for a particular Hadoop distribution.
For example you could try the following to build a NiFi distro for CDH 5.7.1:
mvn clean install -DskipTests -Pcloudera -Dhadoop.version=2.6.0-cdh5.7.1 -Dhive.version=1.1.0-cdh5.7.1 -Dhbase.version=1.2.0-cdh5.7.1
The Hive processors use Hadoop libraries provided by the NiFi Hadoop Libraries NAR, and other NARs (like the Hadoop/HDFS processors) use this same libraries NAR, so the best approach is to build the whole thing. Otherwise you can try to replace just the Hadoop/Hive/HBase-related NARs and see if that works.
Because NiFi expects the newer version of Hive, it is necessary to remove the unsupported newer features (such as HiveStreaming and ORC support), support the older version of Thrift, and build against the Cloudera-specific libraries.
I have created a branch of the current NiFi-1.1.x release with the necessary changes to get the PutHiveQL and SelectHiveQL processors to work, which you could build as below:
git clone https://github.com/Chaffelson/nifi.git
git checkout nifi-1.1.x-cdhHiveBundle
mvn -T C2.0 clean install -Pcloudera -Dhive.version=1.1.0-cdh5.10.0 -Dhive.hadoop.version=2.6.0-cdh5.10.0 -Dhadoop.version=2.6.0-cdh5.10.0 -DskipTests
I have posted a more complete coverage of this on the Hortonworks Community forum: https://community.hortonworks.com/articles/93771/connecting-nifi-to-cdh-hive.html

What version of Hive is packed in CDH 5.5.x?

I would like to know the version of Hive that comes along with the CDH 5.5.x version?
I have managed to get the answer. CDH 5.5.x comes with Hive 1.1.0. For more details of various other tools' version check out this link.

Where to download Hive 0.12 source?

I have raised a beeline bug and would like to test the patch, so I'm trying to recompile Hive 0.12 with the patch, but the problem that it seems Apache only host versions 0.13.1+:
http://www.apache.org/dyn/closer.cgi/hive/
Anybody knows a place to find older versions (0.12)?
I think you can find the source code you're looking for here Apache Hive releases page
Now it seems to be hosted on GitHub.

Can Presto connect to other Hadoop distributions and run queries

I see Presto has plugin only to CDH4. Can I connect to other distributions such as HortonWorks from this and what does it take to do it.
Without a specific plugin, I am running into "path host null" errors when executing queries from Presto. Appreciate your help.
The Presto Hive connector supports multiple versions of Hadoop:
hive-hadoop1: Apache Hadoop 1.x
hive-hadoop2: Apache Hadoop 2.x
hive-cdh4: Cloudera CDH 4
hive-cdh5: Cloudera CDH 5
See the Hive Connector documentation for more details.
Where is the code for the CDH connector in GitHub?
briefly looking at the code in GitHub, i dont see anything specific to CDH , other than the name, in presto / presto-hive-cdh4 /src / main /java - am i looking at the wrong thing?