we have usecase of presto hive accessing s3 file present in avro format.
When we try to use standalone hive-metastore and read this avro data using external table ,we are getting issue SerDeStorageSchemaReader class not found issue
MetaException(message:org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader class not found)
at org.apache.hadoop.hive.metastore.utils.JavaUtils.getClass(JavaUtils.java:54)
We understand this error is coming because SerDeStorageSchemaReader class is not available in standalone-metastore.
i want to understand can be run hive-metastore without using hive/hadoop or there is any other option too?
standalone hive doesnt support avro. we need to install full hadoop plus hive version and start only hive metastore to fix it
Related
I can not solve the problem of compatibility of an external orc and Claudera’s hive.
I have cloudera express version 6.3.2 with hive version 2.1.1
In general, it’s strange, I downloaded the latest version of claudera, and there is old hive 2.1.1 there
Case:
Externally I create some orc (I tried to create it in the local spark and in the same cloudera through map reducer job - the same result)
I'm trying to read this orc in my claudera even through orcfiledump
I get
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 6
at org.apache.orc.OrcFile $ WriterVersion.from (OrcFile.java:145)
I downloaded the orc-tools-1.5.5-uber.jar utility locally to my computer
Also downloaded there the problematic orc
Performed by java -jar orc-tools-1.5.5-uber.jar meta msout2o12.orc
Uber jar with its own hadoop inside have read this orc ok
Structure for msout2o12.orc
File Version: 0.12 with ORC_135
Rows: 242
Compression: ZLIB
Compression size: 262144
Without any creation of tables, just a hive in the cloudera can stupidly not be able to read the orc using its own utility.
The problem begun from the fact that I created an external table and hiveql on the orc generated such error.
But here it just stupidly reduced the problem to a minimum, just hive --orafiledump can not read the orc.
How to make cloudera read normally orcs? ..
What to tighten up in my cloudera?
It was a big surprize for me.
I returning to parquet.
https://community.cloudera.com/t5/Cloudera-Labs/Problem-of-compatibility-of-an-external-orc-and-Claudera-s/m-p/299395/highlight/false#M582
I have Alluxio 1.8 installed on an EMR 5.19.0 cluster, and can see my S3 tables using /usr/local/alluxio/bin/alluxio fs ls /.
However, when I start up hive and issue
hive> [[DDL w/ LOCATION = alluxio://master_host:19998/my_table ]]], I get the following:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
Is there a way of getting past this? I've tried starting hive with --auxpath pointing to both /usr/local/alluxio/client/alluxio-1.8.1-client.jar and a copy of the jar on hdfs without any success.
Any help?
I posted a blog talking about the reasons for the error message java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found. Here are some tips, hope they can help:
For Hive, set environment variable HIVE_AUX_JARS_PATH in conf/hive-env.sh:
export HIVE_AUX_JARS_PATH=/<PATH_TO_ALLUXIO>/client/alluxio-1.8.1-client.jar:${HIVE_AUX_JARS_PATH}
which I guess is equivalent to what you have done to set --auxpath.
Depending on your setting of Hive (e.g., Hive on MR or Spark or Tez), you may also need to make sure the runtime is also able to access the client jar. Take Hive on MR as an example, you perhaps also need to append the path to Alluxio client jar to mapreduce.application.classpath or yarn.application.classpath to ensure each task of the MR jobs can access this jar.
does Flyway can support Hive or Impala JDBC Drivers?
I Googled around and found almost nothing (examples, problems) about this.
If i start a migration specifying the Hive Driver (previously downloaded in the /drivers path) I'm getting
"ERROR: Unable to autodetect JDBC driver for url: jdbc:hive2://HOSTNAME:10000/DATABASE;principal=hive/HOSTNAME#PRINCIPAL"
Thanks all in advance for the support.
Please refer the below link, there is an Open issue for Hive support for flyway
https://github.com/flyway/flyway/issues/142
I am trying to configure Hive on Spark but even after trying for 5 days i am not getting any solution..
Steps followed:
1.After spark installation,going in hive console and setting below proeprties
set hive.execution.engine=spark;
set spark.master=spark://INBBRDSSVM294:7077;
set spark.executor.memory=2g;
set spark.serializer=org.apache.spark.serializer.KryoSerializer;
2.Added spark -asembly jar in hive lib.
3.When running select count(*) from table_name I am getting below error:
2016-08-08 15:17:30,207 ERROR [main]: spark.SparkTask (SparkTask.java:execute(131))
- Failed to execute spark task, with exception
'org.apache.hadoop.hive.ql.metadata.HiveException (Failed to create spark client.)'
Hive version: 1.2.1
Spark version: tried with 1.6.1,1.3.1 and 2.0.0
Would appreciate if any one can suggest something.
You can download spark-1.3.1 src from spark download website and try to build spark-1.3.1 without hive version using:
./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.4" -Dhadoop.version=2.7.1 -Dyarn.version=2.7.1 –DskipTests
Then copy spark-assembly-1.3.1-hadoop2.7.1.jar to hive/lib folder.
And follow https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started#HiveonSpark:GettingStarted-SparkInstallation to set necessary properties.
First of all, you need to pay attention to which versions are compatible. If you choose Hive 1.2.1, I advise you to use Spark 1.3.1. You can see the version compatibility list here.
The mistake you have is a general mistake. You need to start Spark and see what errors the Spark Workers says. However, have you already copied the hive-site.xml to spark/conf?
I am currently configuring a Cloudera HDP dev image using this tutorial on CentOS 6.5, installing the base and then adding the different components as I need them. Currently, I am installing / testing HCatalog using this section of the tutorial linked above.
I have successfully installed the package and am now testing HCatalog integration with Pig with the following script:
A = LOAD 'groups' USING org.apache.hcatalog.pig.HCatLoader();
DESCRIBE A;
I have previously created and populated a 'groups' table in Hive before running the command. When I run the script with the command pig -useHCatalog test.pig I get an exception rather than the expected output. Below is the initial part of the stacktrace:
Pig Stack Trace
---------------
ERROR 2245: Cannot get schema from loadFunc org.apache.hcatalog.pig.HCatLoader
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Cannot get schema from loadFunc org.apache.hcatalog.pig.HCatLoader
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1608)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1547)
at org.apache.pig.PigServer.registerQuery(PigServer.java:518)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:991)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
...
Has anyone encountered this error before? Any help would be much appreciated. I would be happy to provide more information if you need it.
The error was caused by HBase's Thrift server not being proper configured. I installed/configured Thrift and added the following to my hive-xml.site with the proper server information added:
<property>
<name>hive.metastore.uris</name>
<value>thrift://<!--URL of Your Server-->:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
I thought the snippet above was not required since I am running Cloudera HDP in pseudo-distributed mode.Turns out, it and HBase Thrift are required to use HCatalog with Pig.