No mr job execution info in beeline - hive

In beeline, I could not see the job execution info (like job progress), I have already set the following properties in hive-site.xml. Could anyone help to figure out how to diagnose such issues ? How can I check whether hive server2 take the correct configuration ?
hive.server2.logging.operation.level VERBOSE
I only see the following log in beeline
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

did you try
set hive.root.logger=INFO,console;

I forget to set hive.async.log.enabled to be false, after this setting it works

Related

Naming hive session cannot work sometimes

I ran sqls on hive tez by hive -f xxx.sql --hiveconf hive.session.id=sessionName
but on the yarn resourcemanager displays like this
HIVE-f4ea6c3f-f4cf-4db3-8801-da6f94e20237
HIVE-d920c434-d2e6-4c1c-a506-d69b580960f7
sometimes it displays correctly..
How can solve this problem
The thing is Tez can reuse containers. AM container reuse = session reuse. controlled by this parameter: tez.am.container.reuse.enabled=true
One yarn AM container can be reused for different Tez sessions. This is the reason why yarn name is different.
BTW there is one more parameter added in this JIRA HIVE-12357, you can set name for each DAG:
hive.query.name

Alluxio + Hive on EMR

I have Alluxio 1.8 installed on an EMR 5.19.0 cluster, and can see my S3 tables using /usr/local/alluxio/bin/alluxio fs ls /.
However, when I start up hive and issue
hive> [[DDL w/ LOCATION = alluxio://master_host:19998/my_table ]]], I get the following:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
Is there a way of getting past this? I've tried starting hive with --auxpath pointing to both /usr/local/alluxio/client/alluxio-1.8.1-client.jar and a copy of the jar on hdfs without any success.
Any help?
I posted a blog talking about the reasons for the error message java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found. Here are some tips, hope they can help:
For Hive, set environment variable HIVE_AUX_JARS_PATH in conf/hive-env.sh:
export HIVE_AUX_JARS_PATH=/<PATH_TO_ALLUXIO>/client/alluxio-1.8.1-client.jar:${HIVE_AUX_JARS_PATH}
which I guess is equivalent to what you have done to set --auxpath.
Depending on your setting of Hive (e.g., Hive on MR or Spark or Tez), you may also need to make sure the runtime is also able to access the client jar. Take Hive on MR as an example, you perhaps also need to append the path to Alluxio client jar to mapreduce.application.classpath or yarn.application.classpath to ensure each task of the MR jobs can access this jar.

Issue with executing spark sql job using oozie action

Facing a weird issue, trying to execute a spark-sql(Spark2) job using oozie action but the behavior of execution is quite weird, at times it executes fine but sometimes it continues to be in "Running" state forever, on checking the logs got the below issue.
WARN org.apache.spark.scheduler.cluster.YarnClusterScheduler` - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
The strange thing is that we have already provided sufficient resources, the same can be seen from spark environment variables as well and as well under the cluster resources(cluster has sufficient cores and RAM).
<spark-opts>--executor-memory 10G --num-executors 7 --executor-cores 3 --driver-memory 8G --driver-cores 2</spark-opts>
With the same configuration sometimes it is executing fine as well. Are we missing something?
The issue was related to jar conflict,following are the suggestions to identify the same.
a)Check the maven dependency tree to make sure there is no transitive dependency conflict.
b)While spark job is running check the environment variables being used using Spark UI.
c)Resolve the conflict and run a maven clean package.

Parquet Warning Filling up Logs in Hive MapReduce on Amazon EMR

I am running a custom UDAF on a table stored as parquet on Hive on Tez. Our Hive jobs are run on YARN, all set up in Amazon EMR. However, due to the fact that the parquet data we have was generated with an older version of Parquet (1.5), I am getting a warning that is filling up the YARN logs and causing the disk to run out of space before the job finishes. This is the warning:
PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring
statistics
because created_by could not be parsed (see PARQUET-251): parquet-mr version
It also prints a stack track. I have been trying to silence the warning logs to no avail. I have managed to turn off just about every type of log except this warning. I have tried modifying just about every Log4j settings file using the AWS config as outlined here.
Things I have tried so far:
I set the following settings in tez-site.xml (writing them in JSON format because that's what AWS requires for configuration) It is in proper XML format of course on the actual instance.
"tez.am.log.level": "OFF",
"tez.task.log.level": "OFF",
"tez.am.launch.cluster-default.cmd-opts": "-Dhadoop.metrics.log.level=OFF -Dtez.root.logger=OFF,CLA",
"tez.task-specific.log.level": "OFF;org.apache.parquet=OFF"
I have the following settings on mapred-site.xml. These settings effectively turned off all logging that occurs in my YARN logs except for the warning in question.
"mapreduce.map.log.level": "OFF",
"mapreduce.reduce.log.level": "OFF",
"yarn.app.mapreduce.am.log.level": "OFF"
I have these settings in just about every other log4j.properties file .I found in the list shown in previous AWS link.
"log4j.logger.org.apache.parquet.CorruptStatistics": "OFF",
"log4j.logger.org.apache.parquet": "OFF",
"log4j.rootLogger": "OFF, console"
Honestly at this point, I just want to find some way turn off logs and get the job running somehow. I've read about similar issues such as this link where they fixed it by changing log4j settings, but that's for Spark and it just doesn't seem to be working on Hive/Tez and Amazon. Any help is appreciated.
Ok, So I ended up fixing this by modifying the java logging.properties file for EVERY single data node and the master node in EMR. In my case the file was located at /etc/alternatives/jre/lib/logging.properties
I added a shell command to the bootstrap action file to automatically add the following two lines to the end of the properties file:
org.apache.parquet.level=SEVERE
org.apache.parquet.CorruptStatistics.level = SEVERE
Just wanted to update in case anyone else faced the same issue as this is really not set up properly by Amazon and required a lot of trial and error.

HIVE on Spark Issue

I am trying to configure Hive on Spark but even after trying for 5 days i am not getting any solution..
Steps followed:
1.After spark installation,going in hive console and setting below proeprties
set hive.execution.engine=spark;
set spark.master=spark://INBBRDSSVM294:7077;
set spark.executor.memory=2g;
set spark.serializer=org.apache.spark.serializer.KryoSerializer;
2.Added spark -asembly jar in hive lib.
3.When running select count(*) from table_name I am getting below error:
2016-08-08 15:17:30,207 ERROR [main]: spark.SparkTask (SparkTask.java:execute(131))
- Failed to execute spark task, with exception
'org.apache.hadoop.hive.ql.metadata.HiveException (Failed to create spark client.)'
Hive version: 1.2.1
Spark version: tried with 1.6.1,1.3.1 and 2.0.0
Would appreciate if any one can suggest something.
You can download spark-1.3.1 src from spark download website and try to build spark-1.3.1 without hive version using:
./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.4" -Dhadoop.version=2.7.1 -Dyarn.version=2.7.1 –DskipTests
Then copy spark-assembly-1.3.1-hadoop2.7.1.jar to hive/lib folder.
And follow https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started#HiveonSpark:GettingStarted-SparkInstallation to set necessary properties.
First of all, you need to pay attention to which versions are compatible. If you choose Hive 1.2.1, I advise you to use Spark 1.3.1. You can see the version compatibility list here.
The mistake you have is a general mistake. You need to start Spark and see what errors the Spark Workers says. However, have you already copied the hive-site.xml to spark/conf?