I have been trying to omit logs from console while querying in hive, but still it is showing up.
If you are opening the hive console by typing
> hive
in your terminal and then write queries, you can solve this by simply using
> hive -S
This basically means that you are starting hive in silent mode.
Hope that helps.
You could increase the polling interval to minutes or hours:
SET hive.exec.counters.pull.interval=[millis];
The default is 1000 milliseconds, but you can increase it to anything you like. That should decrease the number of logs written to stdout.
If you don't want any logs on the console while starting the shell you can set the hive.root.logger property
$HIVE_HOME/bin/hive --config hive.root.logger=INFO,DRFA
hive.root.logger specifies the logging level as well as the log
destination. Specifying console as the target sends the logs to the
standard error (instead of the log file).
If you want to see ERROR messages on console you can set this command
$HIVE_HOME/bin/hive --config hive.root.logger=ERROR,console
Start hive in silent mode using
$ hive -S
then Set logger level to Error, which will avoid Warnings/Info from printing.
hive> set logger.PerfLogger.level = ERROR;
If there is "SLF4J: Class path contains multiple SLF4J bindings." in your log, it means that there are multiple log4j jars (different versions, different behaviors) in the class path
I don't know the principle of log4j, but according to the Hadoop configuration file, perform the following steps:
cd $HIVE_HOME/conf
cat > log4j.properties <<EOL
log4j.rootLogger=WARN, CA
log4j.appender.CA=org.apache.log4j.ConsoleAppender
log4j.appender.CA.layout=org.apache.log4j.PatternLayout
log4j.appender.CA.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
EOL
After starting hive (Hive 3.1.2 Apache), the log is set to WARN level, which may not necessarily work, but you can try it.
Related
I'm new flink user and I have the following problem.
I use flink on YARN cluster to transfer related data extracted from RDBMS to HBase.
I write flink batch application on java with multiple ExecutionEnvironments (one per RDB table to transfer table rows in parrallel) to transfer table by table sequentially (because call of env.execute() is blocking).
I start YARN session like this
export YARN_CONF_DIR=/etc/hadoop/conf
export FLINK_HOME=/opt/flink-1.3.1
export FLINK_CONF_DIR=$FLINK_HOME/conf
$FLINK_HOME/bin/yarn-session.sh -n 1 -s 4 -d -jm 2048 -tm 8096
Then I run my application on YARN session started via shell script transfer.sh. Its content is here
#!/bin/bash
export YARN_CONF_DIR=/etc/hadoop/conf
export FLINK_HOME=/opt/flink-1.3.1
export FLINK_CONF_DIR=$FLINK_HOME/conf
$FLINK_HOME/bin/flink run -p 4 transfer.jar
When I start this script from command line manually it works fine - jobs are submitted to YARN session one by one without errors.
Now I should be able to run this script from another java program.
For this aim I use
Runtime.exec("transfer.sh");
(maybe are there better ways to do this? I have seen at REST API but there are some difficulties because job manager is proxied by YARN).
At the beginning is works as usually - first several jobs are submitted to session and finished successfully. But the following jobs are not submitted to YARN session.
In /opt/flink-1.3.1/log/flink-tsvetkoff-client-hadoop-dev1.log I see error (and no another errors found in DEBUG level)
The program execution failed: JobClientActor seems to have died before the JobExecutionResult could be retrieved.
I have tried to analyse this problem by myself and found out that this error has occurred in JobClient class while sending ping request with timeout to JobClientActor (i.e. YARN cluster).
I tried to increase multiple heartbeat and timeout options like akka.*.timeout, akka.watch.heartbeat.* and yarn.heartbeat-delay options but it doesn't solve the problem - new jobs are not submit to YARN session from CliFrontend.
The environment for both case (manual call and call from another program) is the same. When I call
$ ps axu | grep transfer
it will give me output
/usr/lib/jvm/java-8-oracle/bin/java -Dlog.file=/opt/flink-1.3.1/log/flink-tsvetkoff-client-hadoop-dev1.log -Dlog4j.configuration=file:/opt/flink-1.3.1/conf/log4j-cli.properties -Dlogback.configurationFile=file:/opt/flink-1.3.1/conf/logback.xml -classpath /opt/flink-1.3.1/lib/flink-metrics-graphite-1.3.1.jar:/opt/flink-1.3.1/lib/flink-python_2.11-1.3.1.jar:/opt/flink-1.3.1/lib/flink-shaded-hadoop2-uber-1.3.1.jar:/opt/flink-1.3.1/lib/log4j-1.2.17.jar:/opt/flink-1.3.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.3.1/lib/flink-dist_2.11-1.3.1.jar:::/etc/hadoop/conf org.apache.flink.client.CliFrontend run -p 4 transfer.jar
I also tried to update flink to 1.4.0 release or change parallelism of job (even to -p 1) but error has still occurred.
I have no idea what could be different? Is any workaround by the way?
Thank you for any help.
Finally I find out how to resolve that error
Just replace Runtime.exec(...) with new ProcessBuilder(...).inheritIO().start().
I really don't know why the call of inheritIO helps in that case because as I understand it just redirects IO streams from child process to parent process.
But I have checked that if I comment out this line of code the program begins to fall again.
I am running a custom UDAF on a table stored as parquet on Hive on Tez. Our Hive jobs are run on YARN, all set up in Amazon EMR. However, due to the fact that the parquet data we have was generated with an older version of Parquet (1.5), I am getting a warning that is filling up the YARN logs and causing the disk to run out of space before the job finishes. This is the warning:
PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring
statistics
because created_by could not be parsed (see PARQUET-251): parquet-mr version
It also prints a stack track. I have been trying to silence the warning logs to no avail. I have managed to turn off just about every type of log except this warning. I have tried modifying just about every Log4j settings file using the AWS config as outlined here.
Things I have tried so far:
I set the following settings in tez-site.xml (writing them in JSON format because that's what AWS requires for configuration) It is in proper XML format of course on the actual instance.
"tez.am.log.level": "OFF",
"tez.task.log.level": "OFF",
"tez.am.launch.cluster-default.cmd-opts": "-Dhadoop.metrics.log.level=OFF -Dtez.root.logger=OFF,CLA",
"tez.task-specific.log.level": "OFF;org.apache.parquet=OFF"
I have the following settings on mapred-site.xml. These settings effectively turned off all logging that occurs in my YARN logs except for the warning in question.
"mapreduce.map.log.level": "OFF",
"mapreduce.reduce.log.level": "OFF",
"yarn.app.mapreduce.am.log.level": "OFF"
I have these settings in just about every other log4j.properties file .I found in the list shown in previous AWS link.
"log4j.logger.org.apache.parquet.CorruptStatistics": "OFF",
"log4j.logger.org.apache.parquet": "OFF",
"log4j.rootLogger": "OFF, console"
Honestly at this point, I just want to find some way turn off logs and get the job running somehow. I've read about similar issues such as this link where they fixed it by changing log4j settings, but that's for Spark and it just doesn't seem to be working on Hive/Tez and Amazon. Any help is appreciated.
Ok, So I ended up fixing this by modifying the java logging.properties file for EVERY single data node and the master node in EMR. In my case the file was located at /etc/alternatives/jre/lib/logging.properties
I added a shell command to the bootstrap action file to automatically add the following two lines to the end of the properties file:
org.apache.parquet.level=SEVERE
org.apache.parquet.CorruptStatistics.level = SEVERE
Just wanted to update in case anyone else faced the same issue as this is really not set up properly by Amazon and required a lot of trial and error.
i am running below hive coomand from beeline . Can someone please tell where can I see Map reudce logs for this ?
0: jdbc:hive2://<servername>:10003/> select a.offr_id offerID , a.offr_nm offerNm , b.disp_strt_ts dispStartDt , b.disp_end_ts dispEndDt , vld_strt_ts validStartDt, vld_end_ts validEndDt from gcor_offr a, gcor_offr_dur b where a.offr_id = b.offr_id and b.disp_end_ts > '2016-09-13 00:00:00';
When using beeline, MapReduce logs are part of HiveServer2 log4j logs.
If your Hive install was configured by Cloudera Manager (CM), then it will typically be in /var/log/hive/hadoop-cmf-HIVE-1-HIVESERVER2-*.out on the node where HiveServer2 is running (may or may not be the same as where you are running beeline from)
Few other scenarios:
Your Hive install was not configured by CM ? You will need to manually create log4j config file:
Create hive-log4j.properties config file in directory specified by HIVE_CONF_DIR environment variable. (This makes it accessible to HiveServer2 JVM classpath)
In this file, log location is specified by log.dir and log.file. See conf/hive-log4j.properties.template in your distribution for an example template for this file.
You run beeline in "embedded HS2 mode" (i.e. beeline -u jdbc:hive2:// user password) ?:
You will customize beeline log4j (as opposed to HiveServer2 log4j).
Beeline log4j properties file is strictly called beeline-log4j2.properties (in versions prior to Hive 2.0, it is called beeline-log4j.properties). Needs to be created and made accessible to beeline JVM classpath via HIVE_CONF_DIR. See HIVE-10502 and HIVE-12020 for further discussion on this.
You want to customize what HiveServer2 logs get printed on beeline stdout ?
This can be configured at HiveServer2 level using hive.server2.logging.operation.enabled and hive.server2.logging.operation configs.
Hive uses log4j for logging. These logs are not emitted to the standard output by default but are instead captured to a log file specified by Hive's log4j properties file. By default, Hive will use hive-log4j.default in the conf/ directory of the Hive installation which writes out logs to /tmp/<userid>/hive.log and uses the WARN level.
It is often desirable to emit the logs to the standard output and/or change the logging level for debugging purposes. These can be done from the command line as follows:
$HIVE_HOME/bin/hive --hiveconf hive.root.logger=INFO,console
set hive.async.log.enabled=false
Do we have anything to prevent the map-reduce being shown when we use DUMP command in PIG. Just wanted to see the output. Anything that could be run in silent mode?
Make changes in the /conf/pig.properties
$PIG_HOME/conf/pig.properties :
enable:
# log4jconf=./conf/log4j.properties
rename: log4j.properties.template -> log4j.properties
log4j.properties :
set info to error:
log4j.logger.org.apache.pig=info, A
May not work with pig version 0.12+.
This is my first question. I don't know how to config error.log has 2 function as below:
The log generated by current day will output to one fixed name log file. e.g error.log. This current log contains the current generated log only.
The previous log will back-up to single log file. e.g:
yesterday is 11/22/2013, so the error log of yesterday is named 11_22_2013.error.log
You can make use of the rotatelogs command to log rotate the apache logs. Try to put the following as a crontab.
crontab -e
Add the following there.
/usr/local/apache/bin/rotatelogs /path_to_apachelogs.%Y.%m.%d 86400
/usr/local/apache/bin/rotatelogs This path is meant for a cPanel server. You needed to give the full path for it to work. You can use the following command for getting the path.
which rotatelogs
If this is not showing any outputs, Try to locate the path with the locate command.
You can have further awareness from the following link