I am new to Hive. Trying to execute one query which is outputing data to one file.
Below is my query :
hive -e "SET hive.auto.convert.join=false;set
hive.server2.logging.operation.level=NONE;SET mapreduce.map.memory.mb
= 16384; SET mapreduce.map.java.opts='-Djava.net.preferIPv4Stack=true -Xmx13107M';SET mapreduce.reduce.memory.mb = 13107; SET mapreduce.reduce.java.opts='-Djava.net.preferIPv4Stack=true
-Xmx16384M';set hive.support.concurrency = false; SET hive.exec.dynamic.partition=true;SET
hive.exec.dynamic.partition.mode=nonstrict; SET
hive.exec.max.dynamic.partitions.pernode=10000;SET
hive.exec.max.dynamic.partitions=100000; SET
hive.exec.max.created.files=1000000;SET
mapreduce.input.fileinputformat.split.maxsize=128000000; SET
hive.hadoop.supports.splittable.combineinputformat=true;set
hive.execution.engine=mr; set hive.enforce.bucketing = true;hive query
over here;" > /tmp/analysis
But in /tmp/analysis file i can see warnings as well as below.
WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.
WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.
How can i supress that?
From Hive doc https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
Logging:
Hive uses log4j for logging. By default logs are not emitted to the console by the CLI. The default logging level is WARN for Hive releases prior to 0.13.0. Starting with Hive 0.13.0, the default logging level is INFO. By default Hive will use hive-log4j.default in the conf/ directory of the Hive installation which writes out logs to /tmp/<userid>/hive.log and uses the WARN level.
It is often desirable to emit the logs to the standard output and/or change the logging level for debugging purposes. These can be done from the command line as follows:
$HIVE_HOME/bin/hive --hiveconf hive.root.logger=INFO,console
hive.root.logger specifies the logging level as well as the log destination. Specifying console as the target sends the logs to the standard error (instead of the log file).
If the user wishes, the logs can be emitted to the console by adding the arguments shown below:
bin/hive --hiveconf hive.root.logger=INFO,console //for HiveCLI (deprecated)
bin/hiveserver2 --hiveconf hive.root.logger=INFO,console
Alternatively, the user can change the logging level only by using:
bin/hive --hiveconf hive.root.logger=INFO,DRFA //for HiveCLI (deprecated)
bin/hiveserver2 --hiveconf hive.root.logger=INFO,DRFA
Another option for logging is TimeBasedRollingPolicy (applicable for Hive 1.1.0 and above, HIVE-9001) by providing DAILY option as shown below:
bin/hive --hiveconf hive.root.logger=INFO,DAILY //for HiveCLI (deprecated)
bin/hiveserver2 --hiveconf hive.root.logger=INFO,DAILY
Hope it helps!
Use hive silent mode which doesn't print any logs in the output
hive -S -e "SET hive.auto.convert.join=false;set hive.server2.logging.operation.level=NONE;SET mapreduce.map.memory.mb = 16384; SET mapreduce.map.java.opts='-Djava.net.preferIPv4Stack=true -Xmx13107M';SET mapreduce.reduce.memory.mb = 13107; SET mapreduce.reduce.java.opts='-Djava.net.preferIPv4Stack=true -Xmx16384M';set hive.support.concurrency = false; SET hive.exec.dynamic.partition=true;SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.max.dynamic.partitions.pernode=10000;SET hive.exec.max.dynamic.partitions=100000; SET hive.exec.max.created.files=1000000;SET mapreduce.input.fileinputformat.split.maxsize=128000000; SET hive.hadoop.supports.splittable.combineinputformat=true;set hive.execution.engine=mr; set hive.enforce.bucketing = true;hive query over here;" > /tmp/analysis
Related
I want to do a insert overwrite to hdfs folder as csv /textfile.
In hite-site.xml, hive.exec.compress.output is set to true.
I cannot do a set hive.exec.compress.output=false as the code is being executed in a custom build framework.
Is there an option to turn off hive compression like an attribute of the insert overwrite statement?
If you cannot modify properties in hite-site.xml, one option would be from the hive CLI or beeline, but it would be only for the current session, if you close the session and the next day start a new session you will have to do the same.
As an example:
Log in hive CLI or beeline
$ hive
to see the value of the property:
hive> SET hive.execution.engine;
to overwrite its value for the current session
hive> SET hive.execution.engine=tez
or in your case
hive> SET hive.exec.compress.output;
hive> SET hive.exec.compress.output=false
Other commands that can be useful from the Linux shell are:
$ hive -e "SET" > hive_properties
to write a file with all hive properties, or
$ hive -e "SET;" | grep compress
to see a group of hive properties from the console
When using the osqueryi interactive shell for osquery I'm running into an issue where a WARNING is displayed even though logging is supposed to be disabled. Is this a bug?
Docs explain the following:
--logger_min_status
The minimum level for status log recording. Use the following values: INFO = 0, WARNING = 1, ERROR = 2. To disable all status messages use 3+.
--logger_min_sterr
The minimum level for status logs written to stderr. Use the following values: INFO = 0, WARNING = 1, ERROR = 2. To disable all status messages use 3+.
What I have: (results truncated for brevity)
# osqueryi --json --logger_min_status=3 --logger_min_stderr=3 'select * from block_devices'
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
[{"block_size":"512","label":"","model":"VBOX HARDDISK","name":"/dev/sda","parent":"","size":"83886080","type":"","uuid":"","vendor":"ATA"},...]
What I expect:
# osqueryi --json --logger_min_status=3 --logger_min_stderr=3 'select * from block_devices'
[{"block_size":"512","label":"","model":"VBOX HARDDISK","name":"/dev/sda","parent":"","size":"83886080","type":"","uuid":"","vendor":"ATA"},...]
This logging seems to be coming from the LVM library, so is likely not controllable by osquery. I couldn't find the exact log line in the LVM2 source.
I believe it is the populatePVChildren function that would be calling an LVM function that performs the logging.
Your interpretation of the documentation around debugging looks correct.
I'm trying to run this command using beeline.
create table <table_1> like <table_2>
but it appears my Hive is configured to run in ACID mode. So this query fails with
Error: Error while compiling statement: FAILED: SemanticException
[Error 10265]: This command is not allowed on an ACID table
with a non-ACID transaction manager. Failed command: create table
like (state=42000,code=10265)
What's correct syntax to run beeline query using ACID transaction manager without changing any global configuration ?
my beeline command is :
beeline -u <jdbc_con> -e "create table <table_1> like <table_2>";
I suppose I should use something like
hive>set hive.support.concurrency = true;
hive>set hive.enforce.bucketing = true;
hive>set hive.exec.dynamic.partition.mode = nonstrict;
hive>set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
hive>set hive.compactor.initiator.on = true;
hive>set hive.compactor.worker.threads = a positive number on at least one instance of the Thrift metastore service;
But how should I include this into beeline ?
When I tried
beeline -u $jdbc_con -e "set hive.support.concurrency = true; create table <table_1>_test like <table_2>";
It seems it's not possible to change theses parameter this way.
Error: Error while processing statement: Cannot modify
hive.support.concurrency at runtime. It is not in list of params that
are allowed to be modified at runtime (state=42000,code=1)
Thank you for any help.
You can set hive properties and run hive query from beeline as below :
beeline -u $jdbc_con \
--hiveconf "hive.support.concurrency=true" \
--hiveconf "hive.enforce.bucketing=true" \
-e "create table <table_1>_test like <table_2>"
Hope this is helpful.
I am trying to trigger hive on spark using hue interface . The job works perfectly when run from commandline but when i try to run from hue it throws exceptions. In hue, I tried mainly two things:
1) when I give all the properties in .hql file using set commands
set spark.home=/usr/lib/spark;
set hive.execution.engine=spark;
set spark.eventLog.enabled=true;
add jar /usr/lib/spark/assembly/lib/spark-assembly-1.5.0-cdh5.5.1-hadoop2.6.0-cdh5.5.1.jar;
set spark.eventLog.dir=hdfs://10.11.50.81:8020/tmp/;
set spark.executor.memory=2899102923;
I get an error
ERROR : Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Unsupported execution engine: Spark. Please set hive.execution.engine=mr)'
org.apache.hadoop.hive.ql.metadata.HiveException: Unsupported execution engine: Spark. Please set hive.execution.engine=mr
2) when I give properties in hue properties, it just works with mr engine but not spark execution engine.
Any help would be appreciated
I have solved this issue by using a shell action in oozie.
This shell action invokes a pyspark action bearing my sql file.
Even though the job shows as MR in jobtracker, spark history server recognizes as a spark action and the output is achieved.
shell file:
#!/bin/bash
export PYTHONPATH=`pwd`
spark-submit --master local testabc.py
python file:
from pyspark.sql import HiveContext
from pyspark import SparkContext
sc = SparkContext();
sqlContext = HiveContext(sc)
result = sqlContext.sql("insert into table testing_oozie.table2 select * from testing_oozie.table1 ");
result.show()
I'm trying to use a message-driven bean in my webapp, but everytime it throws me this exception :
com.sun.messaging.jmq.jmsserver.util.BrokerException: [B4122]: Can not add message 1-127.0.1.1(b0:1a:c1:66:46:a9)-1-1336769823653 to destination PhysicalQueue [Queue]. The message size of 24968685 bytes is larger than the destination individual message byte limit (maxBytesPerMsg) of 10485760 bytes.
After some researches, I've found out that the default limit is -1, so it has to be unlimited.
I've looked everywhere in Glassfish's admin console but withou finding a way to remove this limit.
Even the "new JMS resource" wizard doesn't ask anything about this parameter.
Is there any way to fix it?
Why is your message so large? You might want to reconsider how you're doing this.
....
You can update it via the imqcmd command. The value you want to change is MaxBytesPerMsg.
From the SunGlassFish MessageQueue 4.4 Administration Guide or the 4.2 guide.
Updating Physical Destination Properties
The subcommand imqcmd update dst changes the values of specified properties of a physical
destination:
imqcmd update dst -t destType -n destName
-o property1=value1 [ [-o property2=value2] ... ]
The properties to be updated can include any of those listed in Table 18–1 (with the exception of the isLocalOnly property, which cannot be changed once the destination has been created).
For example, the following command changes the maxBytesPerMsg property of the queue
destination curlyQueue to 1000 and the maxNumMsgs property to 2000:
imqcmd update dst -t q -n curlyQueue -u admin
-o maxBytesPerMsg=1000
-o maxNumMsgs=2000