hive beeline use ACID transaction manager - hive

I'm trying to run this command using beeline.
create table <table_1> like <table_2>
but it appears my Hive is configured to run in ACID mode. So this query fails with
Error: Error while compiling statement: FAILED: SemanticException
[Error 10265]: This command is not allowed on an ACID table
with a non-ACID transaction manager. Failed command: create table
like (state=42000,code=10265)
What's correct syntax to run beeline query using ACID transaction manager without changing any global configuration ?
my beeline command is :
beeline -u <jdbc_con> -e "create table <table_1> like <table_2>";
I suppose I should use something like
hive>set hive.support.concurrency = true;
hive>set hive.enforce.bucketing = true;
hive>set hive.exec.dynamic.partition.mode = nonstrict;
hive>set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
hive>set hive.compactor.initiator.on = true;
hive>set hive.compactor.worker.threads = a positive number on at least one instance of the Thrift metastore service;
But how should I include this into beeline ?
When I tried
beeline -u $jdbc_con -e "set hive.support.concurrency = true; create table <table_1>_test like <table_2>";
It seems it's not possible to change theses parameter this way.
Error: Error while processing statement: Cannot modify
hive.support.concurrency at runtime. It is not in list of params that
are allowed to be modified at runtime (state=42000,code=1)
Thank you for any help.

You can set hive properties and run hive query from beeline as below :
beeline -u $jdbc_con \
--hiveconf "hive.support.concurrency=true" \
--hiveconf "hive.enforce.bucketing=true" \
-e "create table <table_1>_test like <table_2>"
Hope this is helpful.

Related

Option to disable Hive Compression in Insert Overwrite

I want to do a insert overwrite to hdfs folder as csv /textfile.
In hite-site.xml, hive.exec.compress.output is set to true.
I cannot do a set hive.exec.compress.output=false as the code is being executed in a custom build framework.
Is there an option to turn off hive compression like an attribute of the insert overwrite statement?
If you cannot modify properties in hite-site.xml, one option would be from the hive CLI or beeline, but it would be only for the current session, if you close the session and the next day start a new session you will have to do the same.
As an example:
Log in hive CLI or beeline
$ hive
to see the value of the property:
hive> SET hive.execution.engine;
to overwrite its value for the current session
hive> SET hive.execution.engine=tez
or in your case
hive> SET hive.exec.compress.output;
hive> SET hive.exec.compress.output=false
Other commands that can be useful from the Linux shell are:
$ hive -e "SET" > hive_properties
to write a file with all hive properties, or
$ hive -e "SET;" | grep compress
to see a group of hive properties from the console

How to supress hive warning

I am new to Hive. Trying to execute one query which is outputing data to one file.
Below is my query :
hive -e "SET hive.auto.convert.join=false;set
hive.server2.logging.operation.level=NONE;SET mapreduce.map.memory.mb
= 16384; SET mapreduce.map.java.opts='-Djava.net.preferIPv4Stack=true -Xmx13107M';SET mapreduce.reduce.memory.mb = 13107; SET mapreduce.reduce.java.opts='-Djava.net.preferIPv4Stack=true
-Xmx16384M';set hive.support.concurrency = false; SET hive.exec.dynamic.partition=true;SET
hive.exec.dynamic.partition.mode=nonstrict; SET
hive.exec.max.dynamic.partitions.pernode=10000;SET
hive.exec.max.dynamic.partitions=100000; SET
hive.exec.max.created.files=1000000;SET
mapreduce.input.fileinputformat.split.maxsize=128000000; SET
hive.hadoop.supports.splittable.combineinputformat=true;set
hive.execution.engine=mr; set hive.enforce.bucketing = true;hive query
over here;" > /tmp/analysis
But in /tmp/analysis file i can see warnings as well as below.
WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.
WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.
How can i supress that?
From Hive doc https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
Logging:
Hive uses log4j for logging. By default logs are not emitted to the console by the CLI. The default logging level is WARN for Hive releases prior to 0.13.0. Starting with Hive 0.13.0, the default logging level is INFO. By default Hive will use hive-log4j.default in the conf/ directory of the Hive installation which writes out logs to /tmp/<userid>/hive.log and uses the WARN level.
It is often desirable to emit the logs to the standard output and/or change the logging level for debugging purposes. These can be done from the command line as follows:
$HIVE_HOME/bin/hive --hiveconf hive.root.logger=INFO,console
hive.root.logger specifies the logging level as well as the log destination. Specifying console as the target sends the logs to the standard error (instead of the log file).
If the user wishes, the logs can be emitted to the console by adding the arguments shown below:
bin/hive --hiveconf hive.root.logger=INFO,console //for HiveCLI (deprecated)
bin/hiveserver2 --hiveconf hive.root.logger=INFO,console
Alternatively, the user can change the logging level only by using:
bin/hive --hiveconf hive.root.logger=INFO,DRFA //for HiveCLI (deprecated)
bin/hiveserver2 --hiveconf hive.root.logger=INFO,DRFA
Another option for logging is TimeBasedRollingPolicy (applicable for Hive 1.1.0 and above, HIVE-9001) by providing DAILY option as shown below:
bin/hive --hiveconf hive.root.logger=INFO,DAILY //for HiveCLI (deprecated)
bin/hiveserver2 --hiveconf hive.root.logger=INFO,DAILY
Hope it helps!
Use hive silent mode which doesn't print any logs in the output
hive -S -e "SET hive.auto.convert.join=false;set hive.server2.logging.operation.level=NONE;SET mapreduce.map.memory.mb = 16384; SET mapreduce.map.java.opts='-Djava.net.preferIPv4Stack=true -Xmx13107M';SET mapreduce.reduce.memory.mb = 13107; SET mapreduce.reduce.java.opts='-Djava.net.preferIPv4Stack=true -Xmx16384M';set hive.support.concurrency = false; SET hive.exec.dynamic.partition=true;SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.max.dynamic.partitions.pernode=10000;SET hive.exec.max.dynamic.partitions=100000; SET hive.exec.max.created.files=1000000;SET mapreduce.input.fileinputformat.split.maxsize=128000000; SET hive.hadoop.supports.splittable.combineinputformat=true;set hive.execution.engine=mr; set hive.enforce.bucketing = true;hive query over here;" > /tmp/analysis

Set search_path with file include in postgres psql command

How can I include multiple search paths in a psql command, so that multiple files can be run with different search_paths but all be run in one transaction?
psql
--single-transaction
--command="set search_path = 'a'; \i /sqlfile/a.sql; set search_path = 'b'; \i /sqlfile/b.sql;"
When I run this I get a syntax error at \i. I need to have the files included separately and they're generated dynamically so I'd rather run it using a --command than having to generate a file and using --file if possible.
The manual about the --command option:
command must be either a command string that is completely parsable by
the server (i.e., it contains no psql-specific features), or a single
backslash command. Thus you cannot mix SQL and psql meta-commands
within a -c option. To achieve that, you could use repeated -c options
or pipe the string into psql [...]
Bold emphasis mine.
Try:
psql --single-transaction -c 'set search_path = a' -c '\i /sqlfile/a.sql' -c 'set search_path = b' -c '\i /sqlfile/b.sql'
Or use a here-document:
psql --single-transaction <<EOF
set search_path = a;
\i /sqlfile/a.sql
set search_path = b;
\i /sqlfile/b.sql
EOF
The search_path needs no quotes, btw.

gcloud.dataproc.jobs.submit.hive The property [proxy.port] must have an integer value

I have Hive table created on Google Cloud Dataproc, while executing below SQL query I am getting exception like this:
gcloud dataproc jobs submit hive --cluster mycluster \
-e "select * from table limit 10;"
ERROR: (gcloud.dataproc.jobs.submit.hive)
The property [proxy.port] must have an integer value: []

PostgreSQL - Automate schema and table creation - powershell

I am trying to automate the creation of schemas and some tables into that newly created schema. I am trying to write a script in powershell to help me achieve the same. I have been able to create the schema, however, I cannot create the tables into that schema.
I am passing the new schema to be created as a variable to powershell.
script so far (based off the solution from the following answer. StackOverFlow Solution):
$MySchema=$args[0]
$CreateSchema = 'CREATE SCHEMA \"'+$MySchema+'\"; set schema '''+$MySchema+''';'
write-host $CreateSchema
C:\PostgreSQL\9.3\bin\psql.exe -h $DBSERVER -U $DBUSER -d $DBName -w -c $CreateSchema
# To create tables
C:\PostgreSQL\9.3\bin\psql.exe -h $DBSERVER -U $DBUSER -d $DBName -w -f 'E:\automation\scripts\create-tables.sql' -v schema=$MySchema
At the execution, I see the following error:
psql:E:/automation/scripts/create-tables.sql:11: ERROR: no schema has been selected to create in
The content of create-tables.sql is:
SET search_path TO :schema;
CREATE TABLE testing (
id SERIAL,
QueryDate varchar(255) NULL
);
You've got this in your first step:
$CreateSchema = 'CREATE SCHEMA \"'+$MySchema+'\"; set schema '''+$MySchema+''';'
Take out that set schema - it's erroneous and causing the schema not to be created. Then on the next step you wind up with an empty search path (because the schema never got created), which is why you get that error.