Option to disable Hive Compression in Insert Overwrite

Option to disable Hive Compression in Insert Overwrite - hive

I want to do a insert overwrite to hdfs folder as csv /textfile.
In hite-site.xml, hive.exec.compress.output is set to true.
I cannot do a set hive.exec.compress.output=false as the code is being executed in a custom build framework.
Is there an option to turn off hive compression like an attribute of the insert overwrite statement?

If you cannot modify properties in hite-site.xml, one option would be from the hive CLI or beeline, but it would be only for the current session, if you close the session and the next day start a new session you will have to do the same.
As an example:
Log in hive CLI or beeline
$ hive
to see the value of the property:
hive> SET hive.execution.engine;
to overwrite its value for the current session
hive> SET hive.execution.engine=tez
or in your case
hive> SET hive.exec.compress.output;
hive> SET hive.exec.compress.output=false
Other commands that can be useful from the Linux shell are:
$ hive -e "SET" > hive_properties
to write a file with all hive properties, or
$ hive -e "SET;" | grep compress
to see a group of hive properties from the console

Related

Where to run command for pgloader

I have installed the pgloader using Window Subsystem Linux.
I couldn't figure out where to run the pgloader commands, for example, loading CSV data: https://pgloader.readthedocs.io/en/latest/ref/csv.html
LOAD CSV
FROM 'GeoLiteCity-Blocks.csv' WITH ENCODING iso-646-us
HAVING FIELDS
(
startIpNum, endIpNum, locId
)
INTO postgresql://user#localhost:54393/dbname
TARGET TABLE geolite.blocks
TARGET COLUMNS
(
iprange ip4r using (ip-range startIpNum endIpNum),
locId
)
WITH truncate,
skip header = 2,
fields optionally enclosed by '"',
fields escaped by backslash-quote,
fields terminated by '\t'
SET work_mem to '32 MB', maintenance_work_mem to '64 MB';
Whenever I run the commands in the cmd, it won't recognize the syntax:
-bash: LOAD: command not found

You are supposed to put your commands in a .lisp file then execute the following command :
pgloader yourfile.lisp
Of course ensure pgloader is installed or that you use the binary you compiled.

How to supress hive warning

I am new to Hive. Trying to execute one query which is outputing data to one file.
Below is my query :
hive -e "SET hive.auto.convert.join=false;set
hive.server2.logging.operation.level=NONE;SET mapreduce.map.memory.mb
= 16384; SET mapreduce.map.java.opts='-Djava.net.preferIPv4Stack=true -Xmx13107M';SET mapreduce.reduce.memory.mb = 13107; SET mapreduce.reduce.java.opts='-Djava.net.preferIPv4Stack=true
-Xmx16384M';set hive.support.concurrency = false; SET hive.exec.dynamic.partition=true;SET
hive.exec.dynamic.partition.mode=nonstrict; SET
hive.exec.max.dynamic.partitions.pernode=10000;SET
hive.exec.max.dynamic.partitions=100000; SET
hive.exec.max.created.files=1000000;SET
mapreduce.input.fileinputformat.split.maxsize=128000000; SET
hive.hadoop.supports.splittable.combineinputformat=true;set
hive.execution.engine=mr; set hive.enforce.bucketing = true;hive query
over here;" > /tmp/analysis
But in /tmp/analysis file i can see warnings as well as below.
WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.
WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.
How can i supress that?

From Hive doc https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
Logging:
Hive uses log4j for logging. By default logs are not emitted to the console by the CLI. The default logging level is WARN for Hive releases prior to 0.13.0. Starting with Hive 0.13.0, the default logging level is INFO. By default Hive will use hive-log4j.default in the conf/ directory of the Hive installation which writes out logs to /tmp/<userid>/hive.log and uses the WARN level.
It is often desirable to emit the logs to the standard output and/or change the logging level for debugging purposes. These can be done from the command line as follows:
$HIVE_HOME/bin/hive --hiveconf hive.root.logger=INFO,console
hive.root.logger specifies the logging level as well as the log destination. Specifying console as the target sends the logs to the standard error (instead of the log file).
If the user wishes, the logs can be emitted to the console by adding the arguments shown below:
bin/hive --hiveconf hive.root.logger=INFO,console //for HiveCLI (deprecated)
bin/hiveserver2 --hiveconf hive.root.logger=INFO,console
Alternatively, the user can change the logging level only by using:
bin/hive --hiveconf hive.root.logger=INFO,DRFA //for HiveCLI (deprecated)
bin/hiveserver2 --hiveconf hive.root.logger=INFO,DRFA
Another option for logging is TimeBasedRollingPolicy (applicable for Hive 1.1.0 and above, HIVE-9001) by providing DAILY option as shown below:
bin/hive --hiveconf hive.root.logger=INFO,DAILY //for HiveCLI (deprecated)
bin/hiveserver2 --hiveconf hive.root.logger=INFO,DAILY
Hope it helps!

Use hive silent mode which doesn't print any logs in the output
hive -S -e "SET hive.auto.convert.join=false;set hive.server2.logging.operation.level=NONE;SET mapreduce.map.memory.mb = 16384; SET mapreduce.map.java.opts='-Djava.net.preferIPv4Stack=true -Xmx13107M';SET mapreduce.reduce.memory.mb = 13107; SET mapreduce.reduce.java.opts='-Djava.net.preferIPv4Stack=true -Xmx16384M';set hive.support.concurrency = false; SET hive.exec.dynamic.partition=true;SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.max.dynamic.partitions.pernode=10000;SET hive.exec.max.dynamic.partitions=100000; SET hive.exec.max.created.files=1000000;SET mapreduce.input.fileinputformat.split.maxsize=128000000; SET hive.hadoop.supports.splittable.combineinputformat=true;set hive.execution.engine=mr; set hive.enforce.bucketing = true;hive query over here;" > /tmp/analysis

How To Send Output To Terminal Window with Hive Script

I am familiar with storing output/results for a Hive Query to file, but what command do I use in the script to display the results of the HQL to the terminal?

Normally Hive prints results to the stdout, if not redirected it displays on console. You do not need any special command for this.
If you want to display results on the console screen and at the same time store them in a file, use tee command:
hive -e "use mydb; select * from test_t" | tee ./results.txt
OK
123 {"value(B)":"Bye"}
123 {"value(G)":"Jet"}
Time taken: 1.322 seconds, Fetched: 2 row(s)
Check file contains results
cat ./results.txt
123 {"value(B)":"Bye"}
123 {"value(G)":"Jet"}
See here: https://ru.wikipedia.org/wiki/Tee

This was my output:
There was no output, because I had yet to properly use the LOAD DATA INPATH command to my hdfs. After loading, I received output from the SELECT statement in the script.

How to create a daily dump in MySQL?

I want to make a daily dump of all the databases in MySQL using
Event Scheduler
, by now I have this query to create the event:
DELIMITER $$
CREATE EVENT `DailyBackup`
ON SCHEDULE EVERY 1 DAY STARTS '2015-11-09 00:00:01'
ON COMPLETION NOT PRESERVE ENABLE
DO
BEGIN
mysqldump -user=MYUSER -password=MYPASS all-databases > CONCAT('C:\Users\User\Documents\dumps\Dump',DATE_FORMAT(NOW(),%Y %m %d)).sql
END $$
DELIMITER ;
The problem is that MySQL seems to not recognize the command 'mysqldump' and shows me an error like this: Syntax error: missing 'colon'.
I am not an expert in SQL and I've tried to find the solution, but I couldn't, hope someone can help me with this.
Edit:
Help to make this statement a cron task

For Windows, create a .bat file with the needed command, and then create a scheduled task that runs that .bat file according to a schedule.
Create a .bat file in this fashion, replacing your username, password, and database name as appropriate:
mysqldump --opt --host=localhost --user=root --password=yourpassword dbname > C:\some_folder\some_file.sql
Then go to the start menu, control panel, administrative tools, task scheduler. Hit action > create task. Go to the actions tab, hit new, browse to the .bat file and add it to the task. Then go to the triggers tab, hit new, and define your daily schedule. Refer to http://windows.microsoft.com/en-US/windows/schedule-task
You might want to use a tool like 7zip to compress your backups all in the same command (7zip can be invoked from the command line). An example with 7zip installed would look like:
mysqldump --opt --host=localhost --user=root --password=yourpassword dbname | 7z a -si C:\some_folder\some_file.7z
I use this to include the date and time in the filename:
set _my_datetime=%date:~-4%_%date:~4,2%_%date:~7,2%_%time:~0,2%_%time:~3,2%_%time:~6,2%_%time:~9,2%_
set _my_datetime=%_my_datetime: =_%
set _my_datetime=%_my_datetime::=%
set _my_datetime=%_my_datetime:/=_%
set _my_datetime=%_my_datetime:.=_%
echo %_my_datetime%
mysqldump --opt --host=localhost --user=root --password=yourpassword dbname | 7z a -si C:\some_folder\backup_with_datetime_%_my_datetime%_dbname.7z

#Drew means to use a cronjob. to add a cronjon just start the crontab using this command:
crontab -e
the add a new entry at the end like this:
0 0 * * * mysqldump -u username -ppassword databasename > /path/to/file.sql
this will perform a database dump every day at 00:00

yes program the scheduler to run something like this:
C:/path/to/mysqldump.exe -u username -ppassword databasename > /path/to/file.sql

run os command and set out put to hive variable

Is it possible to run something like this in Hive CLI?
I am trying to pass file contents as a variable to another query.
set column_list=!cat /home/user/filename.lst ;
create table tabname as select $column_list from ...

if you have a query file you pass the variables as hiveconf
hive -hiveconf var1=abcd -f file.txt
or you can construct your query and then pass it to hive cli using -e
hive -e "create table ..."

file filename.lst
line
make a file test.sh,
temp=$(cat /home/user/filename.lst)
hive -f test.hql -hiveconf var=$temp
make a another file test.hql
create table test(${hiveconf:var} string);
on terminal
sh -x test.sh
It will pass the line to the test.hql and it will create a table with line as column;
note- all files should be in same directory .This script is passing only one variable.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas