How to retain history for the hive command line interpreter - hive

It seems that each time I restart the hive command line (cli) the history file ~/.hivehistory were overwritten. What might be done to retain the history for the hive interpreter between sessions?

I had multiple hive sessions open concurrently. It would be nice to know if there were a way for them to append their results to the .hivehistory - instead of overwriting each other.

I use Hive3.1.2,.hiveHistory were not overwritten.

Related

How to load data into BigQuery from command line with WRITE_EMPTY flag?

I'm loading CSV data into BigQuery from the command line. I would like to prevent the operation from occurring if the table exists already. I do not want to truncate the table if it exists, and I do not want to append to it.
It seems that there is no command line option for this:
However, I feel like I might be missing something. Is this truly an option that is impossible to use from the command line interface?
A possible workaround for this can be by using bq cp as follow:
Upload your data to a side table, Truncate the data each upload
bq --location=US load --autodetect --source_format=CSV dataset.table ./dataRaw.csv
Copy the data to your target table using bq cp which support an overwrite flag
bq --location=US cp -n dataset.dataRaw dataset.tableNotToOverWrite
If the table exists you get the following error:
Table 'project:dataset.table' already exists, skipping
I think you are right about CLI doesn't support WRITE_EMPTY mode now.
You may file a feature request to get it prioritized.

Hive query - INSERT OVERWRITE LOCAL DIRECTORY creates multiple files for a single table

I do the following from a hive table myTable.
INSERT OVERWRITE LOCAL DIRECTORY '/myDir/out' SELECT concat_ws('',NAME,PRODUCT,PRC,field1,field2,field3,field4,field5) FROM myTable;
So, this command generates 2 files 000000_0 and 000001_0 inside the folder out/.
But, I need the contents as a single file. What should I do?
There are multiple files in the directory because every reducer is writing one file. If you really need the contents as a single file, run your map reduce job with only 1 reducer which will write to a single file.
However depending on your data size, this might not be a good approach to run a single reducer.
Edit: Instead of forcing hive to run 1 reduce task and output a single reduce file, it would be better to use hadoop fs operations to merge outputs to a single file.
For example
hadoop fs -text /myDir/out/* | hadoop fs -put - /myDir/out.txt
A bit late to the game, but I found that using LIMIT large_number, where large_number is bigger than rows in your query. It forces hive to use at least a reducer. For example:
set mapred.reduce.tasks=1; INSERT OVERWRITE LOCAL DIRECTORY '/myDir/out' SELECT * FROM table_name LIMIT 1000000000
Worked flawlessly.
CLUSTER BY will make the work.

PostgreSQL and queue commands

I would like to know if there is a way to quere my queries. I am doing some basic text matching in psql and each query (which is saved in a different script) takes about 6 hours to run. I was wondering if there is a way to queue my scripts?
For example;
my database is called; "data"
my scipts are called; cancer, heart, death
and I am doing the following;
data; \i cancer;
data; \i heart;
data; \i death;
But I have to come back every so often and check whether it is running or not etc which doesn't seem very efficient.
I am new to postgresql so appreciate any help.
this is the most easiest/fastest solution I can think of, but should work for your case ;)
When using psql from command line, you can start it with
-f filename
where filename is a SQL script. It will run the query and send the output to stdout. Also you can forward this to a file. Just put your queries into that SQL-file and you got your own queuing.
Assuming you might run Linux, you could use screen to have a simple way to leave your session open when logging of for the night.
The easiest solution was to create a separate sql file which runs through the commands sequentially.

how can i copy data from a Hive table into local system?

i have created a table in Hive "sample" and loaded a csv file "sample.txt" into it.
now i need that data from "sample" into my local /opt/zxy/sample.txt.
How can i do that?
Hortonworks' Sandbox lets you do it through its HCatalog menu. Otherwise, the syntax is
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/c' SELECT a.* FROM b
as per Hive language manual
Since your intention is just to copy the entire file from HDFS to your local FS, I would not suggest you to do it through a Hive query, because of the following reasons :
It'll start a Mapreduce job which will take more time than a normal copy.
It'll create file(s) with different names(000000_0, 000001_0 and so on), which will require you to rename the file manually afterwards.
You might face problem in opening these files as they are without any extension. Your OS would be unable to choose an application to open these files on its own. In such a case you either have to rename the file or manually select an application to open it.
To avoid these problems you could use HDFS get command :
bin/hadoop fs -get /user/hive/warehouse/sample/sample.txt /opt/zxy/sample.txt
Simple n easy. But if you need to copy some selected data, then you have to use a Hive query.
HTH
I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:
hive -e 'select * from sample' > /opt/zxy/sample.txt
Hope that helps.
Readers who are accessing Hive from Windows OS can check out this script on Github.
It's a Python+paramiko script that extracts Hive data to local Windows OS file-system.

Loading multiple files

The following is working as expected.
./bq --nosync load -F '^' --max_bad_record=30000 myvserv.xa one.txt ip:string,cb:string,country:string,telco_name:string, ...
1) But how to I send two csv files one.txt and two.txt in the same command?
2) I can not cat file and then pipe | to bg command ?
3) What does nosync mean?
Unfortunately, you can't (yet) upload two files with the same command; you'll have to run bq twice. (If you're loading data from Google Cloud Storage, though, you can specify multiple gs:// URLs separated by commas.)
Nope, bq doesn't (yet) support reading upload data from stdin, though that's a great idea for a future version.
If you just run "bq load", bq will create a load job on the server and then poll for completion. If you specify the --nosync flag, it will just create the load job and then exit without polling. (If desired, you can poll for completion separately using "bq wait".)
For 1), as Jeremy mentioned, you can't import two local files at once in the same command. However, you can start two parallel loads to the same table -- loads are atomic, and append by default, so this should do what you want and may be faster than importing both in a single job since the uploads will happen in parallel.