Export Hive Query Results

Export Hive Query Results - hive

I'm new to hive and could use some tips.
I'm trying to export query results from hive as a csv. When I try to pipe them out of CLI like:
hive -e 'select * from table'>OutPut.txt
I get a text file that has all the records but doesn't have the column headers. Does anyone have a tip for how to export the query results with the column headers, to a csv file?
If I run the query in hue, and then download the results as a csv I get a csv with the column headers but no records. If anyone has a tip on how to download query results from hue with records and column headers, I would greatly appreciate it too.

To export the column headers, you need to set the following in the hiverc file:
set hive.cli.print.header=true;
To get just the headers into a file, you could try the following:
hive -e 'set hive.cli.print.header=true; SELECT * FROM TABLE_NAME LIMIT 0;' > /file_path/file_name.txt

Having the column header but missing data is a known issue: HUE-544
The workaround is to use Hue 3 or more or switch to HiveServer2 (recommended starting from CDH4.6).

Related

How to export hive query result to single local file?

I want to export hive query result to single local file with pipe delimiter.
Hive query contains order by clause.
I have tried below solutions.
Solution1:
hive -e 'insert overwrite local directory '/problem1/solution' fields terminated by '|' select * from table_name order by rec_date'
This solution is creating multiple files. After merging files, it loosing data order.
Solution2:
beeline -u 'jdbc:hive2://server_ip:10000/db_name' --silent --outputformat=dsv --delimiterForDSV='|' -e 'select * from table_name order by rec_date' > /problem1/solution
This solution is creating single file but it has empty 2 lines at top and 2 lines at bottom.
I am removing empty lines using sed command. It takes very long time.
Is there any other efficient way to achieve this?

Try these settings for executing ORDER BY on single reducer:
set hive.optimize.sampling.orderby=false; --disable parallel ORDER BY
Or try to set the number of reducers manually:
set mapred.reduce.tasks=1;

Exporting Hive Table Data into .csv

This question may have been asked before, and I am relatively new to the HADOOP and HIVE language. So I'm trying to export content, as a test, to see if I am doing things correctly. The code is below.
Use MY_DATABASE_NAME;
INSERT OVERWRITE LOCAL DIRECTORY '/random/directory/test'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY "\n"
SELECT date_ts,script_tx,sequence_id FROM dir_test WHERE date_ts BETWEEN '2018-01-01' and '2018-01-02';
That is what I have so far, but then it generates multiple files and I want to combine them into a .csv file or a .xls file, to be worked on. My question, what do I do next to accomplish this?
Thanks in advance.

You can achieve by following ways:
Use single reducer in the query like ORDER BY <col_name>
Store to HDFS and then use command hdfs dfs –getmerge [-nl] <src> <localdest>
Using beeline: beeline --outputformat=csv2 -f query_file.sql > <file_name>.csv

Just get column names from hive table

I know that you can get column names from a table via the following trick in hive:
hive> set hive.cli.print.header=true;
hive> select * from tablename;
Is it also possible to just get the column names from the table?
I dislike having to change a setting for something I only need once.
My current solution is the following:
hive> set hive.cli.print.header=true;
hive> select * from tablename;
hive> set hive.cli.print.header=false;
This seems too verbose and against the DRY-principle.

If you simply want to see the column names this one line should provide it without changing any settings:
describe database.tablename;
However, if that doesn't work for your version of hive this code will provide it, but your default database will now be the database you are using:
use database;
describe tablename;

you could also do show columns in $table or see Hive, how do I retrieve all the database's tables columns for access to hive metadata

The solution is
show columns in table_name;
This is simpler than use
describe tablename;
Thanks a lot.

use desc tablename from Hive CLI or beeline to get all the column names. If you want the column names in a file then run the below command from the shell.
$ hive -e 'desc dbname.tablename;' > ~/columnnames.txt
where dbname is the name of the Hive database where your table is residing
You can find the file columnnames.txt in your root directory.
$cd ~
$ls

Best way to do this is setting the below property:
set hive.cli.print.header=true;
set hive.resultset.use.unique.column.names=false;

Hive External table-CSV File- Header row

Below is the hive table i have created:
CREATE EXTERNAL TABLE Activity (
column1 type, </br>
column2 type
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/exttable/';
In my HDFS location /exttable, i have lot of CSV files and each CSV file also contain the header row. When i am doing select queries, the result contains the header row as well.
Is there any way in HIVE where we can ignore the header row or first line ?

you can now skip the header count in hive 0.13.0.
tblproperties ("skip.header.line.count"="1");

If you are using Hive version 0.13.0 or higher you can specify "skip.header.line.count"="1" in your table properties to remove the header.
For detailed information on the patch see: https://issues.apache.org/jira/browse/HIVE-5795

Lets say you want to load csv file like below located at /home/test/que.csv
1,TAP (PORTUGAL),AIRLINE
2,ANSA INTERNATIONAL,AUTO RENTAL
3,CARLTON HOTELS,HOTEL-MOTEL
Now, we need to create a location in HDFS that holds this data.
hadoop fs -put /home/test/que.csv /user/mcc
Next step is to create a table. There are two types of them to choose from. Refer this for choosing one.
Example for External Table.
create external table industry_
(
MCC string ,
MCC_Name string,
MCC_Group string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/user/mcc/'
tblproperties ("skip.header.line.count"="1");
Note: When accessed via Spark SQL, the header row of the CSV will be shown as a data row.
Tested on: spark version 2.4.

There is not. However, you can pre-process your files to skip the first row before loading into HDFS -
tail -n +2 withfirstrow.csv > withoutfirstrow.csv
Alternatively, you can build it into where clause in HIVE to ignore the first row.

If your hive version doesn't support tblproperties ("skip.header.line.count"="1"), you can use below unix command to ignore the first line (column header) and then put it in HDFS.
sed -n '2,$p' File_with_header.csv > File_with_No_header.csv

To remove the header from the csv file in place use:
sed -i 1d filename.csv

Remove header and footer export data from CSV to SQL Server

I need to export data from CSV to SQL Server
My CSV file is like this
Name,CustomerID
A,1
b,2
End
I need to export only the data into the SQL Server table
A,1
b,2
I tried to work with BULK INSERT, but header is coming
I need to remove header and footer.
Is the only option is to create a bcp with format file
Any help appreciated.
Thanks

You can use first_row and last_row options. If there are 100 rows set first_row=2 and last_row=99

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Export Hive Query Results - hive

To export the column headers, you need to set the following in the hiverc file: set hive.cli.print.header=true; To get just the headers into a file, you could try the following: hive -e 'set hive.cli.print.header=true; SELECT * FROM TABLE_NAME LIMIT 0;' > /file_path/file_name.txt

Having the column header but missing data is a known issue: HUE-544 The workaround is to use Hue 3 or more or switch to HiveServer2 (recommended starting from CDH4.6).

Related

How to export hive query result to single local file?

Exporting Hive Table Data into .csv

Just get column names from hive table

Hive External table-CSV File- Header row

Remove header and footer export data from CSV to SQL Server

Categories

Resources