Hive- query output to file csv/excel - hive

I am trying to output the results of Hive to a File (preferably excel) I tried below methods and non of them work as explained in most posts. I wonder because I use Hue environment. I am new to Hue and hive, any help would be appreciated
insert overwrite directory 'C:/Users/Microsoft/Windows/Data Assets' row format delimited fields terminated by '\n' stored as textfile select * from final_table limit 100;
INSERT OVERWRITE LOCAL DIRECTORY 'C:/Users/Microsoft/Windows/Data Assets'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE
select * from final_table limit 100;

I have tried running the same query in my setup, it works fine.
In your case , it might be an issue with 'C:/Users/Microsoft/Windows/Data Assets' folder permission.
Try writing to different folder(User's Home folder).
Query:
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/fsimage' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n' STORED AS TEXTFILE SELECT * FROM stream_data_partitioned limit 100;
Output Attached
Destination folder

Related

insert into hive external table as select and ensure it generates single file in table directory

My question is somewhat similar to the below post. I want to download some data from a hive table using select query. But because the data is large, I want to write it as an external table in a given path. so that I can create a csv file. Uses the below code
create external table output(col1 STRING, col2STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '{outdir}/output'
INSERT OVERWRITE TABLE output
Select col1, col2 from atable limit 1000
This works fine, and create a file in 0000_ format, which can be copied as a csv file.
But my question is how to ensure that the output will always have a single file? If there is no partition defined, will it always be single file? What is the rule it uses to split files?
Saw few similar questions like below. But it discuss hdfs file access.
How to point to a single file with external table
I know the below alternative, but I use a hive connection object to execute queries from a remote node.
hive -e ' selectsql; ' | sed 's/[\t]/,/g' > outpathwithfilename
You can set the below property before doing the overwrite
set mapreduce.job.reduces=1;
Note: If the hive engine doesn't allow to be modified at runtime, then whitelist the parameter by setting below property in hive-site.xml
hive.security.authorization.sqlstd.confwhitelist.append=|mapreduce.job.|mapreduce.map.|mapreduce.reduce.*

AWS athena hive remove footer from csv file

While creating table in AWS Athena i have a problem with removing footer from imported csv file stored in AWS S3.
The query is pretty simple but im stuck on footer part.
CREATE EXTERNAL TABLE csvfile
(
col1 string
col2 string
col3 string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n' STORED AS TEXTFILE
LOCATION
's3://xxxxx'
TBLPROPERTIES ("skip.header.line.count"='1',"skip.footer.line.count"='1');
The header is working perfectly fine but footer is not ;/ Any idea how to do it?
The problem is that "skip.footer.line.count"='1' is not removing the last row (summary in the csv file)

Hive query insertion into other directories

I have one table called balance in warehouse directory that has following data.
Surender,HDFC,60000,CTS
Raja,AXIS,80000,TCS
Raj,HDFC,70000,TCS
Kumar,AXIS,70000,CTS
Remya,AXIS,40000,CTS
Arun,SBI,30000,TCS
I created a Internal table called balance and loaded the above file into balance table using.
LOAD data local inpath '/home/cloudera/bal.txt' into table balance
Now I just wanted to have all those rows in balance table into a HDFS directory.hence I used the below query.
Insert overwrite directory '/user/cloudera/surenhive' select * from balance;
when I run this query all the data also loaded into above mentioned directory in HDFS.
If i navigate to /user/cloudera/surenhive then i can see the data ,but there are some junk characters between data . Why is it junk characters appearing? How to remove those.
but the below query gives me result without any issues.
Insert overwrite local directory '/home/cloudera/surenhive' select * from balance;
if I load the file from local and store the output into HDFS directory creates any problem for that junk characters.
First of all, if you've loaded the data into a hive table, then it is already in HDFS. Do "describe formatted balance" and you will see the hdfs location of the hive table; the files are there.
But to answer your question more specifically, the default delimiter that hive uses is ^A. That's probably the You can change that by specifying a different delimiter when you do the insert:
insert overwrite directory '/user/cloudera/surenhive'
row format delimited fields terminated by ','
select * from balance;
Alternatively, since it seems you're using an older version of Hive, you could do a "create-table-as-select" with the correct file format, then make the table external and drop it. This will leave you with just the files on hdfs:
create table tmp
row format delimited fields terminated by ','
location '/user/cloudera/surenhive'
as select * from balance;
alter table tmp set tblproperties('EXTERNAL'='TRUE');
drop table tmp;

Insert large amount of data efficiently with SQL

Hi I often have to insert a lot of data into a table. For example, I would have data from excel or text file in the form of
1,a
3,bsdf
4,sdkfj
5,something
129,else
then I often construct 6 insert statements in this example and run the SQL script. I found this was slow when I have to send thousands of small packages to server, it also causes extra overhead to the network.
What's your best way of doing this?
Update: I'm using ORACLE 10g.
Use Oracle external tables.
See also e.g.
OraFaq about external tables
What Tom thinks about external tables
René Nyffenegger's notes about external tables
A simple example that should get you started
You need a file located in a server directory (get familiar with directory objects):
SQL> select directory_path from all_directories where directory_name = 'JTEST';
DIRECTORY_PATH
--------------------------------------------------------------------------------
c:\data\jtest
SQL> !cat ~/.gvfs/jtest\ on\ 192.168.xxx.xxx/exttable-1.csv
1,a
3,bsdf
4,sdkfj
5,something
129,else
Create an external table:
create table so13t (
id number(4),
data varchar2(20)
)
organization external (
type oracle_loader
default directory jtest /* jtest is an existing directory object */
access parameters (
records delimited by newline
fields terminated by ','
missing field values are null
)
location ('exttable-1.csv') /* the file located in jtest directory */
)
reject limit unlimited;
Now you can use all the powers of SQL to access the data:
SQL> select * from so13t order by data;
ID DATA
---------- ------------------------------------------------------------
1 a
3 bsdf
129 else
4 sdkfj
5 something
Im not sure if this works in Oracle but in SQL Server you can use BULK INSERT sql statement to upload data from a txt or a csv file.
BULK
INSERT [TableName]
FROM 'c:\FileName.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
Just make sure that the table columns correctly matches whats in the txt file. With a more complicated solution you may want to use a format file see the following:
http://msdn.microsoft.com/en-us/library/ms178129.aspx
There are alot of ways to speed this up.
1) Do it in a single transaction. This will speed things up by avoiding connection opening / closing.
2) Load directly as a CSV file. If you load data as a CSV file, the "SQL" statements aren't required at all. in MySQL the "LOAD DATA INFILE" operation accomplishes this very intuitively and simply.
3) You can also simply dump the whole file as text into a table called "raw". And then let the database parse the data on its own using triggers. This is a hack, but it will simplify your application code and reduce network usage.

Hundreds of excel lookup values to MySQL inserts online?

I have hundreds of excel documents which have lookup values for all the lookup tables that I am giving to my developers. some are small and some are super huge like world cities. Either i can send them the xls file and let them import it into the DB but I prefer to send them the SQL inserts in a text file so they can just execute it and save time to load all the data.
Now I dont have any MySQL environment setup as i dont do development so the question is how do i convert the various colunms of lookup values on each excel tab into insert statements to load in? Are there any online tools that can read the xls and create sql inserts? I dont want to manually do it, the city table itself will take me a whole week if i put in 12 hours a day each day of the week to manually create the inserts for all the rows.
Within Excel, save your spreadsheets as CSV (comma separated values) files. Your developers will be able to load them into MySQL directly using LOAD DATA INFILE. If they create a table with columns that match the CSV columns, then your developers can import them with the following SQL command:
LOAD DATA INFILE 'file_name.csv'
INTO TABLE tbl_name
FIELDS
TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '"'
LINES
TERMINATED BY '\r\n' -- or '\n' if you are on unix or '\r' if you are on mac
IGNORE 1 LINES -- if you want to skip over the column headings