AWS athena hive remove footer from csv file

AWS athena hive remove footer from csv file - sql

While creating table in AWS Athena i have a problem with removing footer from imported csv file stored in AWS S3.
The query is pretty simple but im stuck on footer part.
CREATE EXTERNAL TABLE csvfile
(
col1 string
col2 string
col3 string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n' STORED AS TEXTFILE
LOCATION
's3://xxxxx'
TBLPROPERTIES ("skip.header.line.count"='1',"skip.footer.line.count"='1');
The header is working perfectly fine but footer is not ;/ Any idea how to do it?
The problem is that "skip.footer.line.count"='1' is not removing the last row (summary in the csv file)

Related

Hive- query output to file csv/excel

I am trying to output the results of Hive to a File (preferably excel) I tried below methods and non of them work as explained in most posts. I wonder because I use Hue environment. I am new to Hue and hive, any help would be appreciated
insert overwrite directory 'C:/Users/Microsoft/Windows/Data Assets' row format delimited fields terminated by '\n' stored as textfile select * from final_table limit 100;
INSERT OVERWRITE LOCAL DIRECTORY 'C:/Users/Microsoft/Windows/Data Assets'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE
select * from final_table limit 100;

I have tried running the same query in my setup, it works fine.
In your case , it might be an issue with 'C:/Users/Microsoft/Windows/Data Assets' folder permission.
Try writing to different folder(User's Home folder).
Query:
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/fsimage' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n' STORED AS TEXTFILE SELECT * FROM stream_data_partitioned limit 100;
Output Attached
Destination folder

lines terminated by '\n' is not working when moving the data from hive view to a hive external table

I have given the external table creation in hive below
create external table tablename (column details)
row format delimited
fields terminated by ','
lines terminated by '\n'
location 'location in which the data gets stored'
However, after the creation and loading of that external table. I see that the underlying data file in the folder is containing the data in a single line. i.e the newline setting is not working.
I am running the hiveql through Azure hd insight cluster

Loading zip csv file from S3 into Hive

I have a csv file that's zipped in S3. For unzipped files, I would use the below code. Is there an option I can add so it upzips before loading?
I'm on Hive and am using a sql editor (db visualizer). I googled and saw some unix steps but I've never used unix before so am wondering if there is another way within the sql
create external table abc (
email string
value int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 's3://path/'
TBLPROPERTIES ("skip.header.line.count"="1");

How can hive load data from csv?

I want to implement apache hive and I want to load the data from csv file to hive table. So, here's the problem :
my csv file generated by SQL Server in it's structure have " sign, and it's become something like "04748/09","2248559","2248559","2009-12-03 00:00:00". So how can I only get the value without the " sign ?
Thanks a lot, I need your suggestions......

problem mention in your comment-how can I ignore the first line when import on hive like an mysql import?
From Hive v0.13.0, you can use skip.header.line.count. You could also specify the same while creating the table. For example:
Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable'
tblproperties ("skip.header.line.count"="1");
or
CREATE TABLE TEMP (f1 STRING, f2 String, f3 String, f4 String) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' TBLPROPERTIES("skip.header.line.count"="1");
load data local inpath 'test.csv'
overwrite into table TEMP;

Hive External table-CSV File- Header row

Below is the hive table i have created:
CREATE EXTERNAL TABLE Activity (
column1 type, </br>
column2 type
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/exttable/';
In my HDFS location /exttable, i have lot of CSV files and each CSV file also contain the header row. When i am doing select queries, the result contains the header row as well.
Is there any way in HIVE where we can ignore the header row or first line ?

you can now skip the header count in hive 0.13.0.
tblproperties ("skip.header.line.count"="1");

If you are using Hive version 0.13.0 or higher you can specify "skip.header.line.count"="1" in your table properties to remove the header.
For detailed information on the patch see: https://issues.apache.org/jira/browse/HIVE-5795

Lets say you want to load csv file like below located at /home/test/que.csv
1,TAP (PORTUGAL),AIRLINE
2,ANSA INTERNATIONAL,AUTO RENTAL
3,CARLTON HOTELS,HOTEL-MOTEL
Now, we need to create a location in HDFS that holds this data.
hadoop fs -put /home/test/que.csv /user/mcc
Next step is to create a table. There are two types of them to choose from. Refer this for choosing one.
Example for External Table.
create external table industry_
(
MCC string ,
MCC_Name string,
MCC_Group string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/user/mcc/'
tblproperties ("skip.header.line.count"="1");
Note: When accessed via Spark SQL, the header row of the CSV will be shown as a data row.
Tested on: spark version 2.4.

There is not. However, you can pre-process your files to skip the first row before loading into HDFS -
tail -n +2 withfirstrow.csv > withoutfirstrow.csv
Alternatively, you can build it into where clause in HIVE to ignore the first row.

If your hive version doesn't support tblproperties ("skip.header.line.count"="1"), you can use below unix command to ignore the first line (column header) and then put it in HDFS.
sed -n '2,$p' File_with_header.csv > File_with_No_header.csv

To remove the header from the csv file in place use:
sed -i 1d filename.csv

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

AWS athena hive remove footer from csv file - sql

Related

Hive- query output to file csv/excel

lines terminated by '\n' is not working when moving the data from hive view to a hive external table

Loading zip csv file from S3 into Hive

How can hive load data from csv?

Hive External table-CSV File- Header row

Categories

Resources