I am trying to create an external table in hive with the following query in HDFS.
CREATE EXTERNAL TABLE `post` (
FileSK STRING,
OriginalSK STRING,
FileStatus STRING,
TransactionType STRING,
TransactionDate STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS PARQUET TBLPROPERTIES("Parquet.compression"="SNAPPY")
LOCATION 'hdfs://.../post'
getting error
Error while compiling statement: FAILED: ParseException line 11:2
missing EOF at 'LOCATION' near ')'
What is the best way to create a HIVE external table with data stored in parquet format?
I am able to create table after removing property TBLPROPERTIES("Parquet.compression"="SNAPPY")
CREATE EXTERNAL TABLE `post` (
FileSK STRING,
OriginalSK STRING,
FileStatus STRING,
TransactionType STRING,
TransactionDate STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS PARQUET,
LOCATION 'hdfs://.../post'
I'd like to use this query to create a partitioned table in Amazon Athena:
CREATE TABLE IF NOT EXISTS
testing.partitioned_test(order_id bigint, name string, car string, country string)
PARTITIONED BY (year int)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
STORED AS 'PARQUET'
LOCATION 's3://testing-imcm-into/partitions'
Unfortunately I don't get the error message which tells me the following:
line 3:2: mismatched input 'partitioned' expecting {, 'with'}
The quotes around 'PARQUET' seemed to be causing a problem.
Try this:
CREATE EXTERNAL TABLE IF NOT EXISTS
partitioned_test (order_id bigint, name string, car string, country string)
PARTITIONED BY (year int)
STORED AS PARQUET
LOCATION 's3://testing-imcm-into/partitions/'
I have already an internal table in hive. Now I want to create an external table with partitions based on date to it. But it throws error, when I try to create it.
Sample code:
create external table db_1.T_DATA1 partitioned by (date string) as select * from db_2.temp
LOCATION 'file path';
Error:
ParseException line 2:0 cannot recognize input near 'LOCATION' ''file
path'' '' in table source
As per the answer provided at https://stackoverflow.com/a/26722320/4326922 you should be able to create external table with CTAS.
I'm trying to load a tab delimited file into a table in hive, and I want to skip the first row because it contains column names. I'm trying to run the code below, but I'm getting the error below. Does anyone see what the issue is?
Code:
set hive.exec.compress.output=false;
set hive.mapred.mode=nonstrict;
-- region to state mapping
DROP TABLE IF EXISTS StateRegion;
CREATE TEMPORARY TABLE StateRegion (Zip_Code int,
Place_Name string,
State string,
State_Abbreviate string,
County string,
Latitude float,
Longitude float,
ZIP_CD int,
District_NM string,
Region_NM string)
row format delimited fields terminated by '\t'
tblproperties("skip.header.line.count"="1");
STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH 'StateRegion'
OVERWRITE INTO TABLE StateRegion;
--test Export
INSERT OVERWRITE LOCAL DIRECTORY './StateRegionTest/'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
select * from StateRegion;
Error:
FAILED: ParseException line 2:0 cannot recognize input near 'STORED' 'AS' 'TEXTFILE'
I have a very basic question which is: How can I add a very simple table to Hive. My table is saved in a text file (.txt) which is saved in HDFS. I have tried to create an external table in Hive which points out this file but when I run an SQL query (select * from table_name) I don't get any output.
Here is an example code:
create external table Data (
dummy INT,
account_number INT,
balance INT,
firstname STRING,
lastname STRING,
age INT,
gender CHAR(1),
address STRING,
employer STRING,
email STRING,
city STRING,
state CHAR(2)
)
LOCATION 'hdfs:///KibTEst/Data.txt';
KibTEst/Data.txt is the path of the text file in HDFS.
The rows in the table are seperated by carriage return, and the columns are seperated by commas.
Thanks for your help!
You just need to create an external table pointing to your file
location in hdfs and with delimiter properties as below:
create external table Data (
dummy INT,
account_number INT,
balance INT,
firstname STRING,
lastname STRING,
age INT,
gender CHAR(1),
address STRING,
employer STRING,
email STRING,
city STRING,
state CHAR(2)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 'hdfs:///KibTEst/Data.txt';
You need to run select query(because file is already in HDFS and external table directly fetches data from it when location is specified in create statement). So you test using below select statement:
SELECT * FROM Data;
create external table Data (
dummy INT,
account_number INT,
balance INT,
firstname STRING,
lastname STRING,
age INT,
gender CHAR(1),
address STRING,
employer STRING,
email STRING,
city STRING,
state CHAR(2)
)
row format delimited
FIELDS TERMINATED BY ‘,’
stored as textfile
LOCATION 'Your hdfs location for external table';
If data in HDFS then use :
LOAD DATA INPATH 'hdfs_file_or_directory_path' INTO TABLE tablename
The use select * from table_name
create external table Data (
dummy INT,
account_number INT,
balance INT,
firstname STRING,
lastname STRING,
age INT,
gender CHAR(1),
address STRING,
employer STRING,
email STRING,
city STRING,
state CHAR(2)
)
row format delimited
FIELDS TERMINATED BY ','
stored as textfile
LOCATION '/Data';
Then load file into table
LOAD DATA INPATH '/KibTEst/Data.txt' INTO TABLE Data;
Then
select * from Data;
I hope, below inputs will try to answer the question asked by #mshabeen.
There are different ways that you can use to load data in Hive table that is created as external table.
While creating the Hive external table you can either use the LOCATION option and specify the HDFS, S3 (in case of AWS) or File location, from where you want to load data OR you can use LOAD DATA INPATH option to load data from HDFS, S3 or File after creating the Hive table.
Alternatively you can also use ALTER TABLE command to load data in the Hive partitions.
Below are some details
Using LOCATION - Used while creating the Hive table. In this case data is already loaded and available in Hive table.
**LOAD DATA INPATH** option - This Hive command can be used to load data from specified location. Point to remember here is, the data will get MOVED from input path to Hive warehouse path.
Example -
LOAD DATA INPATH 'hdfs://cluster-ip/path/to/data/location/'
Using ALTER TABLE command - Mostly this is used to add data from other locations into the Hive partitions. In this case it is required that all partitions are already defined and the values for the partitions are already known. In case of dynamic partitions this command is not required.
Example -
ALTER TABLE table_name ADD PARTITION (date_col='2018-02-21') LOCATION 'hdfs/path/to/location/'
The above code will map the partition to the specified data location (in this case HDFS). However, the data will NOT MOVED to Hive internal warehouse location.
Additional details are available here