How Create a hive external table with parquet format - sql

I am trying to create an external table in hive with the following query in HDFS.
CREATE EXTERNAL TABLE `post` (
FileSK STRING,
OriginalSK STRING,
FileStatus STRING,
TransactionType STRING,
TransactionDate STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS PARQUET TBLPROPERTIES("Parquet.compression"="SNAPPY")
LOCATION 'hdfs://.../post'
getting error
Error while compiling statement: FAILED: ParseException line 11:2
missing EOF at 'LOCATION' near ')'
What is the best way to create a HIVE external table with data stored in parquet format?

I am able to create table after removing property TBLPROPERTIES("Parquet.compression"="SNAPPY")
CREATE EXTERNAL TABLE `post` (
FileSK STRING,
OriginalSK STRING,
FileStatus STRING,
TransactionType STRING,
TransactionDate STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS PARQUET,
LOCATION 'hdfs://.../post'

Related

FAILED: ParseException line 5:16 cannot recognize input near 'TIME' ',' 'store_name' in column type

I am getting error when trying to create a table in hive database. I am accessing it through docker image.
The command I am using:
root#:/opt/Hadoop#hive -f test_db.sql
Below mentioned is the content for the .sql file:
create database test_db;
use test_db;
CREATE EXTERNAL TABLE purchases (
purchase_date DATE,
Purchase_time TIME,
store_name STRING,
item_name STRING,
item_cost FLOAT,
payment STRING
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY '#'
STORED AS TEXTFILE
LOCATION '/user/input/purchases';
The error is:
FAILED: ParseException line 5:16 cannot recognize input near 'TIME' ','
'store_name' in column type
Any idea what's wrong in this?
Thanks!
When I changed the 'TIME' datatype into 'TIMESTAMP' and tried building the table, I could build the table inside HIVE.
create database test_db;
use test_db;
CREATE EXTERNAL TABLE purchases (
purchase_date DATE,
Purchase_time TIMESTAMP,
store_name STRING,
item_name STRING,
item_cost FLOAT,
payment STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE
LOCATION '/user/input/purchases';

impala CREATE EXTERNAL TABLE and remove double quotes

i got data on CSV for example :
"Female","44","0","0","Yes","Govt_job","Urban","103.59","32.7","formerly smoked"
i put it as hdfs with hdfs dfs put
and now i want to create external table from it on impala (not in hive)
there is an option without the double quotes ?
this is what i run by impala-shell:
CREATE EXTERNAL TABLE IF NOT EXISTS test_test.test1_ext
( `gender` STRING,`age` STRING,`hypertension` STRING,`heart_disease` STRING,`ever_married` STRING,`work_type` STRING,`Residence_type` STRING,`avg_glucose_level` STRING,`bmi` STRING,`smoking_status` STRING )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION "/user/test/tmp/test1"
Update 28.11
i managed to do it by create the external and then create a VIEW as select with case when concat() each col.
Impala uses the Hive metastore so anything created in Hive is available from Impala after issuing an INVALIDATE METADATA dbname.tablename. HOWEVER, to remove the quotes you need to use the Hive Serde library 'org.apache.hadoop.hive.serde2.OpenCSVSerde' and this is not accessible from Impala. My suggestion would be to do the following:
Create the external table in Hive
CREATE EXTERNAL TABLE IF NOT EXISTS test_test.test1_ext
( gender STRING, age STRING, hypertension STRING, heart_disease STRING, ever_married STRING, work_type STRING, Residence_type STRING, avg_glucose_level STRING, bmi STRING, smoking_status STRING )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES
(
"separatorChar" = ",",
"quoteChar" = """
)
STORED AS TEXTFILE
LOCATION "/user/test/tmp/test1"
Create a managed table in Hive using CTAS
CREATE TABLE mytable AS SELECT * FROM test_test.test1_ext;
Make it available in Impala
INVALIDATE METADATA db.mytable;

Amazon Athena returning "mismatched input 'partitioned' expecting {, 'with'}" error when creating partitions

I'd like to use this query to create a partitioned table in Amazon Athena:
CREATE TABLE IF NOT EXISTS
testing.partitioned_test(order_id bigint, name string, car string, country string)
PARTITIONED BY (year int)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
STORED AS 'PARQUET'
LOCATION 's3://testing-imcm-into/partitions'
Unfortunately I don't get the error message which tells me the following:
line 3:2: mismatched input 'partitioned' expecting {, 'with'}
The quotes around 'PARQUET' seemed to be causing a problem.
Try this:
CREATE EXTERNAL TABLE IF NOT EXISTS
partitioned_test (order_id bigint, name string, car string, country string)
PARTITIONED BY (year int)
STORED AS PARQUET
LOCATION 's3://testing-imcm-into/partitions/'

error loading csv into hive table

I'm trying to load a tab delimited file into a table in hive, and I want to skip the first row because it contains column names. I'm trying to run the code below, but I'm getting the error below. Does anyone see what the issue is?
Code:
set hive.exec.compress.output=false;
set hive.mapred.mode=nonstrict;
-- region to state mapping
DROP TABLE IF EXISTS StateRegion;
CREATE TEMPORARY TABLE StateRegion (Zip_Code int,
Place_Name string,
State string,
State_Abbreviate string,
County string,
Latitude float,
Longitude float,
ZIP_CD int,
District_NM string,
Region_NM string)
row format delimited fields terminated by '\t'
tblproperties("skip.header.line.count"="1");
STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH 'StateRegion'
OVERWRITE INTO TABLE StateRegion;
--test Export
INSERT OVERWRITE LOCAL DIRECTORY './StateRegionTest/'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
select * from StateRegion;
Error:
FAILED: ParseException line 2:0 cannot recognize input near 'STORED' 'AS' 'TEXTFILE'

Impala can not drop external table

I create a external table with a wrong(non-exists) path :
create external table IF NOT EXISTS ds_user_id_csv
(
type string,
imei string,
imsi string,
idfa string,
msisdn string,
mac string
)
PARTITIONED BY(prov string,day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
stored as textfile
LOCATION 'hdfs://cdh0:8020/user/hive/warehouse/test.db/ds_user_id';
And I can not drop the table:
[cdh1:21000] > drop table ds_user_id_csv
> ;
Query: drop table ds_user_id_csv
ERROR:
ImpalaRuntimeException: Error making 'dropTable' RPC to Hive Metastore:
CAUSED BY: MetaException: java.lang.IllegalArgumentException: Wrong FS: hdfs://cdh0:8020/user/hive/warehouse/test.db/ds_user_id, expected: hdfs://nameservice1
So how to solve this? Thank you.
Use the following command to change the location
ALTER TABLE name ds_user_id_csv SET LOCATION '{new location}';